Mohammad Bajammal

1.4CVDec 11, 2021

Page Segmentation using Visual Adjacency Analysis

Mohammad Bajammal, Ali Mesbah

Page segmentation is a web page analysis process that divides a page into cohesive segments, such as sidebars, headers, and footers. Current page segmentation approaches use either the DOM, textual content, or rendering style information of the page. However, these approaches have a number of drawbacks, such as a large number of parameters and rigid assumptions about the page, which negatively impact their segmentation accuracy. We propose a novel page segmentation approach based on visual analysis of localized adjacency regions. It combines DOM attributes and visual analysis to build features of a given page and guide an unsupervised clustering. We evaluate our approach on 35 real-world web pages, and examine the effectiveness and efficiency of segmentation. The results show that, compared with state-of-the-art, our approach achieves an average of 156% increase in precision and 249% improvement in F-measure.

3.7HCNov 23, 2021

Style-Guided Web Application Exploration

Davood Mazinanian, Mohammad Bajammal, Ali Mesbah

A wide range of analysis and testing techniques targeting modern web apps rely on the automated exploration of their state space by firing events that mimic user interactions. However, finding out which elements are actionable in web apps is not a trivial task. To improve the efficacy of exploring the event space of web apps, we propose a browser-independent, instrumentation-free approach based on structural and visual stylistic cues. Our approach, implemented in a tool called StyleX, employs machine learning models, trained on 700,000 web elements from 1,000 real-world websites, to predict actionable elements on a webpage a priori. In addition, our approach uses stylistic cues for ranking these actionable elements while exploring the app. Our actionable predictor models achieve 90.14\% precision and 87.76\% recall when considering the click event listener, and on average, 75.42\% precision and 77.76\% recall when considering the five most-frequent event types. Our evaluations show that StyleX can improve the JavaScript code coverage achieved by a general-purpose crawler by up to 23\%.

Mohammad Bajammal

2 Papers