PLSep 15, 2017Code
Erlang Code Evolution ControlDavid Insa, Sergio Pérez, Josep Silva et al.
During the software lifecycle, a program can evolve several times for different reasons such as the optimisation of a bottle-neck, the refactoring of an obscure function, etc. These code changes often involve several functions or modules, so it can be difficult to know whether the correct behaviour of the previous releases has been preserved in the new release. Most developers rely on a previously defined test suite to check this behaviour preservation. We propose here an alternative approach to automatically obtain a test suite that specifically focusses on comparing the old and new versions of the code. Our test case generation is directed by a sophisticated combination of several already existing tools such as TypEr, CutEr, and PropEr; and other ideas such as allowing the programmer to chose an expression of interest that must preserve the behaviour, or the recording of the sequences of values to which this expression is evaluated. All the presented work has been implemented in an open-source tool that is publicly available on GitHub.
PLFeb 12, 2018
Erlang Code Evolution Control (Use Cases)David Insa, Sergio Pérez, Josep Silva et al.
The main goal of this work is to show how SecEr can be used in different scenarios. Concretely, we demonstrate how a user can run SecEr to obtain reports about the behaviour preservation between versions as well as how a user can use SecEr to find the source of a discrepancy. The use cases presented are three: two completely different versions of the same program, an improvement in the performance of a function and a program where an error has been introduced. A complete description of the technique and the tool is available at [1] and [2].
IRJan 9, 2015
Web Template Extraction Based on Hyperlink AnalysisJulián Alarte, David Insa, Josep Silva et al.
Web templates are one of the main development resources for website engineers. Templates allow them to increase productivity by plugin content into already formatted and prepared pagelets. For the final user templates are also useful, because they provide uniformity and a common look and feel for all webpages. However, from the point of view of crawlers and indexers, templates are an important problem, because templates usually contain irrelevant information such as advertisements, menus, and banners. Processing and storing this information is likely to lead to a waste of resources (storage space, bandwidth, etc.). It has been measured that templates represent between 40% and 50% of data on the Web. Therefore, identifying templates is essential for indexing tasks. In this work we propose a novel method for automatic template extraction that is based on similarity analysis between the DOM trees of a collection of webpages that are detected using menus information. Our implementation and experiments demonstrate the usefulness of the technique.
IRSep 9, 2014
Automatic Detection of Webpages that Share the Same Web TemplateJulián Alarte, David Insa, Josep Silva et al.
Template extraction is the process of isolating the template of a given webpage. It is widely used in several disciplines, including webpages development, content extraction, block detection, and webpages indexing. One of the main goals of template extraction is identifying a set of webpages with the same template without having to load and analyze too many webpages prior to identifying the template. This work introduces a new technique to automatically discover a reduced set of webpages in a website that implement the template. This set is computed with an hyperlink analysis that computes a very small set with a high level of confidence.
IROct 23, 2012
Using the DOM Tree for Content ExtractionSergio López, Josep Silva, David Insa
The main information of a webpage is usually mixed between menus, advertisements, panels, and other not necessarily related information; and it is often difficult to automatically isolate this information. This is precisely the objective of content extraction, a research area of widely interest due to its many applications. Content extraction is useful not only for the final human user, but it is also frequently used as a preprocessing stage of different systems that need to extract the main content in a web document to avoid the treatment and processing of other useless information. Other interesting application where content extraction is particularly used is displaying webpages in small screens such as mobile phones or PDAs. In this work we present a new technique for content extraction that uses the DOM tree of the webpage to analyze the hierarchical relations of the elements in the webpage. Thanks to this information, the technique achieves a considerable recall and precision. Using the DOM structure for content extraction gives us the benefits of other approaches based on the syntax of the webpage (such as characters, words and tags), but it also gives us a very precise information regarding the related components in a block, thus, producing very cohesive blocks.