M. Teixeira

2papers

2 Papers

5.0LGMay 14
Proposal and study of statistical features for string similarity computation and classification

E. O. Rodrigues, D. Casanova, M. Teixeira et al.

Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The proposed features are not sensitive to language related information. These are purely statistical and can be used in any context with any language or grammatical structure. Other statistical measures that are commonly employed in the field such as longest common subsequence, maximal consecutive longest common subsequence, mutual information and edit distances are evaluated and compared. In the first synthetic set of experiments, the COM and RLM features outperform the remaining state-of-the-art statistical features. In 3 out of 4 cases, the RLM and COM features were statistically more significant than the second best group based on distances (P-value < 0.001). When it comes to a real text plagiarism dataset, the RLM features obtained the best results.

SPOct 14, 2019
Simplification of the digital representation of the tent map through biased fixed point

M. Teixeira, N. P. Basilio, D. L. Firmo et al.

Chaotic systems have been investigated in several areas of engineering. In control theory, such systems have instigated the emergence of new techniques as well, have been used as a source of noise generation. The application of chaotic systems as pseudo-random numbers has also been widely employed in cryptography. One of the central aspects of these applications in high performance situations, such as those involving a large amount of data (Big Data), is the response of these systems in a short period of time. Despite the great advances in the design of chaotic systems in analog circuits, it is perceived less attention in the optimized design of these systems in the digital domain. In this work, the polarized fixed point representation is applied to reduce the number of digital elements. Using this approach, it was possible to significantly reduce the number of logic gates in the subtraction operation. When compared to other works in the literature, it has been viable to reduce by 50 \% the number of elements per bit of the digital representation of the tent map. The chaoticity was evidenced with the calculation of the Lyapunov exponent. Histogram, entropy and autocorrelation tests were used satisfactorily to evaluate the randomness of the represented system.