Valmir C. Barbosa

IR
3papers
14citations
Novelty32%
AI Score18

3 Papers

LGMay 17, 2022
Shape complexity in cluster analysis

Eduardo J. Aguilar, Valmir C. Barbosa

In cluster analysis, a common first step is to scale the data aiming to better partition them into clusters. Even though many different techniques have throughout many years been introduced to this end, it is probably fair to say that the workhorse in this preprocessing phase has been to divide the data by the standard deviation along each dimension. Like division by the standard deviation, the great majority of scaling techniques can be said to have roots in some sort of statistical take on the data. Here we explore the use of multidimensional shapes of data, aiming to obtain scaling factors for use prior to clustering by some method, like k-means, that makes explicit use of distances between samples. We borrow from the field of cosmology and related areas the recently introduced notion of shape complexity, which in the variant we use is a relatively simple, data-dependent nonlinear function that we show can be used to help with the determination of appropriate scaling factors. Focusing on what might be called "midrange" distances, we formulate a constrained nonlinear programming problem and use it to produce candidate scaling-factor sets that can be sifted on the basis of further considerations of the data, say via expert knowledge. We give results on some iconic data sets, highlighting the strengths and potential weaknesses of the new approach. These results are generally positive across all the data sets used.

IRNov 11, 2017
A distributed system for SearchOnMath based on the Microsoft BizSpark program

Ricardo M. Oliveira, Flavio B. Gonzaga, Valmir C. Barbosa et al.

Mathematical information retrieval is a relatively new area, so the first search tools capable of retrieving mathematical formulas began to appear only a few years ago. The proposals made public so far mostly implement searches on internal university databases, small sets of scientific papers, or Wikipedia in English. As such, only modest computing power is required. In this context, SearchOnMath has emerged as a pioneering tool in that it indexes several different databases and is compatible with several mathematical representation languages. Given the significantly greater number of formulas it handles, a distributed system becomes necessary to support it. The present study is based on the Microsoft BizSpark program and has aimed, for 38 different distributed-system scenarios, to pinpoint the one affording the best response times when searching the SearchOnMath databases for a collection of 120 formulas.

CGApr 27, 2012
The conduciveness of CA-rule graphs

Valmir C. Barbosa

Given two subsets A and B of nodes in a directed graph, the conduciveness of the graph from A to B is the ratio representing how many of the edges outgoing from nodes in A are incoming to nodes in B. When the graph's nodes stand for the possible solutions to certain problems of combinatorial optimization, choosing its edges appropriately has been shown to lead to conduciveness properties that provide useful insight into the performance of algorithms to solve those problems. Here we study the conduciveness of CA-rule graphs, that is, graphs whose node set is the set of all CA rules given a cell's number of possible states and neighborhood size. We consider several different edge sets interconnecting these nodes, both deterministic and random ones, and derive analytical expressions for the resulting graph's conduciveness toward rules having a fixed number of non-quiescent entries. We demonstrate that one of the random edge sets, characterized by allowing nodes to be sparsely interconnected across any Hamming distance between the corresponding rules, has the potential of providing reasonable conduciveness toward the desired rules. We conjecture that this may lie at the bottom of the best strategies known to date for discovering complex rules to solve specific problems, all of an evolutionary nature.