LGJan 18, 2016
Improved Sampling Techniques for Learning an Imbalanced Data SetMaureen Lyndel C. Lauron, Jaderick P. Pabico
This paper presents the performance of a classifier built using the stackingC algorithm in nine different data sets. Each data set is generated using a sampling technique applied on the original imbalanced data set. Five new sampling techniques are proposed in this paper (i.e., SMOTERandRep, Lax Random Oversampling, Lax Random Undersampling, Combined-Lax Random Oversampling Undersampling, and Combined-Lax Random Undersampling Oversampling) that were based on the three sampling techniques (i.e., Random Undersampling, Random Oversampling, and Synthetic Minority Oversampling Technique) usually used as solutions in imbalance learning. The metrics used to evaluate the classifier's performance were F-measure and G-mean. F-measure determines the performance of the classifier for every class, while G-mean measures the overall performance of the classifier. The results using F-measure showed that for the data without a sampling technique, the classifier's performance is good only for the majority class. It also showed that among the eight sampling techniques, RU and LRU have the worst performance while other techniques (i.e., RO, C-LRUO and C-LROU) performed well only on some classes. The best performing techniques in all data sets were SMOTE, SMOTERandRep, and LRO having the lowest F-measure values between 0.5 and 0.65. The results using G-mean showed that the oversampling technique that attained the highest G-mean value is LRO (0.86), next is C-LROU (0.85), then SMOTE (0.84) and finally is SMOTERandRep (0.83). Combining the result of the two metrics (F-measure and G-mean), only the three sampling techniques are considered as good performing (i.e., LRO, SMOTE, and SMOTERandRep).
CLAug 6, 2015
On Gobbledygook and Mood of the Philippine Senate: An Exploratory Study on the Readability and Sentiment of Selected Philippine Senators' MicropostsFatima M. Moncada, Jaderick P. Pabico
This paper presents the findings of a readability assessment and sentiment analysis of selected six Philippine senators' microposts over the popular Twitter microblog. Using the Simple Measure of Gobbledygook (SMOG), tweets of Senators Cayetano, Defensor-Santiago, Pangilinan, Marcos, Guingona, and Escudero were assessed. A sentiment analysis was also done to determine the polarity of the senators' respective microposts. Results showed that on the average, the six senators are tweeting at an eight to ten SMOG level. This means that, at least a sixth grader will be able to understand the senators' tweets. Moreover, their tweets are mostly neutral and their sentiments vary in unison at some period of time. This could mean that a senator's tweet sentiment is affected by specific Philippine-based events.
NEAug 1, 2015
The Interactive Effects of Operators and Parameters to GA Performance Under Different Problem SizesJaderick P. Pabico, Elizer A. Albacea
The complex effect of genetic algorithm's (GA) operators and parameters to its performance has been studied extensively by researchers in the past but none studied their interactive effects while the GA is under different problem sizes. In this paper, We present the use of experimental model (1)~to investigate whether the genetic operators and their parameters interact to affect the offline performance of GA, (2)~to find what combination of genetic operators and parameter settings will provide the optimum performance for GA, and (3)~to investigate whether these operator-parameter combination is dependent on the problem size. We designed a GA to optimize a family of traveling salesman problems (TSP), with their optimal solutions known for convenient benchmarking. Our GA was set to use different algorithms in simulating selection ($Ω_s$), different algorithms ($Ω_c$) and parameters ($p_c$) in simulating crossover, and different parameters ($p_m$) in simulating mutation. We used several $n$-city TSPs ($n=\{5, 7, 10, 100, 1000\}$) to represent the different problem sizes (i.e., size of the resulting search space as represented by GA schemata). Using analysis of variance of 3-factor factorial experiments, we found out that GA performance is affected by $Ω_s$ at small problem size (5-city TSP) where the algorithm Partially Matched Crossover significantly outperforms Cycle Crossover at $95\%$ confidence level.
CVJul 26, 2015
Capturing the Dynamics of Pedestrian Traffic Using a Machine Vision SystemLouie Vincent A. Ngoho, Jaderick P. Pabico
We developed a machine vision system to automatically capture the dynamics of pedestrians under four different traffic scenarios. By considering the overhead view of each pedestrian as a digital object, the system processes the image sequences to track the pedestrians. Considering the perspective effect of the camera lens and the projected area of the hallway at the top-view scene, the distance of each tracked object from its original position to its current position is approximated every video frame. Using the approximated distance and the video frame rate (30 frames per second), the respective velocity and acceleration of each tracked object are later derived. The quantified motion characteristics of the pedestrians are displayed by the system through 2-dimensional graphs of the kinematics of motion. The system also outputs video images of the pedestrians with superimposed markers for tracking. These visual markers were used to visually describe and quantify the behavior of the pedestrians under different traffic scenarios.
NEJul 26, 2015
A Neural Prototype for a Virtual Chemical SpectrophotometerJaderick P. Pabico, Jose Rene L. Micor, Elmer Rico E. Mojica
A virtual chemical spectrophotometer for the simultaneous analysis of nickel (Ni) and cobalt (Co) was developed based on an artificial neural network (ANN). The developed ANN correlates the respective concentrations of Co and Ni given the absorbance profile of a Co-Ni mixture based on the Beer's Law. The virtual chemical spectrometer was trained using a 3-layer jump connection neural network model (NNM) with 126 input nodes corresponding to the 126 absorbance readings from 350 nm to 600 nm, 70 nodes in the hidden layer using a logistic activation function, and 2 nodes in the output layer with a logistic function. Test result shows that the NNM has correlation coefficients of 0.9953 and 0.9922 when predicting [Co] and [Ni], respectively. We observed, however, that the NNM has a duality property and that there exists a real-world practical application in solving the dual problem: Predict the Co-Ni mixture's absorbance profile given [Co] and [Ni]. It turns out that the dual problem is much harder to solve because the intended output has a much bigger cardinality than that of the input. Thus, we trained the dual ANN, a 3-layer jump connection nets with 2 input nodes corresponding to [Co] and [Ni], 70-logistic-activated nodes in the hidden layer, and 126 output nodes corresponding to the 126 absorbance readings from 250 nm to 600 nm. Test result shows that the dual NNM has correlation coefficients that range from 0.9050 through 0.9980 at 356 nm through 578 nm with the maximum coefficient observed at 480 nm. This means that the dual ANN can be used to predict the absorbance profile given the respective Co-Ni concentrations which can be of importance in creating academic models for a virtual chemical spectrophotometer.
CYJul 22, 2015
Towards Input Device Satisfaction Through Hand AnthropometryKatrina Joy H. Magno, Jaderick P. Pabico
We collected the hand anthropometric data of 91 respondents to come up with a Filipino-based measurement to determine the suitability of an input device for a digital equipment, the standard PC keyboard. For correlation purposes, we also collected other relevant information like age, height, province of origin, and gender, among others. We computed the percentiles for each finger to classify various finger dimensions and identify length-specific anthropometric cut-points. We compared the percentiles of each finger dimension against the actual length of the longest key combinations when correct finger placement is used for typing, to determine whether the standard PC keyboard is fit for use by our sampled population. Our analysis shows that the members of the population with hand dimensions at extended position below 75th percentile and at 99th percentile are the ones who would most likely not reach the longest key combination for the left and the right hands, respectively. Using machine vision and image processing techniques, we automated the anthropometric process and compared the accuracy of its measurements to that of manual process'. We compared the measurement generated by our automated anthropometric process with the measurements using the manual one and we found out that they have a very minimal absolute difference. The data collected from this study could be used in other studies such as determining a good design for mobile and other handheld devices, or input devices other than keyboard. The automated method that we developed could be used to easily measure hand dimensions given a digital image of the hand and could be extended for measuring the entire human body for various other applications.
SDJul 20, 2015
Automatic Identification of Animal Breeds and Species Using Bioacoustics and Artificial Neural NetworksJaderick P. Pabico, Anne Muriel V. Gonzales, Mariann Jocel S. Villanueva et al.
In this research endeavor, it was hypothesized that the sound produced by animals during their vocalizations can be used as identifiers of the animal breed or species even if they sound the same to unaided human ear. To test this hypothesis, three artificial neural networks (ANNs) were developed using bioacoustics properties as inputs for the respective automatic identification of 13 bird species, eight dog breeds, and 11 frog species. Recorded vocalizations of these animals were collected and processed using several known signal processing techniques to convert the respective sounds into computable bioacoustics values. The converted values of the vocalizations, together with the breed or species identifications, were used to train the ANNs following a ten-fold cross validation technique. Tests show that the respective ANNs can correctly identify 71.43\% of the birds, 94.44\% of the dogs, and 90.91\% of the frogs. This result show that bioacoustics and ANN can be used to automatically determine animal breeds and species, which together could be a promising automated tool for animal identification, biodiversity determination, animal conservation, and other animal welfare efforts.
CVJul 9, 2015
Neural Network Classifiers for Natural Food ProductsJaderick P. Pabico, Alona V. De Grano, Alan L. Zarsuela
Two cheap, off-the-shelf machine vision systems (MVS), each using an artificial neural network (ANN) as classifier, were developed, improved and evaluated to automate the classification of tomato ripeness and acceptability of eggs, respectively. Six thousand color images of human-graded tomatoes and 750 images of human-graded eggs were used to train, test, and validate several multi-layered ANNs. The ANNs output the corresponding grade of the produce by accepting as inputs the spectral patterns of the background-less image. In both MVS, the ANN with the highest validation rate was automatically chosen by a heuristic and its performance compared to that of the human graders'. Using the validation set, the MVS correctly graded 97.00\% and 86.00\% of the tomato and egg data, respectively. The human grader's, however, were measured to perform at a daily average of 92.65\% and 72.67\% for tomato and egg grading, respectively. This results show that an ANN-based MVS is a potential alternative to manual grading.
ETJun 30, 2015
Artificial Catalytic Reactions in 2D for Combinatorial OptimizationJaderick P. Pabico
Presented in this paper is a derivation of a 2D catalytic reaction-based model to solve combinatorial optimization problems (COPs). The simulated catalytic reactions, a computational metaphor, occurs in an artificial chemical reactor that finds near-optimal solutions to COPs. The artificial environment is governed by catalytic reactions that can alter the structure of artificial molecular elements. Altering the molecular structure means finding new solutions to the COP. The molecular mass of the elements was considered as a measure of goodness of fit of the solutions. Several data structures and matrices were used to record the directions and locations of the molecules. These provided the model the 2D topology. The Traveling Salesperson Problem (TSP) was used as a working example. The performance of the model in finding a solution for the TSP was compared to the performance of a topology-less model. Experimental results show that the 2D model performs better than the topology-less one.
ETJun 28, 2015
Simultaneously Solving Computational Problems Using an Artificial Chemical ReactorJaderick P. Pabico
This paper is centered on using chemical reaction as a computational metaphor for simultaneously solving problems. An artificial chemical reactor that can simultaneously solve instances of three unrelated problems was created. The reactor is a distributed stochastic algorithm that simulates a chemical universe wherein the molecular species are being represented either by a human genomic contig panel, a Hamiltonian cycle, or an aircraft landing schedule. The chemical universe is governed by reactions that can alter genomic sequences, re-order Hamiltonian cycles, or reschedule an aircraft landing program. Molecular masses were considered as measures of goodness of solutions, and represented radiation hybrid (RH) vector similarities, costs of Hamiltonian cycles, and penalty costs for landing an aircraft before and after target landing times. This method, tested by solving in tandem with deterministic algorithms, has been shown to find quality solutions in finding the minima RH vector similarities of genomic data, minima costs in Hamiltonian cycles of the traveling salesman, and minima costs for landing aircrafts before or after target landing times.
CVJun 24, 2015
Unshredding of Shredded Documents: Computational Framework and ImplementationLei Kristoffer R. Lactuan, Jaderick P. Pabico
A shredded document $D$ is a document whose pages have been cut into strips for the purpose of destroying private, confidential, or sensitive information $I$ contained in $D$. Shredding has become a standard means of government organizations, businesses, and private individuals to destroy archival records that have been officially classified for disposal. It can also be used to destroy documentary evidence of wrongdoings by entities who are trying to hide $I$. In this paper, we present an optimal $O((n\times m)^2)$ algorithm $A$ that reconstructs an $n$-page $D$, where each page $p$ is shredded into $m$ strips. We also present the efficacy of $A$ in reconstructing three document types: hand-written, machine typed-set, and images.