LGOct 2, 2025
C2AL: Cohort-Contrastive Auxiliary Learning for Large-scale Recommendation SystemsMertcan Cokbas, Ziteng Liu, Zeyi Tao et al.
Training large-scale recommendation models under a single global objective implicitly assumes homogeneity across user populations. However, real-world data are composites of heterogeneous cohorts with distinct conditional distributions. As models increase in scale and complexity and as more data is used for training, they become dominated by central distribution patterns, neglecting head and tail regions. This imbalance limits the model's learning ability and can result in inactive attention weights or dead neurons. In this paper, we reveal how the attention mechanism can play a key role in factorization machines for shared embedding selection, and propose to address this challenge by analyzing the substructures in the dataset and exposing those with strong distributional contrast through auxiliary learning. Unlike previous research, which heuristically applies weighted labels or multi-task heads to mitigate such biases, we leverage partially conflicting auxiliary labels to regularize the shared representation. This approach customizes the learning process of attention layers to preserve mutual information with minority cohorts while improving global performance. We evaluated C2AL on massive production datasets with billions of data points each for six SOTA models. Experiments show that the factorization machine is able to capture fine-grained user-ad interactions using the proposed method, achieving up to a 0.16% reduction in normalized entropy overall and delivering gains exceeding 0.30% on targeted minority cohorts.
IRJul 25, 2014
Fast Spammer Detection Using Structural RankSeungyeon Kim, Haesun Park, Guy Lebanon
Comments for a product or a news article are rapidly growing and became a medium of measuring quality products or services. Consequently, spammers have been emerged in this area to bias them toward their favor. In this paper, we propose an efficient spammer detection method using structural rank of author specific term-document matrices. The use of structural rank was found effective and far faster than similar methods.
CLJan 15, 2013
The Manifold of Human EmotionsSeungyeon Kim, Fuxin Li, Guy Lebanon et al.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper, we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities.
LGJan 15, 2013
Matrix Approximation under Local Low-Rank AssumptionJoonseok Lee, Seungyeon Kim, Guy Lebanon et al.
Matrix approximation is a common tool in machine learning for building accurate prediction models for recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is only locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local low-rank modeling. Our experiments show improvements in prediction accuracy in recommendation tasks.
LGOct 19, 2012
Learning Riemannian MetricsGuy Lebanon
We propose a solution to the problem of estimating a Riemannian metric associated with a given differentiable manifold. The metric learning problem is based on minimizing the relative volume of a given set of points. We derive the details for a family of metrics on the multinomial simplex. The resulting metric has applications in text classification and bears some similarity to TFIDF representation of text documents.
MLOct 3, 2012
Smooth Sparse Coding via Marginal Regression for Learning Sparse RepresentationsKrishnakumar Balasubramanian, Kai Yu, Guy Lebanon
We propose and analyze a novel framework for learning sparse representations, based on two statistical techniques: kernel smoothing and marginal regression. The proposed approach provides a flexible framework for incorporating feature similarity or temporal information present in data sets, via non-parametric kernel smoothing. We provide generalization bounds for dictionary learning using smooth sparse coding and show how the sample complexity depends on the L1 norm of kernel function used. Furthermore, we propose using marginal regression for obtaining sparse codes, which significantly improves the speed and allows one to scale to large dictionary sizes easily. We demonstrate the advantages of the proposed approach, both in terms of accuracy and speed by extensive experimentation on several real data sets. In addition, we demonstrate how the proposed approach could be used for improving semi-supervised sparse coding.
LGJul 11, 2012
An Extended Cencov-Campbell Characterization of Conditional Information GeometryGuy Lebanon
We formulate and prove an axiomatic characterization of conditional information geometry, for both the normalized and the nonnormalized cases. This characterization extends the axiomatic derivation of the Fisher geometry by Cencov and Campbell to the cone of positive conditional models, and as a special case to the manifold of conditional distributions. Due to the close connection between the conditional I-divergence and the product Fisher information metric the characterization provides a new axiomatic interpretation of the primal problems underlying logistic regression and AdaBoost.
LGJun 27, 2012
The Landmark Selection Method for Multiple Output PredictionKrishnakumar Balasubramanian, Guy Lebanon
Conditional modeling x \to y is a central problem in machine learning. A substantial research effort is devoted to such modeling when x is high dimensional. We consider, instead, the case of a high dimensional y, where x is either low dimensional or high dimensional. Our approach is based on selecting a small subset y_L of the dimensions of y, and proceed by modeling (i) x \to y_L and (ii) y_L \to y. Composing these two models, we obtain a conditional model x \to y that possesses convenient statistical properties. Multi-label classification and multivariate regression experiments on several datasets show that this model outperforms the one vs. all approach as well as several sophisticated multiple output prediction methods.
IRJun 27, 2012
Sequential Document Representations and Simplicial CurvesGuy Lebanon
The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present a continuous and differentiable sequential document representation that goes beyond the bag of words assumption, and yet is efficient and effective. This representation employs smooth curves in the multinomial simplex to account for sequential information. We discuss the representation and its geometric properties and demonstrate its applicability for the task of text classification.
LGJun 20, 2012
Statistical Translation, Heat Kernels and Expected DistancesJoshua Dillon, Yi Mao, Guy Lebanon et al.
High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.
LGJun 18, 2012
A Linear Approximation to the chi^2 Kernel with Geometric ConvergenceFuxin Li, Guy Lebanon, Cristian Sminchisescu
We propose a new analytical approximation to the $χ^2$ kernel that converges geometrically. The analytical approximation is derived with elementary methods and adapts to the input distribution for optimal convergence rate. Experiments show the new approximation leads to improved performance in image classification and semantic segmentation tasks using a random Fourier feature approximation of the $\exp-χ^2$ kernel. Besides, out-of-core principal component analysis (PCA) methods are introduced to reduce the dimensionality of the approximation and achieve better performance at the expense of only an additional constant factor to the time complexity. Moreover, when PCA is performed jointly on the training and unlabeled testing data, further performance improvements can be obtained. Experiments conducted on the PASCAL VOC 2010 segmentation and the ImageNet ILSVRC 2010 datasets show statistically significant improvements over alternative approximation methods.
HCMay 14, 2012
Cumulative Revision MapSeungyeon Kim, Joshua V. Dillon, Guy Lebanon
Unlike static documents, version-controlled documents are edited by one or more authors over a certain period of time. Examples include large scale computer code, papers authored by a team of scientists, and online discussion boards. Such collaborative revision process makes traditional document modeling and visualization techniques inappropriate. In this paper we propose a new visualization technique for version-controlled documents that reveals interesting authoring patterns in papers, computer code and Wikipedia articles. The revealed authoring patterns are useful for the readers, participants in the authoring process, and supervisors.
IRMay 14, 2012
A Comparative Study of Collaborative Filtering AlgorithmsJoonseok Lee, Mingxuan Sun, Guy Lebanon
Collaborative filtering is a rapidly advancing research area. Every year several new techniques are proposed and yet it is not clear which of the techniques work best and under what conditions. In this paper we conduct a study comparing several collaborative filtering techniques -- both classic and recent state-of-the-art -- in a variety of experimental contexts. Specifically, we report conclusions controlling for number of items, number of users, sparsity level, performance criteria, and computational complexity. Our conclusions identify what algorithms work well and in what conditions, and contribute to both industrial deployment collaborative filtering algorithms and to the research community.
LGMay 9, 2012
Domain Knowledge Uncertainty and Probabilistic Parameter ConstraintsYi Mao, Guy Lebanon
Incorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.
CLFeb 8, 2012
Beyond Sentiment: The Manifold of Human EmotionsSeungyeon Kim, Fuxin Li, Guy Lebanon et al.
Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.