LGNov 26, 2023Code
Hessian Aware Low-Rank Perturbation for Order-Robust Continual LearningJiaqi Li, Yuanhao Lai, Rui Wang et al.
Continual learning aims to learn a series of tasks sequentially without forgetting the knowledge acquired from the previous ones. In this work, we propose the Hessian Aware Low-Rank Perturbation algorithm for continual learning. By modeling the parameter transitions along the sequential tasks with the weight matrix transformation, we propose to apply the low-rank approximation on the task-adaptive parameters in each layer of the neural networks. Specifically, we theoretically demonstrate the quantitative relationship between the Hessian and the proposed low-rank approximation. The approximation ranks are then globally determined according to the marginal increment of the empirical loss estimated by the layer-specific gradient and low-rank approximation error. Furthermore, we control the model capacity by pruning less important parameters to diminish the parameter growth. We conduct extensive experiments on various benchmarks, including a dataset with large-scale tasks, and compare our method against some recent state-of-the-art methods to demonstrate the effectiveness and scalability of our proposed method. Empirical results show that our method performs better on different benchmarks, especially in achieving task order robustness and handling the forgetting issue. The source code is at https://github.com/lijiaqi/HALRP.
LGMay 2, 2024Code
Physics-Informed Neural Networks: Minimizing Residual Loss with Wide Networks and Effective ActivationsNima Hosseini Dashtbayaz, Ghazal Farhani, Boyu Wang et al.
The residual loss in Physics-Informed Neural Networks (PINNs) alters the simple recursive relation of layers in a feed-forward neural network by applying a differential operator, resulting in a loss landscape that is inherently different from those of common supervised problems. Therefore, relying on the existing theory leads to unjustified design choices and suboptimal performance. In this work, we analyze the residual loss by studying its characteristics at critical points to find the conditions that result in effective training of PINNs. Specifically, we first show that under certain conditions, the residual loss of PINNs can be globally minimized by a wide neural network. Furthermore, our analysis also reveals that an activation function with well-behaved high-order derivatives plays a crucial role in minimizing the residual loss. In particular, to solve a $k$-th order PDE, the $k$-th derivative of the activation function should be bijective. The established theory paves the way for designing and choosing effective activation functions for PINNs and explains why periodic activations have shown promising performance in certain cases. Finally, we verify our findings by conducting a set of experiments on several PDEs. Our code is publicly available at https://github.com/nimahsn/pinns_tf2.
CLMay 6, 2021Code
Hone as You Read: A Practical Type of Interactive SummarizationTanner Bohn, Charles X. Ling
We present HARE, a new task where reader feedback is used to optimize document summaries for personal interest during the normal flow of reading. This task is related to interactive summarization, where personalized summaries are produced following a long feedback stage where users may read the same sentences many times. However, this process severely interrupts the flow of reading, making it impractical for leisurely reading. We propose to gather minimally-invasive feedback during the reading process to adapt to user interests and augment the document in real-time. Building off of recent advances in unsupervised summarization evaluation, we propose a suitable metric for this task and use it to evaluate a variety of approaches. Our approaches range from simple heuristics to preference-learning and their analysis provides insight into this important task. Human evaluation additionally supports the practicality of HARE. The code to reproduce this work is available at https://github.com/tannerbohn/HoneAsYouRead.
CLMay 27, 2020Code
Catching Attention with Automatic Pull Quote SelectionTanner Bohn, Charles X. Ling
To advance understanding on how to engage readers, we advocate the novel task of automatic pull quote selection. Pull quotes are a component of articles specifically designed to catch the attention of readers with spans of text selected from the article and given more salient presentation. This task differs from related tasks such as summarization and clickbait identification by several aspects. We establish a spectrum of baseline approaches to the task, ranging from handcrafted features to a neural mixture-of-experts to cross-task models. By examining the contributions of individual features and embedding dimensions from these models, we uncover unexpected properties of pull quotes to help answer the important question of what engages readers. Human evaluation also supports the uniqueness of this task and the suitability of our selection models. The benefits of exploring this problem further are clear: pull quotes increase enjoyment and readability, shape reader perceptions, and facilitate learning. Code to reproduce this work is available at https://github.com/tannerbohn/AutomaticPullQuoteSelection.
LGFeb 12, 2024
Generalizing across Temporal Domains with Koopman OperatorsQiuhao Zeng, Wei Wang, Fan Zhou et al.
In the field of domain generalization, the task of constructing a predictive model capable of generalizing to a target domain without access to target data remains challenging. This problem becomes further complicated when considering evolving dynamics between domains. While various approaches have been proposed to address this issue, a comprehensive understanding of the underlying generalization theory is still lacking. In this study, we contribute novel theoretic results that aligning conditional distribution leads to the reduction of generalization bounds. Our analysis serves as a key motivation for solving the Temporal Domain Generalization (TDG) problem through the application of Koopman Neural Operators, resulting in Temporal Koopman Networks (TKNets). By employing Koopman Operators, we effectively address the time-evolving distributions encountered in TDG using the principles of Koopman theory, where measurement functions are sought to establish linear transition relations between evolving domains. Through empirical evaluations conducted on synthetic and real-world datasets, we validate the effectiveness of our proposed approach.
LGDec 16, 2024
Leveraging Group Classification with Descending Soft Labeling for Deep Imbalanced RegressionRuizhi Pu, Gezheng Xu, Ruiyi Fang et al.
Deep imbalanced regression (DIR), where the target values have a highly skewed distribution and are also continuous, is an intriguing yet under-explored problem in machine learning. While recent works have already shown that incorporating various classification-based regularizers can produce enhanced outcomes, the role of classification remains elusive in DIR. Moreover, such regularizers (e.g., contrastive penalties) merely focus on learning discriminative features of data, which inevitably results in ignorance of either continuity or similarity across the data. To address these issues, we first bridge the connection between the objectives of DIR and classification from a Bayesian perspective. Consequently, this motivates us to decompose the objective of DIR into a combination of classification and regression tasks, which naturally guides us toward a divide-and-conquer manner to solve the DIR problem. Specifically, by aggregating the data at nearby labels into the same groups, we introduce an ordinal group-aware contrastive learning loss along with a multi-experts regressor to tackle the different groups of data thereby maintaining the data continuity. Meanwhile, considering the similarity between the groups, we also propose a symmetric descending soft labeling strategy to exploit the intrinsic similarity across the data, which allows classification to facilitate regression more effectively. Extensive experiments on real-world datasets also validate the effectiveness of our method.
AIMay 1, 2021
A Deep Learning Framework for Lifelong Machine LearningCharles X. Ling, Tanner Bohn
Humans can learn a variety of concepts and skills incrementally over the course of their lives while exhibiting many desirable properties, such as continual learning without forgetting, forward transfer and backward transfer of knowledge, and learning a new concept or task with only a few examples. Several lines of machine learning research, such as lifelong machine learning, few-shot learning, and transfer learning attempt to capture these properties. However, most previous approaches can only demonstrate subsets of these properties, often by different complex mechanisms. In this work, we propose a simple yet powerful unified deep learning framework that supports almost all of these properties and approaches through one central mechanism. Experiments on toy examples support our claims. We also draw connections between many peculiarities of human learning (such as memory loss and "rain man") and our framework. As academics, we often lack resources required to build and train, deep neural networks with billions of parameters on hundreds of TPUs. Thus, while our framework is still conceptual, and our experiment results are surely not SOTA, we hope that this unified lifelong learning framework inspires new work towards large-scale experiments and understanding human learning in general. This paper is summarized in two short YouTube videos: https://youtu.be/gCuUyGETbTU (part 1) and https://youtu.be/XsaGI01b-1o (part 2).
LGNov 21, 2019
A Conceptual Framework for Lifelong LearningCharles X. Ling, Tanner Bohn
Humans can learn a variety of concepts and skills incrementally over the course of their lives while exhibiting many desirable properties, such as continual learning without forgetting, forward transfer and backward transfer of knowledge, and learning a new concept or task with only a few examples. Several lines of machine learning research, such as lifelong learning, few-shot learning, and transfer learning, attempt to capture these properties. However, most previous approaches can only demonstrate subsets of these properties, often by different complex mechanisms. In this work, we propose a simple yet powerful unified framework that supports almost all of these properties and approaches through one central mechanism. We also draw connections between many peculiarities of human learning (such as memory loss and "rain man") and our framework. While we do not present any state-of-the-art results, we hope that this conceptual framework provides a novel perspective on existing work and proposes many new research directions.
LGOct 4, 2019
Few-Shot Abstract Visual Reasoning With Spectral FeaturesTanner Bohn, Yining Hu, Charles X. Ling
We present an image preprocessing technique capable of improving the performance of few-shot classifiers on abstract visual reasoning tasks. Many visual reasoning tasks with abstract features are easy for humans to learn with few examples but very difficult for computer vision approaches with the same number of samples, despite the ability for deep learning models to learn abstract features. Same-different (SD) problems represent a type of visual reasoning task requiring knowledge of pattern repetition within individual images, and modern computer vision approaches have largely faltered on these classification problems, even when provided with vast amounts of training data. We propose a simple method for solving these problems based on the insight that removing peaks from the amplitude spectrum of an image is capable of emphasizing the unique parts of the image. When combined with several classifiers, our method performs well on the SD SVRT tasks with few-shot learning, improving upon the best comparable results on all tasks, with average absolute accuracy increases nearly 40% for some classifiers. In particular, we find that combining Relational Networks with this image preprocessing approach improves their performance from chance-level to over 90% accuracy on several SD tasks.
CLApr 22, 2018
Learning Sentence Embeddings for Coherence Modelling and BeyondTanner Bohn, Yining Hu, Jinhang Zhang et al.
We present a novel and effective technique for performing text coherence tasks while facilitating deeper insights into the data. Despite obtaining ever-increasing task performance, modern deep-learning approaches to NLP tasks often only provide users with the final network decision and no additional understanding of the data. In this work, we show that a new type of sentence embedding learned through self-supervision can be applied effectively to text coherence tasks while serving as a window through which deeper understanding of the data can be obtained. To produce these sentence embeddings, we train a recurrent neural network to take individual sentences and predict their location in a document in the form of a distribution over locations. We demonstrate that these embeddings, combined with simple visual heuristics, can be used to achieve performance competitive with state-of-the-art on multiple text coherence tasks, outperforming more complex and specialized approaches. Additionally, we demonstrate that these embeddings can provide insights useful to writers for improving writing quality and informing document structuring, and assisting readers in summarizing and locating information.
CVApr 18, 2018
Pelee: A Real-Time Object Detection System on Mobile DevicesRobert J. Wang, Xiang Li, Charles X. Ling
An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.
LGJan 28, 2013
Clustering-Based Matrix FactorizationNima Mirbakhsh, Charles X. Ling
Recommender systems are emerging technologies that nowadays can be found in many applications such as Amazon, Netflix, and so on. These systems help users to find relevant information, recommendations, and their preferred items. Slightly improvement of the accuracy of these recommenders can highly affect the quality of recommendations. Matrix Factorization is a popular method in Recommendation Systems showing promising results in accuracy and complexity. In this paper we propose an extension of matrix factorization which adds general neighborhood information on the recommendation model. Users and items are clustered into different categories to see how these categories share preferences. We then employ these shared interests of categories in a fusion by Biased Matrix Factorization to achieve more accurate recommendations. This is a complement for the current neighborhood aware matrix factorization models which rely on using direct neighborhood information of users and items. The proposed model is tested on two well-known recommendation system datasets: Movielens100k and Netflix. Our experiment shows applying the general latent features of categories into factorized recommender models improves the accuracy of recommendations. The current neighborhood-aware models need a great number of neighbors to acheive good accuracies. To the best of our knowledge, the proposed model is better than or comparable with the current neighborhood-aware models when they consider fewer number of neighbors.