SPAug 24, 2023
Fall Detection using Knowledge Distillation Based Long short-term memory for Offline Embedded and Low Power DevicesHannah Zhou, Allison Chen, Celine Buer et al.
This paper presents a cost-effective, low-power approach to unintentional fall detection using knowledge distillation-based LSTM (Long Short-Term Memory) models to significantly improve accuracy. With a primary focus on analyzing time-series data collected from various sensors, the solution offers real-time detection capabilities, ensuring prompt and reliable identification of falls. The authors investigate fall detection models that are based on different sensors, comparing their accuracy rates and performance. Furthermore, they employ the technique of knowledge distillation to enhance the models' precision, resulting in refined accurate configurations that consume lower power. As a result, this proposed solution presents a compelling avenue for the development of energy-efficient fall detection systems for future advancements in this critical domain.
CVMar 30, 2022Code
Monitored Distillation for Positive Congruent Depth CompletionTian Yu Liu, Parth Agrawal, Allison Chen et al.
We propose a method to infer a dense depth map from a single image, its calibration, and the associated sparse point cloud. In order to leverage existing models (teachers) that produce putative depth maps, we propose an adaptive knowledge distillation approach that yields a positive congruent training process, wherein a student model avoids learning the error modes of the teachers. In the absence of ground truth for model selection and training, our method, termed Monitored Distillation, allows a student to exploit a blind ensemble of teachers by selectively learning from predictions that best minimize the reconstruction error for a given image. Monitored Distillation yields a distilled depth map and a confidence map, or ``monitor'', for how well a prediction from a particular teacher fits the observed image. The monitor adaptively weights the distilled depth where if all of the teachers exhibit high residuals, the standard unsupervised image reconstruction loss takes over as the supervisory signal. On indoor scenes (VOID), we outperform blind ensembling baselines by 17.53% and unsupervised methods by 24.25%; we boast a 79% model size reduction while maintaining comparable performance to the best supervised method. For outdoors (KITTI), we tie for 5th overall on the benchmark despite not using ground truth. Code available at: https://github.com/alexklwong/mondi-python.
IVSep 18, 2021Code
Small Lesion Segmentation in Brain MRIs with Subpixel EmbeddingAlex Wong, Allison Chen, Yangchao Wu et al.
We present a method to segment MRI scans of the human brain into ischemic stroke lesion and normal tissues. We propose a neural network architecture in the form of a standard encoder-decoder where predictions are guided by a spatial expansion embedding network. Our embedding network learns features that can resolve detailed structures in the brain without the need for high-resolution training images, which are often unavailable and expensive to acquire. Alternatively, the encoder-decoder learns global structures by means of striding and max pooling. Our embedding network complements the encoder-decoder architecture by guiding the decoder with fine-grained details lost to spatial downsampling during the encoder stage. Unlike previous works, our decoder outputs at 2 times the input resolution, where a single pixel in the input resolution is predicted by four neighboring subpixels in our output. To obtain the output at the original scale, we propose a learnable downsampler (as opposed to hand-crafted ones e.g. bilinear) that combines subpixel predictions. Our approach improves the baseline architecture by approximately 11.7% and achieves the state of the art on the ATLAS public benchmark dataset with a smaller memory footprint and faster runtime than the best competing method. Our source code has been made available at: https://github.com/alexklwong/subpixel-embedding-segmentation.
CYMar 30
Using Games to Learn How Large Language Models WorkAllison Chen, Isabella Pu
While artificial intelligence (AI) technology is becoming increasingly popular, its underlying mechanisms tend to remain opaque to most people. To address this gap, the field of AI literacy aims to develop various resources to teach people how AI systems function. Here we contribute to this line of work by proposing two games that demonstrate principles behind how large language models (LLMs) work and use data. The first game, Learn Like an LLM, aims to convey that LLMs are trained to predict sequences of text based on a particular dataset. The second game, Tag-Team Text Generation, focuses on teaching that LLMs generate text one word at a time, using both predicted probabilities of the data and randomness. While the games proposed are still in early stages and would benefit greatly from further discussion, we hope they can contribute to using game-based learning to teach about complex AI systems like LLMs.
LGFeb 15, 2024
Analyzing the Roles of Language and Vision in Learning from Limited DataAllison Chen, Ilia Sucholutsky, Olga Russakovsky et al.
Does language help make sense of the visual world? How important is it to actually see the world rather than having it described with words? These basic questions about the nature of intelligence have been difficult to answer because we only had one example of an intelligent system -- humans -- and limited access to cases that isolated language or vision. However, the development of sophisticated Vision-Language Models (VLMs) by artificial intelligence researchers offers us new opportunities to explore the contributions that language and vision make to learning about the world. We ablate components from the cognitive architecture of these models to identify their contributions to learning new tasks from limited data. We find that a language model leveraging all components recovers a majority of a VLM's performance, despite its lack of visual input, and that language seems to allow this by providing access to prior knowledge and reasoning.