SYFeb 6, 2019
Guaranteed Safe Reachability-based Trajectory Design for a High-Fidelity Model of an Autonomous Passenger VehicleSean Vaskov, Utkarsh Sharma, Shreyas Kousik et al. · gatech
Trajectory planning is challenging for autonomous cars since they operate in unpredictable environments with limited sensor horizons. To incorporate new information as it is sensed, planning is done in a loop, with the next plan being computed as the previous plan is executed. The recent Reachability-based Trajectory Design (RTD) is a provably safe, real-time algorithm for trajectory planning. RTD consists of an offline component, where a Forward Reachable Set (FRS) is computed for the vehicle tracking parameterized trajectories; and an online part, where the FRS is used to map obstacles to constraints for trajectory optimization in a provably-safe way. In the literature, RTD has only been applied to small mobile robots. The contribution of this work is applying RTD to a passenger vehicle in CarSim, with a full powertrain model, chassis and tire dynamics. RTD produces safe trajectory plans with the vehicle traveling up to 15 m/s on a two-lane road, with randomly-placed obstacles only known to the vehicle when detected within its sensor horizon. RTD is compared with a Nonlinear Model-Predictive Control (NMPC) and a Rapidly-exploring Random Tree (RRT) approach. The experiment demonstrates RTD's ability to plan safe trajectories in real time, in contrast to the existing state-of-the-art approaches.
LGDec 22, 2025Code
When Less is More: 8-bit Quantization Improves Continual Learning in Large Language ModelsMichael S. Zhang, Rishi A. Ruia, Arnav Kewalram et al.
Catastrophic forgetting poses a fundamental challenge in continual learning, particularly when models are quantized for deployment efficiency. We systematically investigate the interplay between quantization precision (FP16, INT8, INT4) and replay buffer strategies in large language models, revealing unexpected dynamics. While FP16 achieves superior initial task performance (74.44% on NLU), we observe a striking inversion on subsequent tasks: quantized models outperform FP16 by 8-15% on final task forward accuracy, with INT4 achieving nearly double FP16's performance on Code generation (40% vs 20%). Critically, even minimal replay buffers (0.1%) dramatically improve retention - increasing NLU retention after Math training from 45% to 65% across all precision levels - with INT8 consistently achieving the optimal balance between learning plasticity and knowledge retention. We hypothesize that quantization-induced noise acts as implicit regularization, preventing the overfitting to new task gradients that plagues high-precision models. These findings challenge the conventional wisdom that higher precision is always preferable, suggesting instead that INT8 quantization offers both computational efficiency and superior continual learning dynamics. Our results provide practical guidelines for deploying compressed models in continual learning scenarios: small replay buffers (1-2%) suffice for NLU tasks, while Math and Code benefit from moderate buffers (5-10%), with quantized models requiring less replay than FP16 to achieve comparable retention. Code is available at https://github.com/Festyve/LessIsMore.
CVOct 29, 2024
Enhanced Survival Prediction in Head and Neck Cancer Using Convolutional Block Attention and Multimodal Data FusionAiman Farooq, Utkarsh Sharma, Deepak Mishra
Accurate survival prediction in head and neck cancer (HNC) is essential for guiding clinical decision-making and optimizing treatment strategies. Traditional models, such as Cox proportional hazards, have been widely used but are limited in their ability to handle complex multi-modal data. This paper proposes a deep learning-based approach leveraging CT and PET imaging modalities to predict survival outcomes in HNC patients. Our method integrates feature extraction with a Convolutional Block Attention Module (CBAM) and a multi-modal data fusion layer that combines imaging data to generate a compact feature representation. The final prediction is achieved through a fully parametric discrete-time survival model, allowing for flexible hazard functions that overcome the limitations of traditional survival models. We evaluated our approach using the HECKTOR and HEAD-NECK-RADIOMICS- HN1 datasets, demonstrating its superior performance compared to conconventional statistical and machine learning models. The results indicate that our deep learning model significantly improves survival prediction accuracy, offering a robust tool for personalized treatment planning in HNC
LGFeb 12, 2021
Explaining Neural Scaling LawsYasaman Bahri, Ethan Dyer, Jared Kaplan et al.
The population loss of trained deep neural networks often follows precise power-law scaling relations with either the size of the training dataset or the number of parameters in the network. We propose a theory that explains the origins of and connects these scaling laws. We identify variance-limited and resolution-limited scaling behavior for both dataset and model size, for a total of four scaling regimes. The variance-limited scaling follows simply from the existence of a well-behaved infinite data or infinite width limit, while the resolution-limited regime can be explained by positing that models are effectively resolving a smooth data manifold. In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality. We exhibit all four scaling regimes in the controlled setting of large random feature and pretrained models and test the predictions empirically on a range of standard architectures and datasets. We also observe several empirical relationships between datasets and scaling exponents under modifications of task and architecture aspect ratio. Our work provides a taxonomy for classifying different scaling regimes, underscores that there can be different mechanisms driving improvements in loss, and lends insight into the microscopic origins of and relationships between scaling exponents.
LGApr 22, 2020
A Neural Scaling Law from the Dimension of the Data ManifoldUtkarsh Sharma, Jared Kaplan
When data is plentiful, the loss achieved by well-trained neural networks scales as a power-law $L \propto N^{-α}$ in the number of network parameters $N$. This empirical scaling law holds for a wide variety of data modalities, and may persist over many orders of magnitude. The scaling law can be explained if neural models are effectively just performing regression on a data manifold of intrinsic dimension $d$. This simple theory predicts that the scaling exponents $α\approx 4/d$ for cross-entropy and mean-squared error losses. We confirm the theory by independently measuring the intrinsic dimension and the scaling exponents in a teacher/student framework, where we can study a variety of $d$ and $α$ by dialing the properties of random teacher networks. We also test the theory with CNN image classifiers on several datasets and with GPT-type language models.