CLSep 29, 2023
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt EngineeringHan Zhou, Xingchen Wan, Lev Proleev et al. · cambridge
Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. In the few-shot setup, we further extend BC to allow it to learn the contextual bias from labeled data. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.
LGApr 8, 2022
Disability prediction in multiple sclerosis using performance outcome measures and demographic dataSubhrajit Roy, Diana Mincu, Lev Proleev et al.
Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.
SIJun 2, 2023
STUDY: Socially Aware Temporally Causal Decoder Recommender SystemsEltayeb Ahmed, Diana Mincu, Lauren Harrell et al.
Recommender systems are widely used to help people find items that are tailored to their interests. These interests are often influenced by social networks, making it important to use social network information effectively in recommender systems. This is especially true for demographic groups with interests that differ from the majority. This paper introduces STUDY, a Socially-aware Temporally caUsal Decoder recommender sYstem. STUDY introduces a new socially-aware recommender system architecture that is significantly more efficient to learn and train than existing methods. STUDY performs joint inference over socially connected groups in a single forward pass of a modified transformer decoder network. We demonstrate the benefits of STUDY in the recommendation of books for students who are dyslexic, or struggling readers. Dyslexic students often have difficulty engaging with reading material, making it critical to recommend books that are tailored to their interests. We worked with our non-profit partner Learning Ally to evaluate STUDY on a dataset of struggling readers. STUDY was able to generate recommendations that more accurately predicted student engagement, when compared with existing methods.
LGJul 6, 2022
Boosting the interpretability of clinical risk scores with intervention predictionsEric Loreaux, Ke Yu, Jonas Kemp et al.
Machine learning systems show significant promise for forecasting patient adverse events via risk scores. However, these risk scores implicitly encode assumptions about future interventions that the patient is likely to receive, based on the intervention policy present in the training data. Without this important context, predictions from such systems are less interpretable for clinicians. We propose a joint model of intervention policy and adverse event risk as a means to explicitly communicate the model's assumptions about future interventions. We develop such an intervention policy model on MIMIC-III, a real world de-identified ICU dataset, and discuss some use cases that highlight the utility of this approach. We show how combining typical risk scores, such as the likelihood of mortality, with future intervention probability scores leads to more interpretable clinical predictions.
LGFeb 15, 2023
Benchmarking Continuous Time Models for Predicting Multiple Sclerosis ProgressionAlexander Norcliffe, Lev Proleev, Diana Mincu et al.
Multiple sclerosis is a disease that affects the brain and spinal cord, it can lead to severe disability and has no known cure. The majority of prior work in machine learning for multiple sclerosis has been centered around using Magnetic Resonance Imaging scans or laboratory tests; these modalities are both expensive to acquire and can be unreliable. In a recent paper it was shown that disease progression can be predicted effectively using performance outcome measures and demographic data. In our work we build on this to investigate the modeling side, using continuous time models to predict progression. We benchmark four continuous time models using a publicly available multiple sclerosis dataset. We find that the best continuous model is often able to outperform the best benchmarked discrete time model. We also carry out an extensive ablation to discover the sources of performance gains, we find that standardizing existing features leads to a larger performance increase than interpolating missing features.
CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic CapabilitiesGheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu
In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.
SPMar 19, 2019Code
Machine Learning for removing EEG artifacts: Setting the benchmarkSubhrajit Roy
Electroencephalograms (EEG) are often contaminated by artifacts which make interpreting them more challenging for clinicians. Hence, automated artifact recognition systems have the potential to aid the clinical workflow. In this abstract, we share the first results on applying various machine learning algorithms to the recently released world's largest open-source artifact recognition dataset. We envision that these results will serve as a benchmark for researchers who might work with this dataset in future.
CRJun 10, 2024
Safety Alignment Should Be Made More Than Just a Few Tokens DeepXiangyu Qi, Ashwinee Panda, Kaifeng Lyu et al.
The safety alignment of current Large Language Models (LLMs) is vulnerable. Relatively simple attacks, or even benign fine-tuning, can jailbreak aligned models. We argue that many of these vulnerabilities are related to a shared underlying issue: safety alignment can take shortcuts, wherein the alignment adapts a model's generative distribution primarily over only its very first few output tokens. We refer to this issue as shallow safety alignment. In this paper, we present case studies to explain why shallow safety alignment can exist and provide evidence that current aligned LLMs are subject to this issue. We also show how these findings help explain multiple recently discovered vulnerabilities in LLMs, including the susceptibility to adversarial suffix attacks, prefilling attacks, decoding parameter attacks, and fine-tuning attacks. Importantly, we discuss how this consolidated notion of shallow safety alignment sheds light on promising research directions for mitigating these vulnerabilities. For instance, we show that deepening the safety alignment beyond just the first few tokens can often meaningfully improve robustness against some common exploits. Finally, we design a regularized finetuning objective that makes the safety alignment more persistent against fine-tuning attacks by constraining updates on initial tokens. Overall, we advocate that future safety alignment should be made more than just a few tokens deep.
CLDec 19, 2023
Gemini: A Family of Highly Capable Multimodal ModelsGemini Team, Rohan Anil, Sebastian Borgeaud et al.
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of the Gemini family in cross-modal reasoning and language understanding will enable a wide variety of use cases. We discuss our approach toward post-training and deploying Gemini models responsibly to users through services including Gemini, Gemini Advanced, Google AI Studio, and Cloud Vertex AI.
AIFeb 26, 2022
Healthsheet: Development of a Transparency Artifact for Health DatasetsNegar Rostamzadeh, Diana Mincu, Subhrajit Roy et al.
Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical issues surrounding the use of ML in healthcare stem from structural inequalities underlying the way we collect, use, and handle data. Developing guidelines to improve documentation practices regarding the creation, use, and maintenance of ML healthcare datasets is therefore of critical importance. In this work, we introduce Healthsheet, a contextualized adaptation of the original datasheet questionnaire ~\cite{gebru2018datasheets} for health-specific applications. Through a series of semi-structured interviews, we adapt the datasheets for healthcare data documentation. As part of the Healthsheet development process and to understand the obstacles researchers face in creating datasheets, we worked with three publicly-available healthcare datasets as our case studies, each with different types of structured data: Electronic health Records (EHR), clinical trial study data, and smartphone-based performance outcome measures. Our findings from the interviewee study and case studies show 1) that datasheets should be contextualized for healthcare, 2) that despite incentives to adopt accountability practices such as datasheets, there is a lack of consistency in the broader use of these practices 3) how the ML for health community views datasheets and particularly \textit{Healthsheets} as diagnostic tool to surface the limitations and strength of datasets and 4) the relative importance of different fields in the datasheet to healthcare concerns.
LGFeb 2, 2022
Diagnosing failures of fairness transfer across distribution shift in real-world medical settingsJessica Schrouff, Natalie Harris, Oluwasanmi Koyejo et al.
Diagnosing and mitigating changes in model fairness under distribution shift is an important component of the safe deployment of machine learning in healthcare settings. Importantly, the success of any mitigation strategy strongly depends on the structure of the shift. Despite this, there has been little discussion of how to empirically assess the structure of a distribution shift that one is encountering in practice. In this work, we adopt a causal framing to motivate conditional independence tests as a key tool for characterizing distribution shifts. Using our approach in two medical applications, we show that this knowledge can help diagnose failures of fairness transfer, including cases where real-world shifts are more complex than is often assumed in the literature. Based on these results, we discuss potential remedies at each step of the machine learning pipeline.
LGNov 30, 2021
A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021Fabian Falck, Yuyin Zhou, Emma Rocheteau et al.
A collection of the accepted abstracts for the Machine Learning for Health (ML4H) symposium 2021. This index is not complete, as some accepted abstracts chose to opt-out of inclusion.
LGNov 19, 2020
ML4H Abstract Track 2020Emily Alsentzer, Matthew B. A. McDermott, Fabian Falck et al.
A collection of the accepted abstracts for the Machine Learning for Health (ML4H) workshop at NeurIPS 2020. This index is not complete, as some accepted abstracts chose to opt-out of inclusion.
PLMay 24, 2019
Type-Driven Automated Learning with LaleMartin Hirzel, Kiran Kate, Avraham Shinnar et al.
Machine-learning automation tools, ranging from humble grid-search to hyperopt, auto-sklearn, and TPOT, help explore large search spaces of possible pipelines. Unfortunately, each of these tools has a different syntax for specifying its search space, leading to lack of portability, missed relevant points, and spurious points that are inconsistent with error checks and documentation of the searchable base components. This paper proposes using types (such as enum, float, or dictionary) both for checking the correctness of, and for automatically searching over, hyperparameters and pipeline configurations. Using types for both of these purposes guarantees consistency. We present Lale, an embedded language that resembles scikit learn but provides better automation, correctness checks, and portability. Lale extends the reach of existing automation tools across data modalities (tables, text, images, time-series) and programming languages (Python, Java, R). Thus, data scientists can leverage automation while remaining in control of their work.
LGMar 19, 2019
A semi-supervised deep learning algorithm for abnormal EEG identificationSubhrajit Roy, Kiran Kate, Martin Hirzel
Systems that can automatically analyze EEG signals can aid neurologists by reducing heavy workload and delays. However, such systems need to be first trained using a labeled dataset. While large corpuses of EEG data exist, a fraction of them are labeled. Hand-labeling data increases workload for the very neurologists we try to aid. This paper proposes a semi-supervised learning workflow that can not only extract meaningful information from large unlabeled EEG datasets but also make predictions with minimal supervision, using labeled datasets as small as 5 examples.
LGMar 8, 2019
SeizureNet: Multi-Spectral Deep Feature Learning for Seizure Type ClassificationUmar Asif, Subhrajit Roy, Jianbin Tang et al.
Automatic classification of epileptic seizure types in electroencephalograms (EEGs) data can enable more precise diagnosis and efficient management of the disease. This task is challenging due to factors such as low signal-to-noise ratios, signal artefacts, high variance in seizure semiology among epileptic patients, and limited availability of clinical data. To overcome these challenges, in this paper, we present SeizureNet, a deep learning framework which learns multi-spectral feature embeddings using an ensemble architecture for cross-patient seizure type classification. We used the recently released TUH EEG Seizure Corpus (V1.4.0 and V1.5.2) to evaluate the performance of SeizureNet. Experiments show that SeizureNet can reach a weighted F1 score of up to 0.94 for seizure-wise cross validation and 0.59 for patient-wise cross validation for scalp EEG based multi-class seizure type classification. We also show that the high-level feature embeddings learnt by SeizureNet considerably improve the accuracy of smaller networks through knowledge distillation for applications with low-memory constraints.
LGFeb 4, 2019
Seizure Type Classification using EEG signals and Machine Learning: Setting a benchmarkSubhrajit Roy, Umar Asif, Jianbin Tang et al.
Accurate classification of seizure types plays a crucial role in the treatment and disease management of epileptic patients. Epileptic seizure types not only impact the choice of drugs but also the range of activities a patient can safely engage in. With recent advances being made towards artificial intelligence enabled automatic seizure detection, the next frontier is the automatic classification of seizure types. On that note, in this paper, we explore the application of machine learning algorithms for multi-class seizure type classification. We used the recently released TUH EEG seizure corpus (V1.4.0 and V1.5.2) and conducted a thorough search space exploration to evaluate the performance of a combination of various pre-processing techniques, machine learning algorithms, and corresponding hyperparameters on this task. We show that our algorithms can reach a weighted $F1$ score of up to 0.901 for seizure-wise cross validation and 0.561 for patient-wise cross validation thereby setting a benchmark for scalp EEG based multi-class seizure type classification.
CVNov 19, 2018
Fast Efficient Object Detection Using Selective AttentionShivanthan Yohanandan, Andy Song, Adrian G. Dyer et al.
Retraction due to significant oversight
SPJan 30, 2018
ChronoNet: A Deep Recurrent Neural Network for Abnormal EEG IdentificationSubhrajit Roy, Isabell Kiral-Kornek, Stefan Harrer
Brain-related disorders such as epilepsy can be diagnosed by analyzing electroencephalograms (EEG). However, manual analysis of EEG data requires highly trained clinicians, and is a procedure that is known to have relatively low inter-rater agreement (IRA). Moreover, the volume of the data and the rate at which new data becomes available make manual interpretation a time-consuming, resource-hungry, and expensive process. In contrast, automated analysis of EEG data offers the potential to improve the quality of patient care by shortening the time to diagnosis and reducing manual error. In this paper, we focus on one of the first steps in interpreting an EEG session - identifying whether the brain activity is abnormal or normal. To solve this task, we propose a novel recurrent neural network (RNN) architecture termed ChronoNet which is inspired by recent developments from the field of image classification and designed to work efficiently with EEG data. ChronoNet is formed by stacking multiple 1D convolution layers followed by deep gated recurrent unit (GRU) layers where each 1D convolution layer uses multiple filters of exponentially varying lengths and the stacked GRU layers are densely connected in a feed-forward manner. We used the recently released TUH Abnormal EEG Corpus dataset for evaluating the performance of ChronoNet. Unlike previous studies using this dataset, ChronoNet directly takes time-series EEG as input and learns meaningful representations of brain activity patterns. ChronoNet outperforms the previously reported best results by 7.79% thereby setting a new benchmark for this dataset. Furthermore, we demonstrate the domain-independent nature of ChronoNet by successfully applying it to classify speech commands.
NEApr 19, 2016
An Online Structural Plasticity Rule for Generating Better ReservoirsSubhrajit Roy, Arindam Basu
In this article, a novel neuro-inspired low-resolution online unsupervised learning rule is proposed to train the reservoir or liquid of Liquid State Machine. The liquid is a sparsely interconnected huge recurrent network of spiking neurons. The proposed learning rule is inspired from structural plasticity and trains the liquid through formation and elimination of synaptic connections. Hence, the learning involves rewiring of the reservoir connections similar to structural plasticity observed in biological neural networks. The network connections can be stored as a connection matrix and updated in memory by using Address Event Representation (AER) protocols which are generally employed in neuromorphic systems. On investigating the 'pairwise separation property' we find that trained liquids provide 1.36 $\pm$ 0.18 times more inter-class separation while retaining similar intra-class separation as compared to random liquids. Moreover, analysis of the 'linear separation property' reveals that trained liquids are 2.05 $\pm$ 0.27 times better than random liquids. Furthermore, we show that our liquids are able to retain the 'generalization' ability and 'generality' of random liquids. A memory analysis shows that trained liquids have 83.67 $\pm$ 5.79 ms longer fading memory than random liquids which have shown 92.8 $\pm$ 5.03 ms fading memory for a particular type of spike train inputs. We also throw some light on the dynamics of the evolution of recurrent connections within the liquid. Moreover, compared to 'Separation Driven Synaptic Modification' - a recently proposed algorithm for iteratively refining reservoirs, our learning rule provides 9.30%, 15.21% and 12.52% more liquid separations and 2.8%, 9.1% and 7.9% better classification accuracies for four, eight and twelve class pattern recognition tasks respectively.
NEDec 4, 2015
An Online Unsupervised Structural Plasticity Algorithm for Spiking Neural NetworksSubhrajit Roy, Arindam Basu
In this article, we propose a novel Winner-Take-All (WTA) architecture employing neurons with nonlinear dendrites and an online unsupervised structural plasticity rule for training it. Further, to aid hardware implementations, our network employs only binary synapses. The proposed learning rule is inspired by spike time dependent plasticity (STDP) but differs for each dendrite based on its activation level. It trains the WTA network through formation and elimination of connections between inputs and synapses. To demonstrate the performance of the proposed network and learning rule, we employ it to solve two, four and six class classification of random Poisson spike time inputs. The results indicate that by proper tuning of the inhibitory time constant of the WTA, a trade-off between specificity and sensitivity of the network can be achieved. We use the inhibitory time constant to set the number of subpatterns per pattern we want to detect. We show that while the percentage of successful trials are 92%, 88% and 82% for two, four and six class classification when no pattern subdivisions are made, it increases to 100% when each pattern is subdivided into 5 or 10 subpatterns. However, the former scenario of no pattern subdivision is more jitter resilient than the later ones.
NEJun 17, 2015
Learning Spike time codes through Morphological Learning with Binary SynapsesSubhrajit Roy, Phyo Phyo San, Shaista Hussain et al.
In this paper, a neuron with nonlinear dendrites (NNLD) and binary synapses that is able to learn temporal features of spike input patterns is considered. Since binary synapses are considered, learning happens through formation and elimination of connections between the inputs and the dendritic branches to modify the structure or "morphology" of the NNLD. A morphological learning algorithm inspired by the 'Tempotron', i.e., a recently proposed temporal learning algorithm-is presented in this work. Unlike 'Tempotron', the proposed learning rule uses a technique to automatically adapt the NNLD threshold during training. Experimental results indicate that our NNLD with 1-bit synapses can obtain similar accuracy as a traditional Tempotron with 4-bit synapses in classifying single spike random latency and pair-wise synchrony patterns. Hence, the proposed method is better suited for robust hardware implementation in the presence of statistical variations. We also present results of applying this rule to real life spike classification problems from the field of tactile sensing.
ETNov 20, 2014
Liquid State Machine with Dendritically Enhanced Readout for Low-power, Neuromorphic VLSI ImplementationsSubhrajit Roy, Amitava Banerjee, Arindam Basu
In this paper, we describe a new neuro-inspired, hardware-friendly readout stage for the liquid state machine (LSM), a popular model for reservoir computing. Compared to the parallel perceptron architecture trained by the p-delta algorithm, which is the state of the art in terms of performance of readout stages, our readout architecture and learning algorithm can attain better performance with significantly less synaptic resources making it attractive for VLSI implementation. Inspired by the nonlinear properties of dendrites in biological neurons, our readout stage incorporates neurons having multiple dendrites with a lumped nonlinearity. The number of synaptic connections on each branch is significantly lower than the total number of connections from the liquid neurons and the learning algorithm tries to find the best 'combination' of input connections on each branch to reduce the error. Hence, the learning involves network rewiring (NRW) of the readout network similar to structural plasticity observed in its biological counterparts. We show that compared to a single perceptron using analog weights, this architecture for the readout can attain, even by using the same number of binary valued synapses, up to 3.3 times less error for a two-class spike train classification problem and 2.4 times less error for an input rate approximation task. Even with 60 times larger synapses, a group of 60 parallel perceptrons cannot attain the performance of the proposed dendritically enhanced readout. An additional advantage of this method for hardware implementations is that the 'choice' of connectivity can be easily implemented exploiting address event representation (AER) protocols commonly used in current neuromorphic systems where the connection matrix is stored in memory. Also, due to the use of binary synapses, our proposed method is more robust against statistical variations.