SEMay 28
AI-PROPELLER: Warehouse-Scale Interprocedural Code Layout Optimization with AlphaEvolveChaitanya Mamatha Ananda, Rajiv Gupta, Mircea Trofin et al.
Post-link optimizers (PLOs) such as Propeller and BOLT have demonstrated that precise, profile-guided code layout can extract significant performance gains from heavily optimized binaries. However, these systems are currently restricted to intraprocedural techniques, leaving the global potential of interprocedural layout largely untapped. Interprocedural code layout is historically difficult due to a combinatorially intractable search space and complex call-return semantics that are challenging to model. Consequently, the performance potential of fine-grained interprocedural layout remains unproven in practice. AI-PROPELLER uses Magellan, an agentic workflow that evolves the compiler heuristic in Propeller into a fine-grained interprocedural optimizer and fine-tunes the resulting policy hyperparameters. To ensure high-fidelity, we move away from approximate static cost models and the agentic workflow generates multiple layout variants that are executed on actual hardware to measure real performance counters, providing a precise reward signal for the evolutionary loop. AI-PROPELLER has been evaluated on several benchmarks including large warehouse-scale applications and experiments show performance improvements of 0.23% to 1.6% optimized with state-of-the-art FDO and PLO which is significant for real-world binaries. This is the first time ever that large warehouse-scale applications in industrial settings have been optimized with fine-grained interprocedural code layout.
AIMar 17, 2025
The Amazon Nova Family of Models: Technical Report and Model CardAmazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science
We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.
SEMar 31
SemLoc: Structured Grounding of Free-Form LLM Reasoning for Fault LocalizationZhaorui Yang, Haichao Zhu, Qian Zhang et al.
Fault localization identifies program locations responsible for observed failures. Existing techniques rank suspicious code using syntactic spectra--signals derived from execution structure such as statement coverage, control-flow divergence, or dependency reachability. These signals collapse for semantic bugs, where failing and passing executions follow identical code paths and differ only in whether semantic intent is satisfied. Recent LLM-based approaches introduce semantic reasoning but produce stochastic, unverifiable outputs that cannot be systematically cross-referenced across tests or distinguish root causes from cascading effects. We present SemLoc, a fault localization framework based on structured semantic grounding. SemLoc converts free-form LLM reasoning into a closed intermediate representation that binds each inferred property to a typed program anchor, enabling runtime checking and attribution to program structure. It executes instrumented programs to construct a semantic violation spectrum--a constraint-by-test matrix--from which suspiciousness scores are derived analogously to coverage-based methods. A counterfactual verification step further prunes over-approximate constraints and isolates primary causal violations. We evaluate SemLoc on SemFault-250, a corpus of 250 Python programs with single semantic faults. SemLoc outperforms five coverage-, reduction-, and LLM-based baselines, achieving Top-1 accuracy of 42.8% and Top-3 of 68%, while reducing inspection to 7.6% of executable lines. Counterfactual verification provides an additional 12% accuracy gain and identifies primary causal semantic constraints.
CVFeb 13, 2025
Noise Controlled CT Super-Resolution with Conditional Diffusion ModelYuang Wang, Siyeop Yoon, Rui Hu et al.
Improving the spatial resolution of CT images is a meaningful yet challenging task, often accompanied by the issue of noise amplification. This article introduces an innovative framework for noise-controlled CT super-resolution utilizing the conditional diffusion model. The model is trained on hybrid datasets, combining noise-matched simulation data with segmented details from real data. Experimental results with real CT images validate the effectiveness of our proposed framework, showing its potential for practical applications in CT imaging.
LGFeb 1, 2022
Generalizability of Machine Learning Models: Quantitative Evaluation of Three Methodological PitfallsFarhad Maleki, Katie Ovens, Rajiv Gupta et al.
Purpose: Despite the potential of machine learning models, the lack of generalizability has hindered their widespread adoption in clinical practice. We investigate three methodological pitfalls: (1) violation of independence assumption, (2) model evaluation with an inappropriate performance indicator or baseline for comparison, and (3) batch effect. Materials and Methods: Using several retrospective datasets, we implement machine learning models with and without the pitfalls to quantitatively illustrate these pitfalls' effect on model generalizability. Results: Violation of independence assumption, more specifically, applying oversampling, feature selection, and data augmentation before splitting data into train, validation, and test sets, respectively, led to misleading and superficial gains in F1 scores of 71.2% in predicting local recurrence and 5.0% in predicting 3-year overall survival in head and neck cancer as well as 46.0% in distinguishing histopathological patterns in lung cancer. Further, randomly distributing data points for a subject across training, validation, and test sets led to a 21.8% superficial increase in F1 score. Also, we showed the importance of the choice of performance measures and baseline for comparison. In the presence of batch effect, a model built for pneumonia detection led to F1 score of 98.7%. However, when the same model was applied to a new dataset of normal patients, it only correctly classified 3.86% of the samples. Conclusions: These methodological pitfalls cannot be captured using internal model evaluation, and the inaccurate predictions made by such models may lead to wrong conclusions and interpretations. Therefore, understanding and avoiding these pitfalls is necessary for developing generalizable models.
QMNov 19, 2021
Urine Microscopic Image DatasetDipam Goswami, Hari Om Aggrawal, Rajiv Gupta et al.
Urinalysis is a standard diagnostic test to detect urinary system related problems. The automation of urinalysis will reduce the overall diagnostic time. Recent studies used urine microscopic datasets for designing deep learning based algorithms to classify and detect urine cells. But these datasets are not publicly available for further research. To alleviate the need for urine datsets, we prepare our urine sediment microscopic image (UMID) dataset comprising of around 3700 cell annotations and 3 categories of cells namely RBC, pus and epithelial cells. We discuss the several challenges involved in preparing the dataset and the annotations. We make the dataset publicly available.
CRMar 19, 2020
Apps Gone Rogue: Maintaining Personal Privacy in an EpidemicRamesh Raskar, Isabel Schunemann, Rachel Barbar et al.
Containment, the key strategy in quickly halting an epidemic, requires rapid identification and quarantine of the infected individuals, determination of whom they have had close contact with in the previous days and weeks, and decontamination of locations the infected individual has visited. Achieving containment demands accurate and timely collection of the infected individual's location and contact history. Traditionally, this process is labor intensive, susceptible to memory errors, and fraught with privacy concerns. With the recent almost ubiquitous availability of smart phones, many people carry a tool which can be utilized to quickly identify an infected individual's contacts during an epidemic, such as the current 2019 novel Coronavirus crisis. Unfortunately, the very same first-generation contact tracing tools have been used to expand mass surveillance, limit individual freedoms and expose the most private details about individuals. We seek to outline the different technological approaches to mobile-phone based contact-tracing to date and elaborate on the opportunities and the risks that these technologies pose to individuals and societies. We describe advanced security enhancing approaches that can mitigate these risks and describe trade-offs one must make when developing and deploying any mass contact-tracing technology. With this paper, our aim is to continue to grow the conversation regarding contact-tracing for epidemic and pandemic containment and discuss opportunities to advance this space. We invite feedback and discussion.
LGDec 27, 2019
Split Learning for collaborative deep learning in healthcareMaarten G. Poirot, Praneeth Vepakomma, Ken Chang et al.
Shortage of labeled data has been holding the surge of deep learning in healthcare back, as sample sizes are often small, patient information cannot be shared openly, and multi-center collaborative studies are a burden to set up. Distributed machine learning methods promise to mitigate these problems. We argue for a split learning based approach and apply this distributed learning method for the first time in the medical field to compare performance against (1) centrally hosted and (2) non collaborative configurations for a range of participants. Two medical deep learning tasks are used to compare split learning to conventional single and multi center approaches: a binary classification problem of a data set of 9000 fundus photos, and multi-label classification problem of a data set of 156,535 chest X-rays. The several distributed learning setups are compared for a range of 1-50 distributed participants. Performance of the split learning configuration remained constant for any number of clients compared to a single center study, showing a marked difference compared to the non collaborative configuration after 2 clients (p < 0.001) for both sets. Our results affirm the benefits of collaborative training of deep neural networks in health care. Our work proves the significant benefit of distributed learning in healthcare, and paves the way for future real-world implementations.