Francisco Romero

DC
h-index8
7papers
65citations
Novelty61%
AI Score50

7 Papers

NAOct 1, 2016
Mathematical Analysis of Ultrafast Ultrasound Imaging

Giovanni S. Alberti, Habib Ammari, Francisco Romero et al.

This paper provides a mathematical analysis of ultrafast ultrasound imaging. This newly emerging modality for biomedical imaging uses plane waves instead of focused waves in order to achieve very high frame rates. We derive the point spread function of the system in the Born approximation for wave propagation and study its properties. We consider dynamic data for blood flow imaging, and introduce a suitable random model for blood cells. We show that a singular value decomposition method can successfully remove the clutter signal by using the different spatial coherence of tissue and blood signals, thereby providing high-resolution images of blood vessels, even in cases when the clutter and blood speeds are comparable in magnitude. Several numerical simulations are presented to illustrate and validate the approach.

NANov 30, 2018
Dynamic Spike Super-resolution and Applications to Ultrafast Ultrasound Imaging

Giovanni S. Alberti, Habib Ammari, Francisco Romero et al.

We consider the dynamical super-resolution problem consisting in the recovery of positions and velocities of moving particles from low-frequency static measurements taken over multiple time steps. The standard approach to this issue is a two-step process: first, at each time step some static reconstruction method is applied to locate the positions of the particles with super-resolution and, second, some tracking technique is applied to obtain the velocities. In this paper we propose a fully dynamical method based on a phase-space lifting of the positions and the velocities of the particles, which are simultaneously reconstructed with super-resolution. We provide a rigorous mathematical analysis of the recovery problem, both for the noiseless case and in presence of noise (in the discrete setting). Several numerical simulations illustrate and validate our method, which shows some advantage over existing techniques. We then discuss the application of this approach to the dynamical super-resolution problem in ultrafast ultrasound imaging: blood vessels' locations and blood flow velocities are recovered with super-resolution.

87.4DBMar 18Code
Halo: Domain-Aware Query Optimization for Long-Context Question Answering

Pramod Chunduri, Francisco Romero, Ali Payani et al.

Long-context question answering (QA) over lengthy documents is critical for applications such as financial analysis, legal review, and scientific research. Current approaches, such as processing entire documents via a single LLM call or retrieving relevant chunks via RAG have two drawbacks: First, as context size increases, response quality can degrade, impacting accuracy. Second, iteratively processing hundreds of input documents can incur prohibitively high costs in API calls. To improve response quality and reduce the number of iterations needed to get the desired response, users tend to add domain knowledge to their prompts. However, existing systems fail to systematically capture and use this knowledge to guide query processing. Domain knowledge is treated as prompt tokens alongside the document: the LLM may or may not follow it, there is no reduction in computational cost, and when outputs are incorrect, users must manually iterate. We present Halo, a long-context QA framework that automatically extracts domain knowledge from user prompts and applies it as executable operators across a multi-stage query execution pipeline. Halo identifies three common forms of domain knowledge - where in the document to look, what content to ignore, and how to verify the answer - and applies each at the pipeline stage where it is most effective: pruning the document before chunk selection, filtering irrelevant chunks before inference, and ranking candidate responses after generation. To handle imprecise or invalid domain knowledge, Halo includes a fallback mechanism that detects low-quality operators at runtime and selectively disables them. Our evaluation across finance, literature, and scientific datasets shows that Halo achieves up to 13% higher accuracy and 4.8x lower cost compared to baselines, and enables a lightweight open-source model to approach frontier LLM accuracy at 78x lower cost.

DCNov 14, 2025
Flash-Fusion: Enabling Expressive, Low-Latency Queries on IoT Sensor Streams with LLMs

Kausar Patherya, Ashutosh Dhekne, Francisco Romero

Smart cities and pervasive IoT deployments have generated interest in IoT data analysis across transportation and urban planning. At the same time, Large Language Models offer a new interface for exploring IoT data - particularly through natural language. Users today face two key challenges when working with IoT data using LLMs: (1) data collection infrastructure is expensive, producing terabytes of low-level sensor readings that are too granular for direct use, and (2) data analysis is slow, requiring iterative effort and technical expertise. Directly feeding all IoT telemetry to LLMs is impractical due to finite context windows, prohibitive token costs at scale, and non-interactive latencies. What is missing is a system that first parses a user's query to identify the analytical task, then selects the relevant data slices, and finally chooses the right representation before invoking an LLM. We present Flash-Fusion, an end-to-end edge-cloud system that reduces the IoT data collection and analysis burden on users. Two principles guide its design: (1) edge-based statistical summarization (achieving 73.5% data reduction) to address data volume, and (2) cloud-based query planning that clusters behavioral data and assembles context-rich prompts to address data interpretation. We deploy Flash-Fusion on a university bus fleet and evaluate it against a baseline that feeds raw data to a state-of-the-art LLM. Flash-Fusion achieves a 95% latency reduction and 98% decrease in token usage and cost while maintaining high-quality responses. It enables personas across disciplines - safety officers, urban planners, fleet managers, and data scientists - to efficiently iterate over IoT data without the burden of manual query authoring or preprocessing.

91.0DCApr 7
CoStream: Codec-Guided Resource-Efficient System for Video Streaming Analytics

Yulin Zou, Yan Chen, Wenyan Chen et al.

Video streaming analytics is a crucial workload for vision-language model serving, but the high cost of multimodal inference limits scalability. Prior systems reduce inference cost by exploiting temporal and spatial redundancy in video streams, but they target either the vision transformer (ViT) or the LLM with a limited view, leaving end-to-end opportunities untapped. Moreover, existing methods incur significant overhead to identify redundancy, either through offline profiling and training or costly online computation, making them ill-suited for dynamic real-time streams. We present CoStream, a codec-guided streaming video analytics system built on a key observation that video codecs already extract the temporal and spatial structure of each stream as a byproduct of compression. CoStream treats this codec metadata as a low-cost runtime signal to unify optimization across video decoding, visual processing, and LLM prefilling, with transmission reduction as an inherent benefit of operating directly on compressed bitstreams. This drives codec-guided patch pruning before ViT encoding and selective key-value cache refresh during LLM prefilling, both of which are fully online and do not require offline training. Experiments show that CoStream achieves up to 3x throughput improvement and up to 87% GPU compute reduction over state-of-the-art baselines, while maintaining competitive accuracy with only 0-8% F1 drop.

DCMay 30, 2019
INFaaS: A Model-less and Managed Inference Serving System

Francisco Romero, Qian Li, Neeraja J. Yadwadkar et al.

Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain challenges at large scales. Developers must manually search through thousands of model-variants -- versions of already-trained models that differ in hardware, resource footprints, latencies, costs, and accuracies -- to meet the diverse application requirements. Since requirements, query load, and applications themselves evolve over time, these decisions need to be made dynamically for each inference query to avoid excessive costs through naive autoscaling. To avoid navigating through the large and complex trade-off space of model-variants, developers often fix a variant across queries, and replicate it when load increases. However, given the diversity across variants and hardware platforms in the cloud, a lack of understanding of the trade-off space can incur significant costs to developers. This paper introduces INFaaS, a managed and model-less system for distributed inference serving, where developers simply specify the performance and accuracy requirements for their applications without needing to specify a specific model-variant for each query. INFaaS generates model-variants, and efficiently navigates the large trade-off space of model-variants on behalf of developers to meet application-specific objectives: (a) for each query, it selects a model, hardware architecture, and model optimizations, (b) it combines VM-level horizontal autoscaling with model-level autoscaling, where multiple, different model-variants are used to serve queries within each machine. By leveraging diverse variants and sharing hardware resources across models, INFaaS achieves 1.3x higher throughput, violates latency objectives 1.6x less often, and saves up to 21.6x in cost (8.5x on average) compared to state-of-the-art inference serving systems on AWS EC2.

NAAug 15, 2016
A signal separation technique for sub-cellular imaging using dynamic optical coherence tomography

Habib Ammari, Francisco Romero, Cong Shi

This paper aims at imaging the dynamics of metabolic activity of cells. Using dynamic optical coherence tomography, we introduce a new multi-particle dynamical model to simulate the movements of the collagen and the cell metabolic activity and develop an efficient signal separation technique for sub-cellular imaging. We perform a singular-value decomposition of the dynamic optical images to isolate the intensity of the metabolic activity. We prove that the largest eigenvalue of the associated Casorati matrix corresponds to the collagen. We present several numerical simulations to illustrate and validate our approach.