Arjun Rao

h-index9

7papers

152citations

Novelty55%

AI Score53

Ranked #31,801 of 201,018 authors (top 16%)#12,846 in CV (top 22%)

7 Papers

AIFeb 24

From Logs to Language: Learning Optimal Verbalization for LLM-Based Recommendation in Production

Yucheng Shi, Ying Li, Yu Wang et al.

Large language models (LLMs) are promising backbones for generative recommender systems, yet a key challenge remains underexplored: verbalization, i.e., converting structured user interaction logs into effective natural language inputs. Existing methods rely on rigid templates that simply concatenate fields, yielding suboptimal representations for recommendation. We propose a data-centric framework that learns verbalization for LLM-based recommendation. Using reinforcement learning, a verbalization agent transforms raw interaction histories into optimized textual contexts, with recommendation accuracy as the training signal. This agent learns to filter noise, incorporate relevant metadata, and reorganize information to improve downstream predictions. Experiments on a large-scale industrial streaming dataset show that learned verbalization delivers up to 93% relative improvement in discovery item recommendation accuracy over template-based baselines. Further analysis reveals emergent strategies such as user interest summarization, noise removal, and syntax normalization, offering insights into effective context construction for LLM-based recommender systems.

CVDec 8, 2025

Persistent Homology-Guided Frequency Filtering for Image Compression

Anil Chintapalli, Peter Tenholder, Henry Chen et al.

Feature extraction in noisy image datasets presents many challenges in model reliability. In this paper, we use the discrete Fourier transform in conjunction with persistent homology analysis to extract specific frequencies that correspond with certain topological features of an image. This method allows the image to be compressed and reformed while ensuring that meaningful data can be differentiated. Our experimental results show a level of compression comparable to that of using JPEG using six different metrics. The end goal of persistent homology-guided frequency filtration is its potential to improve performance in binary classification tasks (when augmenting a Convolutional Neural Network) compared to traditional feature extraction and compression methods. These findings highlight a useful end result: enhancing the reliability of image compression under noisy conditions.

LGJan 30

Localized, High-resolution Geographic Representations with Slepian Functions

Arjun Rao, Ruth Crasto, Tessa Ooms et al.

Geographic data is fundamentally local. Disease outbreaks cluster in population centers, ecological patterns emerge along coastlines, and economic activity concentrates within country borders. Machine learning models that encode geographic location, however, distribute representational capacity uniformly across the globe, struggling at the fine-grained resolutions that localized applications require. We propose a geographic location encoder built from spherical Slepian functions that concentrate representational capacity inside a region-of-interest and scale to high resolutions without extensive computational demands. For settings requiring global context, we present a hybrid Slepian-Spherical Harmonic encoder that efficiently bridges the tradeoff between local-global performance, while retaining desirable properties such as pole-safety and spherical-surface-distance preservation. Across five tasks spanning classification, regression, and image-augmented prediction, Slepian encodings outperform baselines and retain performance advantages across a wide range of neural network architectures.

LGNov 3, 2025

Measuring the Intrinsic Dimension of Earth Representations

Arjun Rao, Marc Rußwurm, Konstantin Klemmer et al.

Within the context of representation learning for Earth observation, geographic Implicit Neural Representations (INRs) embed low-dimensional location inputs (longitude, latitude) into high-dimensional embeddings, through models trained on geo-referenced satellite, image or text data. Despite the common aim of geographic INRs to distill Earth's data into compact, learning-friendly representations, we lack an understanding of how much information is contained in these Earth representations, and where that information is concentrated. The intrinsic dimension of a dataset measures the number of degrees of freedom required to capture its local variability, regardless of the ambient high-dimensional space in which it is embedded. This work provides the first study of the intrinsic dimensionality of geographic INRs. Analyzing INRs with ambient dimension between 256 and 512, we find that their intrinsic dimensions fall roughly between 2 and 10 and are sensitive to changing spatial resolution and input modalities during INR pre-training. Furthermore, we show that the intrinsic dimension of a geographic INR correlates with downstream task performance and can capture spatial artifacts, facilitating model evaluation and diagnostics. More broadly, our work offers an architecture-agnostic, label-free metric of information content that can enable unsupervised evaluation, model selection, and pre-training design across INRs.

IRMay 28, 2025

Rethinking Hybrid Retrieval: When Small Embeddings and LLM Re-ranking Beat Bigger Models

Arjun Rao, Hanieh Alipour, Nick Pendar

This paper presents a comparison of embedding models in tri-modal hybrid retrieval for Retrieval-Augmented Generation (RAG) systems. We investigate the fusion of dense semantic, sparse lexical, and graph-based embeddings, focusing on the performance of the MiniLM-v6 and BGE-Large architectures. Contrary to conventional assumptions, our results show that the compact MiniLM-v6 outperforms the larger BGE-Large when integrated with LLM-based re-ranking within our tri-modal hybrid framework. Experiments conducted on the SciFact, FIQA, and NFCorpus datasets demonstrate significant improvements in retrieval quality with the MiniLM-v6 configuration. The performance difference is particularly pronounced in agentic re-ranking scenarios, indicating better alignment between MiniLM-v6's embedding space and LLM reasoning. Our findings suggest that embedding model selection for RAG systems should prioritize compatibility with multi-signal fusion and LLM alignment, rather than relying solely on larger models. This approach may reduce computational requirements while improving retrieval accuracy and efficiency.

CVJul 15, 2025

Using Multiple Input Modalities Can Improve Data-Efficiency and O.O.D. Generalization for ML with Satellite Imagery

Arjun Rao, Esther Rolf

A large variety of geospatial data layers is available around the world ranging from remotely-sensed raster data like satellite imagery, digital elevation models, predicted land cover maps, and human-annotated data, to data derived from environmental sensors such as air temperature or wind speed data. A large majority of machine learning models trained on satellite imagery (SatML), however, are designed primarily for optical input modalities such as multi-spectral satellite imagery. To better understand the value of using other input modalities alongside optical imagery in supervised learning settings, we generate augmented versions of SatML benchmark tasks by appending additional geographic data layers to datasets spanning classification, regression, and segmentation. Using these augmented datasets, we find that fusing additional geographic inputs with optical imagery can significantly improve SatML model performance. Benefits are largest in settings where labeled data are limited and in geographic out-of-sample settings, suggesting that multi-modal inputs may be especially valuable for data-efficiency and out-of-sample performance of SatML models. Surprisingly, we find that hard-coded fusion strategies outperform learned variants, with interesting implications for future work.

NEJul 8, 2021

A Long Short-Term Memory for AI Applications in Spike-based Neuromorphic Hardware

Philipp Plank, Arjun Rao, Andreas Wild et al.

Spike-based neuromorphic hardware holds the promise to provide more energy efficient implementations of Deep Neural Networks (DNNs) than standard hardware such as GPUs. But this requires to understand how DNNs can be emulated in an event-based sparse firing regime, since otherwise the energy-advantage gets lost. In particular, DNNs that solve sequence processing tasks typically employ Long Short-Term Memory (LSTM) units that are hard to emulate with few spikes. We show that a facet of many biological neurons, slow after-hyperpolarizing (AHP) currents after each spike, provides an efficient solution. AHP-currents can easily be implemented in neuromorphic hardware that supports multi-compartment neuron models, such as Intel's Loihi chip. Filter approximation theory explains why AHP-neurons can emulate the function of LSTM units. This yields a highly energy-efficient approach to time series classification. Furthermore it provides the basis for implementing with very sparse firing an important class of large DNNs that extract relations between words and sentences in a text in order to answer questions about the text.