18.0IRMay 22Code
RE-TRIANGLE: Does TRIANGLE Enable Multimodal Alignment Beyond Cosine Similarity in Retrieval?Arijit Ghosh, Aritra Bandyopadhyay, Chiranjeev Bindra et al.
Multimodal alignment is critical for bridging the semantic gap in information retrieval. However, traditional pairwise strategies introduce a geometric blind spot: while they align anchor modalities (e.g., text) with others, they lack constraints to enforce mutual consistency between peripheral modalities (e.g., video and audio). The TRIANGLE framework addresses this by minimizing the area of modality triplets on a hypersphere to enforce holistic alignment. In this reproducibility study, we verify the robustness of this geometric objective for retrieval tasks. We confirm that TRIANGLE outperforms pairwise baselines in zero-shot settings, achieving Recall@1 gains of up to +8.7 points, though benefits are domain-dependent. However, we fail to reproduce the reported learning-from-scratch results. Analysis using a synthetic toy dataset attributes this to instability when jointly optimizing geometric alignment with Data-Text Matching (DTM) loss. Furthermore, we find that cosine regularization primarily stabilizes text-to-video retrieval, and fine-tuning with domain supervision amplifies geometric benefits but reduces cross-dataset generalization. Our findings support the efficacy of geometric alignment while highlighting critical optimization sensitivities. Code available at https://github.com/ARIJIT00171/RE-TRIANGLE.
87.3CVApr 7Code
PoM: A Linear-Time Replacement for Attention with the Polynomial MixerDavid Picard, Nicolas Dufour, Lucas Degeorge et al.
This paper introduces the Polynomial Mixer (PoM), a novel token mixing mechanism with linear complexity that serves as a drop-in replacement for self-attention. PoM aggregates input tokens into a compact representation through a learned polynomial function, from which each token retrieves contextual information. We prove that PoM satisfies the contextual mapping property, ensuring that transformers equipped with PoM remain universal sequence-to-sequence approximators. We replace standard self-attention with PoM across five diverse domains: text generation, handwritten text recognition, image generation, 3D modeling, and Earth observation. PoM matches the performance of attention-based models while drastically reducing computational cost when working with long sequences. The code is available at https://github.com/davidpicard/pom.
CVJun 2, 2024Code
EchoNet-Synthetic: Privacy-preserving Video Generation for Safe Medical Data SharingHadrien Reynaud, Qingjie Meng, Mischa Dombrowski et al.
To make medical datasets accessible without sharing sensitive patient information, we introduce a novel end-to-end approach for generative de-identification of dynamic medical imaging data. Until now, generative methods have faced constraints in terms of fidelity, spatio-temporal coherence, and the length of generation, failing to capture the complete details of dataset distributions. We present a model designed to produce high-fidelity, long and complete data samples with near-real-time efficiency and explore our approach on a challenging task: generating echocardiogram videos. We develop our generation method based on diffusion models and introduce a protocol for medical video dataset anonymization. As an exemplar, we present EchoNet-Synthetic, a fully synthetic, privacy-compliant echocardiogram dataset with paired ejection fraction labels. As part of our de-identification protocol, we evaluate the quality of the generated dataset and propose to use clinical downstream tasks as a measurement on top of widely used but potentially biased image quality metrics. Experimental outcomes demonstrate that EchoNet-Synthetic achieves comparable dataset fidelity to the actual dataset, effectively supporting the ejection fraction regression task. Code, weights and dataset are available at https://github.com/HReynaud/EchoNet-Synthetic.
COFeb 27, 2025
About almost covering subsets of the hypercubeArijit Ghosh, Chandrima Kayal, Soumi Nandi
Let $\mathbb{F}$ be a field, and consider the hypercube $\{ 0, 1 \}^{n}$ in $\mathbb{F}^{n}$. Sziklai and Weiner (Journal of Combinatorial Theory, Series A 2022) showed that if a polynomial $P ( X_{1}, \dots, X_{n} ) \in \mathbb{F}[ X_{1}, \dots, X_{n}]$ vanishes on every point of the hypercube $\{0,1\}^{n}$ except those with at most $r$ many ones then the degree of the polynomial will be at least $n-r$. This is a generalization of Alon and Füredi's fundamental result (European Journal of Combinatorics 1993) about polynomials vanishing on every point of the hypercube except at the origin (point with all zero coordinates). Sziklai and Weiner proved their interesting result using Möbius inversion formula and the Zeilberger method for proving binomial equalities. In this short note, we show that a stronger version of Sziklai and Weiner's result can be derived directly from Alon and Füredi's result.
CVOct 29, 2025
MIRO: MultI-Reward cOnditioned pretraining improves T2I quality and efficiencyNicolas Dufour, Lucas Degeorge, Arijit Ghosh et al.
Current text-to-image generative models are trained on large uncurated datasets to enable diverse generation capabilities. However, this does not align well with user preferences. Recently, reward models have been specifically designed to perform post-hoc selection of generated images and align them to a reward, typically user preference. This discarding of informative data together with the optimizing for a single reward tend to harm diversity, semantic fidelity and efficiency. Instead of this post-processing, we propose to condition the model on multiple reward models during training to let the model learn user preferences directly. We show that this not only dramatically improves the visual quality of the generated images but it also significantly speeds up the training. Our proposed method, called MIRO, achieves state-of-the-art performances on the GenEval compositional benchmark and user-preference scores (PickAScore, ImageReward, HPSv2).
LGAug 28, 2025
Dimension Agnostic Testing of Survey Data Credibility through the Lens of RegressionDebabrota Basu, Sourav Chakraborty, Debarshi Chanda et al.
Assessing whether a sample survey credibly represents the population is a critical question for ensuring the validity of downstream research. Generally, this problem reduces to estimating the distance between two high-dimensional distributions, which typically requires a number of samples that grows exponentially with the dimension. However, depending on the model used for data analysis, the conclusions drawn from the data may remain consistent across different underlying distributions. In this context, we propose a task-based approach to assess the credibility of sampled surveys. Specifically, we introduce a model-specific distance metric to quantify this notion of credibility. We also design an algorithm to verify the credibility of survey data in the context of regression models. Notably, the sample complexity of our algorithm is independent of the data dimension. This efficiency stems from the fact that the algorithm focuses on verifying the credibility of the survey data rather than reconstructing the underlying regression model. Furthermore, we show that if one attempts to verify credibility by reconstructing the regression model, the sample complexity scales linearly with the dimensionality of the data. We prove the theoretical correctness of our algorithm and numerically demonstrate our algorithm's performance.
DSNov 19, 2021
Uniform Brackets, Containers, and Combinatorial Macbeath RegionsKunal Dutta, Arijit Ghosh, Shay Moran
We study the connections between three seemingly different combinatorial structures - "uniform" brackets in statistics and probability theory, "containers" in online and distributed learning theory, and "combinatorial Macbeath regions", or Mnets in discrete and computational geometry. We show that these three concepts are manifestations of a single combinatorial property that can be expressed under a unified framework along the lines of Vapnik-Chervonenkis type theory for uniform convergence. These new connections help us to bring tools from discrete and computational geometry to prove improved bounds for these objects. Our improved bounds help to get an optimal algorithm for distributed learning of halfspaces, an improved algorithm for the distributed convex set disjointness problem, and improved regret bounds for online algorithms against a smoothed adversary for a large class of semi-algebraic threshold functions.
DMDec 12, 2014
Size sensitive packing number for Hamming cube and its consequencesKunal Dutta, Arijit Ghosh
We prove a size-sensitive version of Haussler's Packing lemma~\cite{Haussler92spherepacking} for set-systems with bounded primal shatter dimension, which have an additional {\em size-sensitive property}. This answers a question asked by Ezra~\cite{Ezra-sizesendisc-soda-14}. We also partially address another point raised by Ezra regarding overcounting of sets in her chaining procedure. As a consequence of these improvements, we get an improvement on the size-sensitive discrepancy bounds for set systems with the above property. Improved bounds on the discrepancy for these special set systems also imply an improvement in the sizes of {\em relative $(\varepsilon, δ)$-approximations} and $(ν, α)$-samples.