CVAIFeb 22, 2022

RuCLIP -- new models and experiments: a technical report

arXiv:2202.10784v14 citations
Originality Synthesis-oriented
AI Analysis

This work provides improved models for Russian-language vision-language tasks, but it is incremental as it builds on existing CLIP architectures.

The authors introduced six new implementations of the ruCLIP model trained on 240M Russian-English pairs, which outperformed the baseline CLIP + OPUS-MT translation on most of 16 datasets in few-shot and zero-shot tasks.

In the report we propose six new implementations of ruCLIP model trained on our 240M pairs. The accuracy results are compared with original CLIP model with Ru-En translation (OPUS-MT) on 16 datasets from different domains. Our best implementations outperform CLIP + OPUS-MT solution on most of the datasets in few-show and zero-shot tasks. In the report we briefly describe the implementations and concentrate on the conducted experiments. Inference execution time comparison is also presented in the report.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes