Blagoj Mitrevski

h-index3

4papers

23citations

Novelty57%

AI Score43

Ranked #52,308 of 194,257 authors (top 27%)#18,205 in CV (top 31%)

4 Papers

3.6CVOct 31, 2025Code

Sketch-to-Layout: Sketch-Guided Multimodal Layout Generation

Riccardo Brioschi, Aleksandr Alekseev, Emanuele Nevali et al.

Graphic layout generation is a growing research area focusing on generating aesthetically pleasing layouts ranging from poster designs to documents. While recent research has explored ways to incorporate user constraints to guide the layout generation, these constraints often require complex specifications which reduce usability. We introduce an innovative approach exploiting user-provided sketches as intuitive constraints and we demonstrate empirically the effectiveness of this new guidance method, establishing the sketch-to-layout problem as a promising research direction, which is currently under-explored. To tackle the sketch-to-layout problem, we propose a multimodal transformer-based solution using the sketch and the content assets as inputs to produce high quality layouts. Since collecting sketch training data from human annotators to train our model is very costly, we introduce a novel and efficient method to synthetically generate training sketches at scale. We train and evaluate our model on three publicly available datasets: PubLayNet, DocLayNet and SlidesVQA, demonstrating that it outperforms state-of-the-art constraint-based methods, while offering a more intuitive design experience. In order to facilitate future sketch-to-layout research, we release O(200k) synthetically-generated sketches for the public datasets above. The datasets are available at https://github.com/google-deepmind/sketch_to_layout.

3.7CVFeb 8, 2024Code

InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write

Blagoj Mitrevski, Arina Rak, Julian Schnitzler et al.

Digital note-taking is gaining popularity, offering a durable, editable, and easily indexable way of storing notes in a vectorized form, known as digital ink. However, a substantial gap remains between this way of note-taking and traditional pen-and-paper note-taking, a practice that is still favored by a vast majority. Our work InkSight, aims to bridge the gap by empowering physical note-takers to effortlessly convert their work (offline handwriting) to digital ink (online handwriting), a process we refer to as derendering. Prior research on the topic has focused on the geometric properties of images, resulting in limited generalization beyond their training domains. Our approach combines reading and writing priors, allowing training a model in the absence of large amounts of paired samples, which are difficult to obtain. To our knowledge, this is the first work that effectively derenders handwritten text in arbitrary photos with diverse visual characteristics and backgrounds. Furthermore, it generalizes beyond its training domain into simple sketches. Our human evaluation reveals that 87% of the samples produced by our model on the challenging HierText dataset are considered as a valid tracing of the input image and 67% look like a pen trajectory traced by a human.

5.4IRSep 19, 2020

Modeling Online Behavior in Recommender Systems: The Importance of Temporal Context

Milena Filipovic, Blagoj Mitrevski, Diego Antognini et al.

Recommender systems research tends to evaluate model performance offline and on randomly sampled targets, yet the same systems are later used to predict user behavior sequentially from a fixed point in time. Simulating online recommender system performance is notoriously difficult and the discrepancy between online and offline behaviors is typically not accounted for in offline evaluations. This disparity permits weaknesses to go unnoticed until the model is deployed in a production setting. In this paper, we first demonstrate how omitting temporal context when evaluating recommender system performance leads to false confidence. To overcome this, we postulate that offline evaluation protocols can only model real-life use-cases if they account for temporal context. Next, we propose a training procedure to further embed the temporal context in existing models. We use a multi-objective approach to introduce temporal context into traditionally time-unaware recommender systems and confirm its advantage via the proposed evaluation protocol. Finally, we validate that the Pareto Fronts obtained with the added objective dominate those produced by state-of-the-art models that are only optimized for accuracy on three real-world publicly available datasets. The results show that including our temporal objective can improve recall@20 by up to 20%.

4.2LGSep 10, 2020

Momentum-based Gradient Methods in Multi-Objective Recommendation

Blagoj Mitrevski, Milena Filipovic, Diego Antognini et al.

Multi-objective gradient methods are becoming the standard for solving multi-objective problems. Among others, they show promising results in developing multi-objective recommender systems with both correlated and conflicting objectives. Classic multi-gradient~descent usually relies on the combination of the gradients, not including the computation of first and second moments of the gradients. This leads to a brittle behavior and misses important areas in the solution space. In this work, we create a multi-objective model-agnostic Adamize method that leverages the benefits of the Adam optimizer in single-objective problems. This corrects and stabilizes~the~gradients of every objective before calculating a common gradient descent vector that optimizes all the objectives simultaneously. We evaluate the benefits of Multi-objective Adamize on two multi-objective recommender systems and for three different objective combinations, both correlated or conflicting. We report significant improvements, measured with three different Pareto front metrics: hypervolume, coverage, and spacing. Finally, we show that the \textit{Adamized} Pareto front strictly dominates the previous one on multiple objective pairs.