Stanislav Moiseev

SE
h-index18
8papers
17citations
Novelty51%
AI Score54

8 Papers

81.4CCApr 14
Hyperplanes Avoiding Problem and Integer Points Counting in Polyhedra

Grigorii Dakhno, Dmitry Gribanov, Nikita Kasianov et al.

In our work, we consider the problem of computing a vector $x \in Z^n$ of minimum $\|\cdot\|_p$-norm such that $a^\top x \not= a_0$, for any vector $(a,a_0)$ from a given subset of $Z^n$ of size $m$. In other words, we search for a vector of minimum norm that avoids a given finite set of hyperplanes, which is natural to call as the $\textit{Hyperplanes Avoiding Problem}$. This problem naturally appears as a subproblem in Barvinok-type algorithms for counting integer points in polyhedra. We show that: 1) With respect to $\|\cdot\|_1$, the problem admits a feasible solution $x$ with $\|x\|_1 \leq (m+n)/2$, and show that such solution can be constructed by a deterministic polynomial-time algorithm with $O(n \cdot m)$ operations. Moreover, this inequality is the best possible. This is a significant improvement over the previous randomized algorithm, which computes $x$ with a guaranty $\|x\|_{1} \leq n \cdot m$. The original approach of A.~Barvinok can guarantee only $\|x\|_1 = O\bigl((n \cdot m)^n\bigr)$. To prove this result, we use a newly established algorithmic variant of the Combinatorial Nullstellensatz; 2) The problem is NP-hard with respect to any norm $\|\cdot\|_p$, for $p \in \bigl(R_{\geq 1} \cup \{\infty\}\bigr)$. 3) As an application, we show that the problem to count integer points in a polytope $P = \{x \in R^n \colon A x \leq b\}$, for given $A \in Z^{m \times n}$ and $b \in Q^m$, can be solved by an algorithm with $O\bigl(ν^2 \cdot n^3 \cdot Δ^3 \bigr)$ operations, where $ν$ is the maximum size of a normal fan triangulation of $P$, and $Δ$ is the maximum value of rank-order subdeterminants of $A$. As a further application, it provides a refined complexity bound for the counting problem in polyhedra of bounded codimension. For example, in the polyhedra of the Unbounded Subset-Sum problem.

49.0DMMay 7
Near-optimal edge partitioning via intersecting families

Alexander Yakunin, Andrey Kupavskii, Alexander Sushin et al.

We study the problem of edge partitioning, where the goal is to partition the edge set of a graph into several parts. The replication factor of a vertex $v$ is the number of parts that contain edges incident to $v$. The goal is to minimize the average replication factor of the vertices while keeping the sizes of the parts nearly equal. We study the regime where the number of parts is significantly smaller than the size of the graph. To this end, we introduce a new class of edge partitioning algorithms. These algorithms guarantee asymptotically worst-case-optimal upper bounds on the replication factor for any constant number of parts $k$, and when $k$ grows slowly with the number of vertices. In particular, we show that the optimal replication factor for growing $k$ is $\sqrt{k}(1+o(1))$. The algorithms are computationally efficient, including in the LOCAL and CONGEST models, and can be implemented as stateless streaming algorithms in graph processing frameworks. Some of the worst-case graphs are complete graphs and jumbled graphs, also known as pseudo-random graphs. Our method generalizes a family of algorithms based on symmetric intersecting families of sets. Informally, we replace the symmetry condition by a weaker balance condition that is still sufficient for the algorithms. This relaxation makes it possible to construct such families with asymptotically optimal rank $\sqrt{k}(1+o(1))$.

SEJan 26Code
TAM-Eval: Evaluating LLMs for Automated Unit Test Maintenance

Elena Bruches, Vadim Alperovich, Dari Baturova et al.

While Large Language Models (LLMs) have shown promise in software engineering, their application to unit testing remains largely confined to isolated test generation or oracle prediction, neglecting the broader challenge of test suite maintenance. We introduce TAM-Eval (Test Automated Maintenance Evaluation), a framework and benchmark designed to evaluate model performance across three core test maintenance scenarios: creation, repair, and updating of test suites. Unlike prior work limited to function-level tasks, TAM-Eval operates at the test file level, while maintaining access to full repository context during isolated evaluation, better reflecting real-world maintenance workflows. Our benchmark comprises 1,539 automatically extracted and validated scenarios from Python, Java, and Go projects. TAM-Eval supports system-agnostic evaluation of both raw LLMs and agentic workflows, using a reference-free protocol based on test suite pass rate, code coverage, and mutation testing. Empirical results indicate that state-of-the-art LLMs have limited capabilities in realistic test maintenance processes and yield only marginal improvements in test effectiveness. We release TAM-Eval as an open-source framework to support future research in automated software testing. Our data and code are publicly available at https://github.com/trndcenter/TAM-Eval.

LGJun 4, 2025Code
Leveraging Coordinate Momentum in SignSGD and Muon: Memory-Optimized Zero-Order

Egor Petrov, Grigoriy Evseev, Aleksey Antonov et al.

Fine-tuning Large Language Models (LLMs) is essential for adapting pre-trained models to downstream tasks. Yet traditional first-order optimizers such as Stochastic Gradient Descent (SGD) and Adam incur prohibitive memory and computational costs that scale poorly with model size. In this paper, we investigate zero-order (ZO) optimization methods as a memory- and compute-efficient alternative, particularly in the context of parameter-efficient fine-tuning techniques like LoRA. We propose $\texttt{JAGUAR SignSGD}$, a ZO momentum-based algorithm that extends ZO SignSGD, requiring the same number of parameters as the standard ZO SGD and only $\mathcal{O}(1)$ function evaluations per iteration. To the best of our knowledge, this is the first study to establish rigorous convergence guarantees for SignSGD in the stochastic ZO case. We further propose $\texttt{JAGUAR Muon}$, a novel ZO extension of the Muon optimizer that leverages the matrix structure of model parameters, and we provide its convergence rate under arbitrary stochastic noise. Through extensive experiments on challenging LLM fine-tuning benchmarks, we demonstrate that the proposed algorithms meet or exceed the convergence quality of standard first-order methods, achieving significant memory reduction. Our theoretical and empirical results establish new ZO optimization methods as a practical and theoretically grounded approach for resource-constrained LLM adaptation. Our code is available at https://github.com/brain-mmo-lab/ZO_LLM

SEJul 16, 2025Code
MERA Code: A Unified Framework for Evaluating Code Generation Across Tasks

Artem Chervyakov, Alexander Kharitonov, Pavel Zadorozhny et al.

Advancements in LLMs have enhanced task automation in software engineering; however, current evaluations primarily focus on natural language tasks, overlooking code quality. Most benchmarks prioritize high-level reasoning over executable code and real-world performance, leaving gaps in understanding true capabilities and risks associated with these models in production. To address this issue, we propose MERA Code, a new addition to the MERA benchmark family, specifically focused on evaluating code for the latest code generation LLMs in Russian. This benchmark includes 11 evaluation tasks that span 8 programming languages. Our proposed evaluation methodology features a taxonomy that outlines the practical coding skills necessary for models to complete these tasks. The benchmark comprises an open-source codebase for users to conduct MERA assessments, a scoring system compatible with various programming environments, and a platform featuring a leaderboard and submission system. We evaluate open LLMs and frontier API models, analyzing their limitations in terms of practical coding tasks in non-English languages. We are publicly releasing MERA to guide future research, anticipate groundbreaking features in model development, and standardize evaluation procedures.

63.7DSApr 14
Algorithms for Standard-form ILP Problems via Komlós' Discrepancy Setting

Dmitry Gribanov, Tagir Khayaleyev, Mikhail Cherniavskii et al.

We study the standard-form ILP problem $\max\{ c^\top x \colon A x = b,\; x \in Z_{\geq 0}^n \}$, where $A\in Z^{k\times n}$ has full row rank. We obtain refined FPT algorithms parameterized by $k$ and $Δ$, the maximum absolute value of a $k\times k$ minor of $A$. Our approach combines discrepancy-based dynamic programming with matrix discrepancy bounds in Komlós' setting. Let $κ_k$ denote the maximum discrepancy over all matrices with $k$ columns whose columns have Euclidean norm at most $1$. Up to polynomial factors in the input size, the optimization problem can be solved in time $O(κ_k)^{2k}Δ^2$, and the corresponding feasibility problem in time $O(κ_k)^kΔ$. Using the best currently known bound $κ_k=\widetilde O(\log^{1/4}k)$, this yields running times $O(\log k)^{\frac{k}{2}(1+o(1))}Δ^2$ and $O(\log k)^{\frac{k}{4}(1+o(1))}Δ$, respectively. Under the Komlós conjecture, the dependence on $k$ in both running times reduces to $2^{O(k)}$.

SEJan 19
RM -RF: Reward Model for Run-Free Unit Test Evaluation

Elena Bruches, Daniil Grebenkin, Mikhail Klementev et al.

We present RM-RF, a lightweight reward model for run-free evaluation of automatically generated unit tests. Instead of repeatedly compiling and executing candidate tests, RM-RF predicts - from source and test code alone - three execution-derived signals: (1) whether the augmented test suite compiles and runs successfully, (2) whether the generated test cases increase code coverage, and (3) whether the generated test cases improve the mutation kill rate. To train and evaluate RM-RF we assemble a multilingual dataset (Java, Python, Go) of focal files, test files, and candidate test additions labeled by an execution-based pipeline, and we release an associated dataset and methodology for comparative evaluation. We tested multiple model families and tuning regimes (zero-shot, full fine-tuning, and PEFT via LoRA), achieving an average F1 of 0.69 across the three targets. Compared to conventional compile-and-run instruments, RM-RF provides substantially lower latency and infrastructure cost while delivering competitive predictive fidelity, enabling fast, scalable feedback for large-scale test generation and RL-based code optimization.

SESep 12, 2025
Targeted Test Selection Approach in Continuous Integration

Pavel Plyusnin, Aleksey Antonov, Vasilii Ermakov et al.

In modern software development change-based testing plays a crucial role. However, as codebases expand and test suites grow, efficiently managing the testing process becomes increasingly challenging, especially given the high frequency of daily code commits. We propose Targeted Test Selection (T-TS), a machine learning approach for industrial test selection. Our key innovation is a data representation that represent commits as Bags-of-Words of changed files, incorporates cross-file and additional predictive features, and notably avoids the use of coverage maps. Deployed in production, T-TS was comprehensively evaluated against industry standards and recent methods using both internal and public datasets, measuring time efficiency and fault detection. On live industrial data, T-TS selects only 15% of tests, reduces execution time by $5.9\times$, accelerates the pipeline by $5.6\times$, and detects over 95% of test failures. The implementation is publicly available to support further research and practical adoption.