Viacheslav Vyshegorodtsev

CL
h-index2
4papers
95citations
Novelty24%
AI Score30

4 Papers

CLJul 20, 2024
Conversational Rubert for Detecting Competitive Interruptions in ASR-Transcribed Dialogues

Dmitrii Galimzianov, Viacheslav Vyshegorodtsev

Interruption in a dialogue occurs when the listener begins their speech before the current speaker finishes speaking. Interruptions can be broadly divided into two groups: cooperative (when the listener wants to support the speaker), and competitive (when the listener tries to take control of the conversation against the speaker's will). A system that automatically classifies interruptions can be used in call centers, specifically in the tasks of customer satisfaction monitoring and agent monitoring. In this study, we developed a text-based interruption classification model by preparing an in-house dataset consisting of ASR-transcribed customer support telephone dialogues in Russian. We fine-tuned Conversational RuBERT on our dataset and optimized hyperparameters, and the model performed well. With further improvements, the proposed model can be applied to automatic monitoring systems.

CLJul 13, 2024
Text-Based Detection of On-Hold Scripts in Contact Center Calls

Dmitrii Galimzianov, Viacheslav Vyshegorodtsev

Average hold time is a concern for call centers because it affects customer satisfaction. Contact centers should instruct their agents to use special on-hold scripts to maintain positive interactions with clients. This study presents a natural language processing model that detects on-hold phrases in customer service calls transcribed by automatic speech recognition technology. The task of finding hold scripts in dialogue was formulated as a multiclass text classification problem with three mutually exclusive classes: scripts for putting a client on hold, scripts for returning to a client, and phrases irrelevant to on-hold scripts. We collected an in-house dataset of calls and labeled each dialogue turn in each call. We fine-tuned RuBERT on the dataset by exploring various hyperparameter sets and achieved high model performance. The developed model can help agent monitoring by providing a way to check whether an agent follows predefined on-hold scripts.

CVOct 20, 2025
Monitoring Horses in Stalls: From Object to Event Detection

Dmitrii Galimzianov, Viacheslav Vyshegorodtsev, Ivan Nezhivykh

Monitoring the behavior of stalled horses is essential for early detection of health and welfare issues but remains labor-intensive and time-consuming. In this study, we present a prototype vision-based monitoring system that automates the detection and tracking of horses and people inside stables using object detection and multi-object tracking techniques. The system leverages YOLOv11 and BoT-SORT for detection and tracking, while event states are inferred based on object trajectories and spatial relations within the stall. To support development, we constructed a custom dataset annotated with assistance from foundation models CLIP and GroundingDINO. The system distinguishes between five event types and accounts for the camera's blind spots. Qualitative evaluation demonstrated reliable performance for horse-related events, while highlighting limitations in detecting people due to data scarcity. This work provides a foundation for real-time behavioral monitoring in equine facilities, with implications for animal welfare and stable management.

SDJun 3, 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition

Sergey Verbitskiy, Vladimir Berikov, Viacheslav Vyshegorodtsev

Audio pattern recognition (APR) is an important research topic and can be applied to several fields related to our lives. Therefore, accurate and efficient APR systems need to be developed as they are useful in real applications. In this paper, we propose a new convolutional neural network (CNN) architecture and a method for improving the inference speed of CNN-based systems for APR tasks. Moreover, using the proposed method, we can improve the performance of our systems, as confirmed in experiments conducted on four audio datasets. In addition, we investigate the impact of data augmentation techniques and transfer learning on the performance of our systems. Our best system achieves a mean average precision (mAP) of 0.450 on the AudioSet dataset. Although this value is less than that of the state-of-the-art system, the proposed system is 7.1x faster and 9.7x smaller. On the ESC-50, UrbanSound8K, and RAVDESS datasets, we obtain state-of-the-art results with accuracies of 0.961, 0.908, and 0.748, respectively. Our system for the ESC-50 dataset is 1.7x faster and 2.3x smaller than the previous best system. For the RAVDESS dataset, our system is 3.3x smaller than the previous best system. We name our systems "Efficient Residual Audio Neural Networks".