Harsh Purohit

AS
h-index18
18papers
1,487citations
Novelty32%
AI Score46

18 Papers

ASJun 1
Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Tomoya Nishida, Noboru Harada, Daiki Takeuchi et al.

This paper presents an overview of DCASE 2026 Challenge Task 2, titled "Noise-aware unsupervised anomalous sound detection (UASD) for machine condition monitoring." The task aims to advance noise-robust anomalous sound detection for machine condition monitoring under the unsupervised setting, where only normal machine sounds are available for training. Reliable detection under noisy conditions is crucial for practical deployment, but previous DCASE Task 2 settings provided limited information about environmental noise, potentially limiting UASD performance in highly noisy situations. To address this limitation, DCASE 2026 allows participants to exploit two-channel audio samples simultaneously captured at locations near and far from the target machine. Since the distant microphone is expected to contain relatively stronger environmental noise and weaker direct machine sounds, it may help distinguish environmental noise components from the target machine sounds. After the challenge submission deadline, challenge results and an analysis of the submitted systems will be added.

SDJun 13, 2022
Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques

Kota Dohi, Keisuke Imoto, Noboru Harada et al.

We present the task description and discussion on the results of the DCASE 2022 Challenge Task 2: ``Unsupervised anomalous sound detection (ASD) for machine condition monitoring applying domain generalization techniques''. Domain shifts are a critical problem for the application of ASD systems. Because domain shifts can change the acoustic characteristics of data, a model trained in a source domain performs poorly for a target domain. In DCASE 2021 Challenge Task 2, we organized an ASD task for handling domain shifts. In this task, it was assumed that the occurrences of domain shifts are known. However, in practice, the domain of each sample may not be given, and the domain shifts can occur implicitly. In 2022 Task 2, we focus on domain generalization techniques that detects anomalies regardless of the domain shifts. Specifically, the domain of each sample is not given in the test data and only one threshold is allowed for all domains. Analysis of 81 submissions from 31 teams revealed two remarkable types of domain generalization techniques: 1) domain-mixing-based approach that obtains generalized representations and 2) domain-classification-based approach that explicitly or implicitly classifies different domains to improve detection performance for each domain.

SDMay 27, 2022
MIMII DG: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection for Domain Generalization Task

Kota Dohi, Tomoya Nishida, Harsh Purohit et al.

We present a machine sound dataset to benchmark domain generalization techniques for anomalous sound detection (ASD). Domain shifts are differences in data distributions that can degrade the detection performance, and handling them is a major issue for the application of ASD systems. While currently available datasets for ASD tasks assume that occurrences of domain shifts are known, in practice, they can be difficult to detect. To handle such domain shifts, domain generalization techniques that perform well regardless of the domains should be investigated. In this paper, we present the first ASD dataset for the domain generalization techniques, called MIMII DG. The dataset consists of five machine types and three domain shift scenarios for each machine type. The dataset is dedicated to the domain generalization task with features such as multiple different values for parameters that cause domain shifts and introduction of domain shifts that can be difficult to detect, such as shifts in the background noise. Experimental results using two baseline systems indicate that the dataset reproduces domain shift scenarios and is useful for benchmarking domain generalization techniques.

LGJun 11, 2022
Hierarchical Conditional Variational Autoencoder Based Acoustic Anomaly Detection

Harsh Purohit, Takashi Endo, Masaaki Yamamoto et al.

This paper aims to develop an acoustic signal-based unsupervised anomaly detection method for automatic machine monitoring. Existing approaches such as deep autoencoder (DAE), variational autoencoder (VAE), conditional variational autoencoder (CVAE) etc. have limited representation capabilities in the latent space and, hence, poor anomaly detection performance. Different models have to be trained for each different kind of machines to accurately perform the anomaly detection task. To solve this issue, we propose a new method named as hierarchical conditional variational autoencoder (HCVAE). This method utilizes available taxonomic hierarchical knowledge about industrial facility to refine the latent space representation. This knowledge helps model to improve the anomaly detection performance as well. We demonstrated the generalization capability of a single HCVAE model for different types of machines by using appropriate conditions. Additionally, to show the practicability of the proposed approach, (i) we evaluated HCVAE model on different domain and (ii) we checked the effect of partial hierarchical knowledge. Our results show that HCVAE method validates both of these points, and it outperforms the baseline system on anomaly detection task by utmost 15 % on the AUC score metric.

CLSep 25, 2024
Domain-Independent Automatic Generation of Descriptive Texts for Time-Series Data

Kota Dohi, Aoi Ito, Harsh Purohit et al.

Due to scarcity of time-series data annotated with descriptive texts, training a model to generate descriptive texts for time-series data is challenging. In this study, we propose a method to systematically generate domain-independent descriptive texts from time-series data. We identify two distinct approaches for creating pairs of time-series data and descriptive texts: the forward approach and the backward approach. By implementing the novel backward approach, we create the Temporal Automated Captions for Observations (TACO) dataset. Experimental results demonstrate that a contrastive learning based model trained using the TACO dataset is capable of generating descriptive texts for time-series data in novel domains.

ASSep 27, 2024
MIMII-Gen: Generative Modeling Approach for Simulated Evaluation of Anomalous Sound Detection System

Harsh Purohit, Tomoya Nishida, Kota Dohi et al.

Insufficient recordings and the scarcity of anomalies present significant challenges in developing and validating robust anomaly detection systems for machine sounds. To address these limitations, we propose a novel approach for generating diverse anomalies in machine sound using a latent diffusion-based model that integrates an encoder-decoder framework. Our method utilizes the Flan-T5 model to encode captions derived from audio file metadata, enabling conditional generation through a carefully designed U-Net architecture. This approach aids our model in generating audio signals within the EnCodec latent space, ensuring high contextual relevance and quality. We objectively evaluated the quality of our generated sounds using the Fréchet Audio Distance (FAD) score and other metrics, demonstrating that our approach surpasses existing models in generating reliable machine audio that closely resembles actual abnormal conditions. The evaluation of the anomaly detection system using our generated data revealed a strong correlation, with the area under the curve (AUC) score differing by 4.8\% from the original, validating the effectiveness of our generated data. These results demonstrate the potential of our approach to enhance the evaluation and robustness of anomaly detection systems across varied and previously unseen conditions. Audio samples can be found at \url{https://hpworkhub.github.io/MIMII-Gen.github.io/}.

CLSep 24, 2025
DiffNator: Generating Structured Explanations of Time-Series Differences

Kota Dohi, Tomoya Nishida, Harsh Purohit et al.

In many IoT applications, the central interest lies not in individual sensor signals but in their differences, yet interpreting such differences requires expert knowledge. We propose DiffNator, a framework for structured explanations of differences between two time series. We first design a JSON schema that captures the essential properties of such differences. Using the Time-series Observations of Real-world IoT (TORI) dataset, we generate paired sequences and train a model that combine a time-series encoder with a frozen LLM to output JSON-formatted explanations. Experimental results show that DiffNator generates accurate difference explanations and substantially outperforms both a visual question answering (VQA) baseline and a retrieval method using a pre-trained time-series encoder.

ASJul 28, 2025
MIMII-Agent: Leveraging LLMs with Function Calling for Relative Evaluation of Anomalous Sound Detection

Harsh Purohit, Tomoya Nishida, Kota Dohi et al.

This paper proposes a method for generating machine-type-specific anomalies to evaluate the relative performance of unsupervised anomalous sound detection (UASD) systems across different machine types, even in the absence of real anomaly sound data. Conventional keyword-based data augmentation methods often produce unrealistic sounds due to their reliance on manually defined labels, limiting scalability as machine types and anomaly patterns diversify. Advanced audio generative models, such as MIMII-Gen, show promise but typically depend on anomalous training data, making them less effective when diverse anomalous examples are unavailable. To address these limitations, we propose a novel synthesis approach leveraging large language models (LLMs) to interpret textual descriptions of faults and automatically select audio transformation functions, converting normal machine sounds into diverse and plausible anomalous sounds. We validate this approach by evaluating a UASD system trained only on normal sounds from five machine types, using both real and synthetic anomaly data. Experimental results reveal consistent trends in relative detection difficulty across machine types between synthetic and real anomalies. This finding supports our hypothesis and highlights the effectiveness of the proposed LLM-based synthesis approach for relative evaluation of UASD systems.

CLMar 27, 2025
Retrieving Time-Series Differences Using Natural Language Queries

Kota Dohi, Tomoya Nishida, Harsh Purohit et al.

Effectively searching time-series data is essential for system analysis; however, traditional methods often require domain expertise to define search criteria. Recent advancements have enabled natural language-based search, but these methods struggle to handle differences between time-series data. To address this limitation, we propose a natural language query-based approach for retrieving pairs of time-series data based on differences specified in the query. Specifically, we define six key characteristics of differences, construct a corresponding dataset, and develop a contrastive learning-based model to align differences between time-series data with query texts. Experimental results demonstrate that our model achieves an overall mAP score of 0.994 in retrieving time-series pairs.

ASJun 11, 2024
Description and Discussion on DCASE 2024 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Tomoya Nishida, Noboru Harada, Daisuke Niizumi et al.

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2024 Challenge Task 2: First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring. Continuing from last year's DCASE 2023 Challenge Task 2, we organize the task as a first-shot problem under domain generalization required settings. The main goal of the first-shot problem is to enable rapid deployment of ASD systems for new kinds of machines without the need for machine-specific hyperparameter tunings. This problem setting was realized by (1) giving only one section for each machine type and (2) having completely different machine types for the development and evaluation datasets. For the DCASE 2024 Challenge Task 2, data of completely new machine types were newly collected and provided as the evaluation dataset. In addition, attribute information such as the machine operation conditions were concealed for several machine types to mimic situations where such information are unavailable. We will add challenge results and analysis of the submissions after the challenge submission deadline.

SDMay 13, 2023
Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Kota Dohi, Keisuke Imoto, Noboru Harada et al.

We present the task description of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge Task 2: ``First-shot unsupervised anomalous sound detection (ASD) for machine condition monitoring''. The main goal is to enable rapid deployment of ASD systems for new kinds of machines without the need for hyperparameter tuning. In the past ASD tasks, developed methods tuned hyperparameters for each machine type, as the development and evaluation datasets had the same machine types. However, collecting normal and anomalous data as the development dataset can be infeasible in practice. In 2023 Task 2, we focus on solving the first-shot problem, which is the challenge of training a model on a completely novel machine type. Specifically, (i) each machine type has only one section (a subset of machine type) and (ii) machine types in the development and evaluation datasets are completely different. Analysis of 86 submissions from 23 teams revealed that the keys to outperform baselines were: 1) sampling techniques for dealing with class imbalances across different domains and attributes, 2) generation of synthetic samples for robust detection, and 3) use of multiple large pre-trained models to extract meaningful embeddings for the anomaly detector.

ASJun 8, 2021
Description and Discussion on DCASE 2021 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring under Domain Shifted Conditions

Yohei Kawaguchi, Keisuke Imoto, Yuma Koizumi et al.

We present the task description and discussion on the results of the DCASE 2021 Challenge Task 2. In 2020, we organized an unsupervised anomalous sound detection (ASD) task, identifying whether a given sound was normal or anomalous without anomalous training data. In 2021, we organized an advanced unsupervised ASD task under domain-shift conditions, which focuses on the inevitable problem of the practical use of ASD systems. The main challenge of this task is to detect unknown anomalous sounds where the acoustic characteristics of the training and testing samples are different, i.e., domain-shifted. This problem frequently occurs due to changes in seasons, manufactured products, and/or environmental noise. We received 75 submissions from 26 teams, and several novel approaches have been developed in this challenge. On the basis of the analysis of the evaluation results, we found that there are two types of remarkable approaches that TOP-5 winning teams adopted: 1) ensemble approaches of ``outlier exposure'' (OE)-based detectors and ``inlier modeling'' (IM)-based detectors and 2) approaches based on IM-based detection for features learned in a machine-identification task.

SDMay 6, 2021
MIMII DUE: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection with Domain Shifts due to Changes in Operational and Environmental Conditions

Ryo Tanabe, Harsh Purohit, Kota Dohi et al.

In this paper, we introduce MIMII DUE, a new dataset for malfunctioning industrial machine investigation and inspection with domain shifts due to changes in operational and environmental conditions. Conventional methods for anomalous sound detection face practical challenges because the distribution of features changes between the training and operational phases (called domain shift) due to various real-world factors. To check the robustness against domain shifts, we need a dataset that actually includes domain shifts, but such a dataset does not exist so far. The new dataset we created consists of the normal and abnormal operating sounds of five different types of industrial machines under two different operational/environmental conditions (source domain and target domain) independent of normal/abnormal, with domain shifts occurring between the two domains. Experimental results showed significant performance differences between the source and target domains, indicating that the dataset contains the domain shifts. These findings demonstrate that the dataset will be helpful for checking the robustness against domain shifts. The dataset is a subset of the dataset for DCASE 2021 Challenge Task 2 and freely available for download at https://zenodo.org/record/4740355

ASMar 16, 2021
Flow-based Self-supervised Density Estimation for Anomalous Sound Detection

Kota Dohi, Takashi Endo, Harsh Purohit et al.

To develop a machine sound monitoring system, a method for detecting anomalous sound is proposed. Exact likelihood estimation using Normalizing Flows is a promising technique for unsupervised anomaly detection, but it can fail at out-of-distribution detection since the likelihood is affected by the smoothness of the data. To improve the detection performance, we train the model to assign higher likelihood to target machine sounds and lower likelihood to sounds from other machines of the same machine type. We demonstrate that this enables the model to incorporate a self-supervised classification-based approach. Experiments conducted using the DCASE 2020 Challenge Task2 dataset showed that the proposed method improves the AUC by 4.6% on average when using Masked Autoregressive Flow (MAF) and by 5.8% when using Glow, which is a significant improvement over the previous method.

ASSep 25, 2020
Deep Autoencoding GMM-based Unsupervised Anomaly Detection in Acoustic Signals and its Hyper-parameter Optimization

Harsh Purohit, Ryo Tanabe, Takashi Endo et al.

Failures or breakdowns in factory machinery can be costly to companies, so there is an increasing demand for automatic machine inspection. Existing approaches to acoustic signal-based unsupervised anomaly detection, such as those using a deep autoencoder (DA) or Gaussian mixture model (GMM), have poor anomaly-detection performance. In this work, we propose a new method based on a deep autoencoding Gaussian mixture model with hyper-parameter optimization (DAGMM-HO). In our method, the DAGMM-HO applies the conventional DAGMM to the audio domain for the first time, with the idea that its total optimization on reduction of dimensions and statistical modelling will improve the anomaly-detection performance. In addition, the DAGMM-HO solves the hyper-parameter sensitivity problem of the conventional DAGMM by performing hyper-parameter optimization based on the gap statistic and the cumulative eigenvalues. Our evaluation of the proposed method with experimental data of the industrial fans showed that it significantly outperforms previous approaches and achieves up to a 20% improvement based on the standard AUC score.

ASJun 10, 2020
Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring

Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto et al.

In this paper, we present the task description and discuss the results of the DCASE 2020 Challenge Task 2: Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring. The goal of anomalous sound detection (ASD) is to identify whether the sound emitted from a target machine is normal or anomalous. The main challenge of this task is to detect unknown anomalous sounds under the condition that only normal sound samples have been provided as training data. We have designed this challenge as the first benchmark of ASD research, which includes a large-scale dataset, evaluation metrics, and a simple baseline system. We received 117 submissions from 40 teams, and several novel approaches have been developed as a result of this challenge. On the basis of the analysis of the evaluation results, we discuss two new approaches and their problems.

ASMay 19, 2020
Anomalous sound detection based on interpolation deep neural network

Kaori Suefusa, Tomoya Nishida, Harsh Purohit et al.

As the labor force decreases, the demand for labor-saving automatic anomalous sound detection technology that conducts maintenance of industrial equipment has grown. Conventional approaches detect anomalies based on the reconstruction errors of an autoencoder. However, when the target machine sound is non-stationary, a reconstruction error tends to be large independent of an anomaly, and its variations increased because of the difficulty of predicting the edge frames. To solve the issue, we propose an approach to anomalous detection in which the model utilizes multiple frames of a spectrogram whose center frame is removed as an input, and it predicts an interpolation of the removed frame as an output. Rather than predicting the edge frames, the proposed approach makes the reconstruction error consistent with the anomaly. Experimental results showed that the proposed approach achieved 27% improvement based on the standard AUC score, especially against non-stationary machinery sounds.

SDSep 20, 2019
MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection

Harsh Purohit, Ryo Tanabe, Kenji Ichige et al.

Factory machinery is prone to failure or breakdown, resulting in significant expenses for companies. Hence, there is a rising interest in machine monitoring using different sensors including microphones. In the scientific community, the emergence of public datasets has led to advancements in acoustic detection and classification of scenes and events, but there are no public datasets that focus on the sound of industrial machines under normal and anomalous operating conditions in real factory environments. In this paper, we present a new dataset of industrial machine sounds that we call a sound dataset for malfunctioning industrial machine investigation and inspection (MIMII dataset). Normal sounds were recorded for different types of industrial machines (i.e., valves, pumps, fans, and slide rails), and to resemble a real-life scenario, various anomalous sounds were recorded (e.g., contamination, leakage, rotating unbalance, and rail damage). The purpose of releasing the MIMII dataset is to assist the machine-learning and signal-processing community with their development of automated facility maintenance. The MIMII dataset is freely available for download at: https://zenodo.org/record/3384388