Karan Jani

CYOct 27, 2016Code

The Promise and Prejudice of Big Data in Intelligence Community

Karan Jani

Big data holds critical importance in the current generation of information technology, with applications ranging from financial, industrial, academic to defense sectors. With the exponential rise of open source data from social media and increasing government monitoring, big data is now also linked with national security, and subsequently to the intelligence community. In this study I review the scope of big data sciences in the functioning of intelligence community. The major part of my study focuses on the inherent limitations of big data, which affects the intelligence agencies from gathering of information to anticipating surprises. The limiting factors range from technical to ethical issues connected with big data. My study concludes the need of experts with domain knowledge from intelligence community to efficiently guide big data analysis for timely filling the knowledge gaps. As a case study on limitations of using big data, I narrate some of the ongoing work in nuclear intelligence using simple analytics and argue on why big data analysis in that case would lead to unnecessary complications. For further investigation, I highlight cases of crowdsource forecasting tournaments and predicting unrest from social media.

IMJan 27

The Sound of Noise: Leveraging the Inductive Bias of Pre-trained Audio Transformers for Glitch Identification in LIGO

Suyash Deshmukh, Chayan Chatterjee, Abigail Petulante et al.

Transient noise artifacts, or glitches, fundamentally limit the sensitivity of gravitational-wave (GW) interferometers and can mimic true astrophysical signals, particularly the short-duration intermediate-mass black hole (IMBH) mergers. Current glitch classification methods, such as Gravity Spy, rely on supervised models trained from scratch using labeled datasets. These approaches suffer from a significant ``label bottleneck," requiring massive, expertly annotated datasets to achieve high accuracy and often struggling to generalize to new glitch morphologies or exotic GW signals encountered in observing runs. In this work, we present a novel cross-domain framework that treats GW strain data through the lens of audio processing. We utilize the Audio Spectrogram Transformer (AST), a model pre-trained on large-scale audio datasets, and adapt it to the GW domain. Instead of learning time-frequency features from scratch, our method exploits the strong inductive bias inherent in pre-trained audio models, transferring learned representations of natural sound to the characterization of detector noise and GW signals, including IMBHs. We validate this approach by analyzing strain data from the third (O3) and fourth (O4) observing runs of the LIGO detectors. We used t-Distributed Stochastic Neighbor Embedding (t-SNE), an unsupervised clustering technique, to visualize the AST-derived embeddings of signals and glitches, revealing well-separated groups that align closely with independently validated Gravity Spy glitch classes. Our results indicate that the inductive bias from audio pre-training allows superior feature extraction compared to traditional supervised techniques, offering a robust, data-efficient pathway for discovering new, anomalous transients, and classifying complex noise artifacts in the era of next-generation detectors.

Karan Jani

2 Papers