Anthony Miyaguchi

h-index12

15papers

40citations

Novelty22%

AI Score47

Ranked #56,252 of 201,326 authors (top 28%)#20,727 in CV (top 35%)

15 Papers

CVJul 8, 2024Code

Multi-Label Plant Species Classification with Self-Supervised Vision Transformers

Murilo Gustineli, Anthony Miyaguchi, Ian Stalter

We present a transfer learning approach using a self-supervised Vision Transformer (DINOv2) for the PlantCLEF 2024 competition, focusing on the multi-label plant species classification. Our method leverages both base and fine-tuned DINOv2 models to extract generalized feature embeddings. We train classifiers to predict multiple plant species within a single image using these rich embeddings. To address the computational challenges of the large-scale dataset, we employ Spark for distributed data processing, ensuring efficient memory management and processing across a cluster of workers. Our data processing pipeline transforms images into grids of tiles, classifying each tile, and aggregating these predictions into a consolidated set of probabilities. Our results demonstrate the efficacy of combining transfer learning with advanced data processing techniques for multi-label image classification tasks. Our code is available at https://github.com/dsgt-kaggle-clef/plantclef-2024.

CVJul 8, 2024Code

Transfer Learning with Self-Supervised Vision Transformers for Snake Identification

Anthony Miyaguchi, Murilo Gustineli, Austin Fischer et al.

We present our approach for the SnakeCLEF 2024 competition to predict snake species from images. We explore and use Meta's DINOv2 vision transformer model for feature extraction to tackle species' high variability and visual similarity in a dataset of 182,261 images. We perform exploratory analysis on embeddings to understand their structure, and train a linear classifier on the embeddings to predict species. Despite achieving a score of 39.69, our results show promise for DINOv2 embeddings in snake identification. All code for this project is available at https://github.com/dsgt-kaggle-clef/snakeclef-2024.

CVJul 10, 2024Code

Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning

Christopher Chiu, Maximilian Heil, Teresa Kim et al.

FungiCLEF 2024 addresses the fine-grained visual categorization (FGVC) of fungi species, with a focus on identifying poisonous species. This task is challenging due to the size and class imbalance of the dataset, subtle inter-class variations, and significant intra-class variability amongst samples. In this paper, we document our approach in tackling this challenge through the use of ensemble classifier heads on pre-computed image embeddings. Our team (DS@GT) demonstrate that state-of-the-art self-supervised vision models can be utilized as robust feature extractors for downstream application of computer vision tasks without the need for task-specific fine-tuning on the vision backbone. Our approach achieved the best Track 3 score (0.345), accuracy (78.4%) and macro-F1 (0.577) on the private test set in post competition evaluation. Our code is available at https://github.com/dsgt-kaggle-clef/fungiclef-2024.

CLJul 10, 2024Code

DS@GT eRisk 2024: Sentence Transformers for Social Media Risk Assessment

David Guecha, Aaryan Potdar, Anthony Miyaguchi

We present working notes for DS@GT team in the eRisk 2024 for Tasks 1 and 3. We propose a ranking system for Task 1 that predicts symptoms of depression based on the Beck Depression Inventory (BDI-II) questionnaire using binary classifiers trained on question relevancy as a proxy for ranking. We find that binary classifiers are not well calibrated for ranking, and perform poorly during evaluation. For Task 3, we use embeddings from BERT to predict the severity of eating disorder symptoms based on user post history. We find that classical machine learning models perform well on the task, and end up competitive with the baseline models. Representation of text data is crucial in both tasks, and we find that sentence transformers are a powerful tool for downstream modeling. Source code and models are available at \url{https://github.com/dsgt-kaggle-clef/erisk-2024}.

CVJul 8, 2024Code

Tile Compression and Embeddings for Multi-Label Classification in GeoLifeCLEF 2024

Anthony Miyaguchi, Patcharapong Aphiwetsa, Mark McDuffie

We explore methods to solve the multi-label classification task posed by the GeoLifeCLEF 2024 competition with the DS@GT team, which aims to predict the presence and absence of plant species at specific locations using spatial and temporal remote sensing data. Our approach uses frequency-domain coefficients via the Discrete Cosine Transform (DCT) to compress and pre-compute the raw input data for convolutional neural networks. We also investigate nearest neighborhood models via locality-sensitive hashing (LSH) for prediction and to aid in the self-supervised contrastive learning of embeddings through tile2vec. Our best competition model utilized geolocation features with a leaderboard score of 0.152 and a best post-competition score of 0.161. Source code and models are available at https://github.com/dsgt-kaggle-clef/geolifeclef-2024.

SDJun 29, 2023

Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification

Anthony Miyaguchi, Nathan Zhong, Murilo Gustineli et al.

We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition, focused on identifying African bird species in recorded soundscapes. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition. We explore the embedding space learned by BirdNET and propose a process to derive an annotated dataset for supervised learning. Our experiments involve various models and feature engineering approaches to maximize performance on the competition leaderboard. The results demonstrate the effectiveness of our approach in classifying bird species and highlight the potential of transfer learning and semi-supervised dataset annotation in similar tasks.

CVNov 14, 2025

DINOv3 as a Frozen Encoder for CRPS-Oriented Probabilistic Rainfall Nowcasting

Luciano Araujo Dourado Filho, Almir Moreira da Silva Neto, Anthony Miyaguchi et al.

This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight probabilistic head is attached to a pre-trained satellite vision encoder (DINOv3-SAT493M) to map encoder tokens into a discrete empirical CDF (eCDF) over 4-hour accumulated rainfall. The projector-head is optimized end-to-end over the Ranked Probability Score (RPS). As an alternative, 3D-UNET baselines trained with an aggregate Rank Probability Score and a per-pixel Gamma-Hurdle objective are used. On the Weather4Cast 2025 benchmark, the proposed method achieved a promising performance, with a CRPS of 3.5102, which represents $\approx$ 26% in effectiveness gain against the best 3D-UNET.

SDJun 8, 2022

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant et al.

We build a classification model for the BirdCLEF 2022 challenge using unsupervised methods. We implement an unsupervised representation of the training dataset using a triplet loss on spectrogram representation of audio motifs. Our best model performs with a score of 0.48 on the public leaderboard.

IRJul 12, 2025Code

DS@GT at Touché: Large Language Models for Retrieval-Augmented Debate

Anthony Miyaguchi, Conor Johnston, Aaryan Potdar

Large Language Models (LLMs) demonstrate strong conversational abilities. In this Working Paper, we study them in the context of debating in two ways: their ability to perform in a structured debate along with a dataset of arguments to use and their ability to evaluate utterances throughout the debate. We deploy six leading publicly available models from three providers for the Retrieval-Augmented Debate and Evaluation. The evaluation is performed by measuring four key metrics: Quality, Quantity, Manner, and Relation. Throughout this task, we found that although LLMs perform well in debates when given related arguments, they tend to be verbose in responses yet consistent in evaluation. The accompanying source code for this paper is located at https://github.com/dsgt-arc/touche-2025-rad.

CVJul 11, 2025Code

Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification

Jason Kahei Tam, Murilo Gustineli, Anthony Miyaguchi

Accurate identification of fungi species presents a unique challenge in computer vision due to fine-grained inter-species variation and high intra-species variation. This paper presents our approach for the FungiCLEF 2025 competition, which focuses on few-shot fine-grained visual categorization (FGVC) using the FungiTastic Few-Shot dataset. Our team (DS@GT) experimented with multiple vision transformer models, data augmentation, weighted sampling, and incorporating textual information. We also explored generative AI models for zero-shot classification using structured prompting but found them to significantly underperform relative to vision-based models. Our final model outperformed both competition baselines and highlighted the effectiveness of domain specific pretraining and balanced sampling strategies. Our approach ranked 35/74 on the private test set in post-completion evaluation, this suggests additional work can be done on metadata selection and domain-adapted multi-modal learning. Our code is available at https://github.com/dsgt-arc/fungiclef-2025.

CVSep 15, 2025Code

DS@GT AnimalCLEF: Triplet Learning over ViT Manifolds with Nearest Neighbor Classification for Animal Re-identification

Anthony Miyaguchi, Chandrasekaran Maruthaiyannan, Charles R. Clark

This paper details the DS@GT team's entry for the AnimalCLEF 2025 re-identification challenge. Our key finding is that the effectiveness of post-hoc metric learning is highly contingent on the initial quality and domain-specificity of the backbone embeddings. We compare a general-purpose model (DINOv2) with a domain-specific model (MegaDescriptor) as a backbone. A K-Nearest Neighbor classifier with robust thresholding then identifies known individuals or flags new ones. While a triplet-learning projection head improved the performance of the specialized MegaDescriptor model by 0.13 points, it yielded minimal gains (0.03) for the general-purpose DINOv2 on averaged BAKS and BAUS. We demonstrate that the general-purpose manifold is more difficult to reshape for fine-grained tasks, as evidenced by stagnant validation loss and qualitative visualizations. This work highlights the critical limitations of refining general-purpose features for specialized, limited-data re-ID tasks and underscores the importance of domain-specific pre-training. The implementation for this work is publicly available at github.com/dsgt-arc/animalclef-2025.

CVJul 8, 2025Code

Tile-Based ViT Inference with Visual-Cluster Priors for Zero-Shot Multi-Species Plant Identification

Murilo Gustineli, Anthony Miyaguchi, Adrian Cheung et al.

We describe DS@GT's second-place solution to the PlantCLEF 2025 challenge on multi-species plant identification in vegetation quadrat images. Our pipeline combines (i) a fine-tuned Vision Transformer ViTD2PC24All for patch-level inference, (ii) a 4x4 tiling strategy that aligns patch size with the network's 518x518 receptive field, and (iii) domain-prior adaptation through PaCMAP + K-Means visual clustering and geolocation filtering. Tile predictions are aggregated by majority vote and re-weighted with cluster-specific Bayesian priors, yielding a macro-averaged F1 of 0.348 (private leaderboard) while requiring no additional training. All code, configuration files, and reproducibility scripts are publicly available at https://github.com/dsgt-arc/plantclef-2025.

IRJan 21

DS@GT at TREC TOT 2025: Bridging Vague Recollection with Fusion Retrieval and Learned Reranking

Wenxin Zhou, Ritesh Mehta, Anthony Miyaguchi

We develop a two-stage retrieval system that combines multiple complementary retrieval methods with a learned reranker and LLM-based reranking, to address the TREC Tip-of-the-Tongue (ToT) task. In the first stage, we employ hybrid retrieval that merges LLM-based retrieval, sparse (BM25), and dense (BGE-M3) retrieval methods. We also introduce topic-aware multi-index dense retrieval that partitions the Wikipedia corpus into 24 topical domains. In the second stage, we evaluate both a trained LambdaMART reranker and LLM-based reranking. To support model training, we generate 5000 synthetic ToT queries using LLMs. Our best system achieves recall of 0.66 and NDCG@1000 of 0.41 on the test set by combining hybrid retrieval with Gemini-2.5-flash reranking, demonstrating the effectiveness of fusion retrieval.

CLJul 15, 2025

DS@GT at eRisk 2025: From prompts to predictions, benchmarking early depression detection with conversational agent based assessments and temporal attention models

Anthony Miyaguchi, David Guecha, Yuwen Chiu et al.

This Working Note summarizes the participation of the DS@GT team in two eRisk 2025 challenges. For the Pilot Task on conversational depression detection with large language-models (LLMs), we adopted a prompt-engineering strategy in which diverse LLMs conducted BDI-II-based assessments and produced structured JSON outputs. Because ground-truth labels were unavailable, we evaluated cross-model agreement and internal consistency. Our prompt design methodology aligned model outputs with BDI-II criteria and enabled the analysis of conversational cues that influenced the prediction of symptoms. Our best submission, second on the official leaderboard, achieved DCHR = 0.50, ADODL = 0.89, and ASHR = 0.27.

CVDec 10, 2024

Annotation Techniques for Judo Combat Phase Classification from Tournament Footage

Anthony Miyaguchi, Jed Moutahir, Tanmay Sutar

This paper presents a semi-supervised approach to extracting and analyzing combat phases in judo tournaments using live-streamed footage. The objective is to automate the annotation and summarization of live streamed judo matches. We train models that extract relevant entities and classify combat phases from fixed-perspective judo recordings. We employ semi-supervised methods to address limited labeled data in the domain. We build a model of combat phases via transfer learning from a fine-tuned object detector to classify the presence, activity, and standing state of the match. We evaluate our approach on a dataset of 19 thirty-second judo clips, achieving an F1 score on a $20\%$ test hold-out of 0.66, 0.78, and 0.87 for the three classes, respectively. Our results show initial promise for automating more complex information retrieval tasks using rigorous methods with limited labeled data.