Brice Valentin Kok-Shun

11.4LGApr 2

Agentopic: A Generative AI Agent Workflow for Explainable Topic Modeling

Brice Valentin Kok-Shun, Johnny Chan, Gabrielle Peko et al.

Agentopic is a novel agent-based workflow for explainable topic modeling that leverages the reasoning capabilities of Large Language Models (LLMs). Existing topic modeling approaches such as Latent Dirichlet Allocation (LDA) and BERTopic often lack transparency on how topics are assigned or grouped. Agentopic addresses this by using multiple agents that collaboratively perform topic identification, validation, hierarchical grouping, and natural language explanation. This design enables users to trace the reasoning behind topic assignments, enhancing interpretability without sacrificing accuracy. When seeded with topics from the British Broadcasting Corporation (BBC) dataset, Agentopic achieves an F1-score of 0.95, matching GPT-4.1, improving on LDA (0.93), and close to BERTopic (0.98). We used Agentopic to augment the BBC dataset with generated explanations to improve the dataset's richness and context. The unseeded Agentopic generated 2045 semantically coherent topics organized across six hierarchical levels, vastly enriching the original five-category structure. By embedding explainability throughout the workflow, Agentopic offers an interpretable alternative to black-box models, making it particularly valuable for crucial applications like finance and healthcare.

LGFeb 20, 2025

Leveraging ChatGPT for Sponsored Ad Detection and Keyword Extraction in YouTube Videos

Brice Valentin Kok-Shun, Johnny Chan

This work-in-progress paper presents a novel approach to detecting sponsored advertisement segments in YouTube videos and comparing the advertisement with the main content. Our methodology involves the collection of 421 auto-generated and manual transcripts which are then fed into a prompt-engineered GPT-4o for ad detection, a KeyBERT for keyword extraction, and another iteration of ChatGPT for category identification. The results revealed a significant prevalence of product-related ads across various educational topics, with ad categories refined using GPT-4o into succinct 9 content and 4 advertisement categories. This approach provides a scalable and efficient alternative to traditional ad detection methods while offering new insights into the types and relevance of ads embedded within educational content. This study highlights the potential of LLMs in transforming ad detection processes and improving our understanding of advertisement strategies in digital media.

Brice Valentin Kok-Shun

2 Papers