Jun Kato

HC
h-index19
6papers
13citations
Novelty40%
AI Score24

6 Papers

HCMay 20, 2025
Super Kawaii Vocalics: Amplifying the "Cute" Factor in Computer Voice

Yuto Mandai, Katie Seaborn, Tomoyasu Nakano et al.

"Kawaii" is the Japanese concept of cute, which carries sociocultural connotations related to social identities and emotional responses. Yet, virtually all work to date has focused on the visual side of kawaii, including in studies of computer agents and social robots. In pursuit of formalizing the new science of kawaii vocalics, we explored what elements of voice relate to kawaii and how they might be manipulated, manually and automatically. We conducted a four-phase study (grand N = 512) with two varieties of computer voices: text-to-speech (TTS) and game character voices. We found kawaii "sweet spots" through manipulation of fundamental and formant frequencies, but only for certain voices and to a certain extent. Findings also suggest a ceiling effect for the kawaii vocalics of certain voices. We offer empirical validation of the preliminary kawaii vocalics model and an elementary method for manipulating kawaii perceptions of computer voice.

HCMay 20, 2025
Inter(sectional) Alia(s): Ambiguity in Voice Agent Identity via Intersectional Japanese Self-Referents

Takao Fujii, Katie Seaborn, Madeleine Steeds et al.

Conversational agents that mimic people have raised questions about the ethics of anthropomorphizing machines with human social identity cues. Critics have also questioned assumptions of identity neutrality in humanlike agents. Recent work has revealed that intersectional Japanese pronouns can elicit complex and sometimes evasive impressions of agent identity. Yet, the role of other "neutral" non-pronominal self-referents (NPSR) and voice as a socially expressive medium remains unexplored. In a crowdsourcing study, Japanese participants (N = 204) evaluated three ChatGPT voices (Juniper, Breeze, and Ember) using seven self-referents. We found strong evidence of voice gendering alongside the potential of intersectional self-referents to evade gendering, i.e., ambiguity through neutrality and elusiveness. Notably, perceptions of age and formality intersected with gendering as per sociolinguistic theories, especially boku and watakushi. This work provides a nuanced take on agent identity perceptions and champions intersectional and culturally-sensitive work on voice agents.

LGOct 21, 2024
Deep Graph Attention Networks

Jun Kato, Airi Mita, Keita Gobara et al.

Graphs are useful for representing various realworld objects. However, graph neural networks (GNNs) tend to suffer from over-smoothing, where the representations of nodes of different classes become similar as the number of layers increases, leading to performance degradation. A method that does not require protracted tuning of the number of layers is needed to effectively construct a graph attention network (GAT), a type of GNN. Therefore, we introduce a method called "DeepGAT" for predicting the class to which nodes belong in a deep GAT. It avoids over-smoothing in a GAT by ensuring that nodes in different classes are not similar at each layer. Using DeepGAT to predict class labels, a 15-layer network is constructed without the need to tune the number of layers. DeepGAT prevented over-smoothing and achieved a 15-layer GAT with similar performance to a 2-layer GAT, as indicated by the similar attention coefficients. DeepGAT enables the training of a large network to acquire similar attention coefficients to a network with few layers. It avoids the over-smoothing problem and obviates the need to tune the number of layers, thus saving time and enhancing GNN performance.

HCJul 27, 2021
Guided Optimization for Image Processing Pipelines

Yuka Ikarashi, Jonathan Ragan-Kelley, Tsukasa Fukusato et al.

Writing high-performance image processing code is challenging and labor-intensive. The Halide programming language simplifies this task by decoupling high-level algorithms from "schedules" which optimize their implementation. However, even with this abstraction, it is still challenging for Halide programmers to understand complicated scheduling strategies and productively write valid, optimized schedules. To address this, we propose a programming support method called "guided optimization." Guided optimization provides programmers a set of valid optimization options and interactive feedback about their current choices, which enables them to comprehend and efficiently optimize image processing code without the time-consuming trial-and-error process of traditional text editors. We implemented a proof-of-concept system, Roly-poly, which integrates guided optimization, program visualization, and schedule cost estimation to support the comprehension and development of efficient Halide image processing code. We conducted a user study with novice Halide programmers and confirmed that Roly-poly and its guided optimization was informative, increased productivity, and resulted in higher-performing schedules in less time.

CVJun 21, 2020
Lyric Video Analysis Using Text Detection and Tracking

Shota Sakaguchi, Jun Kato, Masataka Goto et al.

We attempt to recognize and track lyric words in lyric videos. Lyric video is a music video showing the lyric words of a song. The main characteristic of lyric videos is that the lyric words are shown at frames synchronously with the music. The difficulty of recognizing and tracking the lyric words is that (1) the words are often decorated and geometrically distorted and (2) the words move arbitrarily and drastically in the video frame. The purpose of this paper is to analyze the motion of the lyric words in lyric videos, as the first step of automatic lyric video generation. In order to analyze the motion of lyric words, we first apply a state-of-the-art scene text detector and recognizer to each video frame. Then, lyric-frame matching is performed to establish the optimal correspondence between lyric words and the frames. After fixing the motion trajectories of individual lyric words from correspondence, we analyze the trajectories of the lyric words by k-medoids clustering and dynamic time warping (DTW).

CYJan 22, 2020
ClassCode: An Interactive Teaching and Learning Environment for Programming Education in Classrooms

Ryo Suzuki, Jun Kato, Koji Yatani

Programming education is becoming important as demands on computer literacy and coding skills are growing. Despite the increasing popularity of interactive online learning systems, many programming courses in schools have not changed their teaching format from the conventional classroom setting. We see two research opportunities here. Students may have diverse expertise and experience in programming. Thus, particular content and teaching speed can be disengaging for experienced students or discouraging for novice learners. In a large classroom, instructors cannot oversee the learning progress of each student, and have difficulty matching teaching materials with the comprehension level of individual students. We present ClassCode, a web-based environment tailored to programming education in classrooms. Students can take online tutorials prepared by instructors at their own pace. They can then deepen their understandings by performing interactive coding exercises interleaved within tutorials. ClassCode tracks all interactions by each student, and summarizes them to instructors. This serves as a progress report, facilitating the instructors to provide additional explanations in-situ or revise course materials. Our user evaluation through a small lecture and expert review by instructors and teaching assistants confirm the potential of ClassCode by uncovering how it could address issues in existing programming courses at universities.