Joonhyung Bae

SD
h-index8
7papers
6citations
Novelty29%
AI Score47

7 Papers

IRMay 15Code
BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

Joonhyung Bae

Bioart's hybrid nature spanning art, science, technology, ethics, and politics defies traditional single-axis categorization. I present BioArtlas, analyzing 81 bioart works across thirteen curated dimensions using novel axis-aware representations that preserve semantic distinctions while enabling cross-dimensional comparison. Our codebook-based approach groups related concepts into unified clusters, addressing polysemy in cultural terminology. Comprehensive evaluation of up to 800 representation-space-algorithm combinations identifies Agglomerative clustering at k=15 on 4D UMAP as optimal (silhouette 0.664 +/- 0.008, trustworthiness/continuity 0.805/0.812). The approach reveals four organizational patterns: artist-specific methodological cohesion, technique-based segmentation, temporal artistic evolution, and trans-temporal conceptual affinities. By separating analytical optimization from public communication, I provide rigorous analysis and accessible exploration through an interactive web interface (https://www.bioartlas.com) with the dataset publicly available (https://github.com/joonhyungbae/BioArtlas).

MMFeb 28Code
Amanous: Distribution-Switching for Superhuman Piano Density on Disklavier

Joonhyung Bae

The automated piano enables note densities, polyphony, and register changes far beyond human physical limits, yet the three dominant traditions for composing such textures--Nancarrow's tempo canons, Xenakis's stochastic distributions, and L-system grammars--have developed in isolation. This paper presents Amanous, a hardware-aware composition system for Yamaha Disklavier that unifies these methodologies through distribution-switching: L-system symbols select distinct distributional regimes rather than merely modulating parameters within a fixed family. Four contributions are reported. (1) A four-layer architecture (symbolic, parametric, numeric, physical) produces statistically distinct sections with large effect sizes (d = 3.70-5.34), validated by per-layer degradation and ablation experiments. (2) A hardware abstraction layer formalizes velocity-dependent latency and key reset constraints, keeping superhuman textures within the Disklavier's actuable envelope. (3) A density sweep reveals a computational saturation transition at 24-30 notes/s (bootstrap 95% CI: 23.3-50.0), beyond which single-domain melodic metrics lose discriminative power and cross-domain coupling becomes necessary. (4) A convergence point calculus operationalizes tempo-canon geometry as a control interface, enabling convergence events to trigger distribution switches linking macro-temporal structure to micro-level texture. All results are computational; a psychoacoustic validation protocol is proposed for future work. The pipeline has been deployed on a physical Disklavier, demonstrating algorithmic self-consistency and sub-millisecond software precision. Supplementary materials (Excerpts 1-4): https://www.amanous.xyz. Source code: https://github.com/joonhyungbae/Amanous.

AIApr 6
Tipiano: Cascaded Piano Hand Motion Synthesis via Fingertip Priors

Joonhyung Bae, Kirak Kim, Hyeyoon Cho et al.

Synthesizing realistic piano hand motions requires both precision and naturalness. Physics-based methods achieve precision but produce stiff motions; data-driven models learn natural dynamics but struggle with positional accuracy. Piano motion exhibits a natural hierarchy: fingertip positions are nearly deterministic given piano geometry and fingering, while wrist and intermediate joints offer stylistic freedom. We present [OURS], a four-stage framework exploiting this hierarchy: (1) statistics-based fingertip positioning, (2) FiLM-conditioned trajectory refinement, (3) wrist estimation, and (4) STGCN-based pose synthesis. We contribute expert-annotated fingerings for the FürElise dataset (153 pieces, ~10 hours). Experiments demonstrate F1 = 0.910, substantially outperforming diffusion baselines (F1 = 0.121), with user study (N=41) confirming quality approaching motion capture. Expert evaluation by professional pianists (N=5) identified anticipatory motion as the key remaining gap, providing concrete directions for future improvement.

SDMay 14
PiAnnotate: A Web Annotation Tool for Piano Fingering, with a Diagnostic Probe

Joonhyung Bae, Kirak Kim, Hyeyoon Cho et al.

Piano fingering shapes how a passage can be played, yet it is difficult to label after a performance. An annotator must decide which finger produced each note while reconciling the score, timing, video, and hand motion. We present PiAnnotate, a web-based pipeline for adding expert fingering annotations to the FurElise performance dataset. The tool brings together a piano-roll view, performance video, and a 3D MANO hand mesh so that reviewers can inspect each assignment in musical and physical context. Rather than storing only the final answer, PiAnnotate keeps paired rule-based and human-edited fingering tracks. These paired tracks make the annotation history auditable by showing where a geometric rule was sufficient, where experts intervened, and how labels changed across review passes. As a final diagnostic, we train a small Transformer probe on the paired tracks. The probe improves on the rule baseline on held-out pieces while remaining conservative about changing labels that were already correct, suggesting that the edited labels contain learnable structure rather than only isolated fixes.

DLMar 28
ARTLAS: Mapping Art-Technology Institutions via Conceptual Axes, Text Embeddings, and Unsupervised Clustering

Joonhyung Bae

The global landscape of art-technology institutions, including festivals, biennials, research labs, conferences, and hybrid organizations, has grown increasingly diverse, yet systematic frameworks for analyzing their multidimensional characteristics remain scarce. This paper proposes ARTLAS, a computational methodology combining an eight-axis conceptual framework (Curatorial Philosophy, Territorial Relation, Knowledge Production Mode, Institutional Genealogy, Temporal Orientation, Ecosystem Function, Audience Relation, and Disciplinary Positioning) with a text-embedding and clustering pipeline to map 78 cultural-technology institutions into a unified analytical space. Each institution is characterized through qualitative descriptions along the eight axes, encoded via E5-large-v2 sentence embeddings and quantized through a word-level codebook into TF-IDF feature vectors. Dimensionality reduction using UMAP, followed by agglomerative clustering (Average linkage, k=10), yields a composite score of 0.825, a silhouette coefficient of 0.803, and a Calinski-Harabasz index of 11,196. Non-negative matrix factorization extracts ten latent topics, and a neighbor-cluster entropy measure identifies boundary institutions bridging multiple thematic communities. An interactive web-based visualization tool built with React enables stakeholders to explore institutional similarities, thematic profiles, and cross-disciplinary connections. The results reveal coherent groupings such as an art-science hub cluster anchored by ZKM and ArtScience Museum, an innovation and industry cluster including Ars Electronica, transmediale, and Sonar, an ACM academic community cluster comprising TEI, DIS, and NIME, and an electronic music and media cluster including CTM Festival, MUTEK, and Sonic Acts. This work contributes a replicable, data-driven approach to institutional ecology in the cultural-technology sector.

SDSep 10, 2025
PianoVAM: A Multimodal Piano Performance Dataset

Yonghyun Kim, Junhyung Park, Joonhyung Bae et al.

The multimodal nature of music performance has driven increasing interest in data beyond the audio domain within the music information retrieval (MIR) community. This paper introduces PianoVAM, a comprehensive piano performance dataset that includes videos, audio, MIDI, hand landmarks, fingering labels, and rich metadata. The dataset was recorded using a Disklavier piano, capturing audio and MIDI from amateur pianists during their daily practice sessions, alongside synchronized top-view videos in realistic and varied performance conditions. Hand landmarks and fingering labels were extracted using a pretrained hand pose estimation model and a semi-automated fingering annotation algorithm. We discuss the challenges encountered during data collection and the alignment process across different modalities. Additionally, we describe our fingering annotation method based on hand landmarks extracted from videos. Finally, we present benchmarking results for both audio-only and audio-visual piano transcription using the PianoVAM dataset and discuss additional potential applications.

SDSep 18, 2025
Two Web Toolkits for Multimodal Piano Performance Dataset Acquisition and Fingering Annotation

Junhyung Park, Yonghyun Kim, Joonhyung Bae et al.

Piano performance is a multimodal activity that intrinsically combines physical actions with the acoustic rendition. Despite growing research interest in analyzing the multimodal nature of piano performance, the laborious process of acquiring large-scale multimodal data remains a significant bottleneck, hindering further progress in this field. To overcome this barrier, we present an integrated web toolkit comprising two graphical user interfaces (GUIs): (i) PiaRec, which supports the synchronized acquisition of audio, video, MIDI, and performance metadata. (ii) ASDF, which enables the efficient annotation of performer fingering from the visual data. Collectively, this system can streamline the acquisition of multimodal piano performance datasets.