LGOct 4, 2023
Memoria: Resolving Fateful Forgetting Problem through Human-Inspired Memory ArchitectureSangjun Park, JinYeong Bak
Making neural networks remember over the long term has been a longstanding issue. Although several external memory techniques have been introduced, most focus on retaining recent information in the short term. Regardless of its importance, information tends to be fatefully forgotten over time. We present Memoria, a memory system for artificial neural networks, drawing inspiration from humans and applying various neuroscientific and psychological theories. The experimental results prove the effectiveness of Memoria in the diverse tasks of sorting, language modeling, and classification, surpassing conventional techniques. Engram analysis reveals that Memoria exhibits the primacy, recency, and temporal contiguity effects which are characteristics of human memory.
CLJul 3, 2024
MentalAgora: A Gateway to Advanced Personalized Care in Mental Health through Multi-Agent Debating and Attribute ControlYeonji Lee, Sangjun Park, Kyunghyun Cho et al.
As mental health issues globally escalate, there is a tremendous need for advanced digital support systems. We introduce MentalAgora, a novel framework employing large language models enhanced by interaction between multiple agents for tailored mental health support. This framework operates through three stages: strategic debating, tailored counselor creation, and response generation, enabling the dynamic customization of responses based on individual user preferences and therapeutic needs. We conduct experiments utilizing a high-quality evaluation dataset TherapyTalk crafted with mental health professionals, shwoing that MentalAgora generates expert-aligned and user preference-enhanced responses. Our evaluations, including experiments and user studies, demonstrate that MentalAgora aligns with professional standards and effectively meets user preferences, setting a new benchmark for digital mental health interventions.
ASApr 4, 2022
Into-TTS : Intonation Template Based Prosody Control SystemJihwan Lee, Joun Yeop Lee, Heejin Choi et al.
Intonations play an important role in delivering the intention of a speaker. However, current end-to-end TTS systems often fail to model proper intonations. To alleviate this problem, we propose a novel, intuitive method to synthesize speech in different intonations using predefined intonation templates. Prior to TTS model training, speech data are grouped into intonation templates in an unsupervised manner. Two proposed modules are added to the end-to-end TTS framework: an intonation predictor and an intonation encoder. The intonation predictor recommends a suitable intonation template to the given text. The intonation encoder, attached to the text encoder output, synthesizes speech abiding the requested intonation template. Main contributions of our paper are: (a) an easy-to-use intonation control system covering a wide range of users; (b) better performance in wrapping speech in a requested intonation with improved objective and subjective evaluation; and (c) incorporating a pre-trained language model for intonation modelling. Audio samples are available at https://srtts.github.io/IntoTTS.
ASMar 27, 2022
Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to EdgeSangjun Park, Kihyun Choo, Joohyung Lee et al.
Text-to-Speech (TTS) services that run on edge devices have many advantages compared to cloud TTS, e.g., latency and privacy issues. However, neural vocoders with a low complexity and small model footprint inevitably generate annoying sounds. This study proposes a Bunched LPCNet2, an improved LPCNet architecture that provides highly efficient performance in high-quality for cloud servers and in a low-complexity for low-resource edge devices. Single logistic distribution achieves computational efficiency, and insightful tricks reduce the model footprint while maintaining speech quality. A DualRate architecture, which generates a lower sampling rate from a prosody model, is also proposed to reduce maintenance costs. The experiments demonstrate that Bunched LPCNet2 generates satisfactory speech quality with a model footprint of 1.1MB while operating faster than real-time on a RPi 3B. Our audio samples are available at https://srtts.github.io/bunchedLPCNet2.
ROMay 14
Diffusion Policy for Coordinated Control of a Nonholonomic Mobile Base and Dual Arms in Door Opening and PassingShangqun Yu, Matthew En, Daniel Wu et al.
Opening heavy, self closing doors, especially those that require pulling remains a long standing challenge in robotics. Humans naturally employ both arms in a dexterous manner, rotating the handle, widening the gap, holding the door, switching arms when needed, and moving through while maintaining clearance. To replicate such behaviors, a robot must perform a long sequence of motions spanning multiple stages and interactions with different parts of the door. Traditional approaches rely on state machines that transition between manually defined stages (e.g., pulling after the knob is rotated, passing after the gap is sufficiently wide). While intuitive, these methods lack robustness, as hand crafted trajectories fail to generalize to the diversity of real world conditions without extensive engineering effort. Recent advances in imitation learning offer a scalable alternative, yet no existing visual action model has demonstrated simultaneous coordination of a nonholonomic base and dual arms for the complete door opening and passing task. In this paper, we tackle this complex, highly constrained problem using a diffusion based visuomotor control policy. Our results demonstrate that a single end to end policy can be learned to execute long horizon tasks requiring tight coordination between manipulation and locomotion. The resulting policy not only achieves a high success rate in opening and traversing damped pull doors but also demonstrates strong robustness to external disturbances capabilities that are difficult to realize with traditional methods.
CLMay 23, 2025Code
CReSt: A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured DocumentsMinsoo Khang, Sangjun Park, Teakgyu Hong et al.
Large Language Models (LLMs) have made substantial progress in recent years, yet evaluating their capabilities in practical Retrieval-Augmented Generation (RAG) scenarios remains challenging. In practical applications, LLMs must demonstrate complex reasoning, refuse to answer appropriately, provide precise citations, and effectively understand document layout. These capabilities are crucial for advanced task handling, uncertainty awareness, maintaining reliability, and structural understanding. While some of the prior works address these aspects individually, there is a need for a unified framework that evaluates them collectively in practical RAG scenarios. To address this, we present CReSt (A Comprehensive Benchmark for Retrieval-Augmented Generation with Complex Reasoning over Structured Documents), a benchmark designed to assess these key dimensions holistically. CReSt comprises 2,245 human-annotated examples in English and Korean, designed to capture practical RAG scenarios that require complex reasoning over structured documents. It also introduces a tailored evaluation methodology to comprehensively assess model performance in these critical areas. Our evaluation shows that even advanced LLMs struggle to perform consistently across these dimensions, underscoring key areas for improvement. We release CReSt to support further research and the development of more robust RAG systems. The dataset and code are available at: https://github.com/UpstageAI/CReSt.
CRJan 26, 2021Code
Ethereum ECCPoWHyoungsung Kim, Jehyuk Jang, Sangjun Park et al.
The error-correction code based proof-of-work (ECCPoW) algorithm is based on a low-density parity-check (LDPC) code. The ECCPoW is possible to impair ASIC with its time-varying capability of the parameters of LDPC code. Previous researches on the ECCPoW algorithm have presented its theory and implementation on Bitcoin. But they do not discuss how stable the block generation time is. A finite mean block generation time (BGT) and none heavy-tail BGT distribution are the ones of the focus in this study. In the ECCPoW algorithm, BGT may show a long-tailed distribution due to time-varying cryptographic puzzles. Thus, it is of interest to see if the BGT distribution is not heavy-tailed and if it shows a finite mean. If the distribution is heavy-tailed, then confirmation of a transaction cannot be guaranteed. We present implementation, simulation, and validation of ECCPoW Ethereum. In implementation, we explain how the ECCPoW algorithm is integrated into Ethereum 1.0 as a new consensus algorithm. In the simulation, we perform a multinode simulation to show that the ECCPoW Ethereum works well with automatic difficulty change. In the validation, we present the statistical results of the two-sample Anderson-Darling test to show that the distribution of BGT satisfies the necessary condition of the exponential distribution. Our implementation is downloadable at https://github.com/cryptoecc/ETH-ECC.
NEFeb 2
Fine-Tuning Language Models to Know What They KnowSangjun Park, Elliot Meyerson, Xin Qiu et al.
Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions and reporting their knowledge state, this dependency in LLMs remains underexplored. This study proposes a framework to measure metacognitive ability $d_{\rm{type2}}'$ using a dual-prompt method, followed by the introduction of Evolution Strategy for Metacognitive Alignment (ESMA) to bind a model's internal knowledge to its explicit behaviors. ESMA demonstrates robust generalization across diverse untrained settings, indicating a enhancement in the model's ability to reference its own knowledge. Furthermore, parameter analysis attributes these improvements to a sparse set of significant modifications.
LGAug 4, 2025
Pigeon-SL: Robust Split Learning Framework for Edge Intelligence under Malicious ClientsSangjun Park, Tony Q. S. Quek, Hyowoon Seo
Recent advances in split learning (SL) have established it as a promising framework for privacy-preserving, communication-efficient distributed learning at the network edge. However, SL's sequential update process is vulnerable to even a single malicious client, which can significantly degrade model accuracy. To address this, we introduce Pigeon-SL, a novel scheme grounded in the pigeonhole principle that guarantees at least one entirely honest cluster among M clients, even when up to N of them are adversarial. In each global round, the access point partitions the clients into N+1 clusters, trains each cluster independently via vanilla SL, and evaluates their validation losses on a shared dataset. Only the cluster with the lowest loss advances, thereby isolating and discarding malicious updates. We further enhance training and communication efficiency with Pigeon-SL+, which repeats training on the selected cluster to match the update throughput of standard SL. We validate the robustness and effectiveness of our approach under three representative attack models -- label flipping, activation and gradient manipulation -- demonstrating significant improvements in accuracy and resilience over baseline SL methods in future intelligent wireless networks.
ASAug 11, 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech SystemsRavichander Vipperla, Sangjun Park, Kihyun Choo et al.
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).
CRJun 22, 2020
Time-Variant Proof-of-Work Using Error-Correction CodesSangjun Park, Haeung Choi, Heung-No Lee
The protocol for cryptocurrencies can be divided into three parts, namely consensus, wallet, and networking overlay. The aim of the consensus part is to bring trustless rational peer-to-peer nodes to an agreement to the current status of the blockchain. The status must be updated through valid transactions. A proof-of-work (PoW) based consensus mechanism has been proven to be secure and robust owing to its simple rule and has served as a firm foundation for cryptocurrencies such as Bitcoin and Ethereum. Specialized mining devices have emerged, as rational miners aim to maximize profit, and caused two problems: i) the re-centralization of a mining market and ii) the huge energy spending in mining. In this paper, we aim to propose a new PoW called Error-Correction Codes PoW (ECCPoW) where the error-correction codes and their decoder can be utilized for PoW. In ECCPoW, puzzles can be intentionally generated to vary from block to block, leading to a time-variant puzzle generation mechanism. This mechanism is useful in repressing the emergence of the specialized mining devices. It can serve as a solution to the two problems of recentralization and energy spending.