Yeonsoo Kim

h-index5
2papers

2 Papers

CLOct 13, 2025Code
KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

Yejin Lee, Su-Hyeon Kim, Hyundong Jin et al.

Toxic content has become an increasingly critical social issue with the rapid expansion of online communication. While numerous studies explored methods for detecting and detoxifying such content, most have focused primarily on English, leaving low-resource language underrepresented. Consequently, Large Language Models~(LLMs) often struggle to identify and neutralize toxic expressions in these languages. This challenge becomes even more pronounced when user employ obfuscation techniques to evade detection systems. Therefore, we propose a \textbf{KOTOX: Korean Toxic Dataset} for deobfuscation and detoxicification to address this issue. We categorize various obfuscation approaches based on linguistic characteristics of Korean and define a set of transformation rules grounded in real-word examples. Using these rules, we construct three dataset versions (easy, normal, and hard) representing different levels of obfuscation difficulty. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigating of obfuscated toxic content in LLM for low-resource languages. Our code and data are available at https://github.com/leeyejin1231/KOTOX.

DCOct 23, 2019
The Economics of Smart Contracts

Kirk Baird, Seongho Jeong, Yeonsoo Kim et al.

Ethereum is a distributed blockchain that can execute smart contracts, which inter-communicate and perform transactions automatically. The execution of smart contracts is paid in the form of gas, which is a monetary unit used in the Ethereum blockchain. The Ethereum Virtual Machine (EVM) provides the metering capability for smart contract execution. Instruction costs vary depending on the instruction type and the approximate computational resources required to execute the instruction on the network. The cost of gas is adjusted using transaction fees to ensure adequate payment of the network. In this work, we highlight the "real" economics of smart contracts. We show that the actual costs of executing smart contracts are disproportionate to the computational costs and that this gap is continuously widening. We show that the gas cost-model of the underlying EVM instruction-set is wrongly modeled. Specifically, the computational cost for the SLOAD instruction increases with the length of the blockchain. Our proposed performance model estimates gas usage and execution time of a smart contract at a given block-height. The new gas-cost model incorporates the block-height to eliminate irregularities in the Ethereum gas calculations. Our findings are based on extensive experiments over the entire history of the EVM blockchain.