SDJul 20, 2022
Cross-Modal Contrastive Representation Learning for Audio-to-Image GenerationHaeChun Chung, JooYong Shim, Jong-Kook Kim
Multiple modalities for certain information provide a variety of perspectives on that information, which can improve the understanding of the information. Thus, it may be crucial to generate data of different modality from the existing data to enhance the understanding. In this paper, we investigate the cross-modal audio-to-image generation problem and propose Cross-Modal Contrastive Representation Learning (CMCRL) to extract useful features from audios and use it in the generation phase. Experimental results show that CMCRL enhances quality of images generated than previous research.
LGJul 25, 2024
Maximum Entropy On-Policy Actor-Critic via Entropy Advantage EstimationJean Seong Bjorn Choe, Jong-Kook Kim
Entropy Regularisation is a widely adopted technique that enhances policy optimisation performance and stability. A notable form of entropy regularisation is augmenting the objective with an entropy term, thereby simultaneously optimising the expected return and the entropy. This framework, known as maximum entropy reinforcement learning (MaxEnt RL), has shown theoretical and empirical successes. However, its practical application in straightforward on-policy actor-critic settings remains surprisingly underexplored. We hypothesise that this is due to the difficulty of managing the entropy reward in practice. This paper proposes a simple method of separating the entropy objective from the MaxEnt RL objective, which facilitates the implementation of MaxEnt RL in on-policy settings. Our empirical evaluations demonstrate that extending Proximal Policy Optimisation (PPO) and Trust Region Policy Optimisation (TRPO) within the MaxEnt framework improves policy optimisation performance in both MuJoCo and Procgen tasks. Additionally, our results highlight MaxEnt RL's capacity to enhance generalisation.
ROSep 13, 2024
Average-Reward Maximum Entropy Reinforcement Learning for Underactuated Double Pendulum TasksJean Seong Bjorn Choe, Bumkyu Choi, Jong-kook Kim
This report presents a solution for the swing-up and stabilisation tasks of the acrobot and the pendubot, developed for the AI Olympics competition at IROS 2024. Our approach employs the Average-Reward Entropy Advantage Policy Optimization (AR-EAPO), a model-free reinforcement learning (RL) algorithm that combines average-reward RL and maximum entropy RL. Results demonstrate that our controller achieves improved performance and robustness scores compared to established baseline methods in both the acrobot and pendubot scenarios, without the need for a heavily engineered reward function or system model. The current results are applicable exclusively to the simulation stage setup.
STApr 28, 2023
Using a Deep Learning Model to Simulate Human Stock Trader's Methods of Chart AnalysisSungwoo Kang, Jong-Kook Kim
Despite the efficient market hypothesis, many studies suggest the existence of inefficiencies in the stock market leading to the development of techniques to gain above-market returns. Systematic trading has undergone significant advances in recent decades with deep learning schemes emerging as a powerful tool for analyzing and predicting market behavior. In this paper, a method is proposed that is inspired by how professional technical analysts trade. This scheme looks at stock prices of the previous 600 days and predicts whether the stock price will rise or fall 10% or 20% within the next D days. The proposed method uses the Resnet's (a deep learning model) skip connections and logits to increase the probability of the prediction. The model was trained and tested using historical data from both the Korea and US stock markets. The backtest is done using the data from 2020 to 2022. Using the proposed method for the Korea market it gave return of 75.36% having Sharpe ratio of 1.57, which far exceeds the market return by 36% and 0.61, respectively. On the US market it gives total return of 27.17% with Sharpe ratio of 0.61, which outperforms other benchmarks such as NASDAQ, S&P500, DOW JONES index by 17.69% and 0.27, respectively.
LGJan 21, 2025
CroMe: Multimodal Fake News Detection using Cross-Modal Tri-Transformer and Metric LearningEunjee Choi, Junhyun Ahn, XinYu Piao et al.
Multimodal Fake News Detection has received increasing attention recently. Existing methods rely on independently encoded unimodal data and overlook the advantages of capturing intra-modality relationships and integrating inter-modal similarities using advanced techniques. To address these issues, Cross-Modal Tri-Transformer and Metric Learning for Multimodal Fake News Detection (CroMe) is proposed. CroMe utilizes Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (BLIP2) as encoders to capture detailed text, image and combined image-text representations. The metric learning module employs a proxy anchor method to capture intra-modality relationships while the feature fusion module uses a Cross-Modal and Tri-Transformer for effective integration. The final fake news detector processes the fused features through a classifier to predict the authenticity of the content. Experiments on datasets show that CroMe excels in multimodal fake news detection.
LGMar 20, 2024
The Bid Picture: Auction-Inspired Multi-player Generative Adversarial Networks TrainingJoo Yong Shim, Jean Seong Bjorn Choe, Jong-Kook Kim
This article proposes auction-inspired multi-player generative adversarial networks training, which mitigates the mode collapse problem of GANs. Mode collapse occurs when an over-fitted generator generates a limited range of samples, often concentrating on a small subset of the data distribution. Despite the restricted diversity of generated samples, the discriminator can still be deceived into distinguishing these samples as real samples from the actual distribution. In the absence of external standards, a model cannot recognize its failure during the training phase. We extend the two-player game of generative adversarial networks to the multi-player game. During the training, the values of each model are determined by the bids submitted by other players in an auction-like process.
LGMar 19, 2024
TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-TransformerEunjee Choi, Jong-Kook Kim
Detecting fake news has received a lot of attention. Many previous methods concatenate independently encoded unimodal data, ignoring the benefits of integrated multimodal information. Also, the absence of specialized feature extraction for text and images further limits these methods. This paper introduces an end-to-end model called TT-BLIP that applies the bootstrapping language-image pretraining for unified vision-language understanding and generation (BLIP) for three types of information: BERT and BLIPTxt for text, ResNet and BLIPImg for images, and bidirectional BLIP encoders for multimodal information. The Multimodal Tri-Transformer fuses tri-modal features using three types of multi-head attention mechanisms, ensuring integrated modalities for enhanced representations and improved multimodal data analysis. The experiments are performed using two fake news datasets, Weibo and Gossipcop. The results indicate TT-BLIP outperforms the state-of-the-art models.
LGOct 24, 2021
Enabling Large Batch Size Training for DNN Models Beyond the Memory Limit While Maintaining PerformanceXinYu Piao, DoangJoo Synn, JooYoung Park et al.
Recent deep learning models are difficult to train using a large batch size, because commodity machines may not have enough memory to accommodate both the model and a large data batch size. The batch size is one of the hyper-parameters used in the training model, and it is dependent on and is limited by the target machine memory capacity because the batch size can only fit into the remaining memory after the model is uploaded. Moreover, the data item size is also an important factor because if each data item size is larger then the batch size that can fit into the remaining memory becomes smaller. This paper proposes a method called Micro-Batch Processing (MBP) to address this problem. This method helps deep learning models to train by providing a batch processing method that splits a batch into a size that can fit in the remaining memory and processes them sequentially. After processing the small batches individually, a loss normalization algorithm based on the gradient accumulation is used to maintain the performance. The purpose of our method is to allow deep learning models to train using larger batch sizes that exceed the memory capacity of a system without increasing the memory size or using multiple devices (GPUs).
LGAug 16, 2021
Introduction to Quantum Reinforcement Learning: Theory and PennyLane-based ImplementationYunseok Kwak, Won Joon Yun, Soyi Jung et al.
The emergence of quantum computing enables for researchers to apply quantum circuit on many existing studies. Utilizing quantum circuit and quantum differential programming, many research are conducted such as \textit{Quantum Machine Learning} (QML). In particular, quantum reinforcement learning is a good field to test the possibility of quantum machine learning, and a lot of research is being done. This work will introduce the concept of quantum reinforcement learning using a variational quantum circuit, and confirm its possibility through implementation and experimentation. We will first present the background knowledge and working principle of quantum reinforcement learning, and then guide the implementation method using the PennyLane library. We will also discuss the power and possibility of quantum reinforcement learning from the experimental results obtained through this work.
CLFeb 17, 2021
Contextual Skipgram: Training Word Representation Using Context InformationDongjae Kim, Jong-Kook Kim
The skip-gram (SG) model learns word representation by predicting the words surrounding a center word from unstructured text data. However, not all words in the context window contribute to the meaning of the center word. For example, less relevant words could be in the context window, hindering the SG model from learning a better quality representation. In this paper, we propose an enhanced version of the SG that leverages context information to produce word representation. The proposed model, Contextual Skip-gram, is designed to predict contextual words with both the center words and the context information. This simple idea helps to reduce the impact of irrelevant words on the training process, thus enhancing the final performance
LGJan 9, 2020
Privacy-Preserving Deep Learning Computation for Geo-Distributed Medical Big-Data PlatformsJoohyung Jeon, Junhui Kim, Joongheon Kim et al.
This paper proposes a distributed deep learning framework for privacy-preserving medical data training. In order to avoid patients' data leakage in medical platforms, the hidden layers in the deep learning framework are separated and where the first layer is kept in platform and others layers are kept in a centralized server. Whereas keeping the original patients' data in local platforms maintain their privacy, utilizing the server for subsequent layers improves learning performance by using all data from each platform during training.