LGSep 20, 2022
SleePyCo: Automatic Sleep Scoring with Feature Pyramid and Contrastive LearningSeongju Lee, Yeonguk Yu, Seunghyeok Back et al.
Automatic sleep scoring is essential for the diagnosis and treatment of sleep disorders and enables longitudinal sleep tracking in home environments. Conventionally, learning-based automatic sleep scoring on single-channel electroencephalogram (EEG) is actively studied because obtaining multi-channel signals during sleep is difficult. However, learning representation from raw EEG signals is challenging owing to the following issues: 1) sleep-related EEG patterns occur on different temporal and frequency scales and 2) sleep stages share similar EEG patterns. To address these issues, we propose a deep learning framework named SleePyCo that incorporates 1) a feature pyramid and 2) supervised contrastive learning for automatic sleep scoring. For the feature pyramid, we propose a backbone network named SleePyCo-backbone to consider multiple feature sequences on different temporal and frequency scales. Supervised contrastive learning allows the network to extract class discriminative features by minimizing the distance between intra-class features and simultaneously maximizing that between inter-class features. Comparative analyses on four public datasets demonstrate that SleePyCo consistently outperforms existing frameworks based on single-channel EEG. Extensive ablation experiments show that SleePyCo exhibits enhanced overall performance, with significant improvements in discrimination between the N1 and rapid eye movement (REM) stages.
CVJun 28, 2023
High-Quality Unknown Object Instance Segmentation via Quadruple Boundary Error RefinementSeunghyeok Back, Sangbeom Lee, Kangmin Kim et al.
Accurate and efficient segmentation of unknown objects in unstructured environments is essential for robotic manipulation. Unknown Object Instance Segmentation (UOIS), which aims to identify all objects in unknown categories and backgrounds, has become a key capability for various robotic tasks. However, existing methods struggle with over-segmentation and under-segmentation, leading to failures in manipulation tasks such as grasping. To address these challenges, we propose QuBER (Quadruple Boundary Error Refinement), a novel error-informed refinement approach for high-quality UOIS. QuBER first estimates quadruple boundary errors-true positive, true negative, false positive, and false negative pixels-at the instance boundaries of the initial segmentation. It then refines the segmentation using an error-guided fusion mechanism, effectively correcting both fine-grained and instance-level segmentation errors. Extensive evaluations on three public benchmarks demonstrate that QuBER outperforms state-of-the-art methods and consistently improves various UOIS methods while maintaining a fast inference time of less than 0.1 seconds. Furthermore, we show that QuBER improves the success rate of grasping target objects in cluttered environments. Code and supplementary materials are available at https://sites.google.com/view/uois-quber.
ROMar 17
BiGraspFormer: End-to-End Bimanual Grasp TransformerKangmin Kim, Seunghyeok Back, Geonhyup Lee et al.
Bimanual grasping is essential for robots to handle large and complex objects. However, existing methods either focus solely on single-arm grasping or employ separate grasp generation and bimanual evaluation stages, leading to coordination problems including collision risks and unbalanced force distribution. To address these limitations, we propose BiGraspFormer, a unified end-to-end transformer framework that directly generates coordinated bimanual grasps from object point clouds. Our key idea is the Single-Guided Bimanual (SGB) strategy, which first generates diverse single grasp candidates using a transformer decoder, then leverages their learned features through specialized attention mechanisms to jointly predict bimanual poses and quality scores. This conditioning strategy reduces the complexity of the 12-DoF search space while ensuring coordinated bimanual manipulation. Comprehensive simulation experiments and real-world validation demonstrate that BiGraspFormer consistently outperforms existing methods while maintaining efficient inference speed (<0.05s), confirming the effectiveness of our framework. Code and supplementary materials are available at https://sites.google.com/view/bigraspformer
CVApr 17, 2024Code
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time AdaptationYeonguk Yu, Sungho Shin, Seunghyeok Back et al.
Test-time adaptation (TTA) aims to adapt a pre-trained model to a new test domain without access to source data after deployment. Existing approaches typically rely on self-training with pseudo-labels since ground-truth cannot be obtained from test data. Although the quality of pseudo labels is important for stable and accurate long-term adaptation, it has not been previously addressed. In this work, we propose DPLOT, a simple yet effective TTA framework that consists of two components: (1) domain-specific block selection and (2) pseudo-label generation using paired-view images. Specifically, we select blocks that involve domain-specific feature extraction and train these blocks by entropy minimization. After blocks are adjusted for current test domain, we generate pseudo-labels by averaging given test images and corresponding flipped counterparts. By simply using flip augmentation, we prevent a decrease in the quality of the pseudo-labels, which can be caused by the domain gap resulting from strong augmentation. Our experimental results demonstrate that DPLOT outperforms previous TTA methods in CIFAR10-C, CIFAR100-C, and ImageNet-C benchmarks, reducing error by up to 5.4%, 9.1%, and 2.9%, respectively. Also, we provide an extensive analysis to demonstrate effectiveness of our framework. Code is available at https://github.com/gist-ailab/domain-specific-block-selection-and-paired-view-pseudo-labeling-for-online-TTA.
ROApr 9, 2025
GraspClutter6D: A Large-scale Real-world Dataset for Robust Perception and Grasping in Cluttered ScenesSeunghyeok Back, Joosoon Lee, Kangmin Kim et al.
Robust grasping in cluttered environments remains an open challenge in robotics. While benchmark datasets have significantly advanced deep learning methods, they mainly focus on simplistic scenes with light occlusion and insufficient diversity, limiting their applicability to practical scenarios. We present GraspClutter6D, a large-scale real-world grasping dataset featuring: (1) 1,000 highly cluttered scenes with dense arrangements (14.1 objects/scene, 62.6\% occlusion), (2) comprehensive coverage across 200 objects in 75 environment configurations (bins, shelves, and tables) captured using four RGB-D cameras from multiple viewpoints, and (3) rich annotations including 736K 6D object poses and 9.3B feasible robotic grasps for 52K RGB-D images. We benchmark state-of-the-art segmentation, object pose estimation, and grasp detection methods to provide key insights into challenges in cluttered environments. Additionally, we validate the dataset's effectiveness as a training resource, demonstrating that grasping networks trained on GraspClutter6D significantly outperform those trained on existing datasets in both simulation and real-world experiments. The dataset, toolkit, and annotation tools are publicly available on our project website: https://sites.google.com/view/graspclutter6d.
ROOct 7, 2025
DeLTa: Demonstration and Language-Guided Novel Transparent Object ManipulationTaeyeop Lee, Gyuree Kang, Bowen Wen et al.
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/
CVOct 22, 2021
Automatic Detection of Injection and Press Mold Parts on 2D Drawing Using Deep Neural NetworkJunseok Lee, Jongwon Kim, Jumi Park et al.
This paper proposes a method to automatically detect the key feature parts in a CAD of commercial TV and monitor using a deep neural network. We developed a deep learning pipeline that can detect the injection parts such as hook, boss, undercut and press parts such as DPS, Embo-Screwless, Embo-Burring, and EMBO in the 2D CAD drawing images. We first cropped the drawing to a specific size for the training efficiency of a deep neural network. Then, we use Cascade R-CNN to find the position of injection and press parts and use Resnet-50 to predict the orientation of the parts. Finally, we convert the position of the parts found through the cropped image to the position of the original image. As a result, we obtained detection accuracy of injection and press parts with 84.1% in AP (Average Precision), 91.2% in AR(Average Recall), 72.0% in AP, 87.0% in AR, and orientation accuracy of injection and press parts with 94.4% and 92.0%, which can facilitate the faster design in industrial product design.
ROSep 23, 2021
Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion ModelingSeunghyeok Back, Joosoon Lee, Taewon Kim et al.
Instance-aware segmentation of unseen objects is essential for a robotic system in an unstructured environment. Although previous works achieved encouraging results, they were limited to segmenting the only visible regions of unseen objects. For robotic manipulation in a cluttered scene, amodal perception is required to handle the occluded objects behind others. This paper addresses Unseen Object Amodal Instance Segmentation (UOAIS) to detect 1) visible masks, 2) amodal masks, and 3) occlusions on unseen object instances. For this, we propose a Hierarchical Occlusion Modeling (HOM) scheme designed to reason about the occlusion by assigning a hierarchy to a feature fusion and prediction order. We evaluated our method on three benchmarks (tabletop, indoors, and bin environments) and achieved state-of-the-art (SOTA) performance. Robot demos for picking up occluded objects, codes, and datasets are available at https://sites.google.com/view/uoais
ROJan 7, 2021
Object Detection for Understanding Assembly Instruction Using Context-aware Data Augmentation and Cascade Mask R-CNNJoosoon Lee, Seongju Lee, Seunghyeok Back et al.
Understanding assembly instruction has the potential to enhance the robot s task planning ability and enables advanced robotic applications. To recognize the key components from the 2D assembly instruction image, We mainly focus on segmenting the speech bubble area, which contains lots of information about instructions. For this, We applied Cascade Mask R-CNN and developed a context-aware data augmentation scheme for speech bubble segmentation, which randomly combines images cuts by considering the context of assembly instructions. We showed that the proposed augmentation scheme achieves a better segmentation performance compared to the existing augmentation algorithm by increasing the diversity of trainable data while considering the distribution of components locations. Also, we showed that deep learning can be useful to understand assembly instruction by detecting the essential objects in the assembly instruction, such as tools and parts.
CVFeb 10, 2020
Segmenting Unseen Industrial Components in a Heavy Clutter Using RGB-D Fusion and Synthetic DataSeunghyeok Back, Jongwon Kim, Raeyoung Kang et al.
Segmentation of unseen industrial parts is essential for autonomous industrial systems. However, industrial components are texture-less, reflective, and often found in cluttered and unstructured environments with heavy occlusion, which makes it more challenging to deal with unseen objects. To tackle this problem, we present a synthetic data generation pipeline that randomizes textures via domain randomization to focus on the shape information. In addition, we propose an RGB-D Fusion Mask R-CNN with a confidence map estimator, which exploits reliable depth information in multiple feature levels. We transferred the trained model to real-world scenarios and evaluated its performance by making comparisons with baselines and ablation studies. We demonstrate that our methods, which use only synthetic data, could be effective solutions for unseen industrial components segmentation.
LGFeb 18, 2019
Intra- and Inter-epoch Temporal Context Network (IITNet) Using Sub-epoch Features for Automatic Sleep Scoring on Raw Single-channel EEGHogeon Seo, Seunghyeok Back, Seongju Lee et al.
A deep learning model, named IITNet, is proposed to learn intra- and inter-epoch temporal contexts from raw single-channel EEG for automatic sleep scoring. To classify the sleep stage from half-minute EEG, called an epoch, sleep experts investigate sleep-related events and consider the transition rules between the found events. Similarly, IITNet extracts representative features at a sub-epoch level by a residual neural network and captures intra- and inter-epoch temporal contexts from the sequence of the features via bidirectional LSTM. The performance was investigated for three datasets as the sequence length (L) increased from one to ten. IITNet achieved the comparable performance with other state-of-the-art results. The best accuracy, MF1, and Cohen's kappa ($κ$) were 83.9%, 77.6%, 0.78 for SleepEDF (L=10), 86.5%, 80.7%, 0.80 for MASS (L=9), and 86.7%, 79.8%, 0.81 for SHHS (L=10), respectively. Even though using four epochs, the performance was still comparable. Compared to using a single epoch, on average, accuracy and MF1 increased by 2.48%p and 4.90%p and F1 of N1, N2, and REM increased by 16.1%p, 1.50%p, and 6.42%p, respectively. Above four epochs, the performance improvement was not significant. The results support that considering the latest two-minute raw single-channel EEG can be a reasonable choice for sleep scoring via deep neural networks with efficiency and reliability. Furthermore, the experiments with the baselines showed that introducing intra-epoch temporal context learning with a deep residual network contributes to the improvement in the overall performance and has the positive synergy effect with the inter-epoch temporal context learning.