CVOct 5, 2022
Coarse-to-Fine Point Cloud Registration with SE(3)-Equivariant RepresentationsCheng-Wei Lin, Tung-I Chen, Hsin-Ying Lee et al.
Point cloud registration is a crucial problem in computer vision and robotics. Existing methods either rely on matching local geometric features, which are sensitive to the pose differences, or leverage global shapes, which leads to inconsistency when facing distribution variances such as partial overlapping. Combining the advantages of both types of methods, we adopt a coarse-to-fine pipeline that concurrently handles both issues. We first reduce the pose differences between input point clouds by aligning global features; then we match the local features to further refine the inaccurate alignments resulting from distribution variances. As global feature alignment requires the features to preserve the poses of input point clouds and local feature matching expects the features to be invariant to these poses, we propose an SE(3)-equivariant feature extractor to simultaneously generate two types of features. In this feature extractor, representations that preserve the poses are first encoded by our novel SE(3)-equivariant network and then converted into pose-invariant ones by a pose-detaching module. Experiments demonstrate that our proposed method increases the recall rate by 20% compared to state-of-the-art methods when facing both pose differences and distribution variances.
93.4IVMay 18
CATRF: Codec-Adaptive TriPlane Radiance Fields for Volumetric Content DeliveryTung-I Chen, Lingdong Wang, Subhransu Maji et al.
Volumetric media promises next-generation content delivery applications, but its bandwidth demand remains a key bottleneck. Implicit and hybrid volumetric representations reduce model sizes, yet still require careful coding to reach 2D video-like bitrates. We present CATRF, a standard-codec-in-the-loop compression framework for plane-factorized radiance fields. During training, we quantize and pack 2D feature planes into codec-friendly canvases, run a standard codec roundtrip (JPEG/VP9/HEVC/AV1), then unpack and dequantize the decoded features before volume rendering. We use a straight-through estimator (STE) to insert the non-differentiable, standard codec pipeline into the training loop, allowing radiance-field features to adapt directly to the real, client-side codec distortions without introducing any learned codec parameters. On both static and dynamic benchmarks, CATRF consistently achieves a better rate-distortion trade-off over codec-agnostic and learned-codec-in-the-loop baselines, and also outperforms recent compressed 3DGS methods in both compression efficiency and decoding speed. These results highlight a practical path toward low-bitrate, compression-resilient volumetric representations for free-viewpoint video streaming.
CVFeb 14, 2022Code
D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic SegmentationTsung-Han Wu, Yi-Syuan Liou, Shao-Ji Yuan et al.
In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations. Active learning, maximizing model performance with few informative labeled data, comes in handy for such a scenario. In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. To adapt the model to the target domain with minimum queried labels, we propose acquiring labels of the samples with high probability density in the target domain yet with low probability density in the source domain, complementary to the existing source domain labeled data. To further facilitate labeling efficiency, we design a dynamic scheduling policy to adjust the labeling budgets between domain exploration and model uncertainty over time. Extensive experiments show that our method outperforms existing active learning and domain adaptation baselines on two benchmarks, GTA5 -> Cityscapes and SYNTHIA -> Cityscapes. With less than 5% target domain annotations, our method reaches comparable results with that of full supervision. Our code is publicly available at https://github.com/tsunghan-wu/D2ADA.
CVNov 29, 2021
Anomaly-Aware Semantic Segmentation by Leveraging Synthetic-Unknown DataGuan-Rong Lu, Yueh-Cheng Liu, Tung-I Chen et al.
Anomaly awareness is an essential capability for safety-critical applications such as autonomous driving. While recent progress of robotics and computer vision has enabled anomaly detection for image classification, anomaly detection on semantic segmentation is less explored. Conventional anomaly-aware systems assuming other existing classes as out-of-distribution (pseudo-unknown) classes for training a model will result in two drawbacks. (1) Unknown classes, which applications need to cope with, might not actually exist during training time. (2) Model performance would strongly rely on the class selection. Observing this, we propose a novel Synthetic-Unknown Data Generation, intending to tackle the anomaly-aware semantic segmentation task. We design a new Masked Gradient Update (MGU) module to generate auxiliary data along the boundary of in-distribution data points. In addition, we modify the traditional cross-entropy loss to emphasize the border data points. We reach the state-of-the-art performance on two anomaly segmentation datasets. Ablation studies also demonstrate the effectiveness of proposed modules.
ROAug 3, 2021
ODIP: Towards Automatic Adaptation for Object Detection by Interactive PerceptionTung-I Chen, Jen-Wei Wang, Winston H. Hsu
Object detection plays a deep role in visual systems by identifying instances for downstream algorithms. In industrial scenarios, however, a slight change in manufacturing systems would lead to costly data re-collection and human annotation processes to re-train models. Existing solutions such as semi-supervised and few-shot methods either rely on numerous human annotations or suffer low performance. In this work, we explore a novel object detector based on interactive perception (ODIP), which can be adapted to novel domains in an automated manner. By interacting with a grasping system, ODIP accumulates visual observations of novel objects, learning to identify previously unseen instances without human-annotated data. Extensive experiments show ODIP outperforms both the generic object detector and state-of-the-art few-shot object detector fine-tuned in traditional manners. A demo video is provided to further illustrate the idea.
CVFeb 24, 2021
Dual-Awareness Attention for Few-Shot Object DetectionTung-I Chen, Yueh-Cheng Liu, Hung-Ting Su et al.
While recent progress has significantly boosted few-shot classification (FSC) performance, few-shot object detection (FSOD) remains challenging for modern learning systems. Existing FSOD systems follow FSC approaches, ignoring critical issues such as spatial variability and uncertain representations, and consequently result in low performance. Observing this, we propose a novel \textbf{Dual-Awareness Attention (DAnA)} mechanism that enables networks to adaptively interpret the given support images. DAnA transforms support images into \textbf{query-position-aware} (QPA) features, guiding detection networks precisely by assigning customized support information to each local region of the query. In addition, the proposed DAnA component is flexible and adaptable to multiple existing object detection frameworks. By adopting DAnA, conventional object detection networks, Faster R-CNN and RetinaNet, which are not designed explicitly for few-shot learning, reach state-of-the-art performance in FSOD tasks. In comparison with previous methods, our model significantly increases the performance by 47\% (+6.9 AP), showing remarkable ability under various evaluation settings.