CVNov 11, 2025Code
RAPTR: Radar-based 3D Pose Estimation using TransformerSorachi Kato, Ryoma Yataka, Pu Perry Wang et al.
Radar-based indoor 3D human pose estimation typically relied on fine-grained 3D keypoint labels, which are costly to obtain especially in complex indoor settings involving clutter, occlusions, or multiple people. In this paper, we propose \textbf{RAPTR} (RAdar Pose esTimation using tRansformer) under weak supervision, using only 3D BBox and 2D keypoint labels which are considerably easier and more scalable to collect. Our RAPTR is characterized by a two-stage pose decoder architecture with a pseudo-3D deformable attention to enhance (pose/joint) queries with multi-view radar features: a pose decoder estimates initial 3D poses with a 3D template loss designed to utilize the 3D BBox labels and mitigate depth ambiguities; and a joint decoder refines the initial poses with 2D keypoint labels and a 3D gravity loss. Evaluated on two indoor radar datasets, RAPTR outperforms existing methods, reducing joint position error by $34.3\%$ on HIBER and $76.9\%$ on MMVR. Our implementation is available at https://github.com/merlresearch/radar-pose-transformer.
34.8CVMay 15Code
Unsupervised 3D Human Pose Estimation via Conditional Multi-view Ancestral SamplingRyohei Goto, Takuya Fujihashi, Shunsuke Saruwatari et al.
We propose a method of estimating a 3D human pose from a single view without 3D supervision. The key to our method is to leverage the 2D diffusion priors of motion diffusion models (MDMs) pre-trained on large 2D human pose datasets. Specifically, we extend multi-view ancestral sampling of diffusion models to the task of 2D-3D lifting of human pose. To this end, we newly propose a conditional multi-view ancestral sampling (cMAS) that optimizes the 3D pose such that its multi-view projections follow the manifold in 2D MDM noise space, while conditioning the 3D pose to match the given 2D poses and anatomical constraints of humans. Experiments on the Yoga dataset demonstrate that our method achieves better cross-domain performance compared to state-of-the-art supervised and unsupervised 3D pose estimation methods, including extreme human poses where 3D supervision is unavailable. Code is available at: https://github.com/asaa0001/c-MAS.
CVApr 24, 2025
Range Image-Based Implicit Neural Compression for LiDAR Point CloudsAkihiro Kuwabara, Sorachi Kato, Takuya Fujihashi et al.
This paper presents a novel scheme to efficiently compress Light Detection and Ranging~(LiDAR) point clouds, enabling high-precision 3D scene archives, and such archives pave the way for a detailed understanding of the corresponding 3D scenes. We focus on 2D range images~(RIs) as a lightweight format for representing 3D LiDAR observations. Although conventional image compression techniques can be adapted to improve compression efficiency for RIs, their practical performance is expected to be limited due to differences in bit precision and the distinct pixel value distribution characteristics between natural images and RIs. We propose a novel implicit neural representation~(INR)--based RI compression method that effectively handles floating-point valued pixels. The proposed method divides RIs into depth and mask images and compresses them using patch-wise and pixel-wise INR architectures with model pruning and quantization, respectively. Experiments on the KITTI dataset show that the proposed method outperforms existing image, point cloud, RI, and INR-based compression methods in terms of 3D reconstruction and detection quality at low bitrates and decoding latency.
IVDec 19, 2024
Quantum Implicit Neural CompressionTakuya Fujihashi, Toshiaki Koike-Akino
Signal compression based on implicit neural representation (INR) is an emerging technique to represent multimedia signals with a small number of bits. While INR-based signal compression achieves high-quality reconstruction for relatively low-resolution signals, the accuracy of high-frequency details is significantly degraded with a small model. To improve the compression efficiency of INR, we introduce quantum INR (quINR), which leverages the exponentially rich expressivity of quantum neural networks for data compression. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods, up to 1.2dB gain.
MMJan 12, 2022
Federated AirNet: Hybrid Digital-Analog Neural Network Transmission for Federated LearningTakuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe
A key issue in federated learning over wireless channels is how to exchange a large number of the model parameters via time-varying channels. Two types of solutions based on digital and analog schemes are used typically. The digital-based solution takes quantization and entropy coding for compression, whereas transmissions via wireless channels may cause catastrophic errors owing to the all-or-nothing behavior in entropy coding. The analog-based solutions such as AirNet and AirComp use analog modulation for the parameter transmissions. However, such an analog scheme often causes significant distortion due to the source signal's large power without compression gain. This paper proposes a novel hybrid digital-analog transmission-Federated AirNet--for the model parameter transmissions in federated learning. The Federated AirNet integrates low-rate digital coding and energy-compact analog modulation. The digital coding offers the baseline of the model parameters and compacts the source signal power. In addition, the residual parameters, which are obtained from the original and encoded model parameters, are analog-modulated to enhance the baseline according to the instantaneous wireless channel quality. We show that the proposed Federated AirNet yields better image classification accuracy compared with the digital-based and analog-based solutions over a wide range of wireless channel signal-to-noise ratios (SNRs).
MMNov 16, 2021
Soft Delivery: Survey on A New Paradigm for Wireless and Mobile Multimedia StreamingTakuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe
The increasing demand for video streaming services is the key driver of modern wireless and mobile communications. For robust and high-quality delivery of video content over wireless and mobile networks, the main challenge is sending image and video signals to single and multiple users over unstable and diverse channel environments. To this end, many studies have designed digital-based video delivery schemes, which mainly consist of a sequence of digital-based coding and transmission schemes. Although digital-based schemes perform well when the channel characteristics are known in advance, significant quality degradation, known as cliff and leveling effects, often occurs owing to the fluctuating channel characteristics. To prevent cliff and leveling effects irrespective of the channel characteristics of each user, a new paradigm for wireless and mobile video streaming has been proposed. Soft delivery schemes skip the digital operations of quantization and entropy and channel coding while directly mapping the power-assigned frequency--domain coefficients onto the transmission symbols. This modification is based on the fact that the pixel distortion due to communication noise is proportional to the magnitude of the noise, resulting in graceful quality improvement, wherein quality is improved gradually, according to the wireless channel quality without any cliff and leveling effects. Herein, we present a comprehensive summary of soft delivery schemes.
CVSep 15, 2020
CSI2Image: Image Reconstruction from Channel State Information Using Generative Adversarial NetworksSorachi Kato, Takeru Fukushima, Tomoki Murakami et al.
This study aims to find the upper limit of the wireless sensing capability of acquiring physical space information. This is a challenging objective, because at present, wireless sensing studies continue to succeed in acquiring novel phenomena. Thus, although a complete answer cannot be obtained yet, a step is taken towards it here. To achieve this, CSI2Image, a novel channel-state-information (CSI)-to-image conversion method based on generative adversarial networks (GANs), is proposed. The type of physical information acquired using wireless sensing can be estimated by checking wheth\-er the reconstructed image captures the desired physical space information. Three types of learning methods are demonstrated: gen\-er\-a\-tor-only learning, GAN-only learning, and hybrid learning. Evaluating the performance of CSI2Image is difficult, because both the clarity of the image and the presence of the desired physical space information must be evaluated. To solve this problem, a quantitative evaluation methodology using an object detection library is also proposed. CSI2Image was implemented using IEEE 802.11ac compressed CSI, and the evaluation results show that the image was successfully reconstructed. The results demonstrate that gen\-er\-a\-tor-only learning is sufficient for simple wireless sensing problems, but in complex wireless sensing problems, GANs are important for reconstructing generalized images with more accurate physical space information.
SPJun 17, 2020
Wireless 3D Point Cloud Delivery Using Deep Graph Neural NetworksTakuya Fujihashi, Toshiaki Koike-Akino, Siheng Chen et al.
In typical point cloud delivery, a sender uses octree-based digital video compression to send three-dimensional (3D) points and color attributes over band-limited links. However, the digital-based schemes have an issue called the cliff effect, where the 3D reconstruction quality will be a step function in terms of wireless channel quality. To prevent the cliff effect subject to channel quality fluctuation, we have proposed soft point cloud delivery called HoloCast. Although the HoloCast realizes graceful quality improvement according to wireless channel quality, it requires large communication overheads. In this paper, we propose a novel scheme for soft point cloud delivery to simultaneously realize better quality and lower communication overheads. The proposed scheme introduces an end-to-end deep learning framework based on graph neural network (GNN) to reconstruct high-quality point clouds from its distorted observation under wireless fading channels. We demonstrate that the proposed GNN-based scheme can reconstruct clean 3D point cloud with low overheads by removing fading and noise effects.
MMMar 8, 2019
HoloCast: Graph Signal Processing for Graceful Point Cloud DeliveryTakuya Fujihashi, Toshiaki Koike-Akino, Takashi Watanabe et al.
In conventional point cloud delivery, a sender uses octree-based digital video compression to stream three-dimensional (3D) points and the corresponding color attributes over band-limited links, e.g., wireless channels, for 3D scene reconstructions. However, the digital-based delivery schemes have an issue called cliff effect, where the 3D reconstruction quality is a step function in terms of wireless channel quality. We propose a novel scheme of point cloud delivery, called HoloCast, to gracefully improve the reconstruction quality with the improvement of wireless channel quality. HoloCast regards the 3D points and color components as graph signals and directly transmits linear-transformed signals based on graph Fourier transform (GFT), without digital quantization and entropy coding operations. One of main contributions in HoloCast is that the use of GFT can deal with non-ordered and non-uniformly distributed multi-dimensional signals such as holographic data unlike conventional delivery schemes. Performance results with point cloud data show that HoloCast yields better 3D reconstruction quality compared to digital-based methods in noisy wireless environment.