Tianyu Song

CV
h-index11
12papers
110citations
Novelty45%
AI Score55

12 Papers

MED-PHFeb 13
Ultrasound-Guided Real-Time Spinal Motion Visualization for Spinal Instability Assessment

Feng Li, Yuan Bi, Tianyu Song et al.

Purpose: Spinal instability is a widespread condition that causes pain, fatigue, and restricted mobility, profoundly affecting patients' quality of life. In clinical practice, the gold standard for diagnosis is dynamic X-ray imaging. However, X-ray provides only 2D motion information, while 3D modalities such as computed tomography (CT) or cone beam computed tomography (CBCT) cannot efficiently capture motion. Therefore, there is a need for a system capable of visualizing real-time 3D spinal motion while minimizing radiation exposure. Methods: We propose ultrasound as an auxiliary modality for 3D spine visualization. Due to acoustic limitations, ultrasound captures only the superficial spinal surface. Therefore, the partially compounded ultrasound volume is registered to preoperative 3D imaging. In this study, CBCT provides the neutral spine configuration, while robotic ultrasound acquisition is performed at maximal spinal bending. A kinematic model is applied to the CBCT-derived spine model for coarse registration, followed by ICP for fine registration, with kinematic parameters optimized based on the registration results. Real-time ultrasound motion tracking is then used to estimate continuous 3D spinal motion by interpolating between the neutral and maximally bent states. Results: The pipeline was evaluated on a bendable 3D-printed lumbar spine phantom. The registration error was $1.941 \pm 0.199$ mm and the interpolated spinal motion error was $2.01 \pm 0.309$ mm (median). Conclusion: The proposed robotic ultrasound framework enables radiation-reduced, real-time 3D visualization of spinal motion, offering a promising 3D alternative to conventional dynamic X-ray imaging for assessing spinal instability.

CLJun 15, 2023
MPSA-DenseNet: A novel deep learning model for English accent classification

Tianyu Song, Linh Thi Hoai Nguyen, Ton Viet Ta

This paper presents three innovative deep learning models for English accent classification: Multi-DenseNet, PSA-DenseNet, and MPSE-DenseNet, that combine multi-task learning and the PSA module attention mechanism with DenseNet. We applied these models to data collected from six dialects of English across native English speaking regions (Britain, the United States, Scotland) and nonnative English speaking regions (China, Germany, India). Our experimental results show a significant improvement in classification accuracy, particularly with MPSA-DenseNet, which outperforms all other models, including DenseNet and EPSA models previously used for accent identification. Our findings indicate that MPSA-DenseNet is a highly promising model for accurately identifying English accents.

81.4HCMar 23
Feasibility of Augmented Reality-Guided Robotic Ultrasound with Cone-Beam CT Integration for Spine Procedures

Tianyu Song, Felix Pabst, Feng Li et al.

Accurate needle placement in spine interventions is critical for effective pain management, yet it depends on reliable identification of anatomical landmarks and careful trajectory planning. Conventional imaging guidance often relies both on CT and X-ray fluoroscopy, exposing patients and staff to high dose of radiation while providing limited real-time 3D feedback. We present an optical see-through augmented reality (OST-AR)-guided robotic system for spine procedures that provides in situ visualization of spinal structures to support needle trajectory planning. We integrate a cone-beam CT (CBCT)-derived 3D spine model which is co-registered with live ultrasound, enabling users to combine global anatomical context with local, real-time imaging. We evaluated the system in a phantom user study involving two representative spine procedures: facet joint injection and lumbar puncture. Sixteen participants performed insertions under two visualization conditions: conventional screen vs. AR. Results show that AR significantly reduces execution time and across-task placement error, while also improving usability, trust, and spatial understanding and lowering cognitive workload. These findings demonstrate the feasibility of AR-guided robotic ultrasound for spine interventions, highlighting its potential to enhance accuracy, efficiency, and user experience in image-guided procedures.

CVMay 10, 2024Code
Learning A Spiking Neural Network for Efficient Image Deraining

Tianyu Song, Guiyue Jin, Pengpeng Li et al.

Recently, spiking neural networks (SNNs) have demonstrated substantial potential in computer vision tasks. In this paper, we present an Efficient Spiking Deraining Network, called ESDNet. Our work is motivated by the observation that rain pixel values will lead to a more pronounced intensity of spike signals in SNNs. However, directly applying deep SNNs to image deraining task still remains a significant challenge. This is attributed to the information loss and training difficulties that arise from discrete binary activation and complex spatio-temporal dynamics. To this end, we develop a spiking residual block to convert the input into spike signals, then adaptively optimize the membrane potential by introducing attention weights to adjust spike responses in a data-driven manner, alleviating information loss caused by discrete binary activation. By this way, our ESDNet can effectively detect and analyze the characteristics of rain streaks by learning their fluctuations. This also enables better guidance for the deraining process and facilitates high-quality image reconstruction. Instead of relying on the ANN-SNN conversion strategy, we introduce a gradient proxy strategy to directly train the model for overcoming the challenge of training. Experimental results show that our approach gains comparable performance against ANN-based methods while reducing energy consumption by 54%. The code source is available at https://github.com/MingTian99/ESDNet.

CVSep 15, 2025Code
WeatherBench: A Real-World Benchmark Dataset for All-in-One Adverse Weather Image Restoration

Qiyuan Guan, Qianfeng Yang, Xiang Chen et al.

Existing all-in-one image restoration approaches, which aim to handle multiple weather degradations within a single framework, are predominantly trained and evaluated using mixed single-weather synthetic datasets. However, these datasets often differ significantly in resolution, style, and domain characteristics, leading to substantial domain gaps that hinder the development and fair evaluation of unified models. Furthermore, the lack of a large-scale, real-world all-in-one weather restoration dataset remains a critical bottleneck in advancing this field. To address these limitations, we present a real-world all-in-one adverse weather image restoration benchmark dataset, which contains image pairs captured under various weather conditions, including rain, snow, and haze, as well as diverse outdoor scenes and illumination settings. The resulting dataset provides precisely aligned degraded and clean images, enabling supervised learning and rigorous evaluation. We conduct comprehensive experiments by benchmarking a variety of task-specific, task-general, and all-in-one restoration methods on our dataset. Our dataset offers a valuable foundation for advancing robust and practical all-in-one image restoration in real-world scenarios. The dataset has been publicly released and is available at https://github.com/guanqiyuan/WeatherBench.

CVOct 20, 2025Code
Rethinking Nighttime Image Deraining via Learnable Color Space Transformation

Qiyuan Guan, Xiang Chen, Guiyue Jin et al.

Compared to daytime image deraining, nighttime image deraining poses significant challenges due to inherent complexities of nighttime scenarios and the lack of high-quality datasets that accurately represent the coupling effect between rain and illumination. In this paper, we rethink the task of nighttime image deraining and contribute a new high-quality benchmark, HQ-NightRain, which offers higher harmony and realism compared to existing datasets. In addition, we develop an effective Color Space Transformation Network (CST-Net) for better removing complex rain from nighttime scenes. Specifically, we propose a learnable color space converter (CSC) to better facilitate rain removal in the Y channel, as nighttime rain is more pronounced in the Y channel compared to the RGB color space. To capture illumination information for guiding nighttime deraining, implicit illumination guidance is introduced enabling the learned features to improve the model's robustness in complex scenarios. Extensive experiments show the value of our dataset and the effectiveness of our method. The source code and datasets are available at https://github.com/guanqiyuan/CST-Net.

CVMar 9, 2019Code
LumiPath -- Towards Real-time Physically-based Rendering on Embedded Devices

Laura Fink, Sing Chun Lee, Jie Ying Wu et al.

With the increasing computational power of today's workstations, real-time physically-based rendering is within reach, rapidly gaining attention across a variety of domains. These have expeditiously applied to medicine, where it is a powerful tool for intuitive 3D data visualization. Embedded devices such as optical see-through head-mounted displays (OST HMDs) have been a trend for medical augmented reality. However, leveraging the obvious benefits of physically-based rendering remains challenging on these devices because of limited computational power, memory usage, and power consumption. We navigate the compromise between device limitations and image quality to achieve reasonable rendering results by introducing a novel light field that can be sampled in real-time on embedded devices. We demonstrate its applications in medicine and discuss limitations of the proposed method. An open-source version of this project is available at https://github.com/lorafib/LumiPath which provides full insight on implementation and exemplary demonstrational material.

22.1SDMay 2
BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations

Tianyu Song, Ton Viet Ta, Ngamta Thamwattana et al.

Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.

MLJul 29, 2025
Stochastic forest transition model dynamics and parameter estimation via deep learning

Satoshi Kumabe, Tianyu Song, Ton Viet Ta

Forest transitions, characterized by dynamic shifts between forest, agricultural, and abandoned lands, are complex phenomena. This study developed a stochastic differential equation model to capture the intricate dynamics of these transitions. We established the existence of global positive solutions for the model and conducted numerical analyses to assess the impact of model parameters on deforestation incentives. To address the challenge of parameter estimation, we proposed a novel deep learning approach that estimates all model parameters from a single sample containing time-series observations of forest and agricultural land proportions. This innovative approach enables us to understand forest transition dynamics and deforestation trends at any future time.

CVAug 13, 2025
Deep Learning for Automated Identification of Vietnamese Timber Species: A Tool for Ecological Monitoring and Conservation

Tianyu Song, Van-Doan Duong, Thi-Phuong Le et al.

Accurate identification of wood species plays a critical role in ecological monitoring, biodiversity conservation, and sustainable forest management. Traditional classification approaches relying on macroscopic and microscopic inspection are labor-intensive and require expert knowledge. In this study, we explore the application of deep learning to automate the classification of ten wood species commonly found in Vietnam. A custom image dataset was constructed from field-collected wood samples, and five state-of-the-art convolutional neural network architectures--ResNet50, EfficientNet, MobileViT, MobileNetV3, and ShuffleNetV2--were evaluated. Among these, ShuffleNetV2 achieved the best balance between classification performance and computational efficiency, with an average accuracy of 99.29\% and F1-score of 99.35\% over 20 independent runs. These results demonstrate the potential of lightweight deep learning models for real-time, high-accuracy species identification in resource-constrained environments. Our work contributes to the growing field of ecological informatics by providing scalable, image-based solutions for automated wood classification and forest biodiversity assessment.

CVMar 4, 2020
Spatiotemporal-Aware Augmented Reality: Redefining HCI in Image-Guided Therapy

Javad Fotouhi, Arian Mehrfard, Tianyu Song et al.

Suboptimal interaction with patient data and challenges in mastering 3D anatomy based on ill-posed 2D interventional images are essential concerns in image-guided therapies. Augmented reality (AR) has been introduced in the operating rooms in the last decade; however, in image-guided interventions, it has often only been considered as a visualization device improving traditional workflows. As a consequence, the technology is gaining minimum maturity that it requires to redefine new procedures, user interfaces, and interactions. The main contribution of this paper is to reveal how exemplary workflows are redefined by taking full advantage of head-mounted displays when entirely co-registered with the imaging system at all times. The proposed AR landscape is enabled by co-localizing the users and the imaging devices via the operating room environment and exploiting all involved frustums to move spatial information between different bodies. The awareness of the system from the geometric and physical characteristics of X-ray imaging allows the redefinition of different human-machine interfaces. We demonstrate that this AR paradigm is generic, and can benefit a wide variety of procedures. Our system achieved an error of $4.76\pm2.91$ mm for placing K-wire in a fracture management procedure, and yielded errors of $1.57\pm1.16^\circ$ and $1.46\pm1.00^\circ$ in the abduction and anteversion angles, respectively, for total hip arthroplasty. We hope that our holistic approach towards improving the interface of surgery not only augments the surgeon's capabilities but also augments the surgical team's experience in carrying out an effective intervention with reduced complications and provide novel approaches of documenting procedures for training purposes.

ROJul 23, 2019
Reflective-AR Display: An Interaction Methodology for Virtual-Real Alignment in Medical Robotics

Javad Fotouhi, Tianyu Song, Arian Mehrfard et al.

Robot-assisted minimally invasive surgery has shown to improve patient outcomes, as well as reduce complications and recovery time for several clinical applications. While increasingly configurable robotic arms can maximize reach and avoid collisions in cluttered environments, positioning them appropriately during surgery is complicated because safety regulations prevent automatic driving. We propose a head-mounted display (HMD) based augmented reality (AR) system designed to guide optimal surgical arm set up. The staff equipped with HMD aligns the robot with its planned virtual counterpart. In this user-centric setting, the main challenge is the perspective ambiguities hindering such collaborative robotic solution. To overcome this challenge, we introduce a novel registration concept for intuitive alignment of AR content to its physical counterpart by providing a multi-view AR experience via reflective-AR displays that simultaneously show the augmentations from multiple viewpoints. Using this system, users can visualize different perspectives while actively adjusting the pose to determine the registration transformation that most closely superimposes the virtual onto the real. The experimental results demonstrate improvement in the interactive alignment of a virtual and real robot when using a reflective-AR display. We also present measurements from configuring a robotic manipulator in a simulated trocar placement surgery using the AR guidance methodology.