CVMay 17, 2025Code
Beluga Whale Detection from Satellite Imagery with Point LabelsYijie Zheng, Jinxuan Yang, Yu Chen et al.
Very high-resolution (VHR) satellite imagery has emerged as a powerful tool for monitoring marine animals on a large scale. However, existing deep learning-based whale detection methods usually require manually created, high-quality bounding box annotations, which are labor-intensive to produce. Moreover, existing studies often exclude ``uncertain whales'', individuals that have ambiguous appearances in satellite imagery, limiting the applicability of these models in real-world scenarios. To address these limitations, this study introduces an automated pipeline for detecting beluga whales and harp seals in VHR satellite imagery. The pipeline leverages point annotations and the Segment Anything Model (SAM) to generate precise bounding box annotations, which are used to train YOLOv8 for multiclass detection of certain whales, uncertain whales, and harp seals. Experimental results demonstrated that SAM-generated annotations significantly improved detection performance, achieving higher $\text{F}_\text{1}$-scores compared to traditional buffer-based annotations. YOLOv8 trained on SAM-labeled boxes achieved an overall $\text{F}_\text{1}$-score of 72.2% for whales overall and 70.3% for harp seals, with superior performance in dense scenes. The proposed approach not only reduces the manual effort required for annotation but also enhances the detection of uncertain whales, offering a more comprehensive solution for marine animal monitoring. This method holds great potential for extending to other species, habitats, and remote sensing platforms, as well as for estimating whale biometrics, thereby advancing ecological monitoring and conservation efforts. The codes for our label and detection pipeline are publicly available at http://github.com/voyagerxvoyagerx/beluga-seeker .
IRFeb 19, 2025Code
Towards Adaptive Memory-Based Optimization for Enhanced Retrieval-Augmented GenerationQitao Qin, Yucong Luo, Yihang Lu et al.
Retrieval-Augmented Generation (RAG), by integrating non-parametric knowledge from external knowledge bases into models, has emerged as a promising approach to enhancing response accuracy while mitigating factual errors and hallucinations. This method has been widely applied in tasks such as Question Answering (QA). However, existing RAG methods struggle with open-domain QA tasks because they perform independent retrieval operations and directly incorporate the retrieved information into generation without maintaining a summarizing memory or using adaptive retrieval strategies, leading to noise from redundant information and insufficient information integration. To address these challenges, we propose Adaptive memory-based optimization for enhanced RAG (Amber) for open-domain QA tasks, which comprises an Agent-based Memory Updater, an Adaptive Information Collector, and a Multi-granular Content Filter, working together within an iterative memory updating paradigm. Specifically, Amber integrates and optimizes the language model's memory through a multi-agent collaborative approach, ensuring comprehensive knowledge integration from previous retrieval steps. It dynamically adjusts retrieval queries and decides when to stop retrieval based on the accumulated knowledge, enhancing retrieval efficiency and effectiveness. Additionally, it reduces noise by filtering irrelevant content at multiple levels, retaining essential information to improve overall model performance. We conduct extensive experiments on several open-domain QA datasets, and the results demonstrate the superiority and effectiveness of our method and its components. The source code is available \footnote{https://anonymous.4open.science/r/Amber-B203/}.
LGApr 17, 2025
TimeCapsule: Solving the Jigsaw Puzzle of Long-Term Time Series Forecasting with Compressed Predictive RepresentationsYihang Lu, Yangyang Xu, Qitao Qing et al.
Recent deep learning models for Long-term Time Series Forecasting (LTSF) often emphasize complex, handcrafted designs, while simpler architectures like linear models or MLPs have often outperformed these intricate solutions. In this paper, we revisit and organize the core ideas behind several key techniques, such as redundancy reduction and multi-scale modeling, which are frequently employed in advanced LTSF models. Our goal is to streamline these ideas for more efficient deep learning utilization. To this end, we introduce TimeCapsule, a model built around the principle of high-dimensional information compression that unifies these techniques in a generalized yet simplified framework. Specifically, we model time series as a 3D tensor, incorporating temporal, variate, and level dimensions, and leverage mode production to capture multi-mode dependencies while achieving dimensionality compression. We propose an internal forecast within the compressed representation domain, supported by the Joint-Embedding Predictive Architecture (JEPA), to monitor the learning of predictive representations. Extensive experiments on challenging benchmarks demonstrate the versatility of our method, showing that TimeCapsule can achieve state-of-the-art performance.
LGSep 30, 2025
ReNF: Rethinking the Design Space of Neural Long-Term Time Series ForecastersYihang Lu, Xianwei Meng, Enhong Chen
Neural Forecasters (NFs) are a cornerstone of Long-term Time Series Forecasting (LTSF). However, progress has been hampered by an overemphasis on architectural complexity at the expense of fundamental forecasting principles. In this work, we return to first principles to redesign the LTSF paradigm. We begin by introducing a Multiple Neural Forecasting Theorem that provides a theoretical basis for our approach. We propose Boosted Direct Output (BDO), a novel forecasting strategy that synergistically combines the advantages of both Auto-Regressive (AR) and Direct Output (DO). In addition, we stabilize the learning process by smoothly tracking the model's parameters. Extensive experiments show that these principled improvements enable a simple MLP to achieve state-of-the-art performance, outperforming recent, complex models in nearly all cases, without any specific considerations in the area. Finally, we empirically verify our theorem, establishing a dynamic performance bound and identifying promising directions for future research. The code for review is available at: .
LGMar 29, 2025
MNT-TNN: Spatiotemporal Traffic Data Imputation via Compact Multimode Nonlinear Transform-based Tensor Nuclear NormYihang Lu, Mahwish Yousaf, Xianwei Meng et al.
Imputation of random or non-random missing data is a long-standing research topic and a crucial application for Intelligent Transportation Systems (ITS). However, with the advent of modern communication technologies such as Global Satellite Navigation Systems (GNSS), traffic data collection has introduced new challenges in random missing value imputation and increasing demands for spatiotemporal dependency modelings. To address these issues, we propose a novel spatiotemporal traffic imputation method based on a Multimode Nonlinear Transformed Tensor Nuclear Norm (MNT-TNN), which can effectively capture the intrinsic multimode spatiotemporal correlations and low-rankness of the traffic tensor, represented as location $\times$ location $\times$ time. To solve the nonconvex optimization problem, we design a proximal alternating minimization (PAM) algorithm with theoretical convergence guarantees. We also suggest an Augmented Transform-based Tensor Nuclear Norm Families (ATTNNs) framework to enhance the imputation results of TTNN techniques, especially at very high miss rates. Extensive experiments on real datasets demonstrate that our proposed MNT-TNN and ATTNNs can outperform the compared state-of-the-art imputation methods, completing the benchmark of random missing traffic value imputation.
IVJan 6, 2021
Ensemble and Random Collaborative Representation-Based Anomaly Detector for Hyperspectral ImageryRong Wang, Yihang Lu, Qianrong Zhang et al.
In recent years, hyperspectral anomaly detection (HAD) has become an active topic and plays a significant role in military and civilian fields. As a classic HAD method, the collaboration representation-based detector (CRD) has attracted extensive attention and in-depth research. Despite the good performance of the CRD method, its computational cost mainly arising from the sliding dual window strategy is too high for wide applications. Moreover, it takes multiple repeated tests to determine the size of the dual window, which needs to be reset once the dataset changes and cannot be identified in advance with prior knowledge. To alleviate this problem, we proposed a novel ensemble and random collaborative representation-based detector (ERCRD) for HAD, which comprises two closely related stages. Firstly, we process the random sub-sampling on CRD (RCRD) to gain several detection results instead of the sliding dual window strategy, which significantly reduces the computational complexity and makes it more feasible in practical applications. Secondly, ensemble learning is employed to refine the multiple results of RCRD, which act as various "experts" providing abundant complementary information to better target different anomalies. Such two stages form an organic and theoretical detector, which can not only improve the accuracy and stability of HAD methods but also enhance its generalization ability. Experiments on four real hyperspectral datasets exhibit the accuracy and efficiency of this proposed ERCRD method compared with ten state-of-the-art HAD methods.