9 Papers

LGMay 24
TGFormer: Towards Temporal Graph Transformer with Auto-Correlation Mechanism

Hongjiang Chen, Pengfei Jiao, Ming Du et al.

The growing interest in Temporal Graph Neural Networks (TGNNs) stems from their ability to model complex dynamics and deliver superior performance. However, TGNNs encounter fundamental challenges in capturing long-term dependencies and identifying periodic patterns. To address these limitations, we propose TGFormer, a novel Transformer architecture specifically designed for temporal graphs. Our model redefines temporal graph learning by establishing a trajectory framework that aligns with time series analysis principles. This approach allows TGFormer to derive node representations through systematic analysis of historical interactions, enabling granular examination of node relationships across sequential timestamps. Building upon stochastic process theory, we develop an auto-correlation mechanism that systematically uncovers periodic dependencies in node interactions. This innovation empowers TGFormer to perform dependency discovery and representation aggregation at sub-interaction levels, demonstrating superior efficiency and accuracy compared to conventional attention mechanisms. Experimental validation across six public benchmarks confirms the effectiveness of our approach, with TGFormer at most achieving 9.35\% precision improvement compared to state-of-the-art approaches.

LGSep 27, 2023
Fast Locality Sensitive Hashing with Theoretical Guarantee

Zongyuan Tan, Hongya Wang, Bo Xu et al.

Locality-sensitive hashing (LSH) is an effective randomized technique widely used in many machine learning tasks. The cost of hashing is proportional to data dimensions, and thus often the performance bottleneck when dimensionality is high and the number of hash functions involved is large. Surprisingly, however, little work has been done to improve the efficiency of LSH computation. In this paper, we design a simple yet efficient LSH scheme, named FastLSH, under l2 norm. By combining random sampling and random projection, FastLSH reduces the time complexity from O(n) to O(m) (m<n), where n is the data dimensionality and m is the number of sampled dimensions. Moreover, FastLSH has provable LSH property, which distinguishes it from the non-LSH fast sketches. We conduct comprehensive experiments over a collection of real and synthetic datasets for the nearest neighbor search task. Experimental results demonstrate that FastLSH is on par with the state-of-the-arts in terms of answer quality, space occupation and query efficiency, while enjoying up to 80x speedup in hash function evaluation. We believe that FastLSH is a promising alternative to the classic LSH scheme.

GRSep 2, 2025
Fidelity-preserving enhancement of ptychography with foundational text-to-image models

Ming Du, Volker Rose, Junjing Deng et al.

Ptychographic phase retrieval enables high-resolution imaging of complex samples but often suffers from artifacts such as grid pathology and multislice crosstalk, which degrade reconstructed images. We propose a plug-and-play (PnP) framework that integrates physics model-based phase retrieval with text-guided image editing using foundational diffusion models. By employing the alternating direction method of multipliers (ADMM), our approach ensures consensus between data fidelity and artifact removal subproblems, maintaining physics consistency while enhancing image quality. Artifact removal is achieved using a text-guided diffusion image editing method (LEDITS++) with a pre-trained foundational diffusion model, allowing users to specify artifacts for removal in natural language. Demonstrations on simulated and experimental datasets show significant improvements in artifact suppression and structural fidelity, validated by metrics such as peak signal-to-noise ratio (PSNR) and diffraction pattern consistency. This work highlights the combination of text-guided generative models and model-based phase retrieval algorithms as a transferable and fidelity-preserving method for high-quality diffraction imaging.

AIMay 12
CVEvolve: Autonomous Algorithm Discovery for Unstructured Scientific Data Processing

Ming Du, Xiangyu Yin, Yanqi Luo et al.

Scientific data processing often requires task-specific algorithms or AI models, creating a barrier for domain scientists who need to analyze their data but may not have extensive computing or image-processing expertise. This barrier is especially pronounced when data are noisy, have a high dynamic range, are sparsely labeled, or are only loosely specified. We introduce CVEvolve, an autonomous agentic harness with a zero-code interface for scientific data-processing algorithm discovery. CVEvolve combines a multi-round search strategy with tools for code execution, evaluation implementation, history management, holdout testing, and optional inspection of scientific data and visual outputs. The search alternates between discovery and improvement actions, and uses lineage-aware stochastic candidate sampling to balance exploration and exploitation. We demonstrate CVEvolve on x-ray fluorescence microscopy image registration, Bragg peak detection, and high-energy diffraction microscopy image segmentation. Across these tasks, CVEvolve discovers algorithms that improve over baseline methods, while holdout test tracking helps identify candidates that generalize better than later over-optimized alternatives. These results show that zero-code, autonomous LLM-powered algorithm development can help domain scientists turn unstructured scientific image data into practical algorithms and downstream scientific discoveries.

AIFeb 17
EAA: Automating materials characterization with vision language model agents

Ming Du, Yanqi Luo, Srutarshi Banerjee et al.

We present Experiment Automation Agents (EAA), a vision-language-model-driven agentic system designed to automate complex experimental microscopy workflows. EAA integrates multimodal reasoning, tool-augmented action, and optional long-term memory to support both autonomous procedures and interactive user-guided measurements. Built on a flexible task-manager architecture, the system enables workflows ranging from fully agent-driven automation to logic-defined routines that embed localized LLM queries. EAA further provides a modern tool ecosystem with two-way compatibility for Model Context Protocol (MCP), allowing instrument-control tools to be consumed or served across applications. We demonstrate EAA at an imaging beamline at the Advanced Photon Source, including automated zone plate focusing, natural language-described feature search, and interactive data acquisition. These results illustrate how vision-capable agents can enhance beamline efficiency, reduce operational burden, and lower the expertise barrier for users.

APP-PHApr 23, 2025
Demonstration of an AI-driven workflow for dynamic x-ray spectroscopy

Ming Du, Mark Wolfman, Chengjun Sun et al.

X-ray absorption near edge structure (XANES) spectroscopy is a powerful technique for characterizing the chemical state and symmetry of individual elements within materials, but requires collecting data at many energy points which can be time-consuming. While adaptive sampling methods exist for efficiently collecting spectroscopic data, they often lack domain-specific knowledge about XANES spectra structure. Here we demonstrate a knowledge-injected Bayesian optimization approach for adaptive XANES data collection that incorporates understanding of spectral features like absorption edges and pre-edge peaks. We show this method accurately reconstructs the absorption edge of XANES spectra using only 15-20% of the measurement points typically needed for conventional sampling, while maintaining the ability to determine the x-ray energy of the sharp peak after absorption edge with errors less than 0.03 eV, the absorption edge with errors less than 0.1 eV; and overall root-mean-square errors less than 0.005 compared to compared to traditionally sampled spectra. Our experiments on battery materials and catalysts demonstrate the method's effectiveness for both static and dynamic XANES measurements, improving data collection efficiency and enabling better time resolution for tracking chemical changes. This approach advances the degree of automation in XANES experiments reducing the common errors of under- or over-sampling points in near the absorption edge and enabling dynamic experiments that require high temporal resolution or limited measurement time.

LGJul 18, 2025
DONUT: Physics-aware Machine Learning for Real-time X-ray Nanodiffraction Analysis

Aileen Luo, Tao Zhou, Ming Du et al.

Coherent X-ray scattering techniques are critical for investigating the fundamental structural properties of materials at the nanoscale. While advancements have made these experiments more accessible, real-time analysis remains a significant bottleneck, often hindered by artifacts and computational demands. In scanning X-ray nanodiffraction microscopy, which is widely used to spatially resolve structural heterogeneities, this challenge is compounded by the convolution of the divergent beam with the sample's local structure. To address this, we introduce DONUT (Diffraction with Optics for Nanobeam by Unsupervised Training), a physics-aware neural network designed for the rapid and automated analysis of nanobeam diffraction data. By incorporating a differentiable geometric diffraction model directly into its architecture, DONUT learns to predict crystal lattice strain and orientation in real-time. Crucially, this is achieved without reliance on labeled datasets or pre-training, overcoming a fundamental limitation for supervised machine learning in X-ray science. We demonstrate experimentally that DONUT accurately extracts all features within the data over 200 times more efficiently than conventional fitting methods.

DBOct 2, 2021
Tao: A Learning Framework for Adaptive Nearest Neighbor Search using Static Features Only

Kaixiang Yang, Hongya Wang, Bo Xu et al.

Approximate nearest neighbor (ANN) search is a fundamental problem in areas such as data management,information retrieval and machine learning. Recently, Li et al. proposed a learned approach named AdaptNN to support adaptive ANN query processing. In the middle of query execution, AdaptNN collects a number of runtime features and predicts termination condition for each individual query, by which better end-to-end latency is attained. Despite its efficiency, using runtime features complicates the learning process and leads to performance degradation. Radically different from AdaptNN, we argue that it is promising to predict termination condition before query exetution. Particularly, we developed Tao, a general learning framework for Terminating ANN queries Adaptively using Only static features. Upon the arrival of a query, Tao first maps the query to a local intrinsic dimension (LID) number, and then predicts the termination condition using LID instead of runtime features. By decoupling prediction procedure from query execution, Tao eliminates the laborious feature selection process involved in AdaptNN. Besides, two design principles are formulated to guide the application of Tao and improve the explainability of the prediction model. We integrate two state-of-the-art indexing approaches, i.e., IMI and HNSW, into Tao, and evaluate the performance over several million to billion-scale datasets. Experimental results show that, in addition to its simplicity and generality , Tao achieves up to 2.69x speedup even compared to its counterpart, at the same high accuracy targets.

CVJul 4, 2019
Searching for Apparel Products from Images in the Wild

Son Tran, Ming Du, Sampath Chanda et al.

In this age of social media, people often look at what others are wearing. In particular, Instagram and Twitter influencers often provide images of themselves wearing different outfits and their followers are often inspired to buy similar clothes.We propose a system to automatically find the closest visually similar clothes in the online Catalog (street-to-shop searching). The problem is challenging since the original images are taken under different pose and lighting conditions. The system initially localizes high-level descriptive regions (top, bottom, wristwear. . . ) using multiple CNN detectors such as YOLO and SSD that are trained specifically for apparel domain. It then classifies these regions into more specific regions such as t-shirts, tunic or dresses. Finally, a feature embedding learned using a multi-task function is recovered for every item and then compared with corresponding items in the online Catalog database and ranked according to distance. We validate our approach component-wise using benchmark datasets and end-to-end using human evaluation.