Wu

CV
h-index117
15papers
3,285citations
Novelty43%
AI Score56

15 Papers

AIMar 17, 2025
The Amazon Nova Family of Models: Technical Report and Model Card

Amazon AGI, Aaron Langford, Aayush Shah et al. · amazon-science

We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents and text. Amazon Nova Micro is a text-only model that delivers our lowest-latency responses at very low cost. Amazon Nova Canvas is an image generation model that creates professional grade images with rich customization controls. Amazon Nova Reel is a video generation model offering high-quality outputs, customization, and motion control. Our models were built responsibly and with a commitment to customer trust, security, and reliability. We report benchmarking results for core capabilities, agentic performance, long context, functional adaptation, runtime performance, and human evaluation.

81.6HCJun 3
Speculating the Impacts of Mediated Social Touch Technology

Russian, Wu, Tim Moesgen et al.

With growing research on haptic interfaces, Mediated Social Touch (MST) technologies offer the potential to record, synthesise, and reproduce (RSR) touch experiences across space and time, enabling, for instance, a hug from afar and from the past. Although much of the existing research highlights the direct benefits of these systems, such as reducing loneliness and providing emotional support, little attention has been paid to their broader sociotechnical impacts. To address this gap, we used the Future Ripples method to speculate on possible effects of MST. We conducted three workshops with 24 participants, including potential users, domain experts, and haptics researchers. Throughout these sessions, participants collectively envisioned possible future scenarios, alongside opportunities and threats, and proposed actionable responses. Our qualitative analysis organised these insights into four themes and three distinctive challenges. These findings offer haptics researchers intervention points across the RSR pipeline to inform MST design, alongside methodological insights from applying Future Ripples to MST technology.

LGJul 25, 2022
C3-SL: Circular Convolution-Based Batch-Wise Compression for Communication-Efficient Split Learning

Cheng-Yen Hsieh, Yu-Chuan Chuang, An-Yeu et al.

Most existing studies improve the efficiency of Split learning (SL) by compressing the transmitted features. However, most works focus on dimension-wise compression that transforms high-dimensional features into a low-dimensional space. In this paper, we propose circular convolution-based batch-wise compression for SL (C3-SL) to compress multiple features into one single feature. To avoid information loss while merging multiple features, we exploit the quasi-orthogonality of features in high-dimensional space with circular convolution and superposition. To the best of our knowledge, we are the first to explore the potential of batch-wise compression under the SL scenario. Based on the simulation results on CIFAR-10 and CIFAR-100, our method achieves a 16x compression ratio with negligible accuracy drops compared with the vanilla SL. Moreover, C3-SL significantly reduces 1152x memory and 2.25x computation overhead compared to the state-of-the-art dimension-wise compression method.

IVJul 16, 2022
Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation

Yu-Shan Tai, Cheng-Yang Chang, Chieh-Fang Teng et al.

Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively.

ARSep 12, 2024
Efficient and Reliable Vector Similarity Search Using Asymmetric Encoding with NAND-Flash for Many-Class Few-Shot Learning

Hao-Wei Chiang, Chi-Tse Huang, Hsiang-Yun Cheng et al.

While memory-augmented neural networks (MANNs) offer an effective solution for few-shot learning (FSL) by integrating deep neural networks with external memory, the capacity requirements and energy overhead of data movement become enormous due to the large number of support vectors in many-class FSL scenarios. Various in-memory search solutions have emerged to improve the energy efficiency of MANNs. NAND-based multi-bit content addressable memory (MCAM) is a promising option due to its high density and large capacity. Despite its potential, MCAM faces limitations such as a restricted number of word lines, limited quantization levels, and non-ideal effects like varying string currents and bottleneck effects, which lead to significant accuracy drops. To address these issues, we propose several innovative methods. First, the Multi-bit Thermometer Code (MTMC) leverages the extensive capacity of MCAM to enhance vector precision using cumulative encoding rules, thereby mitigating the bottleneck effect. Second, the Asymmetric vector similarity search (AVSS) reduces the precision of the query vector while maintaining that of the support vectors, thereby minimizing the search iterations and improving efficiency in many-class scenarios. Finally, the Hardware-Aware Training (HAT) method optimizes controller training by modeling the hardware characteristics of MCAM, thus enhancing the reliability of the system. Our integrated framework reduces search iterations by up to 32 times, and increases overall accuracy by 1.58% to 6.94%.

75.2SOC-PHMay 19
A Bounded-Confidence Model of Opinion Dynamics with Adaptive Interaction Probabilities

Leila Thompsky, Yuexuan, Wu et al.

Models of opinion dynamics aim to capture how individuals' opinions change when they interact with each other. One well-known model of opinion dynamics is the Deffuant--Weisbuch (DW) model, which is a type of bounded-confidence model (BCM). In the DW model, agents have pairwise interactions, and they are receptive to other agents' opinions when their opinions are sufficiently close to each other. In this paper, we extend the DW model by studying it on networks with heterogeneous and adaptive edge weights between pairs of agents. These edge weights govern the interaction probabilities between the agents and thereby encode the idea that people are more likely to communicate with individuals with whom they have previously compromised or had other positive interactions. We prove theoretical guarantees of our adaptive edge-weighted DW model's convergence properties, the long-time dynamics of its edge weights, and the model's associated ``effective graph", which is a time-dependent subgraph that includes edges only between agents that are receptive to each other's opinions. We support our theoretical results with numerical simulations of our adaptive edge-weighted DW model on a variety of networks and find that including adaptive edge weights yields different qualitative dynamics for different types of networks. In particular, for small confidence bounds, we observe that incorporating adaptive edge weights decreases the convergence time for dense networks but increases the convergence time for sparse networks.

CLJul 7, 2025
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann et al. · amazon-science, baidu

In this report, we introduce the Gemini 2.X model family: Gemini 2.5 Pro and Gemini 2.5 Flash, as well as our earlier Gemini 2.0 Flash and Flash-Lite models. Gemini 2.5 Pro is our most capable model yet, achieving SoTA performance on frontier coding and reasoning benchmarks. In addition to its incredible coding and reasoning skills, Gemini 2.5 Pro is a thinking model that excels at multimodal understanding and it is now able to process up to 3 hours of video content. Its unique combination of long context, multimodal and reasoning capabilities can be combined to unlock new agentic workflows. Gemini 2.5 Flash provides excellent reasoning abilities at a fraction of the compute and latency requirements and Gemini 2.0 Flash and Flash-Lite provide high performance at low latency and cost. Taken together, the Gemini 2.X model generation spans the full Pareto frontier of model capability vs cost, allowing users to explore the boundaries of what is possible with complex agentic problem solving.

34.5CVMar 22
Efficient Coarse-to-Fine Diffusion Models with Time Step Sequence Redistribution

Yu-Shan Tai, An-Yeu, Wu

Recently, diffusion models (DMs) have made significant strides in high-quality image generation. However, the multi-step denoising process often results in considerable computational overhead, impeding deployment on resource-constrained edge devices. Existing methods mitigate this issue by compressing models and adjusting the time step sequence. However, they overlook input redundancy and require lengthy search times. In this paper, we propose Coarse-to-Fine Diffusion Models with Time Step Sequence Redistribution. Recognizing indistinguishable early-stage generated images, we introduce Coarse-to-Fine Denoising (C2F) to reduce computation during coarse feature generation. Furthermore, we design Time Step Sequence Redistribution (TRD) for efficient sampling trajectory adjustment, requiring less than 10 minutes for search. Experimental results demonstrate that the proposed methods achieve near-lossless performance with an 80% to 90% reduction in computation on CIFAR10 and LSUN-Church.

CROct 16, 2024
SoK: Prompt Hacking of Large Language Models

Baha Rababah, Shang, Wu et al.

The safety and robustness of large language models (LLMs) based applications remain critical challenges in artificial intelligence. Among the key threats to these applications are prompt hacking attacks, which can significantly undermine the security and reliability of LLM-based systems. In this work, we offer a comprehensive and systematic overview of three distinct types of prompt hacking: jailbreaking, leaking, and injection, addressing the nuances that differentiate them despite their overlapping characteristics. To enhance the evaluation of LLM-based applications, we propose a novel framework that categorizes LLM responses into five distinct classes, moving beyond the traditional binary classification. This approach provides more granular insights into the AI's behavior, improving diagnostic precision and enabling more targeted enhancements to the system's safety and robustness.

IVApr 11, 2024
LATTE: Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer

Jiing-Ping Wang, Ming-Guang Lin, An-Yeu et al.

With the rise of Transformer models in NLP and CV domain, Multi-Head Attention has been proven to be a game-changer. However, its expensive computation poses challenges to the model throughput and efficiency, especially for the long sequence tasks. Exploiting the sparsity in attention has been proven to be an effective way to reduce computation. Nevertheless, prior works do not consider the various distributions among different heads and lack a systematic method to determine the threshold. To address these challenges, we propose Low-Precision Approximate Attention with Head-wise Trainable Threshold for Efficient Transformer (LATTE). LATTE employs a headwise threshold-based filter with the low-precision dot product and computation reuse mechanism to reduce the computation of MHA. Moreover, the trainable threshold is introduced to provide a systematic method for adjusting the thresholds and enable end-to-end optimization. Experimental results indicate LATTE can smoothly adapt to both NLP and CV tasks, offering significant computation savings with only a minor compromise in performance. Also, the trainable threshold is shown to be essential for the leverage between the performance and the computation. As a result, LATTE filters up to 85.16% keys with only a 0.87% accuracy drop in the CV task and 89.91% keys with a 0.86 perplexity increase in the NLP task.

CVJul 6, 2025
Quick Bypass Mechanism of Zero-Shot Diffusion-Based Image Restoration

Yu-Shan Tai, An-Yeu, Wu

Recent advancements in diffusion models have demonstrated remarkable success in various image generation tasks. Building upon these achievements, diffusion models have also been effectively adapted to image restoration tasks, e.g., super-resolution and deblurring, aiming to recover high-quality images from degraded inputs. Although existing zero-shot approaches enable pretrained diffusion models to perform restoration tasks without additional fine-tuning, these methods often suffer from prolonged iteration times in the denoising process. To address this limitation, we propose a Quick Bypass Mechanism (QBM), a strategy that significantly accelerates the denoising process by initializing from an intermediate approximation, effectively bypassing early denoising steps. Furthermore, recognizing that approximation may introduce inconsistencies, we introduce a Revised Reverse Process (RRP), which adjusts the weighting of random noise to enhance the stochasticity and mitigate potential disharmony. We validate proposed methods on ImageNet-1K and CelebA-HQ across multiple image restoration tasks, e.g., super-resolution, deblurring, and compressed sensing. Our experimental results show that the proposed methods can effectively accelerate existing methods while maintaining original performance.

CVJan 26, 2024
MPTQ-ViT: Mixed-Precision Post-Training Quantization for Vision Transformer

Yu-Shan Tai, An-Yeu, Wu

While vision transformers (ViTs) have shown great potential in computer vision tasks, their intense computation and memory requirements pose challenges for practical applications. Existing post-training quantization methods leverage value redistribution or specialized quantizers to address the non-normal distribution in ViTs. However, without considering the asymmetry in activations and relying on hand-crafted settings, these methods often struggle to maintain performance under low-bit quantization. To overcome these challenges, we introduce SmoothQuant with bias term (SQ-b) to alleviate the asymmetry issue and reduce the clamping loss. We also introduce optimal scaling factor ratio search (OPT-m) to determine quantization parameters by a data-dependent mechanism automatically. To further enhance the compressibility, we incorporate the above-mentioned techniques and propose a mixed-precision post-training quantization framework for vision transformers (MPTQ-ViT). We develop greedy mixed-precision quantization (Greedy MP) to allocate layer-wise bit-width considering both model performance and compressibility. Our experiments on ViT, DeiT, and Swin demonstrate significant accuracy improvements compared with SOTA on the ImageNet dataset. Specifically, our proposed methods achieve accuracy improvements ranging from 0.90% to 23.35% on 4-bit ViTs with single-precision and from 3.82% to 78.14% on 5-bit fully quantized ViTs with mixed-precision.

CVDec 9, 2020
High-resolution global irrigation prediction with Sentinel-2 30m data

Weixin, Wu, Sonal Thakkar et al.

An accurate and precise understanding of global irrigation usage is crucial for a variety of climate science efforts. Irrigation is highly energy-intensive, and as population growth continues at its current pace, increases in crop need and water usage will have an impact on climate change. Precise irrigation data can help with monitoring water usage and optimizing agricultural yield, particularly in developing countries. Irrigation data, in tandem with precipitation data, can be used to predict water budgets as well as climate and weather modeling. With our research, we produce an irrigation prediction model that combines unsupervised clustering of Normalized Difference Vegetation Index (NDVI) temporal signatures with a precipitation heuristic to label the months that irrigation peaks for each cropland cluster in a given year. We have developed a novel irrigation model and Python package ("Irrigation30") to generate 30m resolution irrigation predictions of cropland worldwide. With a small crowdsourced test set of cropland coordinates and irrigation labels, using a fraction of the resources used by the state-of-the-art NASA-funded GFSAD30 project with irrigation data limited to India and Australia, our model was able to achieve consistency scores in excess of 97\% and an accuracy of 92\% in a small geo-diverse randomly sampled test set.

LGSep 16, 2019
AdaBoost-assisted Extreme Learning Machine for Efficient Online Sequential Classification

Yi-Ta Chen, Yu-Chuan Chuang, An-Yeu et al.

In this paper, we propose an AdaBoost-assisted extreme learning machine for efficient online sequential classification (AOS-ELM). In order to achieve better accuracy in online sequential learning scenarios, we utilize the cost-sensitive algorithm-AdaBoost, which diversifying the weak classifiers, and adding the forgetting mechanism, which stabilizing the performance during the training procedure. Hence, AOS-ELM adapts better to sequentially arrived data compared with other voting based methods. The experiment results show AOS-ELM can achieve 94.41% accuracy on MNIST dataset, which is the theoretical accuracy bound performed by an original batch learning algorithm, AdaBoost-ELM. Moreover, with the forgetting mechanism, the standard deviation of accuracy during the online sequential learning process is reduced to 8.26x.

CVJul 9, 2017
Integration of LiDAR and Hyperspectral Data for Land-cover Classification: A Case Study

Pedram Ghamisi, Gabriele Cavallaro, Dan et al.

In this paper, an approach is proposed to fuse LiDAR and hyperspectral data, which considers both spectral and spatial information in a single framework. Here, an extended self-dual attribute profile (ESDAP) is investigated to extract spatial information from a hyperspectral data set. To extract spectral information, a few well-known classifiers have been used such as support vector machines (SVMs), random forests (RFs), and artificial neural networks (ANNs). The proposed method accurately classify the relatively volumetric data set in a few CPU processing time in a real ill-posed situation where there is no balance between the number of training samples and the number of features. The classification part of the proposed approach is fully-automatic.