Jason Wang

CV
h-index58
20papers
3,701citations
Novelty45%
AI Score55

20 Papers

CLDec 19, 2025
OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman et al. · berkeley, mila

This is the system card published alongside the OpenAI GPT-5 launch, August 2025. GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say 'think hard about this' in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time. Once usage limits are reached, a mini version of each model handles remaining queries. This system card focuses primarily on gpt-5-thinking and gpt-5-main, while evaluations for other models are available in the appendix. The GPT-5 system not only outperforms previous models on benchmarks and answers questions more quickly, but -- more importantly -- is more useful for real-world queries. We've made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy, and have leveled up GPT-5's performance in three of ChatGPT's most common uses: writing, coding, and health. All of the GPT-5 models additionally feature safe-completions, our latest approach to safety training to prevent disallowed content. Similarly to ChatGPT agent, we have decided to treat gpt-5-thinking as High capability in the Biological and Chemical domain under our Preparedness Framework, activating the associated safeguards. While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm -- our defined threshold for High capability -- we have chosen to take a precautionary approach.

100.0LGApr 16
$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities

Physical Intelligence, Bo Ai, Ali Amin et al. · mit

We present a new robotic foundation model, called $π_{0.7}$, that can enable strong out-of-the-box performance in a wide range of scenarios. $π_{0.7}$ can follow diverse language instructions in unseen environments, including multi-stage tasks with various kitchen appliances, provide zero-shot cross-embodiment generalization, for example enabling a robot to fold laundry without seeing the task before, and perform challenging tasks such as operating an espresso machine out of the box at a level of performance that matches much more specialized RL-finetuned models. The main idea behind $π_{0.7}$ is to use diverse context conditioning during training. This conditioning information, contained in the prompt, makes it possible to steer the model precisely to perform many tasks with different strategies. It is conditioned not just on a language command that describes what it should do, but on additional multimodal information that also describes the manner or strategy in which it should do it, including metadata about task performance and subgoal images. This enables $π_{0.7}$ to use very diverse data, including demonstrations, potentially suboptimal (autonomous) data including failures, and data from non-robot sources. Our experiments evaluate $π_{0.7}$ across numerous tasks with multiple robot platforms, on tasks that require speed and dexterity, language following, and compositional task generalization.

66.0CVMay 29
Non-Learning Low-Light Stereo Vision

Jason Wang, Lucas Nguyen, Hyunseung Eom et al.

We present a non-learning stereo framework for disparity estimation from severely noisy images. Using the Field of Junctions (FoJ), it retains coarse visual features stable under severe noise for cost volume construction while discarding fine textures inseparable from photon noise. The resulting structural information guides boundary-aware Semi-Global Matching (SGM) that dynamically adapts smoothness penalties to preserve true disparity discontinuities. The output is a sparse disparity map more accurate than those of recent stereo algorithms over unmasked pixels on widely-used benchmark datasets.

CLNov 25, 2022
The Naughtyformer: A Transformer Understands Offensive Humor

Leonard Tang, Alexander Cai, Steve Li et al. · harvard

Jokes are intentionally written to be funny, but not all jokes are created the same. Some jokes may be fit for a classroom of kindergarteners, but others are best reserved for a more mature audience. While recent work has shown impressive results on humor detection in text, here we instead investigate the more nuanced task of detecting humor subtypes, especially of the less innocent variety. To that end, we introduce a novel jokes dataset filtered from Reddit and solve the subtype classification task using a finetuned Transformer dubbed the Naughtyformer. Moreover, we show that our model is significantly better at detecting offensiveness in jokes compared to state-of-the-art methods.

AIJan 5Code
Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications

YuanLab. ai, Shawn Wu, Sean Wang et al.

We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enhance performance on enterprise-oriented tasks while maintaining competitive capabilities on general-purpose tasks. To address the overthinking phenomenon commonly observed in Large Reasoning Models (LRMs), we propose Reflection-aware Adaptive Policy Optimization (RAPO), a novel RL training algorithm that effectively regulates overthinking behaviors. In enterprise-oriented tasks such as retrieval-augmented generation (RAG), complex table understanding, and summarization, Yuan3.0 Flash consistently achieves superior performance. Moreover, it also demonstrates strong reasoning capabilities in domains such as mathematics, science, etc., attaining accuracy comparable to frontier model while requiring only approximately 1/4 to 1/2 of the average tokens. Yuan3.0 Flash has been fully open-sourced to facilitate further research and real-world deployment: https://github.com/Yuan-lab-LLM/Yuan3.0.

LGOct 22, 2023
MoPe: Model Perturbation-based Privacy Attacks on Language Models

Marvin Li, Jason Wang, Jeffrey Wang et al.

Recent work has shown that Large Language Models (LLMs) can unintentionally leak sensitive information present in their training data. In this paper, we present Model Perturbations (MoPe), a new method to identify with high confidence if a given text is in the training data of a pre-trained language model, given white-box access to the models parameters. MoPe adds noise to the model in parameter space and measures the drop in log-likelihood at a given point $x$, a statistic we show approximates the trace of the Hessian matrix with respect to model parameters. Across language models ranging from $70$M to $12$B parameters, we show that MoPe is more effective than existing loss-based attacks and recently proposed perturbation-based methods. We also examine the role of training point order and model size in attack success, and empirically demonstrate that MoPe accurately approximate the trace of the Hessian in practice. Our results show that the loss of a point alone is insufficient to determine extractability -- there are training points we can recover using our method that have average loss. This casts some doubt on prior works that use the loss of a point as evidence of memorization or unlearning.

CRFeb 26, 2024Code
Pandora's White-Box: Precise Training Data Detection and Extraction in Large Language Models

Jeffrey G. Wang, Jason Wang, Marvin Li et al.

In this paper we develop state-of-the-art privacy attacks against Large Language Models (LLMs), where an adversary with some access to the model tries to learn something about the underlying training data. Our headline results are new membership inference attacks (MIAs) against pretrained LLMs that perform hundreds of times better than baseline attacks, and a pipeline showing that over 50% (!) of the fine-tuning dataset can be extracted from a fine-tuned LLM in natural settings. We consider varying degrees of access to the underlying model, pretraining and fine-tuning data, and both MIAs and training data extraction. For pretraining data, we propose two new MIAs: a supervised neural network classifier that predicts training data membership on the basis of (dimensionality-reduced) model gradients, as well as a variant of this attack that only requires logit access to the model by leveraging recent model-stealing work on LLMs. To our knowledge this is the first MIA that explicitly incorporates model-stealing information. Both attacks outperform existing black-box baselines, and our supervised attack closes the gap between MIA attack success against LLMs and the strongest known attacks for other machine learning models. In fine-tuning, we find that a simple attack based on the ratio of the loss between the base and fine-tuned models is able to achieve near-perfect MIA performance; we then leverage our MIA to extract a large fraction of the fine-tuning dataset from fine-tuned Pythia and Llama models. Our code is available at github.com/safr-ai-lab/pandora-llm.

CVJan 28, 2022Code
Detecting Owner-member Relationship with Graph Convolution Network in Fisheye Camera System

Zizhang Wu, Jason Wang, Tianhao Xu et al.

The owner-member relationship between wheels and vehicles contributes significantly to the 3D perception of vehicles, especially in embedded environments. However, to leverage this relationship we must face two major challenges: i) Traditional IoU-based heuristics have difficulty handling occluded traffic congestion scenarios. ii) The effectiveness and applicability of the solution in a vehicle-mounted system is difficult. To address these issues, we propose an innovative relationship prediction method, DeepWORD, by designing a graph convolutional network (GCN). Specifically, to improve the information richness, we use feature maps with local correlation as input to the nodes. Subsequently, we introduce a graph attention network (GAT) to dynamically correct the a priori estimation bias. Finally, we designed a dataset as a large-scale benchmark which has annotated owner-member relationship, called WORD. In the experiments we learned that the proposed method achieved state-of-the-art accuracy and real-time performance. The WORD dataset is made publicly available at https://github.com/NamespaceMain/ownermember-relationship-dataset.

LGApr 2, 2025
Towards Interpretable Soft Prompts

Oam Patel, Jason Wang, Nikhil Shivakumar Nayak et al. · harvard

Soft prompts have been popularized as a cheap and easy way to improve task-specific LLM performance beyond few-shot prompts. Despite their origin as an automated prompting method, however, soft prompts and other trainable prompts remain a black-box method with no immediately interpretable connections to prompting. We create a novel theoretical framework for evaluating the interpretability of trainable prompts based on two desiderata: faithfulness and scrutability. We find that existing methods do not naturally satisfy our proposed interpretability criterion. Instead, our framework inspires a new direction of trainable prompting methods that explicitly optimizes for interpretability. To this end, we formulate and test new interpretability-oriented objective functions for two state-of-the-art prompt tuners: Hard Prompts Made Easy (PEZ) and RLPrompt. Our experiments with GPT-2 demonstrate a fundamental trade-off between interpretability and the task-performance of the trainable prompt, explicating the hardness of the soft prompt interpretability problem and revealing odd behavior that arises when one optimizes for an interpretability proxy.

SEApr 29, 2025
Automated Unit Test Case Generation: A Systematic Literature Review

Jason Wang, Basem Suleiman, Muhammad Johan Alibasa

Software is omnipresent within all factors of society. It is thus important to ensure that software are well tested to mitigate bad user experiences as well as the potential for severe financial and human losses. Software testing is however expensive and absorbs valuable time and resources. As a result, the field of automated software testing has grown of interest to researchers in past decades. In our review of present and past research papers, we have identified an information gap in the areas of improvement for the Genetic Algorithm and Particle Swarm Optimisation. A gap in knowledge in the current challenges that face automated testing has also been identified. We therefore present this systematic literature review in an effort to consolidate existing knowledge in regards to the evolutionary approaches as well as their improvements and resulting limitations. These improvements include hybrid algorithm combinations as well as interoperability with mutation testing and neural networks. We will also explore the main test criterion that are used in these algorithms alongside the challenges currently faced in the field related to readability, mocking and more.

CVOct 15, 2024
Scalable Indoor Novel-View Synthesis using Drone-Captured 360 Imagery with 3D Gaussian Splatting

Yuanbo Chen, Chengyu Zhang, Jason Wang et al.

Scene reconstruction and novel-view synthesis for large, complex, multi-story, indoor scenes is a challenging and time-consuming task. Prior methods have utilized drones for data capture and radiance fields for scene reconstruction, both of which present certain challenges. First, in order to capture diverse viewpoints with the drone's front-facing camera, some approaches fly the drone in an unstable zig-zag fashion, which hinders drone-piloting and generates motion blur in the captured data. Secondly, most radiance field methods do not easily scale to arbitrarily large number of images. This paper proposes an efficient and scalable pipeline for indoor novel-view synthesis from drone-captured 360 videos using 3D Gaussian Splatting. 360 cameras capture a wide set of viewpoints, allowing for comprehensive scene capture under a simple straightforward drone trajectory. To scale our method to large scenes, we devise a divide-and-conquer strategy to automatically split the scene into smaller blocks that can be reconstructed individually and in parallel. We also propose a coarse-to-fine alignment strategy to seamlessly match these blocks together to compose the entire scene. Our experiments demonstrate marked improvement in both reconstruction quality, i.e. PSNR and SSIM, and computation time compared to prior approaches.

LGJun 3, 2024
Hardness of Learning Neural Networks under the Manifold Hypothesis

Bobak T. Kiani, Jason Wang, Melanie Weber

The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. Gaussian or uniform Boolean data distributions. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable. We prove that learning is hard under input manifolds of bounded curvature by extending proofs of hardness in the SQ and cryptographic settings for Boolean data inputs to the geometric setting. On the other hand, we show that additional assumptions on the volume of the data manifold alleviate these fundamental limitations and guarantee learnability via a simple interpolation argument. Notable instances of this regime are manifolds which can be reliably reconstructed via manifold learning. Looking forward, we comment on and empirically explore intermediate regimes of manifolds, which have heterogeneous features commonly found in real world data.

LGOct 31, 2023
Closed Drafting as a Case Study for First-Principle Interpretability, Memory, and Generalizability in Deep Reinforcement Learning

Ryan Rezai, Jason Wang

Closed drafting or "pick and pass" is a popular game mechanic where each round players select a card or other playable element from their hand and pass the rest to the next player. In this paper, we establish first-principle methods for studying the interpretability, generalizability, and memory of Deep Q-Network (DQN) models playing closed drafting games. In particular, we use a popular family of closed drafting games called "Sushi Go Party", in which we achieve state-of-the-art performance. We fit decision rules to interpret the decision-making strategy of trained DRL agents by comparing them to the ranking preferences of different types of human players. As Sushi Go Party can be expressed as a set of closely-related games based on the set of cards in play, we quantify the generalizability of DRL models trained on various sets of cards, establishing a method to benchmark agent performance as a function of environment unfamiliarity. Using the explicitly calculable memory of other player's hands in closed drafting games, we create measures of the ability of DRL models to learn memory.

CVSep 10, 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation

Yanjun Gao, Lulu Liu, Jason Wang et al.

Temporal grounding aims to predict a time interval of a video clip corresponding to a natural language query input. In this work, we present EVOQUER, a temporal grounding framework incorporating an existing text-to-video grounding model and a video-assisted query generation network. Given a query and an untrimmed video, the temporal grounding model predicts the target interval, and the predicted video clip is fed into a video translation task by generating a simplified version of the input query. EVOQUER forms closed-loop learning by incorporating loss functions from both temporal grounding and query generation serving as feedback. Our experiments on two widely used datasets, Charades-STA and ActivityNet, show that EVOQUER achieves promising improvements by 1.05 and 1.31 at R@0.7. We also discuss how the query generation task could facilitate error analysis by explaining temporal grounding model behavior.

IRApr 16, 2021
Hierarchical Topic Presence Models

Jason Wang, Robert E. Weiss

Topic models analyze text from a set of documents. Documents are modeled as a mixture of topics, with topics defined as probability distributions on words. Inferences of interest include the most probable topics and characterization of a topic by inspecting the topic's highest probability words. Motivated by a data set of web pages (documents) nested in web sites, we extend the Poisson factor analysis topic model to hierarchical topic presence models for analyzing text from documents nested in known groups. We incorporate an unknown binary topic presence parameter for each topic at the web site and/or the web page level to allow web sites and/or web pages to be sparse mixtures of topics and we propose logistic regression modeling of topic presence conditional on web site covariates. We introduce local topics into the Poisson factor analysis framework, where each web site has a local topic not found in other web sites. Two data augmentation methods, the Chinese table distribution and Pólya-Gamma augmentation, aid in constructing our sampler. We analyze text from web pages nested in United States local public health department web sites to abstract topical information and understand national patterns in topic presence.

IRMar 30, 2021
Local and Global Topics in Text Modeling of Web Pages Nested in Web Sites

Jason Wang, Robert E. Weiss

Topic models are popular models for analyzing a collection of text documents. The models assert that documents are distributions over latent topics and latent topics are distributions over words. A nested document collection is where documents are nested inside a higher order structure such as stories in a book, articles in a journal, or web pages in a web site. In a single collection of documents, topics are global, or shared across all documents. For web pages nested in web sites, topic frequencies likely vary between web sites. Within a web site, topic frequencies almost certainly vary between web pages. A hierarchical prior for topic frequencies models this hierarchical structure and specifies a global topic distribution. Web site topic distributions vary around the global topic distribution and web page topic distributions vary around the web site topic distribution. In a nested collection of web pages, some topics are likely unique to a single web site. Local topics in a nested collection of web pages are topics unique to one web site. For US local health department web sites, brief inspection of the text shows local geographic and news topics specific to each department that are not present in others. Topic models that ignore the nesting may identify local topics, but do not label topics as local nor do they explicitly identify the web site owner of the local topic. For web pages nested inside web sites, local topic models explicitly label local topics and identifies the owning web site. This identification can be used to adjust inferences about global topics. In the US public health web site data, topic coverage is defined at the web site level after removing local topic words from pages. Hierarchical local topic models can be used to identify local topics, adjust inferences about if web sites cover particular health topics, and study how well health topics are covered.

CVMar 30, 2021
DeepWORD: A GCN-based Approach for Owner-Member Relationship Detection in Autonomous Driving

Zizhang Wu, Man Wang, Jason Wang et al.

It's worth noting that the owner-member relationship between wheels and vehicles has an significant contribution to the 3D perception of vehicles, especially in the embedded environment. However, there are currently two main challenges about the above relationship prediction: i) The traditional heuristic methods based on IoU can hardly deal with the traffic jam scenarios for the occlusion. ii) It is difficult to establish an efficient applicable solution for the vehicle-mounted system. To address these issues, we propose an innovative relationship prediction method, namely DeepWORD, by designing a graph convolution network (GCN). Specifically, we utilize the feature maps with local correlation as the input of nodes to improve the information richness. Besides, we introduce the graph attention network (GAT) to dynamically amend the prior estimation deviation. Furthermore, we establish an annotated owner-member relationship dataset called WORD as a large-scale benchmark, which will be available soon. The experiments demonstrate that our solution achieves state-of-the-art accuracy and real-time in practice.

CVJun 30, 2020
Vehicle Re-ID for Surround-view Camera System

Zizhang Wu, Man Wang, Lingxiao Yin et al.

The vehicle re-identification (ReID) plays a critical role in the perception system of autonomous driving, which attracts more and more attention in recent years. However, to our best knowledge, there is no existing complete solution for the surround-view system mounted on the vehicle. In this paper, we argue two main challenges in above scenario: i) In single camera view, it is difficult to recognize the same vehicle from the past image frames due to the fisheye distortion, occlusion, truncation, etc. ii) In multi-camera view, the appearance of the same vehicle varies greatly from different camera's viewpoints. Thus, we present an integral vehicle Re-ID solution to address these problems. Specifically, we propose a novel quality evaluation mechanism to balance the effect of tracking box's drift and target's consistency. Besides, we take advantage of the Re-ID network based on attention mechanism, then combined with a spatial constraint strategy to further boost the performance between different cameras. The experiments demonstrate that our solution achieves state-of-the-art accuracy while being real-time in practice. Besides, we will release the code and annotated fisheye dataset for the benefit of community.

SRJun 22, 2020
Machine Learning in Heliophysics and Space Weather Forecasting: A White Paper of Findings and Recommendations

Gelu Nita, Manolis Georgoulis, Irina Kitiashvili et al.

The authors of this white paper met on 16-17 January 2020 at the New Jersey Institute of Technology, Newark, NJ, for a 2-day workshop that brought together a group of heliophysicists, data providers, expert modelers, and computer/data scientists. Their objective was to discuss critical developments and prospects of the application of machine and/or deep learning techniques for data analysis, modeling and forecasting in Heliophysics, and to shape a strategy for further developments in the field. The workshop combined a set of plenary sessions featuring invited introductory talks interleaved with a set of open discussion sessions. The outcome of the discussion is encapsulated in this white paper that also features a top-level list of recommendations agreed by participants.

CVDec 13, 2017
The Effectiveness of Data Augmentation in Image Classification using Deep Learning

Luis Perez, Jason Wang

In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping, rotating, and flipping input images. We artificially constrain our access to data to a small subset of the ImageNet dataset, and compare each data augmentation technique in turn. One of the more successful data augmentations strategies is the traditional transformations mentioned above. We also experiment with GANs to generate images of different styles. Finally, we propose a method to allow a neural net to learn augmentations that best improve the classifier, which we call neural augmentation. We discuss the successes and shortcomings of this method on various datasets.