43.8ARMar 28
A 64-Spin All-to-All CMOS Ising Machine with Landscape Perturbation Achieving 2.28 nJ/Edge-Bit Energy-to-SolutionAhmet Yusuf Salim, Jianan Wu, Soner Seçkiner et al.
A 64-spin all-to-all current-mode coupling Ising machine is implemented in 65 nm CMOS. The design supports 31 coefficient levels in 0.943 mm2 and achieves Energy-to-Solution (ETS) of 2.28 nJ/edge-bit. Continuous programming refresh not only mitigates leakage but also provides a mechanism for deterministic energy landscape perturbation, which consistently improves solution quality with higher success rate compared to operation without landscape perturbation.
48.3LGMay 11
Identified-Set Geometry of Distributional Model Extraction under Top-$K$ Censored API AccessWenhua Nie, ZiCheng Zhu, Jianan Wu et al.
Modern LLM APIs often reveal only top-$K$ logit scores and censor the remaining vocabulary. We study the per-position distribution-recovery limits of this access model. For censoring threshold $τ$, the compatible teacher distributions form an identified set whose total-variation diameter is exactly $U_K=(V-K)\exp(τ)/(Z_A+(V-K)\exp(τ))$, where $Z_A$ is the observed partition function. For KL recovery, we give a computable binary-endpoint lower bound and an asymptotically matching small-ambiguity upper bound, with an extension to reference-aware attackers. Experiments on a Qwen3 math-reasoning teacher reveal a layered extraction hierarchy: on-task top-$K$ distillation recovers 12% of private capability, full-logit distillation recovers 56% despite 99% KL closure, and generation-based extraction recovers 96%. Top-$K$ censoring therefore limits per-position distribution recovery but does not by itself prevent capability extraction, separating fidelity from transfer in prompt-only logit distillation.
82.4LGMay 8
Gradient Starvation in Binary-Reward GRPO: Why Group-Mean Centering Fails and Why the Simplest Fix WorksWenhua Nie, Jianan Wu, Junlin Liu et al.
Group Relative Policy Optimization (GRPO) is a standard algorithm for reinforcement learning from verifiable rewards, but its group-mean-centered advantage can fail under binary rewards. The failure mode is gradient starvation: when every response in a group is correct or every response is wrong, the centered advantage is exactly zero and the policy receives no learning signal. We prove that the true degeneracy rate always exceeds the i.i.d. Bernoulli prediction by Jensen's inequality, and observe a 0.69 degeneracy rate at group size four in logged Qwen3.5-9B GSM8K training. We then show that the fixed-reference Sign advantage, $A=2r-1$, performs pass@$G$ failure descent by increasing the probability that at least one sample in the group succeeds. On the full GSM8K test set across seven seeds, Sign reaches 73.8% accuracy versus 28.4% for standard normalized group-mean DrGRPO at group size four, a 45.4 point gain with $p<0.0001$. The effect is directionally consistent on Llama-3.1-8B and positive but underpowered on a MATH-500 transfer check. Pass@$k$ analysis indicates that the main benefit is search compression rather than large capacity expansion, aligning the empirical gains with recent RLVR ceiling observations.
70.7LGMay 8
The Coupling Tax: How Shared Token Budgets Undermine Visible Chain-of-Thought Under Fixed Output LimitsWenhua Nie, Junlin Liu, Jianan Wu et al.
Chain-of-thought reasoning is often treated as a monotone way to improve language-model accuracy by letting a model think longer. We identify a countervailing effect, the coupling tax: when reasoning traces and final answers share one output-token budget, long traces can crowd out the answer they are meant to support. Across GSM8K, MATH-500, and five BIG-Bench Hard tasks with Qwen3 models at three scales, non-thinking mode matches or outperforms thinking mode on GSM8K and MATH-500 at every budget up to 2048 tokens, while harder tasks shift the crossover to larger budgets. We derive a truncation-waste decomposition, $\mathrm{Acc}_{\mathrm{think}}(b)=α_c F_L(b)+α_t(1-F_L(b))$, that predicts this crossover from chain-length and accuracy statistics and explains inverse scaling within the Qwen family. A DeepSeek-R1-Distill-Llama-8B replication shows the same pattern under a different thinking interface. As a mitigation, split-budget generation decouples reasoning and answer budgets; on full MATH-500, IRIS reaches 74.0% accuracy, a strengthened extraction variant reaches 78.8%, and a fixed non-oracle SC+IRIS gate reaches 83.6%. The results show that test-time reasoning should be evaluated as a budget-allocation problem, not only as a question of whether longer traces are available.
CVAug 27, 2021
Binocular Mutual Learning for Improving Few-shot ClassificationZiqi Zhou, Xi Qiu, Jiangtao Xie et al.
Most of the few-shot learning methods learn to transfer knowledge from datasets with abundant labeled data (i.e., the base set). From the perspective of class space on base set, existing methods either focus on utilizing all classes under a global view by normal pretraining, or pay more attention to adopt an episodic manner to train meta-tasks within few classes in a local view. However, the interaction of the two views is rarely explored. As the two views capture complementary information, we naturally think of the compatibility of them for achieving further performance gains. Inspired by the mutual learning paradigm and binocular parallax, we propose a unified framework, namely Binocular Mutual Learning (BML), which achieves the compatibility of the global view and the local view through both intra-view and cross-view modeling. Concretely, the global view learns in the whole class space to capture rich inter-class relationships. Meanwhile, the local view learns in the local class space within each episode, focusing on matching positive pairs correctly. In addition, cross-view mutual interaction further promotes the collaborative learning and the implicit exploration of useful knowledge from each other. During meta-test, binocular embeddings are aggregated together to support decision-making, which greatly improve the accuracy of classification. Extensive experiments conducted on multiple benchmarks including cross-domain validation confirm the effectiveness of our method.
CVAug 20, 2021
DeFRCN: Decoupled Faster R-CNN for Few-Shot Object DetectionLimeng Qiao, Yuxuan Zhao, Zhiyuan Li et al.
Few-shot object detection, which aims at detecting novel objects rapidly from extremely few annotated examples of previously unseen classes, has attracted significant research interest in the community. Most existing approaches employ the Faster R-CNN as basic detection framework, yet, due to the lack of tailored considerations for data-scarce scenario, their performance is often not satisfactory. In this paper, we look closely into the conventional Faster R-CNN and analyze its contradictions from two orthogonal perspectives, namely multi-stage (RPN vs. RCNN) and multi-task (classification vs. localization). To resolve these issues, we propose a simple yet effective architecture, named Decoupled Faster R-CNN (DeFRCN). To be concrete, we extend Faster R-CNN by introducing Gradient Decoupled Layer for multi-stage decoupling and Prototypical Calibration Block for multi-task decoupling. The former is a novel deep layer with redefining the feature-forward operation and gradient-backward operation for decoupling its subsequent layer and preceding layer, and the latter is an offline prototype-based classification model with taking the proposals from detector as input and boosting the original classification scores with additional pairwise scores for calibration. Extensive experiments on multiple benchmarks show our framework is remarkably superior to other existing approaches and establishes a new state-of-the-art in few-shot literature.
CVApr 16, 2020
SQE: a Self Quality Evaluation Metric for Parameters Optimization in Multi-Object TrackingYanru Huang, Feiyu Zhu, Zheni Zeng et al.
We present a novel self quality evaluation metric SQE for parameters optimization in the challenging yet critical multi-object tracking task. Current evaluation metrics all require annotated ground truth, thus will fail in the test environment and realistic circumstances prohibiting further optimization after training. By contrast, our metric reflects the internal characteristics of trajectory hypotheses and measures tracking performance without ground truth. We demonstrate that trajectories with different qualities exhibit different single or multiple peaks over feature distance distribution, inspiring us to design a simple yet effective method to assess the quality of trajectories using a two-class Gaussian mixture model. Experiments mainly on MOT16 Challenge data sets verify the effectiveness of our method in both correlating with existing metrics and enabling parameters self-optimization to achieve better performance. We believe that our conclusions and method are inspiring for future multi-object tracking in practice.
CVDec 27, 2017
Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC ProjectZhimeng Zhang, Jianan Wu, Xuan Zhang et al.
Although many methods perform well in single camera tracking, multi-camera tracking remains a challenging problem with less attention. DukeMTMC is a large-scale, well-annotated multi-camera tracking benchmark which makes great progress in this field. This report is dedicated to briefly introduce our method on DukeMTMC and show that simple hierarchical clustering with well-trained person re-identification features can get good results on this dataset.
CVDec 4, 2017
SOT for MOTQizheng He, Jianan Wu, Gang Yu et al.
In this paper we present a robust tracker to solve the multiple object tracking (MOT) problem, under the framework of tracking-by-detection. As the first contribution, we innovatively combine single object tracking (SOT) algorithms with multiple object tracking algorithms, and our results show that SOT is a general way to strongly reduce the number of false negatives, regardless of the quality of detection. Another contribution is that we show with a deep learning based appearance model, it is easy to associate detections of the same object efficiently and also with high accuracy. This appearance model plays an important role in our MOT algorithm to correctly associate detections into long trajectories, and also in our SOT algorithm to discover new detections mistakenly missed by the detector. The deep neural network based model ensures the robustness of our tracking algorithm, which can perform data association in a wide variety of scenes. We ran comprehensive experiments on a large-scale and challenging dataset, the MOT16 benchmark, and results showed that our tracker achieved state-of-the-art performance based on both public and private detections.
CVNov 30, 2015
Incidental Scene Text Understanding: Recent Progresses on ICDAR 2015 Robust Reading Competition Challenge 4Cong Yao, Jianan Wu, Xinyu Zhou et al.
Different from focused texts present in natural images, which are captured with user's intention and intervention, incidental texts usually exhibit much more diversity, variability and complexity, thus posing significant difficulties and challenges for scene text detection and recognition algorithms. The ICDAR 2015 Robust Reading Competition Challenge 4 was launched to assess the performance of existing scene text detection and recognition methods on incidental texts as well as to stimulate novel ideas and solutions. This report is dedicated to briefly introduce our strategies for this challenging problem and compare them with prior arts in this field.