AIApr 9, 2023
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D ScenesRan Gong, Jiangyong Huang, Yizhou Zhao et al. · berkeley
Understanding the continuous states of objects is essential for task learning and planning in the real world. However, most existing task learning benchmarks assume discrete (e.g., binary) object goal states, which poses challenges for the learning of complex tasks and transferring learned policy from simulated environments to the real world. Furthermore, state discretization limits a robot's ability to follow human instructions based on the grounding of actions and states. To tackle these challenges, we present ARNOLD, a benchmark that evaluates language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD is comprised of 8 language-conditioned tasks that involve understanding object states and learning policies for continuous goals. To promote language-instructed learning, we provide expert demonstrations with template-generated language descriptions. We assess task performance by utilizing the latest language-conditioned policy learning models. Our results indicate that current models for language-conditioned manipulations continue to experience significant challenges in novel goal-state generalizations, scene generalizations, and object generalizations. These findings highlight the need to develop new algorithms that address this gap and underscore the potential for further research in this area. Project website: https://arnold-benchmark.github.io.
98.4SEMay 31
BenchEvolver: Frontier Task Synthesis via Solution-Centric EvolutionYangzhen Wu, Aaron J. Li, Wenjie Ma et al.
The rapid progress of frontier large language models has led to widespread benchmark saturation, limiting the ability of existing datasets to differentiate model capabilities or provide useful training signal. For instance, on LiveCodeBench, frontier models achieve over 99% Pass@1 on easy splits and exceed 90% Pass@1 on average across difficulty levels. Constructing new, challenging datasets typically requires substantial human effort, creating a bottleneck for progress. We introduce BenchEvolver, a solution-centric evolutionary framework that automatically transforms existing coding problems into harder variants. Rather than generating problems from scratch, BenchEvolver evolves reference solutions through structured transformations and derives corresponding statements and tests from the evolved solutions. This design grounds generation in executable semantics, enabling scalable construction of high-quality, diverse, and difficult tasks with verifiable correctness. Applying BenchEvolver to LiveCodeBench and SciCode, we obtain evolved tasks that are substantially harder while maintaining validity, reference correctness, and diversity. We further curate LiveCodeBench-Plus, a 91-problem benchmark combining evolved and difficult original LCB-v6 tasks, where frontier-model Pass@1 ranges from 27.5% to 62.6%, restoring clear discrimination among strong coding models. Importantly, evolved tasks remain challenging even for the model that generates them, enabling self-improvement. We further show that RL on evolved LCB tasks improves held-out coding performance: for gpt-oss-20b, seed+evolved training achieves +8.7 and +8.3 Pass@1 gains on LCB v6 Hard and LCB-Pro Easy, exceeding seed-only gains by 70.7% and 34.8%, respectively. Our results show that BenchEvolver can convert saturated benchmarks into frontier-level evaluation suites and reusable training signal.
CYMar 3Code
Credibility Governance: A Social Mechanism for Collective Self-Correction under Weak Truth SignalsWanying He, Yanxi Lin, Ziheng Zhou et al.
Online platforms increasingly rely on opinion aggregation to allocate real-world attention and resources, yet common signals such as engagement votes or capital-weighted commitments are easy to amplify and often track visibility rather than reliability. This makes collective judgments brittle under weak truth signals, noisy or delayed feedback, early popularity surges, and strategic manipulation. We propose Credibility Governance (CG), a mechanism that reallocates influence by learning which agents and viewpoints consistently track evolving public evidence. CG maintains dynamic credibility scores for both agents and opinions, updates opinion influence via credibility-weighted endorsements, and updates agent credibility based on the long-run performance of the opinions they support, rewarding early and persistent alignment with emerging evidence while filtering short-lived noise. We evaluate CG in POLIS, a socio-physical simulation environment that models coupled belief dynamics and downstream feedback under uncertainty. Across settings with initial majority misalignment, observation noise and contamination, and misinformation shocks, CG outperforms vote-based, stake-weighted, and no-governance baselines, yielding faster recovery to the true state, reduced lock-in and path dependence, and improved robustness under adversarial pressure. Our implementation and experimental scripts are publicly available at https://github.com/Wanying-He/Credibility_Governance.
CVNov 8, 2025Code
Commonality in Few: Few-Shot Multimodal Anomaly Detection via Hypergraph-Enhanced MemoryYuxuan Lin, Hanjing Yan, Xuan Tong et al.
Few-shot multimodal industrial anomaly detection is a critical yet underexplored task, offering the ability to quickly adapt to complex industrial scenarios. In few-shot settings, insufficient training samples often fail to cover the diverse patterns present in test samples. This challenge can be mitigated by extracting structural commonality from a small number of training samples. In this paper, we propose a novel few-shot unsupervised multimodal industrial anomaly detection method based on structural commonality, CIF (Commonality In Few). To extract intra-class structural information, we employ hypergraphs, which are capable of modeling higher-order correlations, to capture the structural commonality within training samples, and use a memory bank to store this intra-class structural prior. Firstly, we design a semantic-aware hypergraph construction module tailored for single-semantic industrial images, from which we extract common structures to guide the construction of the memory bank. Secondly, we use a training-free hypergraph message passing module to update the visual features of test samples, reducing the distribution gap between test features and features in the memory bank. We further propose a hyperedge-guided memory search module, which utilizes structural information to assist the memory search process and reduce the false positive rate. Experimental results on the MVTec 3D-AD dataset and the Eyecandies dataset show that our method outperforms the state-of-the-art (SOTA) methods in few-shot settings. Code is available at https://github.com/Sunny5250/CIF.
IVMar 12, 2022
MDT-Net: Multi-domain Transfer by Perceptual Supervision for Unpaired Images in OCT ScanWeinan Song, Gaurav Fotedar, Nima Tajbakhsh et al.
Deep learning models tend to underperform in the presence of domain shifts. Domain transfer has recently emerged as a promising approach wherein images exhibiting a domain shift are transformed into other domains for augmentation or adaptation. However, with the absence of paired and annotated images, models merely learned by adversarial loss and cycle consistency loss could result in poor consistency of anatomy structures during the translation. Additionally, the complexity of learning multi-domain transfer could significantly increase with the number of target domains and source images. In this paper, we propose a multi-domain transfer network, named MDT-Net, to address the limitations above through perceptual supervision. Specifically, our model consists of a single encoder-decoder network and multiple domain-specific transfer modules to disentangle feature representations of the anatomy content and domain variance. Owing to this architecture, the model could significantly reduce the complexity when the translation is conducted among multiple domains. To demonstrate the performance of our method, we evaluate our model qualitatively and quantitatively on RETOUCH, an OCT dataset comprising scans from three different scanner devices (domains). Furthermore, we take the transfer results as additional training data for fluid segmentation to prove the advantage of our model indirectly, i.e., in the task of data adaptation and augmentation. Experimental results show that our method could bring universal improvement in these segmentation tasks, which demonstrates the effectiveness and efficiency of MDT-Net in multi-domain transfer.
CVJul 22, 2024
All rivers run into the sea: Unified Modality Brain-like Emotional Central MechanismXinji Mai, Junxiong Lin, Haoran Wang et al.
In the field of affective computing, fully leveraging information from a variety of sensory modalities is essential for the comprehensive understanding and processing of human emotions. Inspired by the process through which the human brain handles emotions and the theory of cross-modal plasticity, we propose UMBEnet, a brain-like unified modal affective processing network. The primary design of UMBEnet includes a Dual-Stream (DS) structure that fuses inherent prompts with a Prompt Pool and a Sparse Feature Fusion (SFF) module. The design of the Prompt Pool is aimed at integrating information from different modalities, while inherent prompts are intended to enhance the system's predictive guidance capabilities and effectively manage knowledge related to emotion classification. Moreover, considering the sparsity of effective information across different modalities, the SSF module aims to make full use of all available sensory data through the sparse integration of modality fusion prompts and inherent prompts, maintaining high adaptability and sensitivity to complex emotional states. Extensive experiments on the largest benchmark datasets in the Dynamic Facial Expression Recognition (DFER) field, including DFEW, FERV39k, and MAFW, have proven that UMBEnet consistently outperforms the current state-of-the-art methods. Notably, in scenarios of Modality Missingness and multimodal contexts, UMBEnet significantly surpasses the leading current methods, demonstrating outstanding performance and adaptability in tasks that involve complex emotional understanding with rich multimodal information.
CVDec 17, 2024Code
Dense Audio-Visual Event Localization under Cross-Modal Consistency and Multi-Temporal Granularity CollaborationZiheng Zhou, Jinxing Zhou, Wei Qian et al.
In the field of audio-visual learning, most research tasks focus exclusively on short videos. This paper focuses on the more practical Dense Audio-Visual Event Localization (DAVEL) task, advancing audio-visual scene understanding for longer, untrimmed videos. This task seeks to identify and temporally pinpoint all events simultaneously occurring in both audio and visual streams. Typically, each video encompasses dense events of multiple classes, which may overlap on the timeline, each exhibiting varied durations. Given these challenges, effectively exploiting the audio-visual relations and the temporal features encoded at various granularities becomes crucial. To address these challenges, we introduce a novel CCNet, comprising two core modules: the Cross-Modal Consistency Collaboration (CMCC) and the Multi-Temporal Granularity Collaboration (MTGC). Specifically, the CMCC module contains two branches: a cross-modal interaction branch and a temporal consistency-gated branch. The former branch facilitates the aggregation of consistent event semantics across modalities through the encoding of audio-visual relations, while the latter branch guides one modality's focus to pivotal event-relevant temporal areas as discerned in the other modality. The MTGC module includes a coarse-to-fine collaboration block and a fine-to-coarse collaboration block, providing bidirectional support among coarse- and fine-grained temporal features. Extensive experiments on the UnAV-100 dataset validate our module design, resulting in a new state-of-the-art performance in dense audio-visual event localization. The code is available at https://github.com/zzhhfut/CCNet-AAAI2025.
39.9CLMar 29
PRBench: End-to-end Paper Reproduction in Physics ResearchShi Qiu, Junyi Deng, Yiwei Deng et al.
AI agents powered by large language models exhibit strong reasoning and problem-solving capabilities, enabling them to assist scientific research tasks such as formula derivation and code generation. However, whether these agents can reliably perform end-to-end reproduction from real scientific papers remains an open question. We introduce PRBench, a benchmark of 30 expert-curated tasks spanning 11 subfields of physics. Each task requires an agent to comprehend the methodology of a published paper, implement the corresponding algorithms from scratch, and produce quantitative results matching the original publication. Agents are provided only with the task instruction and paper content, and operate in a sandboxed execution environment. All tasks are contributed by domain experts from over 20 research groups at the School of Physics, Peking University, each grounded in a real published paper and validated through end-to-end reproduction with verified ground-truth results and detailed scoring rubrics. Using an agentified assessment pipeline, we evaluate a set of coding agents on PRBench and analyze their capabilities across key dimensions of scientific reasoning and execution. The best-performing agent, OpenAI Codex powered by GPT-5.3-Codex, achieves a mean overall score of 34%. All agents exhibit a zero end-to-end callback success rate, with particularly poor performance in data accuracy and code correctness. We further identify systematic failure modes, including errors in formula implementation, inability to debug numerical simulations, and fabrication of output data. Overall, PRBench provides a rigorous benchmark for evaluating progress toward autonomous scientific research.
41.3CVApr 14
Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion ModelingHuanzhen Wang, Ziheng Zhou, Zeng Tao et al.
The human brain constructs emotional percepts not by processing facial expressions in isolation, but through a dynamic, hierarchical integration of sensory input with semantic and contextual knowledge. However, existing vision-based dynamic emotion modeling approaches often neglect emotion perception and cognitive theories. To bridge this gap between machine and human emotion perception, we propose cognition-inspired Dual-stream Semantic Enhancement (DuSE). Our model instantiates a dual-stream cognitive architecture. The first stream, a Hierarchical Temporal Prompt Cluster (HTPC), operationalizes the cognitive priming effect. It simulates how linguistic cues pre-sensitize neural pathways, modulating the processing of incoming visual stimuli by aligning textual semantics with fine-grained temporal features of facial dynamics. The second stream, a Latent Semantic Emotion Aggregator (LSEA), computationally models the knowledge integration process, akin to the mechanism described by the Conceptual Act Theory. It aggregates sensory inputs and synthesizes them with learned conceptual knowledge, reflecting the role of the hippocampus and default mode network in constructing a coherent emotional experience. By explicitly modeling these neuro-cognitive mechanisms, DuSE provides a more neurally plausible and robust framework for dynamic facial expression recognition (DFER). Extensive experiments on challenging in-the-wild benchmarks validate our cognition-centric approach, demonstrating that emulating the brain's strategies for emotion processing yields state-of-the-art performance and enhances model interpretability.
59.9CVApr 14
ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion PerceptionHuanzhen Wang, Ziheng Zhou, Jiaqi Song et al.
Dynamic facial expression recognition in the wild remains challenging due to data scarcity and long-tail distributions, which hinder models from effectively learning the temporal dynamics of scarce emotions. To address these limitations, we propose ARGen, an Affect-Reinforced Generative Augmentation Framework that enables data-adaptive dynamic expression generation for robust emotion perception. ARGen operates in two stages: Affective Semantic Injection (ASI) and Adaptive Reinforcement Diffusion (ARD). The ASI stage establishes affective knowledge alignment through facial Action Units and employs a retrieval-augmented prompt generation strategy to synthesize consistent and fine-grained affective descriptions via large-scale visual-language models, thereby injecting interpretable emotional priors into the generation process. The ARD stage integrates text-conditioned image-to-video diffusion with reinforcement learning, introducing inter-frame conditional guidance and a multi-objective reward function to jointly optimize expression naturalness, facial integrity, and generative efficiency. Extensive experiments on both generation and recognition tasks verify that ARGen substantially enhances synthesis fidelity and improves recognition performance, establishing an interpretable and generalizable generative augmentation paradigm for vision-based affective computing.
95.4CRApr 4Code
CREBench: Evaluating Large Language Models in Cryptographic Binary Reverse EngineeringBaicheng Chen, Yu Wang, Ziheng Zhou et al.
Reverse engineering (RE) is central to software security, particularly for cryptographic programs that handle sensitive data and are highly prone to vulnerabilities. It supports critical tasks such as vulnerability discovery and malware analysis. Despite its importance, RE remains labor-intensive and requires substantial expertise, making large language models (LLMs) a potential solution for automating the process. However, their capabilities for RE remain systematically underexplored. To address this gap, we study the cryptographic binary RE capabilities of LLMs and introduce \textbf{CREBench}, a benchmark comprising 432 challenges built from 48 standard cryptographic algorithms, 3 insecure crypto key usage scenarios, and 3 difficulty levels. Each challenge follows a Capture-the-Flag (CTF) RE challenge, requiring the model to analyze the underlying cryptographic logic and recover the correct input. We design an evaluation framework comprising four sub-tasks, from algorithm identification to correct flag recovery. We evaluate eight frontier LLMs on CREBench. GPT-5.4, the best-performing model, achieves 64.03 out of 100 and recovers the flag in 59\% of challenges. We also establish a strong human expert baseline of 92.19 points, showing that humans maintain an advantage in cryptographic RE tasks. Our code and dataset are available at https://github.com/wangyu-ovo/CREBench.
45.1CVMay 8
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine MolluscsZiheng Zhou, Yang Wang, Nan Wang et al.
The decline of global shellfish biodiversity poses a severe threat to coastal ecosystems. Although artificial intelligence (AI) technologies show potential for automated ecological monitoring, existing marine benthic datasets often lack adaptation to the complexities of real underwater environments (e.g., variable lighting conditions and diverse species postures), posing challenges for the robust generalization of vision models in practical ecological monitoring. To address this problem, we construct ShellfishNet, a comprehensive image benchmark dataset designed specifically for real-world ecological monitoring constraints. Comprising 8,691 images across 32 taxa, this dataset includes a curated subset annotated with descriptive captions. It is constructed through field photography and web scraping, encompassing samples from complex real-world environments. Based on this benchmark, we systematically evaluate 80 representative neural network models, including Convolutional Neural Networks (CNNs), Vision Transformers (ViTs), State Space Models (SSMs), and Self-Supervised Learning (SSL) methods. Furthermore, we evaluate the performance of fine-grained visual categorization (FGVC) models and investigate the image captioning capabilities of several mainstream multimodal large language models (MLLMs). Meanwhile, we introduce image corruption benchmark tests to simulate common underwater degradation scenarios (turbidity, severe weather) and assess the robustness of vision models, enabling trustworthy decisions on ecological protection in the wild. ShellfishNet is dedicated to providing a data foundation and a model-evaluation benchmark for the intelligent monitoring of benthic organisms.
CLApr 22, 2025
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language ModelsShi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.
Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olympiad difficulty. PHYBench addresses data contamination through original content and employs a systematic curation pipeline to eliminate flawed items. Evaluations show that PHYBench activates more tokens and provides stronger differentiation between reasoning models compared to other baselines like AIME 2024, OlympiadBench and GPQA. Even the best-performing model, Gemini 2.5 Pro, achieves only 36.9% accuracy compared to human experts' 61.9%. To further enhance evaluation precision, we introduce the Expression Edit Distance (EED) Score for mathematical expression assessment, which improves sample efficiency by 204% over binary scoring. Moreover, PHYBench effectively elicits multi-step and multi-condition reasoning, providing a platform for examining models' reasoning robustness, preferences, and deficiencies. The benchmark results and dataset are publicly available at https://www.phybench.cn/.
CVMar 7, 2024
A$^{3}$lign-DFER: Pioneering Comprehensive Dynamic Affective Alignment for Dynamic Facial Expression Recognition with CLIPZeng Tao, Yan Wang, Junxiong Lin et al.
The performance of CLIP in dynamic facial expression recognition (DFER) task doesn't yield exceptional results as observed in other CLIP-based classification tasks. While CLIP's primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perfect alignment difficult. To address this issue, we have designed A$^{3}$lign-DFER, which introduces a new DFER labeling paradigm to comprehensively achieve alignment, thus enhancing CLIP's suitability for the DFER task. Specifically, our A$^{3}$lign-DFER method is designed with multiple modules that work together to obtain the most suitable expanded-dimensional embeddings for classification and to achieve alignment in three key aspects: affective, dynamic, and bidirectional. We replace the input label text with a learnable Multi-Dimensional Alignment Token (MAT), enabling alignment of text to facial expression video samples in both affective and dynamic dimensions. After CLIP feature extraction, we introduce the Joint Dynamic Alignment Synchronizer (JAS), further facilitating synchronization and alignment in the temporal dimension. Additionally, we implement a Bidirectional Alignment Training Paradigm (BAP) to ensure gradual and steady training of parameters for both modalities. Our insightful and concise A$^{3}$lign-DFER method achieves state-of-the-art results on multiple DFER datasets, including DFEW, FERV39k, and MAFW. Extensive ablation experiments and visualization studies demonstrate the effectiveness of A$^{3}$lign-DFER. The code will be available in the future.
CVAug 6, 2025
CLASP: Cross-modal Salient Anchor-based Semantic Propagation for Weakly-supervised Dense Audio-Visual Event LocalizationJinxing Zhou, Ziheng Zhou, Yanghao Zhou et al.
The Dense Audio-Visual Event Localization (DAVEL) task aims to temporally localize events in untrimmed videos that occur simultaneously in both the audio and visual modalities. This paper explores DAVEL under a new and more challenging weakly-supervised setting (W-DAVEL task), where only video-level event labels are provided and the temporal boundaries of each event are unknown. We address W-DAVEL by exploiting \textit{cross-modal salient anchors}, which are defined as reliable timestamps that are well predicted under weak supervision and exhibit highly consistent event semantics across audio and visual modalities. Specifically, we propose a \textit{Mutual Event Agreement Evaluation} module, which generates an agreement score by measuring the discrepancy between the predicted audio and visual event classes. Then, the agreement score is utilized in a \textit{Cross-modal Salient Anchor Identification} module, which identifies the audio and visual anchor features through global-video and local temporal window identification mechanisms. The anchor features after multimodal integration are fed into an \textit{Anchor-based Temporal Propagation} module to enhance event semantic encoding in the original temporal audio and visual features, facilitating better temporal localization under weak supervision. We establish benchmarks for W-DAVEL on both the UnAV-100 and ActivityNet1.3 datasets. Extensive experiments demonstrate that our method achieves state-of-the-art performance.
CVJul 23, 2025
IONext: Unlocking the Next Era of Inertial OdometryShanshan Zhang, Qi Zhang, Siyue Wang et al.
Researchers have increasingly adopted Transformer-based models for inertial odometry. While Transformers excel at modeling long-range dependencies, their limited sensitivity to local, fine-grained motion variations and lack of inherent inductive biases often hinder localization accuracy and generalization. Recent studies have shown that incorporating large-kernel convolutions and Transformer-inspired architectural designs into CNN can effectively expand the receptive field, thereby improving global motion perception. Motivated by these insights, we propose a novel CNN-based module called the Dual-wing Adaptive Dynamic Mixer (DADM), which adaptively captures both global motion patterns and local, fine-grained motion features from dynamic inputs. This module dynamically generates selective weights based on the input, enabling efficient multi-scale feature aggregation. To further improve temporal modeling, we introduce the Spatio-Temporal Gating Unit (STGU), which selectively extracts representative and task-relevant motion features in the temporal domain. This unit addresses the limitations of temporal modeling observed in existing CNN approaches. Built upon DADM and STGU, we present a new CNN-based inertial odometry backbone, named Next Era of Inertial Odometry (IONext). Extensive experiments on six public datasets demonstrate that IONext consistently outperforms state-of-the-art (SOTA) Transformer- and CNN-based methods. For instance, on the RNIN dataset, IONext reduces the average ATE by 10% and the average RTE by 12% compared to the representative model iMOT.
CVJun 24, 2024
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective RecognitionHaoran Wang, Xinji Mai, Zeng Tao et al.
The contemporary state-of-the-art of Dynamic Facial Expression Recognition (DFER) technology facilitates remarkable progress by deriving emotional mappings of facial expressions from video content, underpinned by training on voluminous datasets. Yet, the DFER datasets encompass a substantial volume of noise data. Noise arises from low-quality captures that defy logical labeling, and instances that suffer from mislabeling due to annotation bias, engendering two principal types of uncertainty: the uncertainty regarding data usability and the uncertainty concerning label reliability. Addressing the two types of uncertainty, we have meticulously crafted a two-stage framework aiming at \textbf{S}eeking \textbf{C}ertain data \textbf{I}n extensive \textbf{U}ncertain data (SCIU). This initiative aims to purge the DFER datasets of these uncertainties, thereby ensuring that only clean, verified data is employed in training processes. To mitigate the issue of low-quality samples, we introduce the Coarse-Grained Pruning (CGP) stage, which assesses sample weights and prunes those deemed unusable due to their low weight. For samples with incorrect annotations, the Fine-Grained Correction (FGC) stage evaluates prediction stability to rectify mislabeled data. Moreover, SCIU is conceived as a universally compatible, plug-and-play framework, tailored to integrate seamlessly with prevailing DFER methodologies. Rigorous experiments across prevalent DFER datasets and against numerous benchmark methods substantiates SCIU's capacity to markedly elevate performance metrics.
LGJun 3, 2019
C2P2: A Collective Cryptocurrency Up/Down Price Prediction EngineChongyang Bai, Tommy White, Linda Xiao et al.
We study the problem of predicting whether the price of the 21 most popular cryptocurrencies (according to coinmarketcap.com) will go up or down on day d, using data up to day d-1. Our C2P2 algorithm is the first algorithm to consider the fact that the price of a cryptocurrency c might depend not only on historical prices, sentiments, global stock indices, but also on the prices and predicted prices of other cryptocurrencies. C2P2 therefore does not predict cryptocurrency prices one coin at a time --- rather it uses similarity metrics in conjunction with collective classification to compare multiple cryptocurrency features to jointly predict the cryptocurrency prices for all 21 coins considered. We show that our C2P2 algorithm beats out a recent competing 2017 paper by margins varying from 5.1-83% and another Bitcoin-specific prediction paper from 2018 by 16%. In both cases, C2P2 is the winner on all cryptocurrencies considered. Moreover, we experimentally show that the use of similarity metrics within our C2P2 algorithm leads to a direct improvement for 20 out of 21 cryptocurrencies ranging from 0.4% to 17.8%. Without the similarity component, C2P2 still beats competitors on 20 out of 21 cryptocurrencies considered. We show that all these results are statistically significant via a Student's t-test with p<1e-5. Check our demo at https://www.cs.dartmouth.edu/dsail/demos/c2p2
CRMar 15, 2018
Distributed Data Vending on BlockchainJiayu Zhou, Fengyi Tang, He Zhu et al.
Recent advances in blockchain technologies have provided exciting opportunities for decentralized applications. Specifically, blockchain-based smart contracts enable credible transactions without authorized third parties. The attractive properties of smart contracts facilitate distributed data vending, allowing for proprietary data to be securely exchanged on a blockchain. Distributed data vending can transform domains such as healthcare by encouraging data distribution from owners and enabling large-scale data aggregation. However, one key challenge in distributed data vending is the trade-off dilemma between the effectiveness of data retrieval, and the leakage risk from indexing the data. In this paper, we propose a framework for distributed data vending through a combination of data embedding and similarity learning. We illustrate our framework through a practical scenario of distributing and aggregating electronic medical records on a blockchain. Extensive empirical results demonstrate the effectiveness of our framework.
CVSep 12, 2016
Image denoising via group sparsity residual constraintZhiyuan Zha, Xin Liu, Ziheng Zhou et al.
Group sparsity has shown great potential in various low-level vision tasks (e.g, image denoising, deblurring and inpainting). In this paper, we propose a new prior model for image denoising via group sparsity residual constraint (GSRC). To enhance the performance of group sparse-based image denoising, the concept of group sparsity residual is proposed, and thus, the problem of image denoising is translated into one that reduces the group sparsity residual. To reduce the residual, we first obtain some good estimation of the group sparse coefficients of the original image by the first-pass estimation of noisy image, and then centralize the group sparse coefficients of noisy image to the estimation. Experimental results have demonstrated that the proposed method not only outperforms many state-of-the-art denoising methods such as BM3D and WNNM, but results in a faster speed.