Chen Ding

CL
h-index74
26papers
5,202citations
Novelty48%
AI Score59

26 Papers

CLOct 25, 2024
GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P. Goucher et al. · openai

GPT-4o is an autoregressive omni model that accepts as input any combination of text, audio, image, and video, and generates any combination of text, audio, and image outputs. It's trained end-to-end across text, vision, and audio, meaning all inputs and outputs are processed by the same neural network. GPT-4o can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50\% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models. In line with our commitment to building AI safely and consistent with our voluntary commitments to the White House, we are sharing the GPT-4o System Card, which includes our Preparedness Framework evaluations. In this System Card, we provide a detailed look at GPT-4o's capabilities, limitations, and safety evaluations across multiple categories, focusing on speech-to-speech while also evaluating text and image capabilities, and measures we've implemented to ensure the model is safe and aligned. We also include third-party assessments on dangerous capabilities, as well as discussion of potential societal impacts of GPT-4o's text and vision capabilities.

CLDec 19, 2025
OpenAI GPT-5 System Card

Aaditya Singh, Adam Fry, Adam Perelman et al. · berkeley, mila

This is the system card published alongside the OpenAI GPT-5 launch, August 2025. GPT-5 is a unified system with a smart and fast model that answers most questions, a deeper reasoning model for harder problems, and a real-time router that quickly decides which model to use based on conversation type, complexity, tool needs, and explicit intent (for example, if you say 'think hard about this' in the prompt). The router is continuously trained on real signals, including when users switch models, preference rates for responses, and measured correctness, improving over time. Once usage limits are reached, a mini version of each model handles remaining queries. This system card focuses primarily on gpt-5-thinking and gpt-5-main, while evaluations for other models are available in the appendix. The GPT-5 system not only outperforms previous models on benchmarks and answers questions more quickly, but -- more importantly -- is more useful for real-world queries. We've made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy, and have leveled up GPT-5's performance in three of ChatGPT's most common uses: writing, coding, and health. All of the GPT-5 models additionally feature safe-completions, our latest approach to safety training to prevent disallowed content. Similarly to ChatGPT agent, we have decided to treat gpt-5-thinking as High capability in the Biological and Chemical domain under our Preparedness Framework, activating the associated safeguards. While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm -- our defined threshold for High capability -- we have chosen to take a precautionary approach.

CLAug 3, 2023
NBIAS: A Natural Language Processing Framework for Bias Identification in Text

Shaina Raza, Muskan Garg, Deepak John Reji et al.

Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data may end up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework NBIAS that consists of four main layers: data, corpus construction, model development and an evaluation layer. The dataset is constructed by collecting diverse data from various domains, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/ phrases through a unique named entity BIAS. In the evaluation procedure, we incorporate a blend of quantitative and qualitative measures to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.

LGJul 9, 2024
Sampling and active learning methods for network reliability estimation using K-terminal spanning tree

Chen Ding, Pengfei Wei, Yan Shi et al.

Network reliability analysis remains a challenge due to the increasing size and complexity of networks. This paper presents a novel sampling method and an active learning method for efficient and accurate network reliability estimation under node failure and edge failure scenarios. The proposed sampling method adopts Monte Carlo technique to sample component lifetimes and the K-terminal spanning tree algorithm to accelerate structure function computation. Unlike existing methods that compute only one structure function value per sample, our method generates multiple component state vectors and corresponding structure function values from each sample. Network reliability is estimated based on survival signatures derived from these values. A transformation technique extends this method to handle both node failure and edge failure. To enhance efficiency of proposed sampling method and achieve adaptability to network topology changes, we introduce an active learning method utilizing a random forest (RF) classifier. This classifier directly predicts structure function values, integrates network behaviors across diverse topologies, and undergoes iterative refinement to enhance predictive accuracy. Importantly, the trained RF classifier can directly predict reliability for variant networks, a capability beyond the sampling method alone. Through investigating several network examples and two practical applications, the effectiveness of both proposed methods is demonstrated.

QUANT-PHAug 3, 2022
Active Learning on a Programmable Photonic Quantum Processor

Chen Ding, Xiao-Yue Xu, Yun-Fei Niu et al.

Training a quantum machine learning model generally requires a large labeled dataset, which incurs high labeling and computational costs. To reduce such costs, a selective training strategy, called active learning (AL), chooses only a subset of the original dataset to learn while maintaining the trained model's performance. Here, we design and implement two AL-enpowered variational quantum classifiers, to investigate the potential applications and effectiveness of AL in quantum machine learning. Firstly, we build a programmable free-space photonic quantum processor, which enables the programmed implementation of various hybrid quantum-classical computing algorithms. Then, we code the designed variational quantum classifier with AL into the quantum processor, and execute comparative tests for the classifiers with and without the AL strategy. The results validate the great advantage of AL in quantum machine learning, as it saves at most $85\%$ labeling efforts and $91.6\%$ percent computational efforts compared to the training without AL on a data classification task. Our results inspire AL's further applications in large-scale quantum machine learning to drastically reduce training data and speed up training, underpinning the exploration of practical quantum advantages in quantum physics or real-world applications.

QUANT-PHJul 31, 2022
Parameter-Parallel Distributed Variational Quantum Algorithm

Yun-Fei Niu, Shuo Zhang, Chen Ding et al.

Variational quantum algorithms (VQAs) have emerged as a promising near-term technique to explore practical quantum advantage on noisy intermediate-scale quantum (NISQ) devices. However, the inefficient parameter training process due to the incompatibility with backpropagation and the cost of a large number of measurements, posing a great challenge to the large-scale development of VQAs. Here, we propose a parameter-parallel distributed variational quantum algorithm (PPD-VQA), to accelerate the training process by parameter-parallel training with multiple quantum processors. To maintain the high performance of PPD-VQA in the realistic noise scenarios, a alternate training strategy is proposed to alleviate the acceleration attenuation caused by noise differences among multiple quantum processors, which is an unavoidable common problem of distributed VQA. Besides, the gradient compression is also employed to overcome the potential communication bottlenecks. The achieved results suggest that the PPD-VQA could provide a practical solution for coordinating multiple quantum processors to handle large-scale real-word applications.

CLJul 14, 2023
Mitigating Bias in Conversations: A Hate Speech Classifier and Debiaser with Prompts

Shaina Raza, Chen Ding, Deval Pandya

Discriminatory language and biases are often present in hate speech during conversations, which usually lead to negative impacts on targeted groups such as those based on race, gender, and religion. To tackle this issue, we propose an approach that involves a two-step process: first, detecting hate speech using a classifier, and then utilizing a debiasing component that generates less biased or unbiased alternatives through prompts. We evaluated our approach on a benchmark dataset and observed reduction in negativity due to hate speech comments. The proposed method contributes to the ongoing efforts to reduce biases in online discourse and promote a more inclusive and fair environment for communication.

AISep 16, 2023
BG-GAN: Generative AI Enable Representing Brain Structure-Function Connections for Alzheimer's Disease

Tong Zhou, Chen Ding, Changhong Jing et al.

The relationship between brain structure and function is critical for revealing the pathogenesis of brain disorders, including Alzheimer's disease (AD). However, mapping brain structure to function connections is a very challenging task. In this work, a bidirectional graph generative adversarial network (BG-GAN) is proposed to represent brain structure-function connections. Specifically, by designing a module incorporating inner graph convolution network (InnerGCN), the generators of BG-GAN can employ features of direct and indirect brain regions to learn the mapping function between the structural domain and the functional domain. Besides, a new module named Balancer is designed to counterpoise the optimization between generators and discriminators. By introducing the Balancer into BG-GAN, both the structural generator and functional generator can not only alleviate the issue of mode collapse but also learn complementarity of structural and functional features. Experimental results using the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset show that both generated structure and function connections can improve the identification accuracy of AD. The experimental findings suggest that the relationship between brain structure and function is not a complete one-to-one correspondence. They also suggest that brain structure is the basis of brain function, and the strong structural connections are majorly accompanied by strong functional connections.

LGNov 22, 2023
Scalable CP Decomposition for Tensor Learning using GPU Tensor Cores

Zeliang Zhang, Zhuo Liu, Susan Liang et al.

CP decomposition is a powerful tool for data science, especially gene analysis, deep learning, and quantum computation. However, the application of tensor decomposition is largely hindered by the exponential increment of the computational complexity and storage consumption with the size of tensors. While the data in our real world is usually presented as trillion- or even exascale-scale tensors, existing work can only support billion-scale scale tensors. In our work, we propose the Exascale-Tensor to mitigate the significant gap. Specifically, we propose a compression-based tensor decomposition framework, namely the exascale-tensor, to support exascale tensor decomposition. Then, we carefully analyze the inherent parallelism and propose a bag of strategies to improve computational efficiency. Last, we conduct experiments to decompose tensors ranging from million-scale to trillion-scale for evaluation. Compared to the baselines, the exascale-tensor supports 8,000x larger tensors and a speedup up to 6.95x. We also apply our method to two real-world applications, including gene analysis and tensor layer neural networks, of which the numeric results demonstrate the scalability and effectiveness of our method.

CVDec 15, 2025
JoDiffusion: Jointly Diffusing Image with Pixel-Level Annotations for Semantic Segmentation Promotion

Haoyu Wang, Lei Zhang, Wenrui Liu et al.

Given the inherently costly and time-intensive nature of pixel-level annotation, the generation of synthetic datasets comprising sufficiently diverse synthetic images paired with ground-truth pixel-level annotations has garnered increasing attention recently for training high-performance semantic segmentation models. However, existing methods necessitate to either predict pseudo annotations after image generation or generate images conditioned on manual annotation masks, which incurs image-annotation semantic inconsistency or scalability problem. To migrate both problems with one stone, we present a novel dataset generative diffusion framework for semantic segmentation, termed JoDiffusion. Firstly, given a standard latent diffusion model, JoDiffusion incorporates an independent annotation variational auto-encoder (VAE) network to map annotation masks into the latent space shared by images. Then, the diffusion model is tailored to capture the joint distribution of each image and its annotation mask conditioned on a text prompt. By doing these, JoDiffusion enables simultaneously generating paired images and semantically consistent annotation masks solely conditioned on text prompts, thereby demonstrating superior scalability. Additionally, a mask optimization strategy is developed to mitigate the annotation noise produced during generation. Experiments on Pascal VOC, COCO, and ADE20K datasets show that the annotated dataset generated by JoDiffusion yields substantial performance improvements in semantic segmentation compared to existing methods.

50.4CVMay 14
UniTriGen: Unified Triplet Generation of Aligned Visible-Infrared-Label for Few-Shot RGB-T Semantic Segmentation

Ping Zhou, Haoyu Wang, Mengmeng Zheng et al.

RGB-T semantic segmentation requires strictly aligned VIS-IR-Label triplets; however, such aligned triplet data are often scarce in real-world scenarios. Existing generative augmentation methods usually adopt cascaded generation paradigms, decomposing joint triplet generation into local conditional processes. As a result, consistency among VIS, IR, and Label in spatial structure, semantic content, and cross-modal details cannot be reliably maintained. To address this issue, we propose UniTriGen, a unified triplet generation framework that directly generates spatially aligned, semantically consistent, and modality complementary VIS-IR-Label triplets under the guidance of text prompts. UniTriGen first introduces a unified triplet generation mechanism, where VIS, IR, and Label are jointly encoded into a shared latent space and modeled with a diffusion process to enforce global cross-modal consistency. Lightweight modality-specific residual adapters are further integrated into this mechanism to accommodate modality-specific imaging characteristics and output formats. To mitigate generation bias caused by imbalanced scene and class distributions in limited paired triplets, UniTriGen also employs a scene-balanced and class-aware few-shot sampling strategy, which induces a more balanced sampling distribution and enhances the scene and class diversity of generated triplets. Experiments show that UniTriGen generates high-quality aligned triplets from limited real paired data, thereby achieving consistent performance improvements across various RGB-T semantic segmentation models.

89.0PLApr 6Code
AutoLALA: Automatic Loop Algebraic Locality Analysis for AI and HPC Kernels

Yifan Zhu, Yekai Pan, Yanghui Wu et al.

Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contraction, stencil computation, and einsum operations, the cost of moving data through the memory hierarchy often exceeds the cost of arithmetic. This paper presents AutoLALA, an open-source tool that analyzes data locality in affine loop programs. The tool accepts programs written in a small domain-specific language (DSL), lowers them to polyhedral sets and maps, and produces closed-form symbolic formulas for reuse distance and data movement complexity. AutoLALA implements the fully symbolic locality analysis of Zhu et al. together with the data movement distance (DMD) framework of Smith et al. In particular, it computes reuse distance as the image of the access space under the access map, avoiding both stack simulation and Denning's recursive working-set formulation. We describe the DSL syntax and its formal semantics, the polyhedral lowering pipeline that constructs timestamp spaces and access maps via affine transformations, and the sequence of Barvinok counting operations used to derive symbolic reuse-interval and reuse-distance distributions. The system is implemented in Rust as a modular library spanning three crates, with safe bindings to the Barvinok library. We provide both a command-line interface and an interactive web playground with LaTeX rendering of the output formulas. The tool handles arbitrary affine loop nests, covering workloads such as tensor contractions, einsum expressions, stencil computations, and general polyhedral programs.

CVJul 8, 2025Code
Prompt-Free Conditional Diffusion for Multi-object Image Augmentation

Haoyu Wang, Lei Zhang, Wei Wei et al.

Diffusion models has underpinned much recent advances of dataset augmentation in various computer vision tasks. However, when involving generating multi-object images as real scenarios, most existing methods either rely entirely on text condition, resulting in a deviation between the generated objects and the original data, or rely too much on the original images, resulting in a lack of diversity in the generated images, which is of limited help to downstream tasks. To mitigate both problems with one stone, we propose a prompt-free conditional diffusion framework for multi-object image augmentation. Specifically, we introduce a local-global semantic fusion strategy to extract semantics from images to replace text, and inject knowledge into the diffusion model through LoRA to alleviate the category deviation between the original model and the target dataset. In addition, we design a reward model based counting loss to assist the traditional reconstruction loss for model training. By constraining the object counts of each category instead of pixel-by-pixel constraints, bridging the quantity deviation between the generated data and the original data while improving the diversity of the generated data. Experimental results demonstrate the superiority of the proposed method over several representative state-of-the-art baselines and showcase strong downstream task gain and out-of-domain generalization capabilities. Code is available at \href{https://github.com/00why00/PFCD}{here}.

30.0SIMay 10
Astro Generative Network: A Variational Framework for Controlled Node Insertion in Incomplete Complex Networks

Mehrdad Jalali, Binh Vu, Swati Chandna et al.

Empirical networked systems are often only partially observed: sampling frames, crawling policies, privacy constraints, and temporal gaps can leave actors and edges unobserved. This complicates robustness and sensitivity analysis because many graph-learning pipelines implicitly treat the observed node set as exhaustive. Link prediction and graph completion repair structure among known vertices, whereas full-graph generators synthesize new graphs rather than extending an observed one as a fixed backbone. We study the complementary task of controlled node insertion: generating plausible new actors and attaching them to an existing graph while preserving interpretable global topology. We introduce the Astro Generative Network (AGN), a variational graph autoencoder that samples latent vectors to decode node features and then integrates new vertices through similarity-based attachment to the observed backbone. We distinguish the recommended configuration, AGN, from AGN-original, a diagnostic baseline that permits generated-generated edges. Across three synthetic regimes, AGN-original forms dense generated-generated subgraphs that artificially inflate clustering and density. Disabling those edges removes this artifact while preserving degree and path-length behavior. In our experiments, AGN keeps clustering and modularity changes modest relative to pre-insertion values, while novelty diagnostics show non-trivial separation from existing nodes without claiming domain-grounded identities. Our contribution is methodological: a reproducible insertion protocol and evaluation lens for incomplete network science and engineering

51.8PLMar 10
Fully Symbolic Analysis of Loop Locality: Using Imaginary Reuse to Infer Real Performance

Yifan Zhu, Yekai Pan, Chen Ding et al.

This paper presents a new theory of locality and its compiler support. The theory is fully symbolic and derives locality as polynomials, and the compiler analysis supports affine loop nests. They derive cache-performance scaling in quadratic and reciprocal expressions and are more general and precise than empirical scaling rules. Evaluated on a benchmark suite of 41 scientific kernels and tensor operations, the compiler requires an average of 41 seconds to derive the locality polynomials. After derivation, predicting the cache miss count for any given input size and cache configuration takes less than a millisecond. Across all tests--with and without loop fusion--the accuracy in the data movement prediction is 99.6\%, compared to simulated set-associative L1 data cache.

CLDec 18, 2024
Fake News Detection: Comparative Evaluation of BERT-like Models and Large Language Models with Generative AI-Annotated Data

Shaina Raza, Drai Paulen-Patterson, Chen Ding

Fake news poses a significant threat to public opinion and social stability in modern society. This study presents a comparative evaluation of BERT-like encoder-only models and autoregressive decoder-only large language models (LLMs) for fake news detection. We introduce a dataset of news articles labeled with GPT-4 assistance (an AI-labeling method) and verified by human experts to ensure reliability. Both BERT-like encoder-only models and LLMs were fine-tuned on this dataset. Additionally, we developed an instruction-tuned LLM approach with majority voting during inference for label generation. Our analysis reveals that BERT-like models generally outperform LLMs in classification tasks, while LLMs demonstrate superior robustness against text perturbations. Compared to weak labels (distant supervision) data, the results show that AI labels with human supervision achieve better classification results. This study highlights the effectiveness of combining AI-based annotation with human oversight and demonstrates the performance of different families of machine learning models for fake news detection

PFJan 22
Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB10

Yifan Zhu, Yekai Pan, Chen Ding

High-performance attention kernels are essential for Large Language Models. This paper presents analysis of CuTile-based Flash Attention memory behavior and a technique to improve its cache performance. In particular, our analysis on the NVIDIA GB10 (Grace Blackwell) identifies the main cause of L2 cache miss. Leveraging this insight, we introduce a new programming technique called Sawtooth Wavefront Reordering that reduces L2 misses. We validate it in both CUDA and CuTile, observing 50\% or greater reduction in L2 misses and up to 60\% increase in throughput on GB10.

CVNov 3, 2024
Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning

Fei Zhou, Peng Wang, Lei Zhang et al.

Meta-learning offers a promising avenue for few-shot learning (FSL), enabling models to glean a generalizable feature embedding through episodic training on synthetic FSL tasks in a source domain. Yet, in practical scenarios where the target task diverges from that in the source domain, meta-learning based method is susceptible to over-fitting. To overcome this, we introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning, which is crafted to comprehensively exploit the cross-domain transferable image prior that each image can be decomposed into complementary low-frequency content details and high-frequency robust structural characteristics. Motivated by this insight, we propose to decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network to enhance the final category prediction. More importantly, we introduce a feature reconstruction prior and a prediction consistency prior to separately encourage the consistency of the intermediate feature as well as the final category prediction between the original query image and its decomposed frequency components. This allows for collectively guiding the network's meta-learning process with the aim of learning generalizable image feature embeddings, while not introducing any extra computational cost in the inference phase. Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.

SEJul 16, 2025
QSpark: Towards Reliable Qiskit Code Generation

Kiana Kheiri, Aamna Aamir, Andriy Miranskyy et al.

Quantum circuits must be error-resilient, yet LLMs like Granite-20B-Code and StarCoder often output flawed Qiskit code. We fine-tuned the Qwen2.5-Coder-32B model with two RL methods, Group Relative Policy Optimization (GRPO) and Odds-Ratio Preference Optimization (ORPO), using a richly annotated synthetic dataset. On the Qiskit HumanEval benchmark, ORPO reaches 56.29% Pass@1 ($\approx+10$ pp over Granite-8B-QK) and GRPO hits 49%, both beating all general-purpose baselines; on the original HumanEval they score 65.90% and 63.00%. GRPO performs well on basic tasks (44/78) and excels on intermediate ones (41/68), but neither GRPO nor ORPO solves any of the five advanced tasks, highlighting clear gains yet room for progress in AI-assisted quantum programming.

QUANT-PHDec 18, 2024
AI-Powered Algorithm-Centric Quantum Processor Topology Design

Tian Li, Xiao-Yue Xu, Chen Ding et al.

Quantum computing promises to revolutionize various fields, yet the execution of quantum programs necessitates an effective compilation process. This involves strategically mapping quantum circuits onto the physical qubits of a quantum processor. The qubits' arrangement, or topology, is pivotal to the circuit's performance, a factor that often defies traditional heuristic or manual optimization methods due to its complexity. In this study, we introduce a novel approach leveraging reinforcement learning to dynamically tailor qubit topologies to the unique specifications of individual quantum circuits, guiding algorithm-driven quantum processor topology design for reducing the depth of mapped circuit, which is particularly critical for the output accuracy on noisy quantum processors. Our method marks a significant departure from previous methods that have been constrained to mapping circuits onto a fixed processor topology. Experiments demonstrate that we have achieved notable enhancements in circuit performance, with a minimum of 20\% reduction in circuit depth in 60\% of the cases examined, and a maximum enhancement of up to 46\%. Furthermore, the pronounced benefits of our approach in reducing circuit depth become increasingly evident as the scale of the quantum circuits increases, exhibiting the scalability of our method in terms of problem size. This work advances the co-design of quantum processor architecture and algorithm mapping, offering a promising avenue for future research and development in the field.

CLJan 19, 2024
FAIR Enough: How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training?

Shaina Raza, Shardul Ghuge, Chen Ding et al.

The rapid evolution of Large Language Models (LLMs) highlights the necessity for ethical considerations and data integrity in AI development, particularly emphasizing the role of FAIR (Findable, Accessible, Interoperable, Reusable) data principles. While these principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area. This research gap is the focus of our study, which begins with an examination of existing literature to underline the importance of FAIR principles in managing data for LLM training. Building upon this, we propose a novel framework designed to integrate FAIR principles into the LLM development lifecycle. A contribution of our work is the development of a comprehensive checklist intended to guide researchers and developers in applying FAIR data principles consistently across the model development process. The utility and effectiveness of our framework are validated through a case study on creating a FAIR-compliant dataset aimed at detecting and mitigating biases in LLMs. We present this framework to the community as a tool to foster the creation of technologically advanced, ethically grounded, and socially responsible AI models.

SYDec 22, 2023
DMC4ML: Data Movement Complexity for Machine Learning

Chen Ding, Christopher Kanan, Dylan McKellips et al.

The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources of the memory cost. Finally, the result is symbolic, which can be used to select algorithmic parameters such as the group size in grouped query attention for any dimension size and number of heads and the batch size for batched convolution for any image size and kernel size.

SDSep 5, 2021
The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

Keke Wang, Xudong Mao, Hao Wu et al.

This paper describes the ByteDance speaker diarization system for the fourth track of the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21). The VoxSRC-21 provides both the dev set and test set of VoxConverse for use in validation and a standalone test set for evaluation. We first collect the duration and signal-to-noise ratio (SNR) of all audio and find that the distribution of the VoxConverse's test set and the VoxSRC-21's test set is more closer. Our system consists of voice active detection (VAD), speaker embedding extraction, spectral clustering followed by a re-clustering step based on agglomerative hierarchical clustering (AHC) and overlapped speech detection and handling. Finally, we integrate systems with different time scales using DOVER-Lap. Our best system achieves 5.15\% of the diarization error rate (DER) on evaluation set, ranking the second at the diarization track of the challenge.

IRMar 15, 2021
Deep Dynamic Neural Network to trade-off between Accuracy and Diversity in a News Recommender System

Shaina Raza, Chen Ding

The news recommender systems are marked by a few unique challenges specific to the news domain. These challenges emerge from rapidly evolving readers' interests over dynamically generated news items that continuously change over time. News reading is also driven by a blend of a reader's long-term and short-term interests. In addition, diversity is required in a news recommender system, not only to keep the reader engaged in the reading process but to get them exposed to different views and opinions. In this paper, we propose a deep neural network that jointly learns informative news and readers' interests into a unified framework. We learn the news representation (features) from the headlines, snippets (body) and taxonomy (category, subcategory) of news. We learn a reader's long-term interests from the reader's click history, short-term interests from the recent clicks via LSTMSs and the diversified reader's interests through the attention mechanism. We also apply different levels of attention to our model. We conduct extensive experiments on two news datasets to demonstrate the effectiveness of our approach.

IRSep 10, 2020
News Recommender System: A review of recent progress, challenges, and opportunities

Shaina Raza, Chen Ding

Nowadays, more and more news readers tend to read news online where they have access to millions of news articles from multiple sources. In order to help users to find the right and relevant content, news recommender systems (NRS) are developed to relieve the information overload problem and suggest news items that users might be interested in. In this paper, we highlight the major challenges faced by the news recommendation domain and identify the possible solutions from the state-of-the-art. Due to the rapid growth of building recommender systems using deep learning models, we divide our discussion in two parts. In the first part, we present an overview of the conventional recommendation solutions, datasets, evaluation criteria beyond accuracy and recommendation platforms being used in NRS. In the second part, we explain the deep learning-based recommendation solutions applied in NRS. Different from previous surveys, we also study the effects of news recommendations on user behavior and try to suggest the possible remedies to mitigate these effects. By providing the state-of-the-art knowledge, this survey can help researchers and practical professionals in their understanding of developments in news recommendation algorithms. It also sheds light on potential new directions

LGJun 21, 2019
Quantum-Inspired Support Vector Machine

Chen Ding, Tian-Yi Bao, He-Liang Huang

Support vector machine (SVM) is a particularly powerful and flexible supervised learning model that analyzes data for both classification and regression, whose usual algorithm complexity scales polynomially with the dimension of data space and the number of data points. To tackle the big data challenge, a quantum SVM algorithm was proposed, which is claimed to achieve exponential speedup for least squares SVM (LS-SVM). Here, inspired by the quantum SVM algorithm, we present a quantum-inspired classical algorithm for LS-SVM. In our approach, a improved fast sampling technique, namely indirect sampling, is proposed for sampling the kernel matrix and classifying. We first consider the LS-SVM with a linear kernel, and then discuss the generalization of our method to non-linear kernels. Theoretical analysis shows our algorithm can make classification with arbitrary success probability in logarithmic runtime of both the dimension of data space and the number of data points for low rank, low condition number and high dimensional data matrix, matching the runtime of the quantum SVM.