SDOct 6, 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning EraAndreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen et al.
Speech is the fundamental mode of human communication, and its synthesis has long been a core priority in human-computer interaction research. In recent years, machines have managed to master the art of generating speech that is understandable by humans. But the linguistic content of an utterance encompasses only a part of its meaning. Affect, or expressivity, has the capacity to turn speech into a medium capable of conveying intimate thoughts, feelings, and emotions -- aspects that are essential for engaging and naturalistic interpersonal communication. While the goal of imparting expressivity to synthesised utterances has so far remained elusive, following recent advances in text-to-speech synthesis, a paradigm shift is well under way in the fields of affective speech synthesis and conversion as well. Deep learning, as the technology which underlies most of the recent advances in artificial intelligence, is spearheading these efforts. In the present overview, we outline ongoing trends and summarise state-of-the-art approaches in an attempt to provide a comprehensive overview of this exciting field.
SDJun 22, 2022
Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal ExpressionMeishu Song, Zijiang Yang, Andreas Triantafyllopoulos et al.
We propose a novel Dynamic Restrained Uncertainty Weighting Loss to experimentally handle the problem of balancing the contributions of multiple tasks on the ICML ExVo 2022 Challenge. The multitask aims to recognize expressed emotions and demographic traits from vocal bursts jointly. Our strategy combines the advantages of Uncertainty Weight and Dynamic Weight Average, by extending weights with a restraint term to make the learning process more explainable. We use a lightweight multi-exit CNN architecture to implement our proposed loss approach. The experimental H-Mean score (0.394) shows a substantial improvement over the baseline H-Mean score (0.335).
CVSep 24, 2023
MM-NeRF: Multimodal-Guided 3D Multi-Style Transfer of Neural Radiance FieldZijiang Yang, Zhongwei Qiu, Chang Xu et al.
3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism is proposed to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance.
NEMay 16, 2022
Explanation-Guided Fairness Testing through Genetic AlgorithmMing Fan, Wenying Wei, Wuxia Jin et al.
The fairness characteristic is a critical attribute of trusted AI systems. A plethora of research has proposed diverse methods for individual fairness testing. However, they are suffering from three major limitations, i.e., low efficiency, low effectiveness, and model-specificity. This work proposes ExpGA, an explanationguided fairness testing approach through a genetic algorithm (GA). ExpGA employs the explanation results generated by interpretable methods to collect high-quality initial seeds, which are prone to derive discriminatory samples by slightly modifying feature values. ExpGA then adopts GA to search discriminatory sample candidates by optimizing a fitness value. Benefiting from this combination of explanation results and GA, ExpGA is both efficient and effective to detect discriminatory individuals. Moreover, ExpGA only requires prediction probabilities of the tested model, resulting in a better generalization capability to various models. Experiments on multiple real-world benchmarks, including tabular and text datasets, show that ExpGA presents higher efficiency and effectiveness than four state-of-the-art approaches.
LGOct 11, 2023
RaftFed: A Lightweight Federated Learning Framework for Vehicular Crowd IntelligenceChangan Yang, Yaxing Chen, Yao Zhang et al.
Vehicular crowd intelligence (VCI) is an emerging research field. Facilitated by state-of-the-art vehicular ad-hoc networks and artificial intelligence, various VCI applications come to place, e.g., collaborative sensing, positioning, and mapping. The collaborative property of VCI applications generally requires data to be shared among participants, thus forming network-wide intelligence. How to fulfill this process without compromising data privacy remains a challenging issue. Although federated learning (FL) is a promising tool to solve the problem, adapting conventional FL frameworks to VCI is nontrivial. First, the centralized model aggregation is unreliable in VCI because of the existence of stragglers with unfavorable channel conditions. Second, existing FL schemes are vulnerable to Non-IID data, which is intensified by the data heterogeneity in VCI. This paper proposes a novel federated learning framework called RaftFed to facilitate privacy-preserving VCI. The experimental results show that RaftFed performs better than baselines regarding communication overhead, model accuracy, and model convergence.
ROAug 6, 2024
Adversarial Safety-Critical Scenario Generation using Naturalistic Human Driving PriorsKunkun Hao, Yonggang Luo, Wen Cui et al.
Evaluating the decision-making system is indispensable in developing autonomous vehicles, while realistic and challenging safety-critical test scenarios play a crucial role. Obtaining these scenarios is non-trivial, thanks to the long-tailed distribution, sparsity, and rarity in real-world data sets. To tackle this problem, in this paper, we introduce a natural adversarial scenario generation solution using naturalistic human driving priors and reinforcement learning techniques. By doing this, we can obtain large-scale test scenarios that are both diverse and realistic. Specifically, we build a simulation environment that mimics natural traffic interaction scenarios. Informed by this environment, we implement a two-stage procedure. The first stage incorporates conventional rule-based models, e.g., IDM~(Intelligent Driver Model) and MOBIL~(Minimizing Overall Braking Induced by Lane changes) model, to coarsely and discretely capture and calibrate key control parameters from the real-world dataset. Next, we leverage GAIL~(Generative Adversarial Imitation Learning) to represent driver behaviors continuously. The derived GAIL can be further used to design a PPO~(Proximal Policy Optimization)-based actor-critic network framework to fine-tune the reward function, and then optimizes our natural adversarial scenario generation solution. Extensive experiments have been conducted in the NGSIM dataset including the trajectory of 3,000 vehicles. Essential traffic parameters were measured in comparison with the baseline model, e.g., the collision rate, accelerations, steering, and the number of lane changes. Our findings demonstrate that the proposed model can generate realistic safety-critical test scenarios covering both naturalness and adversariality, which can be a cornerstone for the development of autonomous vehicles.
SDJun 14, 2022
Exploring speaker enrolment for few-shot personalisation in emotional vocalisation predictionAndreas Triantafyllopoulos, Meishu Song, Zijiang Yang et al.
In this work, we explore a novel few-shot personalisation architecture for emotional vocalisation prediction. The core contribution is an `enrolment' encoder which utilises two unlabelled samples of the target speaker to adjust the output of the emotion encoder; the adjustment is based on dot-product attention, thus effectively functioning as a form of `soft' feature selection. The emotion and enrolment encoders are based on two standard audio architectures: CNN14 and CNN10. The two encoders are further guided to forget or learn auxiliary emotion and/or speaker information. Our best approach achieves a CCC of $.650$ on the ExVo Few-Shot dev set, a $2.5\%$ increase over our baseline CNN14 CCC of $.634$.
CLApr 16
Robust Reward Modeling for Large Language Models via Causal DecompositionYunsheng Lu, Zijiang Yang, Licheng Pan et al.
Reward models are central to aligning large language models, yet they often overfit to spurious cues such as response length and overly agreeable tone. Most prior work weakens these cues directly by penalizing or controlling specific artifacts, but it does not explicitly encourage the model to ground preferences in the prompt's intent. We learn a decoder that maps a candidate answer to the latent intent embedding of the input. The reconstruction error is used as a signal to regularize the reward model training. We provide theoretical evidence that this signal emphasizes prompt-dependent information while suppressing prompt-independent shortcuts. Across math, helpfulness, and safety benchmarks, the decoder selects shorter and less sycophantic candidates with 0.877 accuracy. Incorporating this signal into RM training in Gemma-2-2B-it and Gemma-2-9B-it increases RewardBench accuracy from 0.832 to 0.868. For Best-of-N selection, our framework increases length-controlled win rates while producing shorter outputs, and remains robust to lengthening and mild off-topic drift in controlled rewrite tests.
CVMar 4Code
DeNuC: Decoupling Nuclei Detection and Classification in HistopathologyZijiang Yang, Chen Kuang, Dongmei Fu
Pathology Foundation Models (FMs) have shown strong performance across a wide range of pathology image representation and diagnostic tasks. However, FMs do not exhibit the expected performance advantage over traditional specialized models in Nuclei Detection and Classification (NDC). In this work, we reveal that jointly optimizing nuclei detection and classification leads to severe representation degradation in FMs. Moreover, we identify that the substantial intrinsic disparity in task difficulty between nuclei detection and nuclei classification renders joint NDC optimization unnecessarily computationally burdensome for the detection stage. To address these challenges, we propose DeNuC, a simple yet effective method designed to break through existing bottlenecks by Decoupling Nuclei detection and Classification. DeNuC employs a lightweight model for accurate nuclei localization, subsequently leveraging a pathology FM to encode input images and query nucleus-specific features based on the detected coordinates for classification. Extensive experiments on three widely used benchmarks demonstrate that DeNuC effectively unlocks the representational potential of FMs for NDC and significantly outperforms state-of-the-art methods. Notably, DeNuC improves F1 scores by 4.2% and 3.6% (or higher) on the BRCAM2C and PUMA datasets, respectively, while using only 16% (or fewer) trainable parameters compared to other methods. Code is available at https://github.com/ZijiangY1116/DeNuC.
CVJun 23, 2023
An Overview about Emerging Technologies of Autonomous DrivingYu Huang, Yue Chen, Zijiang Yang
Since DARPA started Grand Challenges in 2004 and Urban Challenges in 2007, autonomous driving has been the most active field of AI applications. This paper gives an overview about technical aspects of autonomous driving technologies and open problems. We investigate the major fields of self-driving systems, such as perception, mapping and localization, prediction, planning and control, simulation, V2X and safety etc. Especially we elaborate on all these issues in a framework of data closed loop, a popular platform to solve the long tailed autonomous driving problems.
CVNov 7, 2025
MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and ClassificationZijiang Yang, Hanqing Chao, Bokai Zhao et al.
Nucleus detection and classification (NDC) in histopathology analysis is a fundamental task that underpins a wide range of high-level pathology applications. However, existing methods heavily rely on labor-intensive nucleus-level annotations and struggle to fully exploit large-scale unlabeled data for learning discriminative nucleus representations. In this work, we propose MUSE (MUlti-scale denSE self-distillation), a novel self-supervised learning method tailored for NDC. At its core is NuLo (Nucleus-based Local self-distillation), a coordinate-guided mechanism that enables flexible local self-distillation based on predicted nucleus positions. By removing the need for strict spatial alignment between augmented views, NuLo allows critical cross-scale alignment, thus unlocking the capacity of models for fine-grained nucleus-level representation. To support MUSE, we design a simple yet effective encoder-decoder architecture and a large field-of-view semi-supervised fine-tuning strategy that together maximize the value of unlabeled pathology images. Extensive experiments on three widely used benchmarks demonstrate that MUSE effectively addresses the core challenges of histopathological NDC. The resulting models not only surpass state-of-the-art supervised baselines but also outperform generic pathology foundation models.
LGNov 18, 2023
Bridging Data-Driven and Knowledge-Driven Approaches for Safety-Critical Scenario Generation in Automated Vehicle ValidationKunkun Hao, Lu Liu, Wen Cui et al.
Automated driving vehicles~(ADV) promise to enhance driving efficiency and safety, yet they face intricate challenges in safety-critical scenarios. As a result, validating ADV within generated safety-critical scenarios is essential for both development and performance evaluations. This paper investigates the complexities of employing two major scenario-generation solutions: data-driven and knowledge-driven methods. Data-driven methods derive scenarios from recorded datasets, efficiently generating scenarios by altering the existing behavior or trajectories of traffic participants but often falling short in considering ADV perception; knowledge-driven methods provide effective coverage through expert-designed rules, but they may lead to inefficiency in generating safety-critical scenarios within that coverage. To overcome these challenges, we introduce BridgeGen, a safety-critical scenario generation framework, designed to bridge the benefits of both methodologies. Specifically, by utilizing ontology-based techniques, BridgeGen models the five scenario layers in the operational design domain (ODD) from knowledge-driven methods, ensuring broad coverage, and incorporating data-driven strategies to efficiently generate safety-critical scenarios. An optimized scenario generation toolkit is developed within BridgeGen. This expedites the crafting of safety-critical scenarios through a combination of traditional optimization and reinforcement learning schemes. Extensive experiments conducted using Carla simulator demonstrate the effectiveness of BridgeGen in generating diverse safety-critical scenarios.
ROMar 10
Provably Safe Trajectory Generation for Manipulators Under Motion and Environmental UncertaintiesFei Meng, Zijiang Yang, Xinyu Mao et al.
Robot manipulators operating in uncertain and non-convex environments present significant challenges for safe and optimal motion planning. Existing methods often struggle to provide efficient and formally certified collision risk guarantees, particularly when dealing with complex geometries and non-Gaussian uncertainties. This article proposes a novel risk-bounded motion planning framework to address this unmet need. Our approach integrates a rigid manipulator deep stochastic Koopman operator (RM-DeSKO) model to robustly predict the robot's state distribution under motion uncertainty. We then introduce an efficient, hierarchical verification method that combines parallelizable physics simulations with sum-of-squares (SOS) programming as a filter for fine-grained, formal certification of collision risk. This method is embedded within a Model Predictive Path Integral (MPPI) controller that uniquely utilizes binary collision information from SOS decomposition to improve its policy. The effectiveness of the proposed framework is validated on two typical robot manipulators through extensive simulations and real-world experiments, including a challenging human-robot collaboration scenario, demonstrating sim-to-real transfer of the learned model and its ability to generate safe and efficient trajectories in complex, uncertain settings.
QMAug 24, 2025Code
Neural Proteomics Fields for Super-resolved Spatial Proteomics PredictionBokai Zhao, Weiyang Shi, Hanqing Chao et al.
Spatial proteomics maps protein distributions in tissues, providing transformative insights for life sciences. However, current sequencing-based technologies suffer from low spatial resolution, and substantial inter-tissue variability in protein expression further compromises the performance of existing molecular data prediction methods. In this work, we introduce the novel task of spatial super-resolution for sequencing-based spatial proteomics (seq-SP) and, to the best of our knowledge, propose the first deep learning model for this task--Neural Proteomics Fields (NPF). NPF formulates seq-SP as a protein reconstruction problem in continuous space by training a dedicated network for each tissue. The model comprises a Spatial Modeling Module, which learns tissue-specific protein spatial distributions, and a Morphology Modeling Module, which extracts tissue-specific morphological features. Furthermore, to facilitate rigorous evaluation, we establish an open-source benchmark dataset, Pseudo-Visium SP, for this task. Experimental results demonstrate that NPF achieves state-of-the-art performance with fewer learnable parameters, underscoring its potential for advancing spatial proteomics research. Our code and dataset are publicly available at https://github.com/Bokai-Zhao/NPF.
LGJul 28, 2019Code
A real-time iterative machine learning approach for temperature profile prediction in additive manufacturing processesArindam Paul, Mojtaba Mozaffar, Zijiang Yang et al.
Additive Manufacturing (AM) is a manufacturing paradigm that builds three-dimensional objects from a computer-aided design model by successively adding material layer by layer. AM has become very popular in the past decade due to its utility for fast prototyping such as 3D printing as well as manufacturing functional parts with complex geometries using processes such as laser metal deposition that would be difficult to create using traditional machining. As the process for creating an intricate part for an expensive metal such as Titanium is prohibitive with respect to cost, computational models are used to simulate the behavior of AM processes before the experimental run. However, as the simulations are computationally costly and time-consuming for predicting multiscale multi-physics phenomena in AM, physics-informed data-driven machine-learning systems for predicting the behavior of AM processes are immensely beneficial. Such models accelerate not only multiscale simulation tools but also empower real-time control systems using in-situ data. In this paper, we design and develop essential components of a scientific framework for developing a data-driven model-based real-time control system. Finite element methods are employed for solving time-dependent heat equations and developing the database. The proposed framework uses extremely randomized trees - an ensemble of bagged decision trees as the regression algorithm iteratively using temperatures of prior voxels and laser information as inputs to predict temperatures of subsequent voxels. The models achieve mean absolute percentage errors below 1% for predicting temperature profiles for AM processes. The code is made available for the research community at https://github.com/paularindam/ml-iter-additive.
SEMar 19, 2018Code
Automated Localization for Unreproducible BuildsZhilei Ren, He Jiang, Jifeng Xuan et al.
Reproducibility is the ability of recreating identical binaries under pre-defined build environments. Due to the need of quality assurance and the benefit of better detecting attacks against build environments, the practice of reproducible builds has gained popularity in many open-source software repositories such as Debian and Bitcoin. However, identifying the unreproducible issues remains a labour intensive and time consuming challenge, because of the lacking of information to guide the search and the diversity of the causes that may lead to the unreproducible binaries. In this paper we propose an automated framework called RepLoc to localize the problematic files for unreproducible builds. RepLoc features a query augmentation component that utilizes the information extracted from the build logs, and a heuristic rule-based filtering component that narrows the search scope. By integrating the two components with a weighted file ranking module, RepLoc is able to automatically produce a ranked list of files that are helpful in locating the problematic files for the unreproducible builds. We have implemented a prototype and conducted extensive experiments over 671 real-world unreproducible Debian packages in four different categories. By considering the topmost ranked file only, RepLoc achieves an accuracy rate of 47.09%. If we expand our examination to the top ten ranked files in the list produced by RepLoc, the accuracy rate becomes 79.28%. Considering that there are hundreds of source code, scripts, Makefiles, etc., in a package, RepLoc significantly reduces the scope of localizing problematic files. Moreover, with the help of RepLoc, we successfully identified and fixed six new unreproducible packages from Debian and Guix.
LGMay 5
Enhance the after-discharge mortality rate prediction via learning from the medical notesZijiang Yang
With the increase of the Electronic Health Records (EHR) data, more and more researchers are developing machine learning models to learn from the medical notes. These unstructured text data pose significant challenges on the learning process as the quality of data is low. These data are often messy, repetitive and redundant. We have shown these notes data to be informative by conducting the after-discharge mortality rate prediction task. The AUC-ROC for models using the medical note information is generally 0.1 higher than those without the medical notes. Furthermore, we propose the Deep Neural Network(DNN) model with 'pooling' mechanism to enhance the mortality prediction. Based on the experimental results, we demonstrate that the proposed model outperforms the traditional machine learning models like the tree-based models. The proposed method learns from the most informative medical notes and improves the prediction accuracy significantly. The AUC-ROC for the proposed model is 2% to 14% higher than the traditional ones in 15-days, 30-days, 60-days, 365-days after-discharge mortality prediction tasks. Moreover, we can discover some interesting knowledge through the traditional and proposed models. These knowledge are inspiring but also consistent with the previous findings. The models are able to reveal the relationships between the informative keywords and documents from the medical notes and the severity of the patients.
SDJan 8, 2025
MADUV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization ChallengeZijiang Yang, Meishu Song, Xin Jing et al.
The Mice Autism Detection via Ultrasound Vocalization (MADUV) Challenge introduces the first INTERSPEECH challenge focused on detecting autism spectrum disorder (ASD) in mice through their vocalizations. Participants are tasked with developing models to automatically classify mice as either wild-type or ASD models based on recordings with a high sampling rate. Our baseline system employs a simple CNN-based classification using three different spectrogram features. Results demonstrate the feasibility of automated ASD detection, with the considered audible-range features achieving the best performance (UAR of 0.600 for segment-level and 0.625 for subject-level classification). This challenge bridges speech technology and biomedical research, offering opportunities to advance our understanding of ASD models through machine learning approaches. The findings suggest promising directions for vocalization analysis and highlight the potential value of audible and ultrasound vocalizations in ASD detection.
ROMar 7, 2024
LitSim: A Conflict-aware Policy for Long-term Interactive Traffic SimulationHaojie Xin, Xiaodong Zhang, Renzhi Tang et al.
Simulation is pivotal in evaluating the performance of autonomous driving systems due to the advantages of high efficiency and low cost compared to on-road testing. Bridging the gap between simulation and the real world requires realistic agent behaviors. However, the existing works have the following shortcomings in achieving this goal: (1) log replay offers realistic scenarios but often leads to collisions due to the absence of dynamic interactions, and (2) both heuristic-based and data-based solutions, which are parameterized and trained on real-world datasets, encourage interactions but often deviate from real-world data over long horizons. In this work, we propose LitSim, a long-term interactive simulation approach that maximizes realism by minimizing the interventions in the log. Specifically, our approach primarily uses log replay to ensure realism and intervenes only when necessary to prevent potential conflicts. We then encourage interactions among the agents and resolve the conflicts, thereby reducing the risk of unrealistic behaviors. We train and validate our model on the real-world dataset NGSIM, and the experimental results demonstrate that LitSim outperforms the currently popular approaches in terms of realism and reactivity.
LGApr 17, 2024
Virtual Foundry Graphnet for Metal Sintering Deformation PredictionRachel, Chen, Juheon Lee et al.
Metal Sintering is a necessary step for Metal Injection Molded parts and binder jet such as HP's metal 3D printer. The metal sintering process introduces large deformation varying from 25 to 50% depending on the green part porosity. In this paper, we use a graph-based deep learning approach to predict the part deformation, which can speed up the deformation simulation substantially at the voxel level. Running a well-trained Metal Sintering inferencing engine only takes a range of seconds to obtain the final sintering deformation value. The tested accuracy on example complex geometry achieves 0.7um mean deviation for a 63mm testing part.
LGDec 15, 2023
Bridging the Semantic-Numerical Gap: A Numerical Reasoning Method of Cross-modal Knowledge Graph for Material Property PredictionGuangxuan Song, Dongmei Fu, Zhongwei Qiu et al.
Using machine learning (ML) techniques to predict material properties is a crucial research topic. These properties depend on numerical data and semantic factors. Due to the limitations of small-sample datasets, existing methods typically adopt ML algorithms to regress numerical properties or transfer other pre-trained knowledge graphs (KGs) to the material. However, these methods cannot simultaneously handle semantic and numerical information. In this paper, we propose a numerical reasoning method for material KGs (NR-KG), which constructs a cross-modal KG using semantic nodes and numerical proxy nodes. It captures both types of information by projecting KG into a canonical KG and utilizes a graph neural network to predict material properties. In this process, a novel projection prediction loss is proposed to extract semantic features from numerical information. NR-KG facilitates end-to-end processing of cross-modal data, mining relationships and cross-modal information in small-sample datasets, and fully utilizes valuable experimental data to enhance material prediction. We further propose two new High-Entropy Alloys (HEA) property datasets with semantic descriptions. NR-KG outperforms state-of-the-art (SOTA) methods, achieving relative improvements of 25.9% and 16.1% on two material datasets. Besides, NR-KG surpasses SOTA methods on two public physical chemistry molecular datasets, showing improvements of 22.2% and 54.3%, highlighting its potential application and generalizability. We hope the proposed datasets, algorithms, and pre-trained models can facilitate the communities of KG and AI for materials.
CVDec 21, 2024
From Pixels to Gigapixels: Bridging Local Inductive Bias and Long-Range Dependencies with Pixel-MambaZhongwei Qiu, Hanqing Chao, Tiancheng Lin et al.
Histopathology plays a critical role in medical diagnostics, with whole slide images (WSIs) offering valuable insights that directly influence clinical decision-making. However, the large size and complexity of WSIs may pose significant challenges for deep learning models, in both computational efficiency and effective representation learning. In this work, we introduce Pixel-Mamba, a novel deep learning architecture designed to efficiently handle gigapixel WSIs. Pixel-Mamba leverages the Mamba module, a state-space model (SSM) with linear memory complexity, and incorporates local inductive biases through progressively expanding tokens, akin to convolutional neural networks. This enables Pixel-Mamba to hierarchically combine both local and global information while efficiently addressing computational challenges. Remarkably, Pixel-Mamba achieves or even surpasses the quantitative performance of state-of-the-art (SOTA) foundation models that were pretrained on millions of WSIs or WSI-text pairs, in a range of tumor staging and survival analysis tasks, {\bf even without requiring any pathology-specific pretraining}. Extensive experiments demonstrate the efficacy of Pixel-Mamba as a powerful and efficient framework for end-to-end WSI analysis.
CLDec 10, 2024
Improving the Natural Language Inference robustness to hard dataset by data augmentation and preprocessingZijiang Yang
Natural Language Inference (NLI) is the task of inferring whether the hypothesis can be justified by the given premise. Basically, we classify the hypothesis into three labels(entailment, neutrality and contradiction) given the premise. NLI was well studied by the previous researchers. A number of models, especially the transformer based ones, have achieved significant improvement on these tasks. However, it is reported that these models are suffering when they are dealing with hard datasets. Particularly, they perform much worse when dealing with unseen out-of-distribution premise and hypothesis. They may not understand the semantic content but learn the spurious correlations. In this work, we propose the data augmentation and preprocessing methods to solve the word overlap, numerical reasoning and length mismatch problems. These methods are general methods that do not rely on the distribution of the testing data and they help improve the robustness of the models.
CVMar 7
NuNext: Reframing Nucleus Detection as Next-Point DetectionZhongyi Shui, Honglin Li, Xiaozhong Ji et al.
Nucleus detection in histopathology is pivotal for a wide range of clinical applications. Existing approaches either regress nuclear proxy maps that require complex post-processing, or employ dense anchors or queries that introduce severe foreground-background imbalance. In this work, we reformulate nucleus detection as next-point prediction, wherein a multimodal large language model is developed to directly output foreground nucleus centroids from the input image. The model is trained in two stages. In the supervised learning stage, we propose spatial-aware soft supervision to relax strict centroid matching and a chain-of-visual-thought strategy to incorporate visual priors that facilitate coordinate prediction. In the reinforcement fine-tuning stage, we design distribution matching reward, low-variance group filtering, and fine-grained advantage shaping to further improve the model's detection quality. Extensive experiments on nine widely used benchmarks demonstrate the superiority of our method. Code will be released soon.
LGDec 9, 2025
Long-only cryptocurrency portfolio management by ranking the assets: a neural network approachZijiang Yang
This paper will propose a novel machine learning based portfolio management method in the context of the cryptocurrency market. Previous researchers mainly focus on the prediction of the movement for specific cryptocurrency such as the bitcoin(BTC) and then trade according to the prediction. In contrast to the previous work that treats the cryptocurrencies independently, this paper manages a group of cryptocurrencies by analyzing the relative relationship. Specifically, in each time step, we utilize the neural network to predict the rank of the future return of the managed cryptocurrencies and place weights accordingly. By incorporating such cross-sectional information, the proposed methods is shown to profitable based on the backtesting experiments on the real daily cryptocurrency market data from May, 2020 to Nov, 2023. During this 3.5 years, the market experiences the full cycle of bullish, bearish and stagnant market conditions. Despite under such complex market conditions, the proposed method outperforms the existing methods and achieves a Sharpe ratio of 1.01 and annualized return of 64.26%. Additionally, the proposed method is shown to be robust to the increase of transaction fee.
CRAug 12, 2025
Enhance the machine learning algorithm performance in phishing detection with keyword featuresZijiang Yang
Recently, we can observe a significant increase of the phishing attacks in the Internet. In a typical phishing attack, the attacker sets up a malicious website that looks similar to the legitimate website in order to obtain the end-users' information. This may cause the leakage of the sensitive information and the financial loss for the end-users. To avoid such attacks, the early detection of these websites' URLs is vital and necessary. Previous researchers have proposed many machine learning algorithms to distinguish the phishing URLs from the legitimate ones. In this paper, we would like to enhance these machine learning algorithms from the perspective of feature selection. We propose a novel method to incorporate the keyword features with the traditional features. This method is applied on multiple traditional machine learning algorithms and the experimental results have shown this method is useful and effective. On average, this method can reduce the classification error by 30% for the large dataset. Moreover, its enhancement is more significant for the small dataset. In addition, this method extracts the information from the URL and does not rely on the additional information provided by the third-part service. The best result for the machine learning algorithm using our proposed method has achieved the accuracy of 99.68%.
AIJun 15, 2025
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language ModelsPeiyuan Tang, Haojie Xin, Xiaodong Zhang et al.
As Vision-Language Models (VLMs) demonstrate increasing capabilities across real-world applications such as code generation and chatbot assistance, ensuring their safety has become paramount. Unlike traditional Large Language Models (LLMs), VLMs face unique vulnerabilities due to their multimodal nature, allowing adversaries to modify visual or textual inputs to bypass safety guardrails and trigger the generation of harmful content. Through systematic analysis of VLM behavior under attack, we identify a novel phenomenon termed ``delayed safety awareness''. Specifically, we observe that safety-aligned VLMs may initially be compromised to produce harmful content, but eventually recognize the associated risks and attempt to self-correct. This pattern suggests that VLMs retain their underlying safety awareness but experience a temporal delay in their activation. Building on this insight, we hypothesize that VLMs' safety awareness can be proactively reactivated through carefully designed prompts. To this end, we introduce ``The Safety Reminder'', a soft prompt tuning approach that optimizes learnable prompt tokens, which are periodically injected during the text generation process to enhance safety awareness, effectively preventing harmful content generation. Additionally, our safety reminder only activates when harmful content is detected, leaving normal conversations unaffected and preserving the model's performance on benign tasks. Through comprehensive evaluation across three established safety benchmarks and one adversarial attacks, we demonstrate that our approach significantly reduces attack success rates while maintaining model utility, offering a practical solution for deploying safer VLMs in real-world applications.
CVDec 21, 2024
From Histopathology Images to Cell Clouds: Learning Slide Representations with Hierarchical Cell TransformerZijiang Yang, Zhongwei Qiu, Tiancheng Lin et al.
It is clinically crucial and potentially very beneficial to be able to analyze and model directly the spatial distributions of cells in histopathology whole slide images (WSI). However, most existing WSI datasets lack cell-level annotations, owing to the extremely high cost over giga-pixel images. Thus, it remains an open question whether deep learning models can directly and effectively analyze WSIs from the semantic aspect of cell distributions. In this work, we construct a large-scale WSI dataset with more than 5 billion cell-level annotations, termed WSI-Cell5B, and a novel hierarchical Cell Cloud Transformer (CCFormer) to tackle these challenges. WSI-Cell5B is based on 6,998 WSIs of 11 cancers from The Cancer Genome Atlas Program, and all WSIs are annotated per cell by coordinates and types. To the best of our knowledge, WSI-Cell5B is the first WSI-level large-scale dataset integrating cell-level annotations. On the other hand, CCFormer formulates the collection of cells in each WSI as a cell cloud and models cell spatial distribution. Specifically, Neighboring Information Embedding (NIE) is proposed to characterize the distribution of cells within the neighborhood of each cell, and a novel Hierarchical Spatial Perception (HSP) module is proposed to learn the spatial relationship among cells in a bottom-up manner. The clinical analysis indicates that WSI-Cell5B can be used to design clinical evaluation metrics based on counting cells that effectively assess the survival risk of patients. Extensive experiments on survival prediction and cancer staging show that learning from cell spatial distribution alone can already achieve state-of-the-art (SOTA) performance, i.e., CCFormer strongly outperforms other competing methods.
SDMar 31, 2022
A Temporal-oriented Broadcast ResNet for COVID-19 DetectionXin Jing, Shuo Liu, Emilia Parada-Cabaleiro et al.
Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size. Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet~(TorNet), constitutes of a broadcasting learning block, i.e. the Alternating Broadcast (AB) Block, which contains several Broadcast Residual Blocks (BC ResBlocks) and a convolution layer. With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks~(RNNs), typically used to model temporal information. TorNet achieves 72.2% Unweighted Average Recall (UAR) on the INTERPSEECH 2021 Computational Paralinguistics Challenge COVID-19 cough Sub-Challenge, by this showing competitive results with a higher computational efficiency than other state-of-the-art alternatives.
SDMar 29, 2022
An Overview & Analysis of Sequence-to-Sequence Emotional Voice ConversionZijiang Yang, Xin Jing, Andreas Triantafyllopoulos et al.
Emotional voice conversion (EVC) focuses on converting a speech utterance from a source to a target emotion; it can thus be a key enabling technology for human-computer interaction applications and beyond. However, EVC remains an unsolved research problem with several challenges. In particular, as speech rate and rhythm are two key factors of emotional conversion, models have to generate output sequences of differing length. Sequence-to-sequence modelling is recently emerging as a competitive paradigm for models that can overcome those challenges. In an attempt to stimulate further research in this promising new direction, recent sequence-to-sequence EVC papers were systematically investigated and reviewed from six perspectives: their motivation, training strategies, model architectures, datasets, model inputs, and evaluation methods. This information is organised to provide the research community with an easily digestible overview of the current state-of-the-art. Finally, we discuss existing challenges of sequence-to-sequence EVC.
SEMar 16, 2021
From Innovations to Prospects: What Is Hidden Behind Cryptocurrencies?Ang Jia, Ming Fan, Xi Xu et al.
The great influence of Bitcoin has promoted the rapid development of blockchain-based digital currencies, especially the altcoins, since 2013. However, most altcoins share similar source codes, resulting in concerns about code innovations. In this paper, an empirical study on existing altcoins is carried out to offer a thorough understanding of various aspects associated with altcoin innovations. Firstly, we construct the dataset of altcoins, including source code repositories, GitHub fork relations, and market capitalizations (cap). Then, we analyze the altcoin innovations from the perspective of source code similarities. The results demonstrate that more than 85% of altcoin repositories present high code similarities. Next, a temporal clustering algorithm is proposed to mine the inheritance relationship among various altcoins. The family pedigrees of altcoin are constructed, in which the altcoin presents similar evolution features as biology, such as power-law in family size, variety in family evolution, etc. Finally, we investigate the correlation between code innovations and market capitalization. Although we fail to predict the price of altcoins based on their code similarities, the results show that altcoins with higher innovations reflect better market prospects.
LGJan 26, 2021
A General Framework Combining Generative Adversarial Networks and Mixture Density Networks for Inverse Modeling in Microstructural Materials DesignZijiang Yang, Dipendra Jha, Arindam Paul et al.
Microstructural materials design is one of the most important applications of inverse modeling in materials science. Generally speaking, there are two broad modeling paradigms in scientific applications: forward and inverse. While the forward modeling estimates the observations based on known parameters, the inverse modeling attempts to infer the parameters given the observations. Inverse problems are usually more critical as well as difficult in scientific applications as they seek to explore the parameters that cannot be directly observed. Inverse problems are used extensively in various scientific fields, such as geophysics, healthcare and materials science. However, it is challenging to solve inverse problems, because they usually need to learn a one-to-many non-linear mapping, and also require significant computing time, especially for high-dimensional parameter space. Further, inverse problems become even more difficult to solve when the dimension of input (i.e. observation) is much lower than that of output (i.e. parameters). In this work, we propose a framework consisting of generative adversarial networks and mixture density networks for inverse modeling, and it is evaluated on a materials science dataset for microstructural materials design. Compared with baseline methods, the results demonstrate that the proposed framework can overcome the above-mentioned challenges and produce multiple promising solutions in an efficient manner.
CVNov 11, 2020
GRCNN: Graph Recognition Convolutional Neural Network for Synthesizing Programs from Flow ChartsLin Cheng, Zijiang Yang
Program synthesis is the task to automatically generate programs based on user specification. In this paper, we present a framework that synthesizes programs from flow charts that serve as accurate and intuitive specifications. In order doing so, we propose a deep neural network called GRCNN that recognizes graph structure from its image. GRCNN is trained end-to-end, which can predict edge and node information of the flow chart simultaneously. Experiments show that the accuracy rate to synthesize a program is 66.4%, and the accuracy rates to recognize edge and nodes are 94.1% and 67.9%, respectively. On average, it takes about 60 milliseconds to synthesize a program.
ASApr 30, 2020
An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and AnxietyJing Han, Kun Qian, Meishu Song et al.
The COVID-19 outbreak was announced as a global pandemic by the World Health Organisation in March 2020 and has affected a growing number of people in the past few weeks. In this context, advanced artificial intelligence techniques are brought to the fore in responding to fight against and reduce the impact of this global health crisis. In this study, we focus on developing some potential use-cases of intelligent speech analysis for COVID-19 diagnosed patients. In particular, by analysing speech recordings from these patients, we construct audio-only-based models to automatically categorise the health state of patients from four aspects, including the severity of illness, sleep quality, fatigue, and anxiety. For this purpose, two established acoustic feature sets and support vector machines are utilised. Our experiments show that an average accuracy of .69 obtained estimating the severity of illness, which is derived from the number of days in hospitalisation. We hope that this study can foster an extremely fast, low-cost, and convenient way to automatically detect the COVID-19 disease.
CRApr 22, 2020
Scalable Attack on Graph Data by Injecting Vicious NodesJihong Wang, Minnan Luo, Fnu Suya et al.
Recent studies have shown that graph convolution networks (GCNs) are vulnerable to carefully designed attacks, which aim to cause misclassification of a specific node on the graph with unnoticeable perturbations. However, a vast majority of existing works cannot handle large-scale graphs because of their high time complexity. Additionally, existing works mainly focus on manipulating existing nodes on the graph, while in practice, attackers usually do not have the privilege to modify information of existing nodes. In this paper, we develop a more scalable framework named Approximate Fast Gradient Sign Method (AFGSM) which considers a more practical attack scenario where adversaries can only inject new vicious nodes to the graph while having no control over the original graph. Methodologically, we provide an approximation strategy to linearize the model we attack and then derive an approximate closed-from solution with a lower time cost. To have a fair comparison with existing attack methods that manipulate the original graph, we adapt them to the new attack scenario by injecting vicious nodes. Empirical experimental results show that our proposed attack method can significantly reduce the classification accuracy of GCNs and is much faster than existing methods without jeopardizing the attack performance.
LGJul 21, 2019
Characterizing Attacks on Deep Reinforcement LearningXinlei Pan, Chaowei Xiao, Warren He et al.
Recent studies show that Deep Reinforcement Learning (DRL) models are vulnerable to adversarial attacks, which attack DRL models by adding small perturbations to the observations. However, some attacks assume full availability of the victim model, and some require a huge amount of computation, making them less feasible for real world applications. In this work, we make further explorations of the vulnerabilities of DRL by studying other aspects of attacks on DRL using realistic and efficient attacks. First, we adapt and propose efficient black-box attacks when we do not have access to DRL model parameters. Second, to address the high computational demands of existing attacks, we introduce efficient online sequential attacks that exploit temporal consistency across consecutive steps. Third, we explore the possibility of an attacker perturbing other aspects in the DRL setting, such as the environment dynamics. Finally, to account for imperfections in how an attacker would inject perturbations in the physical world, we devise a method for generating a robust physical perturbations to be printed. The attack is evaluated on a real-world robot under various conditions. We conduct extensive experiments both in simulation such as Atari games, robotics and autonomous driving, and on real-world robotics, to compare the effectiveness of the proposed attacks with baseline approaches. To the best of our knowledge, we are the first to apply adversarial attacks on DRL systems to physical robots.
COMP-PHJul 7, 2019
IRNet: A General Purpose Deep Residual Regression Framework for Materials DiscoveryDipendra Jha, Logan Ward, Zijiang Yang et al.
Materials discovery is crucial for making scientific advances in many domains. Collections of data from experiments and first-principle computations have spurred interest in applying machine learning methods to create predictive models capable of mapping from composition and crystal structures to materials properties. Generally, these are regression problems with the input being a 1D vector composed of numerical attributes representing the material composition and/or crystal structure. While neural networks consisting of fully connected layers have been applied to such problems, their performance often suffers from the vanishing gradient problem when network depth is increased. In this paper, we study and propose design principles for building deep regression networks composed of fully connected layers with numerical vectors as input. We introduce a novel deep regression network with individual residual learning, IRNet, that places shortcut connections after each layer so that each layer learns the residual mapping between its output and input. We use the problem of learning properties of inorganic materials from numerical attributes derived from material composition and/or crystal structure to compare IRNet's performance against that of other machine learning techniques. Using multiple datasets from the Open Quantum Materials Database (OQMD) and Materials Project for training and evaluation, we show that IRNet provides significantly better prediction performance than the state-of-the-art machine learning approaches currently used by domain scientists. We also show that IRNet's use of individual residual learning leads to better convergence during the training phase than when shortcut connections are between multi-layer stacks while maintaining the same number of parameters.
CRAug 2, 2018
sCompile: Critical Path Identification and Analysis for Smart ContractsJialiang Chang, Bo Gao, Hao Xiao et al.
Ethereum smart contracts are an innovation built on top of the blockchain technology, which provides a platform for automatically executing contracts in an anonymous, distributed, and trusted way. The problem is magnified by the fact that smart contracts, unlike ordinary programs, cannot be patched easily once deployed. It is important for smart contracts to be checked against potential vulnerabilities. In this work, we propose an alternative approach to automatically identify critical program paths (with multiple function calls including inter-contract function calls) in a smart contract, rank the paths according to their criticalness, discard them if they are infeasible or otherwise present them with user friendly warnings for user inspection. We identify paths which involve monetary transaction as critical paths, and prioritize those which potentially violate important properties. For scalability, symbolic execution techniques are only applied to top ranked critical paths. Our approach has been implemented in a tool called sCompile, which has been applied to 36,099 smart contracts. The experiment results show that sCompile is efficient, i.e., 5 seconds on average for one smart contract. Furthermore, we show that many known vulnerabilities can be captured if user inspects as few as 10 program paths generated by sCompile. Lastly, sCompile discovered 224 unknown vulnerabilities with a false positive rate of 15.4% before user inspection.
SEMar 2, 2017
What Causes My Test Alarm? Automatic Cause Analysis for Test Alarms in System and Integration TestingHe Jiang, Xiaochen Li, Zijiang Yang et al.
Driven by new software development processes and testing in clouds, system and integration testing nowadays tends to produce enormous number of alarms. Such test alarms lay an almost unbearable burden on software testing engineers who have to manually analyze the causes of these alarms. The causes are critical because they decide which stakeholders are responsible to fix the bugs detected during the testing. In this paper, we present a novel approach that aims to relieve the burden by automating the procedure. Our approach, called Cause Analysis Model, exploits information retrieval techniques to efficiently infer test alarm causes based on test logs. We have developed a prototype and evaluated our tool on two industrial datasets with more than 14,000 test alarms. Experiments on the two datasets show that our tool achieves an accuracy of 58.3% and 65.8%, respectively, which outperforms the baseline algorithms by up to 13.3%. Our algorithm is also extremely efficient, spending about 0.1s per cause analysis. Due to the attractive experimental results, our industrial partner, a leading information and communication technology company in the world, has deployed the tool and it achieves an average accuracy of 72% after two months of running, nearly three times more accurate than a previous strategy based on regular expressions.