Zhao Liu

AI
h-index17
23papers
1,005citations
Novelty59%
AI Score60

23 Papers

IRJun 4
OneReason Technical Report

OneRec Team, Biao Yang, Boyang Ding et al.

Generative recommendation models in the OneRec family have been widely deployed in many real-world services, such as short-video, live-streaming, advertising, and e-commerce. However, these generative models can only benefit from the scaling advantage, while their reasoning ability is hard to activate, since we cannot construct meaningful Chain-of-Thought (CoT) sequences consisting of itemic tokens only. Inspired by the success of the reasoning-style ``think before answer'' paradigm in the LLM field, we conduct preliminary studies (i.e., OneRec-Think, OpenOneRec) to explore reasoning capability in generative recommendation. Nevertheless, we notice an unexpected phenomenon: the thinking mode does not show advantages over the non-thinking mode. Drawing insights from recent findings on CoT robustness in multi-modal language models, we argue that effective reasoning in recommendation rests on two factors: perception, the ability to ground itemic tokens in their underlying language semantics, and cognition, the ability to reorganize a user's behavior sequence into coherent latent interest points. We therefore propose OneReason, which includes: (1) strong itemic token perception in pre-training, (2) a three-level cognition-enhanced CoT format for recommendation tasks in SFT, and (3) a specialize-then-unify training recipe in RL to enhance the thinking ability.

IVJun 24, 2022
Segmentation-free PVC for Cardiac SPECT using a Densely-connected Multi-dimensional Dynamic Network

Huidong Xie, Zhao Liu, Luyao Shi et al.

In nuclear imaging, limited resolution causes partial volume effects (PVEs) that affect image sharpness and quantitative accuracy. Partial volume correction (PVC) methods incorporating high-resolution anatomical information from CT or MRI have been demonstrated to be effective. However, such anatomical-guided methods typically require tedious image registration and segmentation steps. Accurately segmented organ templates are also hard to obtain, particularly in cardiac SPECT imaging, due to the lack of hybrid SPECT/CT scanners with high-end CT and associated motion artifacts. Slight mis-registration/mis-segmentation would result in severe degradation in image quality after PVC. In this work, we develop a deep-learning-based method for fast cardiac SPECT PVC without anatomical information and associated organ segmentation. The proposed network involves a densely-connected multi-dimensional dynamic mechanism, allowing the convolutional kernels to be adapted based on the input images, even after the network is fully trained. Intramyocardial blood volume (IMBV) is introduced as an additional clinical-relevant loss function for network optimization. The proposed network demonstrated promising performance on 28 canine studies acquired on a GE Discovery NM/CT 570c dedicated cardiac SPECT scanner with a 64-slice CT using Technetium-99m-labeled red blood cells. This work showed that the proposed network with densely-connected dynamic mechanism produced superior results compared with the same network without such mechanism. Results also showed that the proposed network without anatomical information could produce images with statistically comparable IMBV measurements to the images generated by anatomical-guided PVC methods, which could be helpful in clinical translation.

IRMay 18Code
DADF: A Distribution-Aware Debiasing Framework for Watch-Time Regression in Recommender Systems

Yiqing Yang, Xinlong Zhao, Zhao Liu et al.

Watch-time prediction is a central regression task in short-video recommender systems, where labels are highly long-tailed and residual errors vary systematically across observed watch-time regions. In practice, a model may appear globally calibrated while still overestimating short views and underestimating long views, because opposite errors cancel out in aggregate. Existing methods mainly improve the first-stage watch-time predictor, but often leave such residual distributional bias insufficiently corrected. We propose DADF, a distribution-aware debiasing framework for watch-time regression. Instead of replacing a deployed predictor, DADF performs second-stage multiplicative residual correction on top of it. DADF combines three complementary designs: a dynamic distribution-aware transformation for stabilizing long-tailed correction targets, a debias-factor-aware module for modeling heterogeneous residual patterns using inference-time observable factors, especially video duration, and a multi-label-aware module that exploits auxiliary prediction signals from engagement heads. We evaluate DADF on public short-video benchmarks and a large-scale industrial ranking system. DADF consistently improves both pointwise accuracy and ranking quality across datasets and backbones. In the industrial setting, it achieves a 1.88 percentage-point WUAUC gain over the production baseline, reduces MAE by 12.57%, and yields a statistically significant 0.347% lift in average time spent per device in online A/B testing. These results demonstrate that DADF effectively mitigates local calibration bias and provides a practical plug-in solution for debiasing long-tailed continuous targets. The source code is available at https://github.com/liuzhao09/DADF.

CRMar 10Code
Reasoning-Oriented Programming: Chaining Semantic Gadgets to Jailbreak Large Vision Language Models

Quanchen Zou, Moyang Chen, Zonghao Ying et al.

Large Vision-Language Models (LVLMs) undergo safety alignment to suppress harmful content. However, current defenses predominantly target explicit malicious patterns in the input representation, often overlooking the vulnerabilities inherent in compositional reasoning. In this paper, we identify a systemic flaw where LVLMs can be induced to synthesize harmful logic from benign premises. We formalize this attack paradigm as \textit{Reasoning-Oriented Programming}, drawing a structural analogy to Return-Oriented Programming in systems security. Just as ROP circumvents memory protections by chaining benign instruction sequences, our approach exploits the model's instruction-following capability to orchestrate a semantic collision of orthogonal benign inputs. We instantiate this paradigm via \tool{}, an automated framework that optimizes for \textit{semantic orthogonality} and \textit{spatial isolation}. By generating visual gadgets that are semantically decoupled from the harmful intent and arranging them to prevent premature feature fusion, \tool{} forces the malicious logic to emerge only during the late-stage reasoning process. This effectively bypasses perception-level alignment. We evaluate \tool{} on SafeBench and MM-SafetyBench across 7 state-of-the-art 0.LVLMs, including GPT-4o and Claude 3.7 Sonnet. Our results demonstrate that \tool{} consistently circumvents safety alignment, outperforming the strongest existing baseline by an average of 4.67\% on open-source models and 9.50\% on commercial models.

CVJul 27, 2022
Mid-level Representation Enhancement and Graph Embedded Uncertainty Suppressing for Facial Expression Recognition

Jie Lei, Zhao Liu, Zeyu Zou et al.

Facial expression is an essential factor in conveying human emotional states and intentions. Although remarkable advancement has been made in facial expression recognition (FER) task, challenges due to large variations of expression patterns and unavoidable data uncertainties still remain. In this paper, we propose mid-level representation enhancement (MRE) and graph embedded uncertainty suppressing (GUS) addressing these issues. On one hand, MRE is introduced to avoid expression representation learning being dominated by a limited number of highly discriminative patterns. On the other hand, GUS is introduced to suppress the feature ambiguity in the representation space. The proposed method not only has stronger generalization capability to handle different variations of expression patterns but also more robustness to capture expression representations. Experimental evaluation on Aff-Wild2 have verified the effectiveness of the proposed method.

SEJul 11, 2023
ConFL: Constraint-guided Fuzzing for Machine Learning Framework

Zhao Liu, Quanchen Zou, Tian Yu et al.

As machine learning gains prominence in various sectors of society for automated decision-making, concerns have risen regarding potential vulnerabilities in machine learning (ML) frameworks. Nevertheless, testing these frameworks is a daunting task due to their intricate implementation. Previous research on fuzzing ML frameworks has struggled to effectively extract input constraints and generate valid inputs, leading to extended fuzzing durations for deep execution or revealing the target crash. In this paper, we propose ConFL, a constraint-guided fuzzer for ML frameworks. ConFL automatically extracting constraints from kernel codes without the need for any prior knowledge. Guided by the constraints, ConFL is able to generate valid inputs that can pass the verification and explore deeper paths of kernel codes. In addition, we design a grouping technique to boost the fuzzing efficiency. To demonstrate the effectiveness of ConFL, we evaluated its performance mainly on Tensorflow. We find that ConFL is able to cover more code lines, and generate more valid inputs than state-of-the-art (SOTA) fuzzers. More importantly, ConFL found 84 previously unknown vulnerabilities in different versions of Tensorflow, all of which were assigned with new CVE ids, of which 3 were critical-severity and 13 were high-severity. We also extended ConFL to test PyTorch and Paddle, 7 vulnerabilities are found to date.

IROct 21, 2025Code
DiffGRM: Diffusion-based Generative Recommendation Model

Zhao Liu, Yichen Zhu, Yiqing Yang et al.

Generative recommendation (GR) is an emerging paradigm that represents each item via a tokenizer as an n-digit semantic ID (SID) and predicts the next item by autoregressively generating its SID conditioned on the user's history. However, two structural properties of SIDs make ARMs ill-suited. First, intra-item consistency: the n digits jointly specify one item, yet the left-to-right causality trains each digit only under its prefix and blocks bidirectional cross-digit evidence, collapsing supervision to a single causal path. Second, inter-digit heterogeneity: digits differ in semantic granularity and predictability, while the uniform next-token objective assigns equal weight to all digits, overtraining easy digits and undertraining hard digits. To address these two issues, we propose DiffGRM, a diffusion-based GR model that replaces the autoregressive decoder with a masked discrete diffusion model (MDM), thereby enabling bidirectional context and any-order parallel generation of SID digits for recommendation. Specifically, we tailor DiffGRM in three aspects: (1) tokenization with Parallel Semantic Encoding (PSE) to decouple digits and balance per-digit information; (2) training with On-policy Coherent Noising (OCN) that prioritizes uncertain digits via coherent masking to concentrate supervision on high-value signals; and (3) inference with Confidence-guided Parallel Denoising (CPD) that fills higher-confidence digits first and generates diverse Top-K candidates. Experiments show consistent gains over strong generative and discriminative recommendation baselines on multiple datasets, improving NDCG@10 by 6.9%-15.5%. Code is available at https://github.com/liuzhao09/DiffGRM.

IVApr 29, 2025Code
LymphAtlas- A Unified Multimodal Lymphoma Imaging Repository Delivering AI-Enhanced Diagnostic Insight

Jiajun Ding, Beiyao Zhu, Xiaosheng Liu et al.

This study integrates PET metabolic information with CT anatomical structures to establish a 3D multimodal segmentation dataset for lymphoma based on whole-body FDG PET/CT examinations, which bridges the gap of the lack of standardised multimodal segmentation datasets in the field of haematological malignancies. We retrospectively collected 483 examination datasets acquired between March 2011 and May 2024, involving 220 patients (106 non-Hodgkin lymphoma, 42 Hodgkin lymphoma); all data underwent ethical review and were rigorously de-identified. Complete 3D structural information was preserved during data acquisition, preprocessing and annotation, and a high-quality dataset was constructed based on the nnUNet format. By systematic technical validation and evaluation of the preprocessing process, annotation quality and automatic segmentation algorithm, the deep learning model trained based on this dataset is verified to achieve accurate segmentation of lymphoma lesions in PET/CT images with high accuracy, good robustness and reproducibility, which proves the applicability and stability of this dataset in accurate segmentation and quantitative analysis. The deep fusion of PET/CT images achieved with this dataset not only significantly improves the accurate portrayal of the morphology, location and metabolic features of tumour lesions, but also provides solid data support for early diagnosis, clinical staging and personalized treatment, and promotes the development of automated image segmentation and precision medicine based on deep learning. The dataset and related resources are available at https://github.com/SuperD0122/LymphAtlas-.

CRMay 8
Demystifying and Detecting Agentic Workflow Injection Vulnerabilities in GitHub Actions

Shenao Wang, Xinyi Hou, Zhao Liu et al.

GitHub Actions is increasingly used to deploy LLM-based agents for repository-centric tasks such as issue triage, pull-request review, code modification, and release assistance. These agentic workflows extend traditional CI/CD automation with agentic capabilities but also create a new injection surface. In this paper, we introduce Agentic Workflow Injection (AWI), a workflow-level injection flaw where untrusted GitHub event context, such as issue bodies, pull-request descriptions, or comments, is incorporated into agent prompts or agent-consumed inputs and converted into attacker-influenced behavior through agent tools or downstream workflow logic. We identify two core AWI patterns: Prompt-to-Agent (P2A), where untrusted content reaches an agent prompt boundary, and Prompt-to-Script (P2S), where attacker influence propagates through model- or agent-derived outputs into later scripts. We present the first systematic study of AWI in GitHub Actions. We characterize 1,033 real-world AI-assisted actions and extract AWI-specific taint specifications, including prompt boundaries, derived outputs, agentic capabilities, and access-control interfaces. Based on these specifications, we design TaintAWI, a taint-analysis tool that tracks flows from untrusted event context to agent prompt inputs and security-sensitive workflow sinks. Applying TaintAWI to 13,392 real-world agentic workflows from 10,792 repositories, we report 519 potential AWI vulnerabilities, of which 496 are confirmed exploitable under our threat model, yielding a precision of 95.6%. Among them, 343 are previously unknown zero-day vulnerabilities. We prioritized disclosure for 187 zero-day cases, received 26 maintainer responses, and 24 cases have been accepted or fixed at the time of writing.

CRJul 29, 2025
PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking

Quanchen Zou, Zonghao Ying, Moyang Chen et al.

The increasing sophistication of large vision-language models (LVLMs) has been accompanied by advances in safety alignment mechanisms designed to prevent harmful content generation. However, these defenses remain vulnerable to sophisticated adversarial attacks. Existing jailbreak methods typically rely on direct and semantically explicit prompts, overlooking subtle vulnerabilities in how LVLMs compose information over multiple reasoning steps. In this paper, we propose a novel and effective jailbreak framework inspired by Return-Oriented Programming (ROP) techniques from software security. Our approach decomposes a harmful instruction into a sequence of individually benign visual gadgets. A carefully engineered textual prompt directs the sequence of inputs, prompting the model to integrate the benign visual gadgets through its reasoning process to produce a coherent and harmful output. This makes the malicious intent emergent and difficult to detect from any single component. We validate our method through extensive experiments on established benchmarks including SafeBench and MM-SafetyBench, targeting popular LVLMs. Results show that our approach consistently and substantially outperforms existing baselines on state-of-the-art models, achieving near-perfect attack success rates (over 0.90 on SafeBench) and improving ASR by up to 0.39. Our findings reveal a critical and underexplored vulnerability that exploits the compositional reasoning abilities of LVLMs, highlighting the urgent need for defenses that secure the entire reasoning process.

CLDec 9, 2024
Evaluating and Mitigating Social Bias for Large Language Models in Open-ended Settings

Zhao Liu, Tian Xie, Xueru Zhang

Current social bias benchmarks for Large Language Models (LLMs) primarily rely on predefined question formats like multiple-choice, limiting their ability to reflect the complexity and open-ended nature of real-world interactions. To close this gap, we extend an existing dataset BBQ (Parrish et al., 2022) to Open-BBQ, a comprehensive framework to evaluate the social bias of LLMs in open-ended settings by incorporating two additional question categories: fill-in-the-blank and short-answer. Since our new Open-BBQ dataset contains a lot of open-ended responses like sentences and paragraphs, we developed an evaluation process to detect biases from open-ended content by labeling sentences and paragraphs. In addition to this, we also found that existing debiasing methods, such as self-debiasing (Gallegos et al., 2024), have over-correction issues, which make the original correct answers incorrect. In order to solve this issue, we propose Composite Prompting, an In-context Learning (ICL) method combining structured examples with explicit chain-of-thought reasoning to form a unified instruction template for LLMs to explicitly identify content that needs debiasing. Experimental results show that the proposed method significantly reduces the bias for both GPT-3.5 and GPT-4o while maintaining high accuracy.

MAMar 13
Collaborative Multi-Agent Optimization for Personalized Memory System

Wenyu Mao, Haoyang Liu, Zhao Liu et al.

Memory systems are crucial to personalized LLMs by mitigating the context window limitation in capturing long-term user-LLM conversations. Typically, such systems leverage multiple agents to handle multi-granular memory construction and personalized memory retrieval tasks. To optimize the system, existing methods focus on specializing agents on their local tasks independently via prompt engineering or fine-tuning. However, they overlook cross-agent collaboration, where independent optimization on local agents hardly guarantees the global system performance. To address this issue, we propose a Collaborative Reinforcement Learning Framework for Multi-Agent Memory Systems (CoMAM), jointly optimizing local agents to facilitate collaboration. Specifically, we regularize agents' execution as a sequential Markov decision process (MDP) to embed inter-agent dependencies into the state transition, yielding both local task rewards (e.g., information coverage for memory construction) and global rewards (i.e., query-answer accuracy). Then, we quantify each agent's contribution via group-level ranking consistency between local and global rewards, treating them as adaptive weights to assign global credit and integrate local-global rewards. Each agent is optimized by these integrated rewards, aligning local improvements with the global performance. Experiments show CoMAM outperforms leading memory systems, validating the efficacy of our proposed collaborative reinforcement learning for joint optimization.

CROct 16, 2025
Sequential Comics for Jailbreaking Multimodal Large Language Models via Structured Visual Storytelling

Deyue Zhang, Dongdong Yang, Junjie Mu et al.

Multimodal large language models (MLLMs) exhibit remarkable capabilities but remain susceptible to jailbreak attacks exploiting cross-modal vulnerabilities. In this work, we introduce a novel method that leverages sequential comic-style visual narratives to circumvent safety alignments in state-of-the-art MLLMs. Our method decomposes malicious queries into visually innocuous storytelling elements using an auxiliary LLM, generates corresponding image sequences through diffusion models, and exploits the models' reliance on narrative coherence to elicit harmful outputs. Extensive experiments on harmful textual queries from established safety benchmarks show that our approach achieves an average attack success rate of 83.5\%, surpassing prior state-of-the-art by 46\%. Compared with existing visual jailbreak methods, our sequential narrative strategy demonstrates superior effectiveness across diverse categories of harmful content. We further analyze attack patterns, uncover key vulnerability factors in multimodal safety mechanisms, and evaluate the limitations of current defense strategies against narrative-driven attacks, revealing significant gaps in existing protections.

IRMay 18, 2025
Geography-Aware Large Language Models for Next POI Recommendation

Zhao Liu, Wei Liu, Huajie Zhu et al.

The next Point-of-Interest (POI) recommendation task aims to predict users' next destinations based on their historical movement data and plays a key role in location-based services and personalized applications. Accurate next POI recommendation depends on effectively modeling geographic information and POI transition relations, which are crucial for capturing spatial dependencies and user movement patterns. While Large Language Models (LLMs) exhibit strong capabilities in semantic understanding and contextual reasoning, applying them to spatial tasks like next POI recommendation remains challenging. First, the infrequent nature of specific GPS coordinates makes it difficult for LLMs to model precise spatial contexts. Second, the lack of knowledge about POI transitions limits their ability to capture potential POI-POI relationships. To address these issues, we propose GA-LLM (Geography-Aware Large Language Model), a novel framework that enhances LLMs with two specialized components. The Geographic Coordinate Injection Module (GCIM) transforms GPS coordinates into spatial representations using hierarchical and Fourier-based positional encoding, enabling the model to understand geographic features from multiple perspectives. The POI Alignment Module (PAM) incorporates POI transition relations into the LLM's semantic space, allowing it to infer global POI relationships and generalize to unseen POIs. Experiments on three real-world datasets demonstrate the state-of-the-art performance of GA-LLM.

AIMay 17, 2024
Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: A Systematic Scoping Review

Hongyi Yang, Fangyuan Chang, Dian Zhu et al.

This systematic review assessed the current state and future prospects of artificial intelligence (AI) in schizophrenia rehabilitation management. We reviewed 61 studies on AI-related data types, feature engineering methods, algorithmic models, and evaluation metrics published from 2012-2024. The review categorizes AI applications into the following key application areas: symptom monitoring, medication management, risk management, functional training, and psychosocial support. Findings indicate that supervised machine learning techniques, particularly for symptom monitoring and relapse risk management, remain the predominant approaches, effectively leveraging structured data while incorporating interpretable algorithms. This study underscores the potential of AI in transforming long-term management strategies for schizophrenia, offering valuable insights into improving the quality of life of patients. Future research should focus on expanding data sources through multimodal data integration, exploring deep learning models, and integrating AI-driven interventions into training tasks to fully capitalize on AI's potential in schizophrenia rehabilitation.

AIAug 11, 2021
Frequency-based tension assessment of an inclined cable with complex boundary conditions using the PSO algorithm

Wen-ming Zhang, Zhi-wei Wang, Dan-dian Feng et al.

The frequency-based method is the most commonly used method for measuring cable tension. However, the calculation formulas for the conventional frequency-based method are generally based on the ideally hinged or fixed boundary conditions without a comprehensive consideration of the inclination angle, sag-extensibility, and flexural stiffness of cables, leading to a significant error in cable tension identification. This study aimed to propose a frequency-based method of cable tension identification considering the complex boundary conditions at the two ends of cables using the particle swarm optimization (PSO) algorithm. First, the refined stay cable model was established considering the inclination angle, flexural stiffness, and sag-extensibility, as well as the rotational constraint stiffness and lateral support stiffness for the unknown boundaries of cables. The vibration mode equation of the stay cable model was discretized and solved using the finite difference method. Then, a multiparameter identification method based on the PSO algorithm was proposed. This method was able to identify the tension, flexural stiffness, axial stiffness, boundary rotational constraint stiffness, and boundary lateral support stiffness according to the measured multiorder frequencies in a synchronous manner. The feasibility and accuracy of this method were validated through numerical cases. Finally, the proposed approach was applied to the tension identification of the anchor span strands of a suspension bridge (Jindong Bridge) in China. The results of cable tension identification using the proposed method and the existing methods discussed in previous studies were compared with the on-site pressure ring measurement results. The comparison showed that the proposed approach had a high accuracy in cable tension identification.

LGJun 19, 2021
Boosting Offline Reinforcement Learning with Residual Generative Modeling

Hua Wei, Deheng Ye, Zhao Liu et al.

Offline reinforcement learning (RL) tries to learn the near-optimal policy with recorded offline experience without online exploration. Current offline RL research includes: 1) generative modeling, i.e., approximating a policy using fixed data; and 2) learning the state-action value function. While most research focuses on the state-action function part through reducing the bootstrapping error in value function approximation induced by the distribution shift of training data, the effects of error propagation in generative modeling have been neglected. In this paper, we analyze the error in generative modeling. We propose AQL (action-conditioned Q-learning), a residual generative model to reduce policy approximation error for offline RL. We show that our method can learn more accurate policy approximations in different benchmark datasets. In addition, we show that the proposed offline RL method can learn more competitive AI agents in complex control tasks under the multiplayer online battle arena (MOBA) game Honor of Kings.

AINov 25, 2020
Towards Playing Full MOBA Games with Deep Reinforcement Learning

Deheng Ye, Guibin Chen, Wen Zhang et al.

MOBA games, e.g., Honor of Kings, League of Legends, and Dota 2, pose grand challenges to AI systems such as multi-agent, enormous state-action space, complex action control, etc. Developing AI for playing MOBA games has raised much attention accordingly. However, existing work falls short in handling the raw game complexity caused by the explosion of agent combinations, i.e., lineups, when expanding the hero pool in case that OpenAI's Dota AI limits the play to a pool of only 17 heroes. As a result, full MOBA games without restrictions are far from being mastered by any existing AI system. In this paper, we propose a MOBA AI learning paradigm that methodologically enables playing full MOBA games with deep reinforcement learning. Specifically, we develop a combination of novel and existing learning techniques, including curriculum self-play learning, policy distillation, off-policy adaption, multi-head value estimation, and Monte-Carlo tree-search, in training and playing a large pool of heroes, meanwhile addressing the scalability issue skillfully. Tested on Honor of Kings, a popular MOBA game, we show how to build superhuman AI agents that can defeat top esports players. The superiority of our AI is demonstrated by the first large-scale performance test of MOBA AI agent in the literature.

CEJun 27, 2020
Deep Generative Modeling for Mechanistic-based Learning and Design of Metamaterial Systems

Liwei Wang, Yu-Chin Chan, Faez Ahmed et al.

Metamaterials are emerging as a new paradigmatic material system to render unprecedented and tailorable properties for a wide variety of engineering applications. However, the inverse design of metamaterial and its multiscale system is challenging due to high-dimensional topological design space, multiple local optima, and high computational cost. To address these hurdles, we propose a novel data-driven metamaterial design framework based on deep generative modeling. A variational autoencoder (VAE) and a regressor for property prediction are simultaneously trained on a large metamaterial database to map complex microstructures into a low-dimensional, continuous, and organized latent space. We show in this study that the latent space of VAE provides a distance metric to measure shape similarity, enable interpolation between microstructures and encode meaningful patterns of variation in geometries and properties. Based on these insights, systematic data-driven methods are proposed for the design of microstructure, graded family, and multiscale system. For microstructure design, the tuning of mechanical properties and complex manipulations of microstructures are easily achieved by simple vector operations in the latent space. The vector operation is further extended to generate metamaterial families with a controlled gradation of mechanical properties by searching on a constructed graph model. For multiscale metamaterial systems design, a diverse set of microstructures can be rapidly generated using VAE for target properties at different locations and then assembled by an efficient graph-based optimization method to ensure compatibility between adjacent microstructures. We demonstrate our framework by designing both functionally graded and heterogeneous metamaterial systems that achieve desired distortion behaviors.

AIDec 20, 2019
Mastering Complex Control in MOBA Games with Deep Reinforcement Learning

Deheng Ye, Zhao Liu, Mingfei Sun et al.

We study the reinforcement learning problem of complex action control in the Multi-player Online Battle Arena (MOBA) 1v1 games. This problem involves far more complicated state and action spaces than those of traditional 1v1 games, such as Go and Atari series, which makes it very difficult to search any policies with human-level performance. In this paper, we present a deep reinforcement learning framework to tackle this problem from the perspectives of both system and algorithm. Our system is of low coupling and high scalability, which enables efficient explorations at large scale. Our algorithm includes several novel strategies, including control dependency decoupling, action mask, target attention, and dual-clip PPO, with which our proposed actor-critic network can be effectively trained in our system. Tested on the MOBA game Honor of Kings, our AI agent, called Tencent Solo, can defeat top professional human players in full 1v1 games.

CEMay 13, 2019
Similarity Grouping-Guided Neural Network Modeling for Maritime Time Series Prediction

Yan Li, Ryan Wen Liu, Zhao Liu et al.

Reliable and accurate prediction of time series plays a crucial role in maritime industry, such as economic investment, transportation planning, port planning and design, etc. The dynamic growth of maritime time series has the predominantly complex, nonlinear and non-stationary properties. To guarantee high-quality prediction performance, we propose to first adopt the empirical mode decomposition (EMD) and ensemble EMD (EEMD) methods to decompose the original time series into high- and low-frequency components. The low-frequency components can be easily predicted directly through traditional neural network (NN) methods. It is more difficult to predict high-frequency components due to their properties of weak mathematical regularity. To take advantage of the inherent self-similarities within high-frequency components, these components will be divided into several continuous small (overlapping) segments. The grouped segments with high similarities are then selected to form more proper training datasets for traditional NN methods. This regrouping strategy can assist in enhancing the prediction accuracy of high-frequency components. The final prediction result is obtained by integrating the predicted high- and low-frequency components. Our proposed three-step prediction frameworks benefit from the time series decomposition and similar segments grouping. Experiments on both port cargo throughput and vessel traffic flow have illustrated its superior performance in terms of prediction accuracy and robustness.

CVJul 12, 2018
Learning-based Regularization for Cardiac Strain Analysis with Ability for Domain Adaptation

Allen Lu, Nripesh Parajuli, Maria Zontak et al.

Reliable motion estimation and strain analysis using 3D+time echocardiography (4DE) for localization and characterization of myocardial injury is valuable for early detection and targeted interventions. However, motion estimation is difficult due to the low-SNR that stems from the inherent image properties of 4DE, and intelligent regularization is critical for producing reliable motion estimates. In this work, we incorporated the notion of domain adaptation into a supervised neural network regularization framework. We first propose an unsupervised autoencoder network with biomechanical constraints for learning a latent representation that is shown to have more physiologically plausible displacements. We extended this framework to include a supervised loss term on synthetic data and showed the effects of biomechanical constraints on the network's ability for domain adaptation. We validated both the autoencoder and semi-supervised regularization method on in vivo data with implanted sonomicrometers. Finally, we showed the ability of our semi-supervised learning regularization approach to identify infarcted regions using estimated regional strain maps with good agreement to manually traced infarct regions from postmortem excised hearts.

MMJul 25, 2017
MVP2P: Layer-Dependency-Aware Live MVC Video Streaming over Peer-to-Peer Networks

Zhao Liu, Niall Murray, Brian Lee et al.

Multiview video supports observing a scene from different viewpoints. The Joint Video Team (JVT) developed H.264/MVC to enhance the compression efficiency for multiview video, however, MVC encoded multiview video (MVC video) still requires high bitrates for transmission. This paper investigates live MVC video streaming over Peer-to-Peer (P2P) networks. The goal is to minimize the server bandwidth costs whist ensuring high streaming quality to peers. MVC employs intra-view and inter-view prediction structures, which leads to a complicated layer dependency relationship. As the peers' outbound bandwidth is shared while supplying all the MVC video layers, the bandwidth allocation to one MVC layer affects the available outbound bandwidth of the other layers. To optimise the utilisation of the peers' outbound bandwidth for providing video layers, a maximum flow based model is proposed which considers the MVC video layer dependency and the layer supplying relationship between peers. Based on the model, a layer dependency aware live MVC video streaming method over a BitTorrent-like P2P network is proposed, named MVP2P. The key components of MVP2P include a chunk scheduling strategy and a peer selection strategy for receiving peers, and a bandwidth scheduling algorithm for supplying peers. To evaluate the efficiency of the proposed solution, MVP2P is compared with existing methods considering the constraints of peer bandwidth, peer numbers, view switching rates, and peer churns. The test results show that MVP2P significantly outperforms the existing methods.