CLMay 21, 2025
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-ThoughtTencent Hunyuan Team, Ao Liu, Botong Zhou et al. · tencent-ai
As Large Language Models (LLMs) rapidly advance, we introduce Hunyuan-TurboS, a novel large hybrid Transformer-Mamba Mixture of Experts (MoE) model. It synergistically combines Mamba's long-sequence processing efficiency with Transformer's superior contextual understanding. Hunyuan-TurboS features an adaptive long-short chain-of-thought (CoT) mechanism, dynamically switching between rapid responses for simple queries and deep "thinking" modes for complex problems, optimizing computational resources. Architecturally, this 56B activated (560B total) parameter model employs 128 layers (Mamba2, Attention, FFN) with an innovative AMF/MF block pattern. Faster Mamba2 ensures linear complexity, Grouped-Query Attention minimizes KV cache, and FFNs use an MoE structure. Pre-trained on 16T high-quality tokens, it supports a 256K context length and is the first industry-deployed large-scale Mamba model. Our comprehensive post-training strategy enhances capabilities via Supervised Fine-Tuning (3M instructions), a novel Adaptive Long-short CoT Fusion method, Multi-round Deliberation Learning for iterative improvement, and a two-stage Large-scale Reinforcement Learning process targeting STEM and general instruction-following. Evaluations show strong performance: overall top 7 rank on LMSYS Chatbot Arena with a score of 1356, outperforming leading models like Gemini-2.0-Flash-001 (1352) and o4-mini-2025-04-16 (1345). TurboS also achieves an average of 77.9% across 23 automated benchmarks. Hunyuan-TurboS balances high performance and efficiency, offering substantial capabilities at lower inference costs than many reasoning models, establishing a new paradigm for efficient large-scale pre-trained models.
ITApr 10
A New Class of Geometric Analog Error Correction Codes for Crossbar Based In-Memory ComputingZiyuan Zhu, Changcheng Yuan, Ron M. Roth et al.
Analog error correction codes have been proposed for analog in-memory computing on resistive crossbars, which can accelerate vector-matrix multiplication for machine learning. Unlike traditional communication or storage channels, this setting involves a mixed noise model with small perturbations and outlier errors. A number of analog codes have been proposed for handling a single outlier, and several constructions have also been developed to address multiple outliers. However, the set of available code families remains limited, covering only a narrow range of code lengths and dimensions. In this paper, we study a recently proposed family of geometric codes capable of handling multiple outliers, and develop a geometric analysis that characterizes their m-height profiles.
LGMay 21
Physics-Informed Generative Solver: Bridging Data-Driven Priors and Conservation Laws for Stable Spatiotemporal Field ReconstructionZiyuan Zhu, Keyu Hu, Zhifei Chen et al.
Reconstructing continuous physical fields from sparse measurements is a central inverse problem, but data-driven generative models can produce states that violate governing dynamics. We introduce a physics-informed generative solver that separates stable prior learning from inference-time enforcement of conservation laws. Martingale-Regularized Score Matching regularizes score pretraining with a Score Fokker-Planck constraint, yielding a dynamically stable prior. Physics-Informed Implicit Score Sampling then guides denoising trajectories by gradients of physical residuals, projecting samples toward admissible manifolds without retraining. In acoustics, the method co-generates pressure and particle velocity from sparse sensors, enabling dense virtual arrays that suppress spatial aliasing. The same framework generalizes to real-world ERA5 meteorological fields under extreme sparsity. Together, this work establishes a rigorous and generalizable paradigm for solving high-dimensional inverse problems, bridging the gap between generative artificial intelligence and first-principles science.
CROct 26, 2019
DDM: A Demand-based Dynamic Mitigation for SMT Transient ChannelsYue Zhang, Ziyuan Zhu, Dan Meng
Different from the traditional software vulnerability, the microarchitecture side channel has three characteristics: extensive influence, potent threat, and tough defense. The main reason for the micro-architecture side channel is resource sharing. There are many reasons for resource sharing, one of which is SMT (Simultaneous Multi-Threading) technology. In this paper, we define the SMT Transient Channel, which uses the transient state of shared resources between threads to steal information. To mitigate it, we designed a security demand-based dynamic mitigation (DDM) to Mitigate the SMT transient channels. The DDM writes the processes' security requirements to the CPU register sets, and the operating system calls the HLT instruction to dynamically turn on and off the hyper-threading according to the register values to avoid the side channels caused by execution resource sharing. During the implementation of the scheme, we modified the Linux kernel and used the MSR register groups of Intel processor. The evaluation results show that DDM can effectively protect against the transient side-channel attacks such as PortsMash that rely on SMT, and the performance loss of DDM is less than 8%.