SPJun 4, 2018
A New Wireless Communication Paradigm through Software-controlled MetasurfacesChristos Liaskos, Shuai Nie, Ageliki Tsioliaridou et al.
Electromagnetic waves undergo multiple uncontrollable alterations as they propagate within a wireless environment. Free space path loss, signal absorption, as well as reflections, refractions and diffractions caused by physical objects within the environment highly affect the performance of wireless communications. Currently, such effects are intractable to account for and are treated as probabilistic factors. The paper proposes a radically different approach, enabling deterministic, programmable control over the behavior of the wireless environments. The key-enabler is the so-called HyperSurface tile, a novel class of planar meta-materials which can interact with impinging electromagnetic waves in a controlled manner. The HyperSurface tiles can effectively re-engineer electromagnetic waves, including steering towards any desired direction, full absorption, polarization manipulation and more. Multiple tiles are employed to coat objects such as walls, furniture, overall, any objects in the indoor and outdoor environments. An external software service calculates and deploys the optimal interaction types per tile, to best fit the needs of communicating devices. Evaluation via simulations highlights the potential of the new concept.
ETMay 17, 2018
Realizing Wireless Communication through Software-defined HyperSurface EnvironmentsChristos Liaskos, Shuai Nie, Ageliki Tsioliaridou et al.
Wireless communication environments are unaware of the ongoing data exchange efforts within them. Moreover, their effect on the communication quality is intractable in all but the simplest cases. The present work proposes a new paradigm, where indoor scattering becomes software-defined and, subsequently, optimizable across wide frequency ranges. Moreover, the controlled scattering can surpass natural behavior, exemplary overriding Snell's law, reflecting waves towards any custom angle (including negative ones). Thus, path loss and multi-path fading effects can be controlled and mitigated. The core technology of this new paradigm are metasurfaces, planar artificial structures whose effect on impinging electromagnetic waves is fully defined by their macro-structure. The present study contributes the software-programmable wireless environment model, consisting of several HyperSurface tiles controlled by a central, environment configuration server. HyperSurfaces are a novel class of metasurfaces whose structure and, hence, electromagnetic behavior can be altered and controlled via a software interface. Multiple networked tiles coat indoor objects, allowing fine-grained, customizable reflection, absorption or polarization overall. A central server calculates and deploys the optimal electromagnetic interaction per tile, to the benefit of communicating devices. Realistic simulations using full 3D ray-tracing demonstrate the groundbreaking potential of the proposed approach in 2.4 GHz and 60 GHz frequencies.
CLJan 30
SSL: Sweet Spot Learning for Differentiated Guidance in Agentic OptimizationJinyang Wu, Changpeng Yang, Yuhao Shen et al.
Reinforcement learning with verifiable rewards has emerged as a powerful paradigm for training intelligent agents. However, existing methods typically employ binary rewards that fail to capture quality differences among trajectories achieving identical outcomes, thereby overlooking potential diversity within the solution space. Inspired by the ``sweet spot'' concept in tennis-the racket's core region that produces optimal hitting effects, we introduce \textbf{S}weet \textbf{S}pot \textbf{L}earning (\textbf{SSL}), a novel framework that provides differentiated guidance for agent optimization. SSL follows a simple yet effective principle: progressively amplified, tiered rewards guide policies toward the sweet-spot region of the solution space. This principle naturally adapts across diverse tasks: visual perception tasks leverage distance-tiered modeling to reward proximity, while complex reasoning tasks reward incremental progress toward promising solutions. We theoretically demonstrate that SSL preserves optimal solution ordering and enhances the gradient signal-to-noise ratio, thereby fostering more directed optimization. Extensive experiments across GUI perception, short/long-term planning, and complex reasoning tasks show consistent improvements over strong baselines on 12 benchmarks, achieving up to 2.5X sample efficiency gains and effective cross-task transferability. Our work establishes SSL as a general principle for training capable and robust agents.
CLDec 2, 2025
From Imitation to Discrimination: Toward A Generalized Curriculum Advantage Mechanism Enhancing Cross-Domain Reasoning TasksChangpeng Yang, Jinyang Wu, Yuchen Liu et al.
Reinforcement learning has emerged as a paradigm for post-training large language models, boosting their reasoning capabilities. Such approaches compute an advantage value for each sample, reflecting better or worse performance than expected, thereby yielding both positive and negative signals for training. However, the indiscriminate mixing of the two signals in existing methods, especially from the early stages, may lead to ambiguous guidance and limited gains. To address this issue, we propose **CAPO** (**C**urriculum **A**dvantage **P**olicy **O**ptimization), an adaptive curriculum mechanism based on advantage signals. The proposed mechanism bootstraps imitation learning with positive-only advantage samples to establish robust foundations, and subsequently introduces negative signals to cultivate discriminative capabilities, thereby improving generalization across complex scenarios. Compatible with diverse optimization methods including GRPO, PPO, RLOO, and Reinforce++, our method consistently achieves stable and significant improvements in mathematical reasoning tasks, and further generalizes effectively to multimodal Graphical User Interface (GUI) reasoning scenarios, establishing itself as a versatile and robust optimization framework.
28.0CLApr 12
ProUIE: A Macro-to-Micro Progressive Learning Method for LLM-based Universal Information ExtractionWenda Liu, Zhigang Song, Shuai Nie et al.
LLM-based universal information extraction (UIE) methods often rely on additional information beyond the original training data, which increases training complexity yet often yields limited gains. To address this, we propose ProUIE, a Macro-to-Micro progressive learning approach that improves UIE without introducing any external information. ProUIE consists of three stages: (i) macro-level Complete Modeling (CM), which learns NER, RE, and EE along their intrinsic difficulty order on the full training data to build a unified extraction foundation, (ii) meso-level Streamlined Alignment (SA), which operates on sampled data with simplified target formats, streamlining and regularizing structured outputs to make them more concise and controllable, and (iii) micro-level Deep Exploration (DE), which applies GRPO with stepwise fine-grained rewards (SFR) over structural units to guide exploration and improve performance. Experiments on 36 public datasets show that ProUIE consistently improves unified extraction, outperforming strong instruction-tuned baselines on average for NER and RE while using a smaller backbone, and it further demonstrates clear gains in large-scale production-oriented information extraction.
CLMay 9, 2024Code
Can large language models understand uncommon meanings of common words?Jinyang Wu, Feihu Che, Xinxin Zheng et al.
Large language models (LLMs) like ChatGPT have shown significant advancements across diverse natural language understanding (NLU) tasks, including intelligent dialogue and autonomous agents. Yet, lacking widely acknowledged testing mechanisms, answering `whether LLMs are stochastic parrots or genuinely comprehend the world' remains unclear, fostering numerous studies and sparking heated debates. Prevailing research mainly focuses on surface-level NLU, neglecting fine-grained explorations. However, such explorations are crucial for understanding their unique comprehension mechanisms, aligning with human cognition, and finally enhancing LLMs' general NLU capacities. To address this gap, our study delves into LLMs' nuanced semantic comprehension capabilities, particularly regarding common words with uncommon meanings. The idea stems from foundational principles of human communication within psychology, which underscore accurate shared understandings of word semantics. Specifically, this paper presents the innovative construction of a Lexical Semantic Comprehension (LeSC) dataset with novel evaluation metrics, the first benchmark encompassing both fine-grained and cross-lingual dimensions. Introducing models of both open-source and closed-source, varied scales and architectures, our extensive empirical experiments demonstrate the inferior performance of existing models in this basic lexical-meaning understanding task. Notably, even the state-of-the-art LLMs GPT-4 and GPT-3.5 lag behind 16-year-old humans by 3.9% and 22.3%, respectively. Additionally, multiple advanced prompting techniques and retrieval-augmented generation are also introduced to help alleviate this trouble, yet limitations persist. By highlighting the above critical shortcomings, this research motivates further investigation and offers novel insights for developing more intelligent LLMs.
AIFeb 25
ProactiveMobile: A Comprehensive Benchmark for Boosting Proactive Intelligence on Mobile DevicesDezhi Kong, Zhengzhao Feng, Qiliang Liang et al.
Multimodal large language models (MLLMs) have made significant progress in mobile agent development, yet their capabilities are predominantly confined to a reactive paradigm, where they merely execute explicit user commands. The emerging paradigm of proactive intelligence, where agents autonomously anticipate needs and initiate actions, represents the next frontier for mobile agents. However, its development is critically bottlenecked by the lack of benchmarks that can address real-world complexity and enable objective, executable evaluation. To overcome these challenges, we introduce ProactiveMobile, a comprehensive benchmark designed to systematically advance research in this domain. ProactiveMobile formalizes the proactive task as inferring latent user intent across four dimensions of on-device contextual signals and generating an executable function sequence from a comprehensive function pool of 63 APIs. The benchmark features over 3,660 instances of 14 scenarios that embrace real-world complexity through multi-answer annotations. To ensure quality, a team of 30 experts conducts a final audit of the benchmark, verifying factual accuracy, logical consistency, and action feasibility, and correcting any non-compliant entries. Extensive experiments demonstrate that our fine-tuned Qwen2.5-VL-7B-Instruct achieves a success rate of 19.15%, outperforming o1 (15.71%) and GPT-5 (7.39%). This result indicates that proactivity is a critical competency widely lacking in current MLLMs, yet it is learnable, emphasizing the importance of the proposed benchmark for proactivity evaluation.
CVDec 16, 2025
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge DevicesHyperAI Team, Yuchen Liu, Kaiyang Han et al.
Current multimodal large lanauge models possess strong perceptual and reasoning capabilities, however high computational and memory requirements make them difficult to deploy directly on on-device environments. While small-parameter models are progressively endowed with strong general capabilities, standard Vision Transformer (ViT) encoders remain a critical bottleneck, suffering from excessive latency and memory consumption when processing high-resolution inputs.To address these challenges, we introduce HyperVL, an efficient multimodal large language model tailored for on-device inference. HyperVL adopts an image-tiling strategy to cap peak memory usage and incorporates two novel techniques: (1) a Visual Resolution Compressor (VRC) that adaptively predicts optimal encoding resolutions to eliminate redundant computation, and (2) Dual Consistency Learning (DCL), which aligns multi-scale ViT encoders within a unified framework, enabling dynamic switching between visual branches under a shared LLM. Extensive experiments demonstrate that HyperVL achieves state-of-the-art performance among models of comparable size across multiple benchmarks. Furthermore, it significantly significantly reduces latency and power consumption on real mobile devices, demonstrating its practicality for on-device multimodal inference.
CLApr 24, 2024
KS-LLM: Knowledge Selection of Large Language Models with Evidence Document for Question AnsweringXinxin Zheng, Feihu Che, Jinyang Wu et al.
Large language models (LLMs) suffer from the hallucination problem and face significant challenges when applied to knowledge-intensive tasks. A promising approach is to leverage evidence documents as extra supporting knowledge, which can be obtained through retrieval or generation. However, existing methods directly leverage the entire contents of the evidence document, which may introduce noise information and impair the performance of large language models. To tackle this problem, we propose a novel Knowledge Selection of Large Language Models (KS-LLM) method, aiming to identify valuable information from evidence documents. The KS-LLM approach utilizes triples to effectively select knowledge snippets from evidence documents that are beneficial to answering questions. Specifically, we first generate triples based on the input question, then select the evidence sentences most similar to triples from the evidence document, and finally combine the evidence sentences and triples to assist large language models in generating answers. Experimental comparisons on several question answering datasets, such as TriviaQA, WebQ, and NQ, demonstrate that the proposed method surpasses the baselines and achieves the best results.
SDFeb 17, 2022
ADD 2022: the First Audio Deep Synthesis Detection ChallengeJiangyan Yi, Ruibo Fu, Jianhua Tao et al.
Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021. However, the recent shared tasks have not covered many real-life and challenging scenarios. The first Audio Deep synthesis Detection challenge (ADD) was motivated to fill in the gap. The ADD 2022 includes three tracks: low-quality fake audio detection (LF), partially fake audio detection (PF) and audio fake game (FG). The LF track focuses on dealing with bona fide and fully fake utterances with various real-world noises etc. The PF track aims to distinguish the partially fake audio from the real. The FG track is a rivalry game, which includes two tasks: an audio generation task and an audio fake detection task. In this paper, we describe the datasets, evaluation metrics, and protocols. We also report major findings that reflect the recent advances in audio deepfake detection tasks.
ETMay 7, 2019
An Interpretable Neural Network for Configuring Programmable Wireless EnvironmentsChristos Liaskos, Ageliki Tsioliaridou, Shuai Nie et al.
Software-defined metasurfaces (SDMs) comprise a dense topology of basic elements called meta-atoms, exerting the highest degree of control over surface currents among intelligent panel technologies. As such, they can transform impinging electromagnetic (EM) waves in complex ways, modifying their direction, power, frequency spectrum, polarity and phase. A well-defined software interface allows for applying such functionalities to waves and inter-networking SDMs, while abstracting the underlying physics. A network of SDMs deployed over objects within an area, such as a floorplan walls, creates programmable wireless environments (PWEs) with fully customizable propagation of waves within them. This work studies the use of machine learning for configuring such environments to the benefit of users within. The methodology consists of modeling wireless propagation as a custom, interpretable, back-propagating neural network, with SDM elements as nodes and their cross-interactions as links. Following a training period the network learns the propagation basics of SDMs and configures them to facilitate the communication of users within their vicinity.
ASNov 1, 2018
Deep Segment Attentive Embedding for Duration Robust Speaker VerificationBin Liu, Shuai Nie, Yaping Zhang et al.
LSTM-based speaker verification usually uses a fixed-length local segment randomly truncated from an utterance to learn the utterance-level speaker embedding, while using the average embedding of all segments of a test utterance to verify the speaker, which results in a critical mismatch between testing and training. This mismatch degrades the performance of speaker verification, especially when the durations of training and testing utterances are very different. To alleviate this issue, we propose the deep segment attentive embedding method to learn the unified speaker embeddings for utterances of variable duration. Each utterance is segmented by a sliding window and LSTM is used to extract the embedding of each segment. Instead of only using one local segment, we use the whole utterance to learn the utterance-level embedding by applying an attentive pooling to the embeddings of all segments. Moreover, the similarity loss of segment-level embeddings is introduced to guide the segment attention to focus on the segments with more speaker discriminations, and jointly optimized with the similarity loss of utterance-level embeddings. Systematic experiments on Tongdun and VoxCeleb show that the proposed method significantly improves robustness of duration variant and achieves the relative Equal Error Rate reduction of 50% and 11.54% , respectively.
SDMay 2, 2018
Boosting Noise Robustness of Acoustic Model via Deep Adversarial TrainingBin Liu, Shuai Nie, Yaping Zhang et al.
In realistic environments, speech is usually interfered by various noise and reverberation, which dramatically degrades the performance of automatic speech recognition (ASR) systems. To alleviate this issue, the commonest way is to use a well-designed speech enhancement approach as the front-end of ASR. However, more complex pipelines, more computations and even higher hardware costs (microphone array) are additionally consumed for this kind of methods. In addition, speech enhancement would result in speech distortions and mismatches to training. In this paper, we propose an adversarial training method to directly boost noise robustness of acoustic model. Specifically, a jointly compositional scheme of generative adversarial net (GAN) and neural network-based acoustic model (AM) is used in the training phase. GAN is used to generate clean feature representations from noisy features by the guidance of a discriminator that tries to distinguish between the true clean signals and generated signals. The joint optimization of generator, discriminator and AM concentrates the strengths of both GAN and AM for speech recognition. Systematic experiments on CHiME-4 show that the proposed method significantly improves the noise robustness of AM and achieves the average relative error rate reduction of 23.38% and 11.54% on the development and test set, respectively.