98.9CLMar 16
Aligning Paralinguistic Understanding and Generation in Speech LLMs via Multi-Task Reinforcement LearningJingxiang Chen, Minseok Kim, Seong-Gyun Leem et al.
Speech large language models (LLMs) observe paralinguistic cues such as prosody, emotion, and non-verbal sounds--crucial for intent understanding. However, leveraging these cues faces challenges: limited training data, annotation difficulty, and models exploiting lexical shortcuts over paralinguistic signals. We propose multi-task reinforcement learning (RL) with chain-of-thought prompting that elicits explicit affective reasoning. To address data scarcity, we introduce a paralinguistics-aware speech LLM (PALLM) that jointly optimizes sentiment classification from audio and paralinguistics-aware response generation via a two-stage pipeline. Experiments demonstrate that our approach improves paralinguistics understanding over both supervised baselines and strong proprietary models (Gemini-2.5-Pro, GPT-4o-audio) by 8-12% on Expresso, IEMOCAP, and RAVDESS. The results show that modeling paralinguistic reasoning with multi-task RL is crucial for building emotionally intelligent speech LLMs.
OCDec 11, 2023
Amazon Locker Capacity ManagementSamyukta Sethuraman, Ankur Bansal, Setareh Mardan et al.
Amazon Locker is a self-service delivery or pickup location where customers can pick up packages and drop off returns. A basic first-come-first-served policy for accepting package delivery requests to lockers results in lockers becoming full with standard shipping speed (3-5 day shipping) packages, and leaving no space left for expedited packages which are mostly Next-Day or Two-Day shipping. This paper proposes a solution to the problem of determining how much locker capacity to reserve for different ship-option packages. Yield management is a much researched field with popular applications in the airline, car rental, and hotel industries. However, Amazon Locker poses a unique challenge in this field since the number of days a package will wait in a locker (package dwell time) is, in general, unknown. The proposed solution combines machine learning techniques to predict locker demand and package dwell time, and linear programming to maximize throughput in lockers. The decision variables from this optimization provide optimal capacity reservation values for different ship options. This resulted in a year-over-year increase of 9% in Locker throughput worldwide during holiday season of 2018, impacting millions of customers.
AISep 22, 2025
Memory-QA: Answering Recall Questions Based on Multimodal MemoriesHongda Jiang, Xinyuan Zhang, Siddhant Garg et al. · amazon-science
We introduce Memory-QA, a novel real-world task that involves answering recall questions about visual content from previously stored multimodal memories. This task poses unique challenges, including the creation of task-oriented memories, the effective utilization of temporal and location information within memories, and the ability to draw upon multiple memories to answer a recall question. To address these challenges, we propose a comprehensive pipeline, Pensieve, integrating memory-specific augmentation, time- and location-aware multi-signal retrieval, and multi-memory QA fine-tuning. We created a multimodal benchmark to illustrate various real challenges in this task, and show the superior performance of Pensieve over state-of-the-art solutions (up to 14% on QA accuracy).