CVJul 21, 2022
NSNet: Non-saliency Suppression Sampler for Efficient Video RecognitionBoyang Xia, Wenhao Wu, Haoran Wang et al. · amazon-science
It is challenging for artificial intelligence systems to achieve accurate video recognition under the scenario of low computation costs. Adaptive inference based efficient video recognition methods typically preview videos and focus on salient parts to reduce computation costs. Most existing works focus on complex networks learning with video classification based objectives. Taking all frames as positive samples, few of them pay attention to the discrimination between positive samples (salient frames) and negative samples (non-salient frames) in supervisions. To fill this gap, in this paper, we propose a novel Non-saliency Suppression Network (NSNet), which effectively suppresses the responses of non-salient frames. Specifically, on the frame level, effective pseudo labels that can distinguish between salient and non-salient frames are generated to guide the frame saliency learning. On the video level, a temporal attention module is learned under dual video-level supervisions on both the salient and the non-salient representations. Saliency measurements from both two levels are combined for exploitation of multi-granularity complementary information. Extensive experiments conducted on four well-known benchmarks verify our NSNet not only achieves the state-of-the-art accuracy-efficiency trade-off but also present a significantly faster (2.4~4.3x) practical inference speed than state-of-the-art methods. Our project page is at https://lawrencexia2008.github.io/projects/nsnet .
AIApr 6, 2023
FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days LeadKang Chen, Tao Han, Junchao Gong et al.
We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25° latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80\% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.
CVNov 21, 2022
Slow Motion Matters: A Slow Motion Enhanced Network for Weakly Supervised Temporal Action LocalizationWeiqi Sun, Rui Su, Qian Yu et al. · amazon-science, princeton
Weakly supervised temporal action localization (WTAL) aims to localize actions in untrimmed videos with only weak supervision information (e.g. video-level labels). Most existing models handle all input videos with a fixed temporal scale. However, such models are not sensitive to actions whose pace of the movements is different from the ``normal" speed, especially slow-motion action instances, which complete the movements with a much slower speed than their counterparts with a normal speed. Here arises the slow-motion blurred issue: It is hard to explore salient slow-motion information from videos at ``normal" speed. In this paper, we propose a novel framework termed Slow Motion Enhanced Network (SMEN) to improve the ability of a WTAL network by compensating its sensitivity on slow-motion action segments. The proposed SMEN comprises a Mining module and a Localization module. The mining module generates mask to mine slow-motion-related features by utilizing the relationships between the normal motion and slow motion; while the localization module leverages the mined slow-motion features as complementary information to improve the temporal action localization results. Our proposed framework can be easily adapted by existing WTAL networks and enable them be more sensitive to slow-motion actions. Extensive experiments on three benchmarks are conducted, which demonstrate the high performance of our proposed framework.
CVNov 17, 2022
3D-QueryIS: A Query-based Framework for 3D Instance SegmentationJiaheng Liu, Tong He, Honghui Yang et al.
Previous top-performing methods for 3D instance segmentation often maintain inter-task dependencies and the tendency towards a lack of robustness. Besides, inevitable variations of different datasets make these methods become particularly sensitive to hyper-parameter values and manifest poor generalization capability. In this paper, we address the aforementioned challenges by proposing a novel query-based method, termed as 3D-QueryIS, which is detector-free, semantic segmentation-free, and cluster-free. Specifically, we propose to generate representative points in an implicit manner, and use them together with the initial queries to generate the informative instance queries. Then, the class and binary instance mask predictions can be produced by simply applying MLP layers on top of the instance queries and the extracted point cloud embeddings. Thus, our 3D-QueryIS is free from the accumulated errors caused by the inter-task dependencies. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness and efficiency of our proposed 3D-QueryIS method.
AIDec 18, 2025
Probing Scientific General Intelligence of LLMs with Scientist-Aligned WorkflowsWanghan Xu, Yuhao Zhou, Yifan Zhou et al.
Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-remains lacking. We present an operational SGI definition grounded in the Practical Inquiry Model (PIM: Deliberation, Conception, Action, Perception) and operationalize it via four scientist-aligned tasks: deep research, idea generation, dry/wet experiments, and experimental reasoning. SGI-Bench comprises over 1,000 expert-curated, cross-disciplinary samples inspired by Science's 125 Big Questions, enabling systematic evaluation of state-of-the-art LLMs. Results reveal gaps: low exact match (10--20%) in deep research despite step-level alignment; ideas lacking feasibility and detail; high code executability but low execution result accuracy in dry experiments; low sequence fidelity in wet protocols; and persistent multimodal comparative-reasoning challenges. We further introduce Test-Time Reinforcement Learning (TTRL), which optimizes retrieval-augmented novelty rewards at inference, enhancing hypothesis novelty without reference answer. Together, our PIM-grounded definition, workflow-centric benchmark, and empirical insights establish a foundation for AI systems that genuinely participate in scientific discovery.
CVAug 13, 2024
Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early WarningTianning Zhang, Feng Liu, Yuming Yuan et al.
Existing EEW approaches often treat phase picking, location estimation, and magnitude estimation as separate tasks, lacking a unified framework. Additionally, most deep learning models in seismology rely on full three-component waveforms and are not suitable for real-time streaming data. To address these limitations, we propose a novel unified seismic neural network called Fast Information Streaming Handler (FisH). FisH is designed to process real-time streaming seismic data and generate simultaneous results for phase picking, location estimation, and magnitude estimation in an end-to-end fashion. By integrating these tasks within a single model, FisH simplifies the overall process and leverages the nonlinear relationships between tasks for improved performance. The FisH model utilizes RetNet as its backbone, enabling parallel processing during training and recurrent handling during inference. This capability makes FisH suitable for real-time applications, reducing latency in EEW systems. Extensive experiments conducted on the STEAD benchmark dataset provide strong validation for the effectiveness of our proposed FisH model. The results demonstrate that FisH achieves impressive performance across multiple seismic event detection and characterization tasks. Specifically, it achieves an F1 score of 0.99/0.96. Also, FisH demonstrates precise earthquake location estimation, with location error of only 6.0km, a distance error of 2.6km, and a back-azimuth error of 19°. The model also exhibits accurate earthquake magnitude estimation, with a magnitude error of just 0.14. Additionally, FisH is capable of generating real-time estimations, providing location and magnitude estimations with a location error of 8.06km and a magnitude error of 0.18 within a mere 3 seconds after the P-wave arrives.
77.8CLApr 14
PolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language ModelsHan Bao, Penghao Zhang, Yue Huang et al.
Large Language Models (LLMs) are increasingly integrated into real-world decision-making, including in the domain of public policy. Yet, their ability to comprehend and reason about policy-related content remains underexplored. To fill this gap, we present \textbf{\textit{PolicyBench}}, the first large-scale cross-system benchmark (US-China) evaluating policy comprehension, comprising 21K cases across a broad spectrum of policy areas, capturing the diversity and complexity of real-world governance. Following Bloom's taxonomy, the benchmark assesses three core capabilities: (1) \textbf{Memorization}: factual recall of policy knowledge, (2) \textbf{Understanding}: conceptual and contextual reasoning, and (3) \textbf{Application}: problem-solving in real-life policy scenarios. Building on this benchmark, we further propose \textbf{\textit{PolicyMoE}}, a domain-specialized Mixture-of-Experts (MoE) model with expert modules aligned to each cognitive level. The proposed models demonstrate stronger performance on application-oriented policy tasks than on memorization or conceptual understanding, and yields the highest accuracy on structured reasoning tasks. Our results reveal key limitations of current LLMs in policy understanding and suggest paths toward more reliable, policy-focused models.
56.4LGMay 19
D$^3$-Subsidy: Online and Sequential Driver Subsidy Decision-Making for Large-Scale Ride-Hailing MarketTaijie Chen, Rui Su, Siyuan Feng et al.
Ride-hailing platforms like DiDi Chuxing operate in highly dynamic environments where balancing driver supply and passenger demand is critical. Although driver-side subsidies serve as a primary lever to align these forces and improve key KPIs like completed rides (\texttt{Rides}) and gross merchandise value (\texttt{GMV}), optimizing them in production requires simultaneously meeting three constraints: (i) responsiveness to stochastic shocks, (ii) strict subsidy-rate caps, and (iii) low-latency execution at city scale. These requirements rule out expensive per-order optimization, calling for a forward-looking, constraint-aware city-level controller for online sequential decision making. To meet these requirements, we introduce D$^3$-Subsidy (Dynamic Driver-side Diffusion-based Subsidy), a hierarchical diffusion-based framework for deployable city-wide subsidy control. To bridge the train-inference gap, D$^3$-Subsidy employs a prefix-conditioned diffusion model that samples plausible future trajectories from immutable historical observations, ensuring the training protocol aligns with the fixed-history nature of online deployment. These generated plans are then decoded by a context-conditioned inverse module into low-dimensional city-level control signals. For scalable execution, we bridge the gap between city-level planning and fine-grained dispatch via a Lagrangian-dual-derived mapping, which embeds subsidy-rate caps directly into order-driver incentives without iterative optimization. Additionally, a multi-city pretraining strategy with parameter-efficient fine-tuning enables robust transfer across heterogeneous cities. Extensive offline evaluations demonstrate that D$^3$-Subsidy improves \texttt{Rides} and \texttt{GMV} while enhancing cap compliance, and a real-world A/B test confirms significant uplift while keeping budget-related violation metrics within operational thresholds.
96.7LGMay 7
Tyche: One Step Flow for Efficient Probabilistic Weather ForecastingFan Xu, Yuan Gao, Kun Wang et al.
Probabilistic weather forecasting requires not only accurate trajectories, but calibrated distributions over plausible atmospheric futures. Recent data-driven systems have achieved remarkable deterministic skill, and diffusion-based ensemble forecasters have substantially improved sample realism and uncertainty quantification. However, their inference cost scales with forecast horizon, ensemble size, and the number of denoising steps required for each transition, making large operational ensembles expensive. To address this, we present Tyche, a one-step conditional flow model for efficient probabilistic weather forecasting. Tyche models the conditional forecast distribution with a destination-aware average-velocity flow that maps Gaussian noise directly to future weather states in a single function evaluation (1-NFE). To make this one-step transport learnable in high-dimensional geophysical fields, we derive a JVP-regularized rectification objective that enforces temporal self-consistency across source and destination flow timesteps without explicitly forming Jacobians. The transport field is parameterized by an isotropic Swin-style transformer that preserves fine-scale spatial structure while remaining scalable on global grids. To improve ensemble reliability under autoregressive forecasting, we further introduce a rollout-based finetuning stage with curriculum CRPS calibration supervision. Experiments on ERA5 at 1.5$^\circ$ and 6-hour resolution show that our Tyche, using merely a single NFE, matches or exceeds the forecast skill and calibration of state-of-the-art multi-step generative baselines and the operational ECMWF IFS ensemble.
LGNov 12, 2025
Hierarchical Schedule Optimization for Fast and Robust Diffusion Model SamplingAihua Zhu, Rui Su, Qinglin Zhao et al.
Diffusion probabilistic models have set a new standard for generative fidelity but are hindered by a slow iterative sampling process. A powerful training-free strategy to accelerate this process is Schedule Optimization, which aims to find an optimal distribution of timesteps for a fixed and small Number of Function Evaluations (NFE) to maximize sample quality. To this end, a successful schedule optimization method must adhere to four core principles: effectiveness, adaptivity, practical robustness, and computational efficiency. However, existing paradigms struggle to satisfy these principles simultaneously, motivating the need for a more advanced solution. To overcome these limitations, we propose the Hierarchical-Schedule-Optimizer (HSO), a novel and efficient bi-level optimization framework. HSO reframes the search for a globally optimal schedule into a more tractable problem by iteratively alternating between two synergistic levels: an upper-level global search for an optimal initialization strategy and a lower-level local optimization for schedule refinement. This process is guided by two key innovations: the Midpoint Error Proxy (MEP), a solver-agnostic and numerically stable objective for effective local optimization, and the Spacing-Penalized Fitness (SPF) function, which ensures practical robustness by penalizing pathologically close timesteps. Extensive experiments show that HSO sets a new state-of-the-art for training-free sampling in the extremely low-NFE regime. For instance, with an NFE of just 5, HSO achieves a remarkable FID of 11.94 on LAION-Aesthetics with Stable Diffusion v2.1. Crucially, this level of performance is attained not through costly retraining, but with a one-time optimization cost of less than 8 seconds, presenting a highly practical and efficient paradigm for diffusion model acceleration.
CVJun 23, 2025
Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal DomainRui Su, Dong Xu, Luping Zhou et al.
Weakly supervised temporal action localization is a challenging task as only the video-level annotation is available during the training process. To address this problem, we propose a two-stage approach to fully exploit multi-resolution information in the temporal domain and generate high quality frame-level pseudo labels based on both appearance and motion streams. Specifically, in the first stage, we generate reliable initial frame-level pseudo labels, and in the second stage, we iteratively refine the pseudo labels and use a set of selected frames with highly confident pseudo labels to train neural networks and better predict action class scores at each frame. We fully exploit temporal information at multiple scales to improve temporal action localization performance. Specifically, in order to obtain reliable initial frame-level pseudo labels, in the first stage, we propose an Initial Label Generation (ILG) module, which leverages temporal multi-resolution consistency to generate high quality class activation sequences (CASs), which consist of a number of sequences with each sequence measuring how likely each video frame belongs to one specific action class. In the second stage, we propose a Progressive Temporal Label Refinement (PTLR) framework. In our PTLR framework, two networks called Network-OTS and Network-RTS, which are respectively used to generate CASs for the original temporal scale and the reduced temporal scales, are used as two streams (i.e., the OTS stream and the RTS stream) to refine the pseudo labels in turn. By this way, the multi-resolution information in the temporal domain is exchanged at the pseudo label level, and our work can help improve each stream (i.e., the OTS/RTS stream) by exploiting the refined pseudo labels from another stream (i.e., the RTS/OTS stream).
LGFeb 27, 2025
SeisMoLLM: Advancing Seismic Monitoring via Cross-modal Transfer with Pre-trained Large Language ModelXinghao Wang, Feng Liu, Rui Su et al.
Recent advances in deep learning have revolutionized seismic monitoring, yet developing a foundation model that performs well across multiple complex tasks remains challenging, particularly when dealing with degraded signals or data scarcity. This work presents SeisMoLLM, the first foundation model that utilizes cross-modal transfer for seismic monitoring, to unleash the power of large-scale pre-training from a large language model without requiring direct pre-training on seismic datasets. Through elaborate waveform tokenization and fine-tuning of pre-trained GPT-2 model, SeisMoLLM achieves state-of-the-art performance on the DiTing and STEAD datasets across five critical tasks: back-azimuth estimation, epicentral distance estimation, magnitude estimation, phase picking, and first-motion polarity classification. It attains 36 best results out of 43 task metrics and 12 top scores out of 16 few-shot generalization metrics, with many relative improvements ranging from 10% to 50%. In addition to its superior performance, SeisMoLLM maintains efficiency comparable to or even better than lightweight models in both training and inference. These findings establish SeisMoLLM as a promising foundation model for practical seismic monitoring and highlight cross-modal transfer as an exciting new direction for earthquake studies, showcasing the potential of advanced deep learning techniques to propel seismology research forward.
GRAug 11, 2025
Vertex Features for Neural Global IlluminationRui Su, Honghao Dong, Haojie Jin et al.
Recent research on learnable neural representations has been widely adopted in the field of 3D scene reconstruction and neural rendering applications. However, traditional feature grid representations often suffer from substantial memory footprint, posing a significant bottleneck for modern parallel computing hardware. In this paper, we present neural vertex features, a generalized formulation of learnable representation for neural rendering tasks involving explicit mesh surfaces. Instead of uniformly distributing neural features throughout 3D space, our method stores learnable features directly at mesh vertices, leveraging the underlying geometry as a compact and structured representation for neural processing. This not only optimizes memory efficiency, but also improves feature representation by aligning compactly with the surface using task-specific geometric priors. We validate our neural representation across diverse neural rendering tasks, with a specific emphasis on neural radiosity. Experimental results demonstrate that our method reduces memory consumption to only one-fifth (or even less) of grid-based representations, while maintaining comparable rendering quality and lowering inference overhead.
MAMay 21, 2025
Swarm Intelligence Enhanced Reasoning: A Density-Driven Framework for LLM-Based Multi-Agent OptimizationYing Zhu, Heng Zhou, Rui Su et al.
Recently, many approaches, such as Chain-of-Thought (CoT) prompting and Multi-Agent Debate (MAD), have been proposed to further enrich Large Language Models' (LLMs) complex problem-solving capacities in reasoning scenarios. However, these methods may fail to solve complex problems due to the lack of ability to find optimal solutions. Swarm Intelligence has been serving as a powerful tool for finding optima in the field of traditional optimization problems. To this end, we propose integrating swarm intelligence into the reasoning process by introducing a novel Agent-based Swarm Intelligence (ASI) paradigm. In this paradigm, we formulate LLM reasoning as an optimization problem and use a swarm intelligence scheme to guide a group of LLM-based agents in collaboratively searching for optimal solutions. To avoid swarm intelligence getting trapped in local optima, we further develop a Swarm Intelligence Enhancing Reasoning (SIER) framework, which develops a density-driven strategy to enhance the reasoning ability. To be specific, we propose to perform kernel density estimation and non-dominated sorting to optimize both solution quality and diversity simultaneously. In this case, SIER efficiently enhances solution space exploration through expanding the diversity of the reasoning path. Besides, a step-level quality evaluation is used to help agents improve solution quality by correcting low-quality intermediate steps. Then, we use quality thresholds to dynamically control the termination of exploration and the selection of candidate steps, enabling a more flexible and efficient reasoning process. Extensive experiments are ...
GEO-PHJan 8, 2025
DispFormer: A Pretrained Transformer Incorporating Physical Constraints for Dispersion Curve InversionFeng Liu, Bao Deng, Rui Su et al.
Surface wave dispersion curve inversion is crucial for estimating subsurface shear-wave velocity (vs), yet traditional methods often face challenges related to computational cost, non-uniqueness, and sensitivity to initial models. While deep learning approaches show promise, many require large labeled datasets and struggle with real-world datasets, which often exhibit varying period ranges, missing values, and low signal-to-noise ratios. To address these limitations, this study introduces DispFormer, a transformer-based neural network for $v_s$ profile inversion from Rayleigh-wave phase and group dispersion curves. DispFormer processes dispersion data independently at each period, allowing it to handle varying lengths without requiring network modifications or strict alignment between training and testing datasets. A depth-aware training strategy is also introduced, incorporating physical constraints derived from the depth sensitivity of dispersion data. DispFormer is pre-trained on a global synthetic dataset and evaluated on two regional synthetic datasets using zero-shot and few-shot strategies. Results show that even without labeled data, the zero-shot DispFormer generates inversion profiles that outperform the interpolated reference model used as the pretraining target, providing a deployable initial model generator to assist traditional workflows. When partial labeled data available, the few-shot trained DispFormer surpasses traditional global search methods. Real-world tests further confirm that DispFormer generalizes well to dispersion data with varying lengths and achieves lower data residuals than reference models. These findings underscore the potential of DispFormer as a foundation model for dispersion curve inversion and demonstrate the advantages of integrating physics-informed deep learning into geophysical applications.
CVJun 14, 2021
SGE net: Video object detection with squeezed GRU and information entropy mapRui Su, Wenjing Huang, Haoyu Ma et al.
Recently, deep learning based video object detection has attracted more and more attention. Compared with object detection of static images, video object detection is more challenging due to the motion of objects, while providing rich temporal information. The RNN-based algorithm is an effective way to enhance detection performance in videos with temporal information. However, most studies in this area only focus on accuracy while ignoring the calculation cost and the number of parameters. In this paper, we propose an efficient method that combines channel-reduced convolutional GRU (Squeezed GRU), and Information Entropy map for video object detection (SGE-Net). The experimental results validate the accuracy improvement, computational savings of the Squeezed GRU, and superiority of the information entropy attention mechanism on the classification performance. The mAP has increased by 3.7 contrasted with the baseline, and the number of parameters has decreased from 6.33 million to 0.67 million compared with the standard GRU.
SPMay 27, 2021
Deep Learning for Depression Recognition with Audiovisual Cues: A ReviewLang He, Mingyue Niu, Prayag Tiwari et al.
With the acceleration of the pace of work and life, people have to face more and more pressure, which increases the possibility of suffering from depression. However, many patients may fail to get a timely diagnosis due to the serious imbalance in the doctor-patient ratio in the world. Promisingly, physiological and psychological studies have indicated some differences in speech and facial expression between patients with depression and healthy individuals. Consequently, to improve current medical care, many scholars have used deep learning to extract a representation of depression cues in audio and video for automatic depression detection. To sort out and summarize these works, this review introduces the databases and describes objective markers for automatic depression estimation (ADE). Furthermore, we review the deep learning methods for automatic depression detection to extract the representation of depression from audio and video. Finally, this paper discusses challenges and promising directions related to automatic diagnosing of depression using deep learning technologies.
CVMay 28, 2019
Progressive Cross-Stream Cooperation in Spatial and Temporal Domain for Action LocalizationRui Su, Dong Xu, Luping Zhou et al.
Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal localization. In this work, we propose a new progressive cross-stream cooperation (PCSC) framework that improves all three tasks above. The basic idea is to utilize both spatial region (resp., temporal segment proposals) and features from one stream (i.e., the Flow/RGB stream) to help another stream (i.e., the RGB/Flow stream) to iteratively generate better bounding boxes in the spatial domain (resp., temporal segments in the temporal domain). In this way, not only the actions could be more accurately localized both spatially and temporally, but also the action classes could be predicted more precisely. Specifically, we first combine the latest region proposals (for spatial detection) or segment proposals (for temporal localization) from both streams to form a larger set of labelled training samples to help learn better action detection or segment detection models. Second, to learn better representations, we also propose a new message passing approach to pass information from one stream to another stream, which also leads to better action detection and segment detection models. By first using our newly proposed PCSC framework for spatial localization at the frame-level and then applying our temporal PCSC framework for temporal localization at the tube-level, the action localization results are progressively improved at both the frame level and the video level. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.
LGMar 4, 2019
Hybrid Actor-Critic Reinforcement Learning in Parameterized Action SpaceZhou Fan, Rui Su, Weinan Zhang et al.
In this paper we propose a hybrid architecture of actor-critic algorithms for reinforcement learning in parameterized action space, which consists of multiple parallel sub-actor networks to decompose the structured action space into simpler action spaces along with a critic network to guide the training of all sub-actor networks. While this paper is mainly focused on parameterized action space, the proposed architecture, which we call hybrid actor-critic, can be extended for more general action spaces which has a hierarchical structure. We present an instance of the hybrid actor-critic architecture based on proximal policy optimization (PPO), which we refer to as hybrid proximal policy optimization (H-PPO). Our experiments test H-PPO on a collection of tasks with parameterized action space, where H-PPO demonstrates superior performance over previous methods of parameterized action reinforcement learning.