Zhe Su

CL
h-index49
25papers
1,055citations
Novelty54%
AI Score58

25 Papers

ASJun 5, 2022Code
Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Ziyue Jiang, Zhe Su, Zhou Zhao et al. · cmu

Polyphone disambiguation aims to capture accurate pronunciation knowledge from natural text sequences for reliable Text-to-speech (TTS) systems. However, previous approaches require substantial annotated training data and additional efforts from language experts, making it difficult to extend high-quality neural TTS systems to out-of-domain daily conversations and countless languages worldwide. This paper tackles the polyphone disambiguation problem from a concise and novel perspective: we propose Dict-TTS, a semantic-aware generative text-to-speech model with an online website dictionary (the existing prior information in the natural language). Specifically, we design a semantics-to-pronunciation attention (S2PA) module to match the semantic patterns between the input text sequence and the prior semantics in the dictionary and obtain the corresponding pronunciations; The S2PA module can be easily trained with the end-to-end TTS model without any annotated phoneme labels. Experimental results in three languages show that our model outperforms several strong baseline models in terms of pronunciation accuracy and improves the prosody modeling of TTS systems. Further extensive analyses demonstrate that each design in Dict-TTS is effective. The code is available at \url{https://github.com/Zain-Jiang/Dict-TTS}.

LGOct 11, 2022
Knowledge-Driven New Drug Recommendation

Zhenbang Wu, Huaxiu Yao, Zhe Su et al. · cmu

Drug recommendation assists doctors in prescribing personalized medications to patients based on their health conditions. Existing drug recommendation solutions adopt the supervised multi-label classification setup and only work with existing drugs with sufficient prescription data from many patients. However, newly approved drugs do not have much historical prescription data and cannot leverage existing drug recommendation methods. To address this, we formulate the new drug recommendation as a few-shot learning problem. Yet, directly applying existing few-shot learning algorithms faces two challenges: (1) complex relations among diseases and drugs and (2) numerous false-negative patients who were eligible but did not yet use the new drugs. To tackle these challenges, we propose EDGE, which can quickly adapt to the recommendation for a new drug with limited prescription data from a few support patients. EDGE maintains a drug-dependent multi-phenotype few-shot learner to bridge the gap between existing and new drugs. Specifically, EDGE leverages the drug ontology to link new drugs to existing drugs with similar treatment effects and learns ontology-based drug representations. Such drug representations are used to customize the metric space of the phenotype-driven patient representations, which are composed of a set of phenotypes capturing complex patient health status. Lastly, EDGE eliminates the false-negative supervision signal using an external drug-disease knowledge base. We evaluate EDGE on two real-world datasets: the public EHR data (MIMIC-IV) and private industrial claims data. Results show that EDGE achieves 7.3% improvement on the ROC-AUC score over the best baseline.

AISep 13, 2024
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents

Zhe Su, Xuhui Zhou, Sanketh Rangreji et al. · allen-ai, cmu

Truthfulness (adherence to factual accuracy) and utility (satisfying human needs and instructions) are both fundamental aspects of Large Language Models, yet these goals often conflict (e.g., sell a car with known flaws), which makes it challenging to achieve both in real-world deployments. We propose AI-LieDar, a framework to study how LLM-based agents navigate these scenarios in an multi-turn interactive setting. We design a set of real-world scenarios where language agents are instructed to achieve goals that are in conflict with being truthful during a multi-turn conversation with simulated human agents. To evaluate the truthfulness at large scale, we develop a truthfulness detector inspired by psychological literature to assess the agents' responses. Our experiment demonstrates that all models are truthful less than 50% of the time, though truthfulness and goal achievement (utility) rates vary across models. We further test the steerability of LLMs towards truthfulness, finding that models can be directed to be truthful or deceptive, and even truth-steered models still lie. These findings reveal the complex nature of truthfulness in LLMs and underscore the importance of further research to ensure the safe and reliable deployment of LLMs and LLM-based agents.

99.2CLMar 11
GLM-OCR Technical Report

Shuaiqi Duan, Yadong Xue, Weihan Wang et al. · tsinghua

GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To address the inefficiency of standard autoregressive decoding in deterministic OCR tasks, GLM-OCR introduces a Multi-Token Prediction (MTP) mechanism that predicts multiple tokens per step, significantly improving decoding throughput while keeping memory overhead low through shared parameters. At the system level, a two-stage pipeline is adopted: PP-DocLayout-V3 first performs layout analysis, followed by parallel region-level recognition. Extensive evaluations on public benchmarks and industrial scenarios show that GLM-OCR achieves competitive or state-of-the-art performance in document parsing, text and formula transcription, table structure recovery, and key information extraction. Its compact architecture and structured generation make it suitable for both resource-constrained edge deployment and large-scale production systems.

53.2NEMay 2
Algorithm-hardware co-design of neuromorphic networks with dual memory pathways

Pengfei Sun, Zhe Su, Jascha Achterberg et al. · cambridge

Spiking neural networks excel at event-driven sensing. Yet, maintaining task-relevant context over long timescales both algorithmically and in hardware, while respecting both tight energy and memory budgets, remains a core challenge in the field. We address this challenge through an algorithm-hardware co-design effort. At the algorithm level, inspired by the cortical fast-slow organization in the brain, we introduce a neural network with an explicit slow memory pathway that, combined with fast spiking activity, enables a dual memory pathway (DMP) architecture in which each layer maintains a compact low-dimensional state that summarizes recent activity and modulates spiking dynamics. This explicit memory stabilizes learning while preserving event-driven sparsity, achieving competitive accuracy on long-sequence benchmarks with 40-60% fewer parameters than equivalent state-of-the-art spiking neural networks. At the hardware level, we introduce a near-memory-compute architecture that fully leverages the advantages of the DMP architecture by retaining its compact shared state while optimizing dataflow, across heterogeneous sparse-spike and dense-memory pathways. We show experimental results that demonstrate more than a 4X increase in throughput and over a 5X improvement in energy efficiency compared with state-of-the-art implementations. Together, these contributions demonstrate that biological principles can guide functional abstractions that are both algorithmically effective and hardware-efficient, establishing a scalable co-design framework for real-time neuromorphic computation and learning.

CVJul 1, 2025Code
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

GLM-V Team, Wenyi Hong, Wenmeng Yu et al.

We present GLM-4.1V-Thinking and GLM-4.5V, a family of vision-language models (VLMs) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document interpretation. In a comprehensive evaluation across 42 public benchmarks, GLM-4.5V achieves state-of-the-art performance on nearly all tasks among open-source models of similar size, and demonstrates competitive or even superior results compared to closed-source models such as Gemini-2.5-Flash on challenging tasks including Coding and GUI Agents. Meanwhile, the smaller GLM-4.1V-9B-Thinking remains highly competitive-achieving superior results to the much larger Qwen2.5-VL-72B on 29 benchmarks. We open-source both GLM-4.1V-9B-Thinking and GLM-4.5V. Code, models and more information are released at https://github.com/zai-org/GLM-V.

DGAug 1, 2024
Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning

Zhe Su, Yiying Tong, Guo-Wei Wei

Recently, topological data analysis has become a trending topic in data science and engineering. However, the key technique of topological data analysis, i.e., persistent homology, is defined on point cloud data, which does not work directly for data on manifolds. Although earlier evolutionary de Rham-Hodge theory deals with data on manifolds, it is inconvenient for machine learning applications because of the numerical inconsistency caused by remeshing the involving manifolds in the Lagrangian representation. In this work, we introduce persistent de Rham-Hodge Laplacian, or persistent Hodge Laplacian (PHL) as an abbreviation, for manifold topological learning. Our PHLs are constructed in the Eulerian representation via structure-persevering Cartesian grids, avoiding the numerical inconsistency over the multiscale manifolds. To facilitate the manifold topological learning, we propose a persistent Hodge Laplacian learning algorithm for data on manifolds or volumetric data. As a proof-of-principle application of the proposed manifold topological learning model, we consider the prediction of protein-ligand binding affinities with two benchmark datasets. Our numerical experiments highlight the power and promise of the proposed method.

CVOct 20, 2025Code
Glyph: Scaling Context Windows via Visual-Text Compression

Jiale Cheng, Yusen Liu, Xinyu Zhang et al.

Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph.

40.6LGMar 10
DendroNN: Dendrocentric Neural Networks for Energy-Efficient Classification of Event-Based Data

Jann Krausse, Zhe Su, Kyrus Mama et al.

Spatiotemporal information is at the core of diverse sensory processing and computational tasks. Feed-forward spiking neural networks can be used to solve these tasks while offering potential benefits in terms of energy efficiency by computing event-based. However, they have trouble decoding temporal information with high accuracy. Thus, they commonly resort to recurrence or delays to enhance their temporal computing ability which, however, bring downsides in terms of hardware-efficiency. In the brain, dendrites are computational powerhouses that just recently started to be acknowledged in such machine learning systems. In this work, we focus on a sequence detection mechanism present in branches of dendrites and translate it into a novel type of neural network by introducing a dendrocentric neural network, DendroNN. DendroNNs identify unique incoming spike sequences as spatiotemporal features. This work further introduces a rewiring phase to train the non-differentiable spike sequences without the use of gradients. During the rewiring, the network memorizes frequently occurring sequences and additionally discards those that do not contribute any discriminative information. The networks display competitive accuracies across various event-based time series datasets. We also propose an asynchronous digital hardware architecture using a time-wheel mechanism that builds on the event-driven design of DendroNNs, eliminating per-step global updates typical of delay- or recurrence-based models. By leveraging a DendroNN's dynamic and static sparsity along with intrinsic quantization, it achieves up to 4x higher efficiency than state-of-the-art neuromorphic hardware at comparable accuracy on the same audio classification task, demonstrating its suitability for spatiotemporal event-based computing. This work offers a novel approach to low-power spatiotemporal processing on event-driven hardware.

CLDec 18, 2024
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Frank F. Xu, Yufan Song, Boxuan Li et al. · cmu

We interact with computers on an everyday basis, be it in everyday life or work, and many aspects of work can be done entirely with access to a computer and the Internet. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. But how performant are AI agents at accelerating or even autonomously performing work-related tasks? The answer to this question has important implications both for industry looking to adopt AI into their workflows and for economic policy to understand the effects that adoption of AI may have on the labor market. To measure the progress of these LLM agents' performance on performing real-world professional tasks, in this paper we introduce TheAgentCompany, an extensible benchmark for evaluating AI agents that interact with the world in similar ways to those of a digital worker: by browsing the Web, writing code, running programs, and communicating with other coworkers. We build a self-contained environment with internal web sites and data that mimics a small software company environment, and create a variety of tasks that may be performed by workers in such a company. We test baseline agents powered by both closed API-based and open-weights language models (LMs), and find that the most competitive agent can complete 30% of tasks autonomously. This paints a nuanced picture on task automation with LM agents--in a setting simulating a real workplace, a good portion of simpler tasks could be solved autonomously, but more difficult long-horizon tasks are still beyond the reach of current systems. We release code, data, environment, and experiments on https://the-agent-company.com.

CLJan 26, 2025
Baichuan-Omni-1.5 Technical Report

Yadong Li, Jun Liu, Tao Zhang et al.

We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.

CLMar 8, 2024
Is this the real life? Is this just fantasy? The Misleading Success of Simulating Social Interactions With LLMs

Xuhui Zhou, Zhe Su, Tiwalayo Eisape et al. · allen-ai, cmu

Recent advances in large language models (LLM) have enabled richer social simulations, allowing for the study of various social phenomena. However, most recent work has used a more omniscient perspective on these simulations (e.g., single LLM to generate all interlocutors), which is fundamentally at odds with the non-omniscient, information asymmetric interactions that involve humans and AI agents in the real world. To examine these differences, we develop an evaluation framework to simulate social interactions with LLMs in various settings (omniscient, non-omniscient). Our experiments show that LLMs perform better in unrealistic, omniscient simulation settings but struggle in ones that more accurately reflect real-world conditions with information asymmetry. Our findings indicate that addressing information asymmetry remains a fundamental challenge for LLM-based agents.

ARDec 24, 2025
ElfCore: A 28nm Neural Processor Enabling Dynamic Structured Sparse Training and Online Self-Supervised Learning with Activity-Dependent Weight Update

Zhe Su, Giacomo Indiveri

In this paper, we present ElfCore, a 28nm digital spiking neural network processor tailored for event-driven sensory signal processing. ElfCore is the first to efficiently integrate: (1) a local online self-supervised learning engine that enables multi-layer temporal learning without labeled inputs; (2) a dynamic structured sparse training engine that supports high-accuracy sparse-to-sparse learning; and (3) an activity-dependent sparse weight update mechanism that selectively updates weights based solely on input activity and network dynamics. Demonstrated on tasks including gesture recognition, speech, and biomedical signal processing, ElfCore outperforms state-of-the-art solutions with up to 16X lower power consumption, 3.8X reduced on-chip memory requirements, and 5.9X greater network capacity efficiency.

65.6LGApr 29
Unsupervised Graph Modeling for Anomaly Detection in Accounting Subject Relationships

Yuhan Wang, Ruobing Yan, Zhe Su et al.

This paper addresses the problem of anomaly detection in accounting subject association structures, proposing a structured modeling and unsupervised discriminant framework based on graph neural networks. This framework is used to mine stable correspondences between subjects and identify structural deviations from general ledger details and voucher entries. The method first abstracts accounting subjects as graph nodes, and the co-occurrence and debit/credit correspondence of subjects in the same business record are abstracted as weighted edges. The edge weights are characterized by statistical measures such as co-occurrence frequency or amount aggregation, thus forming a period-level accounting subject association graph. In the representation learning stage, a message passing mechanism is used to fuse the node's own attributes and neighborhood context to obtain node embeddings containing structural information. In the anomaly detection stage, the rationality of subject pair connections is estimated through a relation reconstruction decoder, and edge-level anomaly scores are defined based on the degree of deviation in reconstruction probabilities. These scores are then aggregated to obtain node-level risk ranking and local anomaly localization. This framework can simultaneously capture local substructure anomalies and cross-community anomaly connections without relying on anomaly labeling, outputting traceable subject pair risk clues. Comparative experiments demonstrate more stable comprehensive discriminant capabilities and higher top-ranking accuracy.

CYApr 19, 2025
SOTOPIA-S4: a user-friendly system for flexible, customizable, and large-scale social simulation

Xuhui Zhou, Zhe Su, Sophie Feng et al. · allen-ai, cmu

Social simulation through large language model (LLM) agents is a promising approach to explore and validate hypotheses related to social science questions and LLM agents behavior. We present SOTOPIA-S4, a fast, flexible, and scalable social simulation system that addresses the technical barriers of current frameworks while enabling practitioners to generate multi-turn and multi-party LLM-based interactions with customizable evaluation metrics for hypothesis testing. SOTOPIA-S4 comes as a pip package that contains a simulation engine, an API server with flexible RESTful APIs for simulation management, and a web interface that enables both technical and non-technical users to design, run, and analyze simulations without programming. We demonstrate the usefulness of SOTOPIA-S4 with two use cases involving dyadic hiring negotiation and multi-party planning scenarios.

IVFeb 28, 2025
Manifold Topological Deep Learning for Biomedical Data

Xiang Liu, Zhe Su, Yongyi Shi et al.

Recently, topological deep learning (TDL), which integrates algebraic topology with deep neural networks, has achieved tremendous success in processing point-cloud data, emerging as a promising paradigm in data science. However, TDL has not been developed for data on differentiable manifolds, including images, due to the challenges posed by differential topology. We address this challenge by introducing manifold topological deep learning (MTDL) for the first time. To highlight the power of Hodge theory rooted in differential topology, we consider a simple convolutional neural network (CNN) in MTDL. In this novel framework, original images are represented as smooth manifolds with vector fields that are decomposed into three orthogonal components based on Hodge theory. These components are then concatenated to form an input image for the CNN architecture. The performance of MTDL is evaluated using the MedMNIST v2 benchmark database, which comprises 717,287 biomedical images from eleven 2D and six 3D datasets. MTDL significantly outperforms other competing methods, extending TDL to a wide range of data on smooth manifolds.

NEMay 22, 2024
EchoSpike Predictive Plasticity: An Online Local Learning Rule for Spiking Neural Networks

Lars Graf, Zhe Su, Giacomo Indiveri

The drive to develop artificial neural networks that efficiently utilize resources has generated significant interest in bio-inspired Spiking Neural Networks (SNNs). These networks are particularly attractive due to their potential in applications requiring low power and memory. This potential is further enhanced by the ability to perform online local learning, enabling them to adapt to dynamic environments. This requires the model to be adaptive in a self-supervised manner. While self-supervised learning has seen great success in many deep learning domains, its application for online local learning in multi-layer SNNs remains underexplored. In this paper, we introduce the "EchoSpike Predictive Plasticity" (ESPP) learning rule, a pioneering online local learning rule designed to leverage hierarchical temporal dynamics in SNNs through predictive and contrastive coding. We validate the effectiveness of this approach using benchmark datasets, demonstrating that it performs on par with current state-of-the-art supervised learning rules. The temporal and spatial locality of ESPP makes it particularly well-suited for low-cost neuromorphic processors, representing a significant advancement in developing biologically plausible self-supervised learning models for neuromorphic computing at the edge.

AIJun 19, 2025
Exploring Big Five Personality and AI Capability Effects in LLM-Simulated Negotiation Dialogues

Myke C. Cohen, Zhe Su, Hsien-Te Kao et al. · allen-ai, cmu

This paper presents an evaluation framework for agentic AI systems in mission-critical negotiation contexts, addressing the need for AI agents that can adapt to diverse human operators and stakeholders. Using Sotopia as a simulation testbed, we present two experiments that systematically evaluated how personality traits and AI agent characteristics influence LLM-simulated social negotiation outcomes--a capability essential for a variety of applications involving cross-team coordination and civil-military interactions. Experiment 1 employs causal discovery methods to measure how personality traits impact price bargaining negotiations, through which we found that Agreeableness and Extraversion significantly affect believability, goal achievement, and knowledge acquisition outcomes. Sociocognitive lexical measures extracted from team communications detected fine-grained differences in agents' empathic communication, moral foundations, and opinion patterns, providing actionable insights for agentic AI systems that must operate reliably in high-stakes operational scenarios. Experiment 2 evaluates human-AI job negotiations by manipulating both simulated human personality and AI system characteristics, specifically transparency, competence, adaptability, demonstrating how AI agent trustworthiness impact mission effectiveness. These findings establish a repeatable evaluation methodology for experimenting with AI agent reliability across diverse operator personalities and human-agent team dynamics, directly supporting operational requirements for reliable AI systems. Our work advances the evaluation of agentic AI workflows by moving beyond standard performance metrics to incorporate social dynamics essential for mission success in complex operations.

CLMay 25, 2023
Uncovering and Categorizing Social Biases in Text-to-SQL

Yan Liu, Yan Gao, Zhe Su et al.

Content Warning: This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.} Large pre-trained language models are acknowledged to carry social biases towards different demographics, which can further amplify existing stereotypes in our society and cause even more harm. Text-to-SQL is an important task, models of which are mainly adopted by administrative industries, where unfair decisions may lead to catastrophic consequences. However, existing Text-to-SQL models are trained on clean, neutral datasets, such as Spider and WikiSQL. This, to some extent, cover up social bias in models under ideal conditions, which nevertheless may emerge in real application scenarios. In this work, we aim to uncover and categorize social biases in Text-to-SQL models. We summarize the categories of social biases that may occur in structured data for Text-to-SQL models. We build test benchmarks and reveal that models with similar task accuracy can contain social biases at very different rates. We show how to take advantage of our methodology to uncover and assess social biases in the downstream Text-to-SQL task. We will release our code and data.

CLMay 24, 2023
Uncovering and Quantifying Social Biases in Code Generation

Yan Liu, Xiaokang Chen, Yan Gao et al.

With the popularity of automatic code generation tools, such as Copilot, the study of the potential hazards of these tools is gaining importance. In this work, we explore the social bias problem in pre-trained code generation models. We propose a new paradigm to construct code prompts and successfully uncover social biases in code generation models. To quantify the severity of social biases in generated code, we develop a dataset along with three metrics to evaluate the overall social bias and fine-grained unfairness across different demographics. Experimental results on three pre-trained code generation models (Codex, InCoder, and CodeGen) with varying sizes, reveal severe social biases. Moreover, we conduct analysis to provide useful insights for further choice of code generation models with low social bias. (This work contains examples that potentially implicate stereotypes, associations, and other harms that could be offensive to individuals in certain social groups.)

CVSep 20, 2021
Integrated Construction of Multimodal Atlases with Structural Connectomes in the Space of Riemannian Metrics

Kristen M. Campbell, Haocheng Dai, Zhe Su et al.

The structural network of the brain, or structural connectome, can be represented by fiber bundles generated by a variety of tractography methods. While such methods give qualitative insights into brain structure, there is controversy over whether they can provide quantitative information, especially at the population level. In order to enable population-level statistical analysis of the structural connectome, we propose representing a connectome as a Riemannian metric, which is a point on an infinite-dimensional manifold. We equip this manifold with the Ebin metric, a natural metric structure for this space, to get a Riemannian manifold along with its associated geometric properties. We then use this Riemannian framework to apply object-oriented statistical analysis to define an atlas as the Fréchet mean of a population of Riemannian metrics. This formulation ties into the existing framework for diffeomorphic construction of image atlases, allowing us to construct a multimodal atlas by simultaneously integrating complementary white matter structure details from DWMRI and cortical details from T1-weighted MRI. We illustrate our framework with 2D data examples of connectome registration and atlas formation. Finally, we build an example 3D multimodal atlas using T1 images and connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project.

CVMar 9, 2021
Structural Connectome Atlas Construction in the Space of Riemannian Metrics

Kristen M. Campbell, Haocheng Dai, Zhe Su et al.

The structural connectome is often represented by fiber bundles generated from various types of tractography. We propose a method of analyzing connectomes by representing them as a Riemannian metric, thereby viewing them as points in an infinite-dimensional manifold. After equipping this space with a natural metric structure, the Ebin metric, we apply object-oriented statistical analysis to define an atlas as the Fréchet mean of a population of Riemannian metrics. We demonstrate connectome registration and atlas formation using connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project.

ROJun 29, 2020
Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed

Giovanni Sutanto, Katharina Rombach, Yevgen Chebotar et al.

Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.

RONov 8, 2018
Learning Latent Space Dynamics for Tactile Servoing

Giovanni Sutanto, Nathan Ratliff, Balakumar Sundaralingam et al.

To achieve a dexterous robotic manipulation, we need to endow our robot with tactile feedback capability, i.e. the ability to drive action based on tactile sensing. In this paper, we specifically address the challenge of tactile servoing, i.e. given the current tactile sensing and a target/goal tactile sensing --memorized from a successful task execution in the past-- what is the action that will bring the current tactile sensing to move closer towards the target tactile sensing at the next time step. We develop a data-driven approach to acquire a dynamics model for tactile servoing by learning from demonstration. Moreover, our method represents the tactile sensing information as to lie on a surface --or a 2D manifold-- and perform a manifold learning, making it applicable to any tactile skin geometry. We evaluate our method on a contact point tracking task using a robot equipped with a tactile finger. A video demonstrating our approach can be seen in https://youtu.be/0QK0-Vx7WkI

ROOct 24, 2017
Learning Sensor Feedback Models from Demonstrations via Phase-Modulated Neural Networks

Giovanni Sutanto, Zhe Su, Stefan Schaal et al.

In order to robustly execute a task under environmental uncertainty, a robot needs to be able to reactively adapt to changes arising in its environment. The environment changes are usually reflected in deviation from expected sensory traces. These deviations in sensory traces can be used to drive the motion adaptation, and for this purpose, a feedback model is required. The feedback model maps the deviations in sensory traces to the motion plan adaptation. In this paper, we develop a general data-driven framework for learning a feedback model from demonstrations. We utilize a variant of a radial basis function network structure --with movement phases as kernel centers-- which can generally be applied to represent any feedback models for movement primitives. To demonstrate the effectiveness of our framework, we test it on the task of scraping on a tilt board. In this task, we are learning a reactive policy in the form of orientation adaptation, based on deviations of tactile sensor traces. As a proof of concept of our method, we provide evaluations on an anthropomorphic robot. A video demonstrating our approach and its results can be seen in https://youtu.be/7Dx5imy1Kcw