CVMar 21, 2023
BigSmall: Efficient Multi-Task Learning for Disparate Spatial and Temporal Physiological MeasurementsGirish Narayanswamy, Yujia Liu, Yuzhe Yang et al.
Understanding of human visual perception has historically inspired the design of computer vision architectures. As an example, perception occurs at different scales both spatially and temporally, suggesting that the extraction of salient visual information may be made more effective by paying attention to specific features at varying scales. Visual changes in the body due to physiological processes also occur at different scales and with modality-specific characteristic properties. Inspired by this, we present BigSmall, an efficient architecture for physiological and behavioral measurement. We present the first joint camera-based facial action, cardiac, and pulmonary measurement model. We propose a multi-branch network with wrapping temporal shift modules that yields both accuracy and efficiency gains. We observe that fusing low-level features leads to suboptimal performance, but that fusing high level features enables efficiency gains with negligible loss in accuracy. Experimental results demonstrate that BigSmall significantly reduces the computational costs. Furthermore, compared to existing task-specific models, BigSmall achieves comparable or better results on multiple physiological measurement tasks simultaneously with a unified model.
CLSep 28, 2023
MindShift: Leveraging Large Language Models for Mental-States-Based Problematic Smartphone Use InterventionRuolan Wu, Chun Yu, Xiaole Pan et al.
Problematic smartphone use negatively affects physical and mental health. Despite the wide range of prior research, existing persuasive techniques are not flexible enough to provide dynamic persuasion content based on users' physical contexts and mental states. We first conducted a Wizard-of-Oz study (N=12) and an interview study (N=10) to summarize the mental states behind problematic smartphone use: boredom, stress, and inertia. This informs our design of four persuasion strategies: understanding, comforting, evoking, and scaffolding habits. We leveraged large language models (LLMs) to enable the automatic and dynamic generation of effective persuasion content. We developed MindShift, a novel LLM-powered problematic smartphone use intervention technique. MindShift takes users' in-the-moment app usage behaviors, physical contexts, mental states, goals \& habits as input, and generates personalized and dynamic persuasive content with appropriate persuasion strategies. We conducted a 5-week field experiment (N=25) to compare MindShift with its simplified version (remove mental states) and baseline techniques (fixed reminder). The results show that MindShift improves intervention acceptance rates by 4.7-22.5% and reduces smartphone usage duration by 7.4-9.8%. Moreover, users have a significant drop in smartphone addiction scale scores and a rise in self-efficacy scale scores. Our study sheds light on the potential of leveraging LLMs for context-aware persuasion in other behavior change domains.
NEFeb 1, 2024Code
LM-HT SNN: Enhancing the Performance of SNN to ANN Counterpart through Learnable Multi-hierarchical Threshold ModelZecheng Hao, Xinyu Shi, Yujia Liu et al. · pku
Compared to traditional Artificial Neural Network (ANN), Spiking Neural Network (SNN) has garnered widespread academic interest for its intrinsic ability to transmit information in a more energy-efficient manner. However, despite previous efforts to optimize the learning algorithm of SNNs through various methods, SNNs still lag behind ANNs in terms of performance. The recently proposed multi-threshold model provides more possibilities for further enhancing the learning capability of SNNs. In this paper, we rigorously analyze the relationship among the multi-threshold model, vanilla spiking model and quantized ANNs from a mathematical perspective, then propose a novel LM-HT model, which is an equidistant multi-threshold model that can dynamically regulate the global input current and membrane potential leakage on the time dimension. The LM-HT model can also be transformed into a vanilla single threshold model through reparameterization, thereby achieving more flexible hardware deployment. In addition, we note that the LM-HT model can seamlessly integrate with ANN-SNN Conversion framework under special initialization. This novel hybrid learning framework can effectively improve the relatively poor performance of converted SNNs under low time latency. Extensive experimental results have demonstrated that our model can outperform previous state-of-the-art works on various types of datasets, which promote SNNs to achieve a brand-new level of performance comparable to quantized ANNs. Code is available at https://github.com/hzc1208/LMHT_SNN.
55.4CRMay 13
EBCC: Enclave-Backed Confidential Containers via OCI-Compatible Runtime IntegrationDi Lu, Qingwen Zhang, Yujia Liu et al.
Container runtimes provide a stable operational interface for deploying, monitoring, and controlling modern workloads, while trusted execution environments (TEEs) provide hardware-enforced isolation for sensitive computation. Existing confidential-container systems often rely on VM-backed deployment stacks or TEE-specific execution substrates, which can separate confidential execution from the conventional OCI runtime lifecycle. This paper presents EBCC (Enclave-Backed Confidential Containers), an OCI-compatible runtime architecture for managing composite confidential-computing workloads. EBCC treats the REE-side anchor and TEE-side confidential stages as a single containerized confidential-computing composite, preserves standard OCI lifecycle operations, and keeps TEE-specific execution behind a backend adapter. It also maintains persistent per-instance state and per-stage artifacts for request handling, response generation, logging, and evidence binding. We implement EBCC on a Keystone backend and evaluate its correctness, performance, footprint, and concurrent execution behavior. The results show that EBCC introduces additional latency over native Keystone execution, mainly due to lifecycle mediation, request validation, EID allocation, backend dispatch, and artifact persistence, while keeping the added footprint concentrated on host-side management state. Cross-TEE case studies on SGX, TDX, and OP-TEE show that the same lifecycle and stage abstraction can be mapped to enclave-style, VM-style, and embedded-style TEEs. These results indicate that EBCC can make TEE-backed execution manageable through an OCI-style lifecycle without materially enlarging the protected-side TCB.
CLFeb 26
TCM-DiffRAG: Personalized Syndrome Differentiation Reasoning Method for Traditional Chinese Medicine based on Knowledge Graph and Chain of ThoughtJianmin Li, Ying Chang, Su-Kit Tang et al.
Background: Retrieval augmented generation (RAG) technology can empower large language models (LLMs) to generate more accurate, professional, and timely responses without fine tuning. However, due to the complex reasoning processes and substantial individual differences involved in traditional Chinese medicine (TCM) clinical diagnosis and treatment, traditional RAG methods often exhibit poor performance in this domain. Objective: To address the limitations of conventional RAG approaches in TCM applications, this study aims to develop an improved RAG framework tailored to the characteristics of TCM reasoning. Methods: We developed TCM-DiffRAG, an innovative RAG framework that integrates knowledge graphs (KG) with chains of thought (CoT). TCM-DiffRAG was evaluated on three distinctive TCM test datasets. Results: The experimental results demonstrated that TCM-DiffRAG achieved significant performance improvements over native LLMs. For example, the qwen-plus model achieved scores of 0.927, 0.361, and 0.038, which were significantly enhanced to 0.952, 0.788, and 0.356 with TCM-DiffRAG. The improvements were even more pronounced for non-Chinese LLMs. Additionally, TCM-DiffRAG outperformed directly supervised fine-tuned (SFT) LLMs and other benchmark RAG methods. Conclusions: TCM-DiffRAG shows that integrating structured TCM knowledge graphs with Chain of Thought based reasoning substantially improves performance in individualized diagnostic tasks. The joint use of universal and personalized knowledge graphs enables effective alignment between general knowledge and clinical reasoning. These results highlight the potential of reasoning-aware RAG frameworks for advancing LLM applications in traditional Chinese medicine.
CYFeb 22, 2024
Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical ImagingYuzhe Yang, Yujia Liu, Xin Liu et al.
Advances in artificial intelligence (AI) have achieved expert-level performance in medical imaging applications. Notably, self-supervised vision-language foundation models can detect a broad spectrum of pathologies without relying on explicit training annotations. However, it is crucial to ensure that these AI models do not mirror or amplify human biases, thereby disadvantaging historically marginalized groups such as females or Black patients. The manifestation of such biases could systematically delay essential medical care for certain patient subgroups. In this study, we investigate the algorithmic fairness of state-of-the-art vision-language foundation models in chest X-ray diagnosis across five globally-sourced datasets. Our findings reveal that compared to board-certified radiologists, these foundation models consistently underdiagnose marginalized groups, with even higher rates seen in intersectional subgroups, such as Black female patients. Such demographic biases present over a wide range of pathologies and demographic attributes. Further analysis of the model embedding uncovers its significant encoding of demographic information. Deploying AI systems with these biases in medical imaging can intensify pre-existing care disparities, posing potential challenges to equitable healthcare access and raising ethical questions about their clinical application.
CVDec 7, 2023
Point2CAD: Reverse Engineering CAD Models from 3D Point CloudsYujia Liu, Anton Obukhov, Jan Dirk Wegner et al.
Computer-Aided Design (CAD) model reconstruction from point clouds is an important problem at the intersection of computer vision, graphics, and machine learning; it saves the designer significant time when iterating on in-the-wild objects. Recent advancements in this direction achieve relatively reliable semantic segmentation but still struggle to produce an adequate topology of the CAD model. In this work, we analyze the current state of the art for that ill-posed task and identify shortcomings of existing methods. We propose a hybrid analytic-neural reconstruction scheme that bridges the gap between segmented point clouds and structured CAD models and can be readily combined with different segmentation backbones. Moreover, to power the surface fitting stage, we propose a novel implicit neural representation of freeform surfaces, driving up the performance of our overall CAD reconstruction scheme. We extensively evaluate our method on the popular ABC benchmark of CAD models and set a new state-of-the-art for that dataset. Project page: https://www.obukhov.ai/point2cad}{https://www.obukhov.ai/point2cad.
CVJan 10, 2024
Exploring Vulnerabilities of No-Reference Image Quality Assessment Models: A Query-Based Black-Box MethodChenxi Yang, Yujia Liu, Dingquan Li et al.
No-Reference Image Quality Assessment (NR-IQA) aims to predict image quality scores consistent with human perception without relying on pristine reference images, serving as a crucial component in various visual tasks. Ensuring the robustness of NR-IQA methods is vital for reliable comparisons of different image processing techniques and consistent user experiences in recommendations. The attack methods for NR-IQA provide a powerful instrument to test the robustness of NR-IQA. However, current attack methods of NR-IQA heavily rely on the gradient of the NR-IQA model, leading to limitations when the gradient information is unavailable. In this paper, we present a pioneering query-based black box attack against NR-IQA methods. We propose the concept of score boundary and leverage an adaptive iterative approach with multiple score boundaries. Meanwhile, the initial attack directions are also designed to leverage the characteristics of the Human Visual System (HVS). Experiments show our method outperforms all compared state-of-the-art attack methods and is far ahead of previous black-box methods. The effective NR-IQA model DBCNN suffers a Spearman's rank-order correlation coefficient (SROCC) decline of 0.6381 attacked by our method, revealing the vulnerability of NR-IQA models to black-box attacks. The proposed attack method also provides a potent tool for further exploration into NR-IQA robustness.
CVMar 18, 2024
Defense Against Adversarial Attacks on No-Reference Image Quality Models with Gradient Norm RegularizationYujia Liu, Chenxi Yang, Dingquan Li et al.
The task of No-Reference Image Quality Assessment (NR-IQA) is to estimate the quality score of an input image without additional information. NR-IQA models play a crucial role in the media industry, aiding in performance evaluation and optimization guidance. However, these models are found to be vulnerable to adversarial attacks, which introduce imperceptible perturbations to input images, resulting in significant changes in predicted scores. In this paper, we propose a defense method to improve the stability in predicted scores when attacked by small perturbations, thus enhancing the adversarial robustness of NR-IQA models. To be specific, we present theoretical evidence showing that the magnitude of score changes is related to the $\ell_1$ norm of the model's gradient with respect to the input image. Building upon this theoretical foundation, we propose a norm regularization training strategy aimed at reducing the $\ell_1$ norm of the gradient, thereby boosting the robustness of NR-IQA models. Experiments conducted on four NR-IQA baseline models demonstrate the effectiveness of our strategy in reducing score changes in the presence of adversarial attacks. To the best of our knowledge, this work marks the first attempt to defend against adversarial attacks on NR-IQA models. Our study offers valuable insights into the adversarial robustness of NR-IQA models and provides a foundation for future research in this area.
CVMar 4, 2024
Point2Building: Reconstructing Buildings from Airborne LiDAR Point CloudsYujia Liu, Anton Obukhov, Jan Dirk Wegner et al.
We present a learning-based approach to reconstruct buildings as 3D polygonal meshes from airborne LiDAR point clouds. What makes 3D building reconstruction from airborne LiDAR hard is the large diversity of building designs and especially roof shapes, the low and varying point density across the scene, and the often incomplete coverage of building facades due to occlusions by vegetation or to the viewing angle of the sensor. To cope with the diversity of shapes and inhomogeneous and incomplete object coverage, we introduce a generative model that directly predicts 3D polygonal meshes from input point clouds. Our autoregressive model, called Point2Building, iteratively builds up the mesh by generating sequences of vertices and faces. This approach enables our model to adapt flexibly to diverse geometries and building structures. Unlike many existing methods that rely heavily on pre-processing steps like exhaustive plane detection, our model learns directly from the point cloud data, thereby reducing error propagation and increasing the fidelity of the reconstruction. We experimentally validate our method on a collection of airborne LiDAR data of Zurich, Berlin and Tallinn. Our method shows good generalization to diverse urban styles.
IVApr 20, 2024
Beyond Score Changes: Adversarial Attack on No-Reference Image Quality Assessment from Two PerspectivesChenxi Yang, Yujia Liu, Dingquan Li et al.
Deep neural networks have demonstrated impressive success in No-Reference Image Quality Assessment (NR-IQA). However, recent researches highlight the vulnerability of NR-IQA models to subtle adversarial perturbations, leading to inconsistencies between model predictions and subjective ratings. Current adversarial attacks, however, focus on perturbing predicted scores of individual images, neglecting the crucial aspect of inter-score correlation relationships within an entire image set. Meanwhile, it is important to note that the correlation, like ranking correlation, plays a significant role in NR-IQA tasks. To comprehensively explore the robustness of NR-IQA models, we introduce a new framework of correlation-error-based attacks that perturb both the correlation within an image set and score changes on individual images. Our research primarily focuses on ranking-related correlation metrics like Spearman's Rank-Order Correlation Coefficient (SROCC) and prediction error-related metrics like Mean Squared Error (MSE). As an instantiation, we propose a practical two-stage SROCC-MSE-Attack (SMA) that initially optimizes target attack scores for the entire image set and then generates adversarial examples guided by these scores. Experimental results demonstrate that our SMA method not only significantly disrupts the SROCC to negative values but also maintains a considerable change in the scores of individual images. Meanwhile, it exhibits state-of-the-art performance across metrics with different categories. Our method provides a new perspective on the robustness of NR-IQA models.
83.1HCMar 13
How GenAI Mentor Configurations Shape Early Collaborative Dynamics: A Classroom Comparison of Individual and Shared AgentsSiyu Zha, Weijing Liu, Fei Qin et al.
Generative artificial intelligence (GenAI) is increasingly embedded in computer-supported collaborative learning (CSCL), yet little empirical research has unpacked how different configurations of AI participation reshape collaborative processes. This study investigates how GenAI configuration shapes collaborative regulation in authentic classroom settings. Two eighth-grade classes engaged in small-group creative problem-solving under two conditions: a shared-AI configuration, in which each group interacted with a single AI mentor, and an individual-AI configuration, in which each student accessed a personal AI instance. Using multi-layer discourse coding combined with lag sequential analysis (LSA) and ordered network analysis (ONA), we examined interaction distribution, AI-student coupling, shared regulation processes, and teacher orchestration. Results reveal distinct regulatory dynamics across configurations. Shared AI access promoted convergence-oriented collaboration, with stronger alignment of shared regulatory states and more coordinated group-level reasoning. In contrast, individual AI access distributed support across learners, producing more exploratory and evaluative cycles but also more fragmented interaction patterns, accompanied by increased teacher intervention to manage divergence. These findings suggest that AI configuration functions as a structural design variable that reorganizes the regulatory ecology of classroom collaboration.
AINov 16, 2025
Multi-agent Self-triage System with Medical FlowchartsYujia Liu, Sophia Yu, Hongyue Jin et al.
Online health resources and large language models (LLMs) are increasingly used as a first point of contact for medical decision-making, yet their reliability in healthcare remains limited by low accuracy, lack of transparency, and susceptibility to unverified information. We introduce a proof-of-concept conversational self-triage system that guides LLMs with 100 clinically validated flowcharts from the American Medical Association, providing a structured and auditable framework for patient decision support. The system leverages a multi-agent framework consisting of a retrieval agent, a decision agent, and a chat agent to identify the most relevant flowchart, interpret patient responses, and deliver personalized, patient-friendly recommendations, respectively. Performance was evaluated at scale using synthetic datasets of simulated conversations. The system achieved 95.29% top-3 accuracy in flowchart retrieval (N=2,000) and 99.10% accuracy in flowchart navigation across varied conversational styles and conditions (N=37,200). By combining the flexibility of free-text interaction with the rigor of standardized clinical protocols, this approach demonstrates the feasibility of transparent, accurate, and generalizable AI-assisted self-triage, with potential to support informed patient decision-making while improving healthcare resource utilization.
CVSep 23, 2025
SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment ModelsYujia Liu, Dingquan Li, Tiejun Huang
No-Reference Image Quality Assessment (NR-IQA) models play an important role in various real-world applications. Recently, adversarial attacks against NR-IQA models have attracted increasing attention, as they provide valuable insights for revealing model vulnerabilities and guiding robust system design. Some effective attacks have been proposed against NR-IQA models in white-box settings, where the attacker has full access to the target model. However, these attacks often suffer from poor transferability to unknown target models in more realistic black-box scenarios, where the target model is inaccessible. This work makes the first attempt to address the challenge of low transferability in attacking NR-IQA models by proposing a transferable Signed Ensemble Gaussian black-box Attack (SEGA). The main idea is to approximate the gradient of the target model by applying Gaussian smoothing to source models and ensembling their smoothed gradients. To ensure the imperceptibility of adversarial perturbations, SEGA further removes inappropriate perturbations using a specially designed perturbation filter mask. Experimental results on the CLIVE dataset demonstrate the superior transferability of SEGA, validating its effectiveness in enabling successful transfer-based black-box attacks against NR-IQA models.
CVAug 3, 2025
Single Point, Full Mask: Velocity-Guided Level Set Evolution for End-to-End Amodal SegmentationZhixuan Li, Yujia Liu, Chen Hui et al.
Amodal segmentation aims to recover complete object shapes, including occluded regions with no visual appearance, whereas conventional segmentation focuses solely on visible areas. Existing methods typically rely on strong prompts, such as visible masks or bounding boxes, which are costly or impractical to obtain in real-world settings. While recent approaches such as the Segment Anything Model (SAM) support point-based prompts for guidance, they often perform direct mask regression without explicitly modeling shape evolution, limiting generalization in complex occlusion scenarios. Moreover, most existing methods suffer from a black-box nature, lacking geometric interpretability and offering limited insight into how occluded shapes are inferred. To deal with these limitations, we propose VELA, an end-to-end VElocity-driven Level-set Amodal segmentation method that performs explicit contour evolution from point-based prompts. VELA first constructs an initial level set function from image features and the point input, which then progressively evolves into the final amodal mask under the guidance of a shape-specific motion field predicted by a fully differentiable network. This network learns to generate evolution dynamics at each step, enabling geometrically grounded and topologically flexible contour modeling. Extensive experiments on COCOA-cls, D2SA, and KINS benchmarks demonstrate that VELA outperforms existing strongly prompted methods while requiring only a single-point prompt, validating the effectiveness of interpretable geometric modeling under weak guidance. The code will be publicly released.
CVAug 3, 2025
Shape Distribution Matters: Shape-specific Mixture-of-Experts for Amodal Segmentation under Diverse OcclusionsZhixuan Li, Yujia Liu, Chen Hui et al.
Amodal segmentation targets to predict complete object masks, covering both visible and occluded regions. This task poses significant challenges due to complex occlusions and extreme shape variation, from rigid furniture to highly deformable clothing. Existing one-size-fits-all approaches rely on a single model to handle all shape types, struggling to capture and reason about diverse amodal shapes due to limited representation capacity. A natural solution is to adopt a Mixture-of-Experts (MoE) framework, assigning experts to different shape patterns. However, naively applying MoE without considering the object's underlying shape distribution can lead to mismatched expert routing and insufficient expert specialization, resulting in redundant or underutilized experts. To deal with these issues, we introduce ShapeMoE, a shape-specific sparse Mixture-of-Experts framework for amodal segmentation. The key idea is to learn a latent shape distribution space and dynamically route each object to a lightweight expert tailored to its shape characteristics. Specifically, ShapeMoE encodes each object into a compact Gaussian embedding that captures key shape characteristics. A Shape-Aware Sparse Router then maps the object to the most suitable expert, enabling precise and efficient shape-aware expert routing. Each expert is designed as lightweight and specialized in predicting occluded regions for specific shape patterns. ShapeMoE offers well interpretability via clear shape-to-expert correspondence, while maintaining high capacity and efficiency. Experiments on COCOA-cls, KINS, and D2SA show that ShapeMoE consistently outperforms state-of-the-art methods, especially in occluded region segmentation. The code will be released.
CLAug 9, 2021
The HW-TSC's Offline Speech Translation Systems for IWSLT 2021 EvaluationMinghan Wang, Yuxia Wang, Chang Su et al.
This paper describes our work in participation of the IWSLT-2021 offline speech translation task. Our system was built in a cascade form, including a speaker diarization module, an Automatic Speech Recognition (ASR) module and a Machine Translation (MT) module. We directly use the LIUM SpkDiarization tool as the diarization module. The ASR module is trained with three ASR datasets from different sources, by multi-source training, using a modified Transformer encoder. The MT module is pretrained on the large-scale WMT news translation dataset and fine-tuned on the TED corpus. Our method achieves 24.6 BLEU score on the 2021 test set.
CVMar 4, 2021
PC2WF: 3D Wireframe Reconstruction from Raw Point CloudsYujia Liu, Stefano D'Aronco, Konrad Schindler et al.
We introduce PC2WF, the first end-to-end trainable deep network architecture to convert a 3D point cloud into a wireframe model. The network takes as input an unordered set of 3D points sampled from the surface of some object, and outputs a wireframe of that object, i.e., a sparse set of corner points linked by line segments. Recovering the wireframe is a challenging task, where the numbers of both vertices and edges are different for every instance, and a-priori unknown. Our architecture gradually builds up the model: It starts by encoding the points into feature vectors. Based on those features, it identifies a pool of candidate vertices, then prunes those candidates to a final set of corner vertices and refines their locations. Next, the corners are linked with an exhaustive set of candidate edges, which is again pruned to obtain the final wireframe. All steps are trainable, and errors can be backpropagated through the entire sequence. We validate the proposed model on a publicly available synthetic dataset, for which the ground truth wireframes are accessible, as well as on a new real-world dataset. Our model produces wireframe abstractions of good quality and outperforms several baselines.
ROFeb 29, 2020
Rigid-Soft Interactive Learning for Robust GraspingLinhan Yang, Fang Wan, Haokun Wang et al.
Inspired by widely used soft fingers on grasping, we propose a method of rigid-soft interactive learning, aiming at reducing the time of data collection. In this paper, we classify the interaction categories into Rigid-Rigid, Rigid-Soft, Soft-Rigid according to the interaction surface between grippers and target objects. We find experimental evidence that the interaction types between grippers and target objects play an essential role in the learning methods. We use soft, stuffed toys for training, instead of everyday objects, to reduce the integration complexity and computational burden and exploit such rigid-soft interaction by changing the gripper fingers to the soft ones when dealing with rigid, daily-life items such as the Yale-CMU-Berkeley (YCB) objects. With a small data collection of 5K picking attempts in total, our results suggest that such Rigid-Soft and Soft-Rigid interactions are transferable. Moreover, the combination of different grasp types shows better performance on the grasping test. We achieve the best grasping performance at 97.5\% for easy YCB objects and 81.3\% for difficult YCB objects while using a precise grasp with a two-soft-finger gripper to collect training data and power grasp with a four-soft-finger gripper to test.
ROFeb 29, 2020
Scalable Tactile Sensing for an Omni-adaptive Soft Robot FingerZeyi Yang, Sheng Ge, Fang Wan et al.
Robotic fingers made of soft material and compliant structures usually lead to superior adaptation when interacting with the unstructured physical environment. In this paper, we present an embedded sensing solution using optical fibers for an omni-adaptive soft robotic finger with exceptional adaptation in all directions. In particular, we managed to insert a pair of optical fibers inside the finger's structural cavity without interfering with its adaptive performance. The resultant integration is scalable as a versatile, low-cost, and moisture-proof solution for physically safe human-robot interaction. In addition, we experimented with our finger design for an object sorting task and identified sectional diameters of 94\% objects within the $\pm$6mm error and measured 80\% of the structural strains within $\pm$0.1mm/mm error. The proposed sensor design opens many doors in future applications of soft robotics for scalable and adaptive physical interactions in the unstructured environment.
ROFeb 29, 2020
Reconfigurable Design for Omni-adaptive Grasp LearningFang Wan, Haokun Wang, Jiyuan Wu et al.
The engineering design of robotic grippers presents an ample design space for optimization towards robust grasping. In this paper, we adopt the reconfigurable design of the robotic gripper using a novel soft finger structure with omni-directional adaptation, which generates a large number of possible gripper configurations by rearranging these fingers. Such reconfigurable design with these omni-adaptive fingers enables us to systematically investigate the optimal arrangement of the fingers towards robust grasping. Furthermore, we adopt a learning-based method as the baseline to benchmark the effectiveness of each design configuration. As a result, we found that a 3-finger and 4-finger radial configuration is the most effective one achieving an average 96\% grasp success rate on seen and novel objects selected from the YCB dataset. We also discussed the influence of the frictional surface on the finger to improve the grasp robustness.
LGMar 26, 2019
A geometry-inspired decision-based attackYujia Liu, Seyed-Mohsen Moosavi-Dezfooli, Pascal Frossard
Deep neural networks have recently achieved tremendous success in image classification. Recent studies have however shown that they are easily misled into incorrect classification decisions by adversarial examples. Adversaries can even craft attacks by querying the model in black-box settings, where no information about the model is released except its final decision. Such decision-based attacks usually require lots of queries, while real-world image recognition systems might actually restrict the number of queries. In this paper, we propose qFool, a novel decision-based attack algorithm that can generate adversarial examples using a small number of queries. The qFool method can drastically reduce the number of queries compared to previous decision-based attacks while reaching the same quality of adversarial examples. We also enhance our method by constraining adversarial perturbations in low-frequency subspace, which can make qFool even more computationally efficient. Altogether, we manage to fool commercial image recognition systems with a small number of queries, which demonstrates the actual effectiveness of our new algorithm in practice.
CVJun 21, 2018
Detection based Defense against Adversarial Examples from the Steganalysis Point of ViewJiayang Liu, Weiming Zhang, Yiwei Zhang et al.
Deep Neural Networks (DNNs) have recently led to significant improvements in many fields. However, DNNs are vulnerable to adversarial examples which are samples with imperceptible perturbations while dramatically misleading the DNNs. Moreover, adversarial examples can be used to perform an attack on various kinds of DNN based systems, even if the adversary has no access to the underlying model. Many defense methods have been proposed, such as obfuscating gradients of the networks or detecting adversarial examples. However it is proved out that these defense methods are not effective or cannot resist secondary adversarial attacks. In this paper, we point out that steganalysis can be applied to adversarial examples detection, and propose a method to enhance steganalysis features by estimating the probability of modifications caused by adversarial attacks. Experimental results show that the proposed method can accurately detect adversarial examples. Moreover, secondary adversarial attacks cannot be directly performed to our method because our method is not based on a neural network but based on high-dimensional artificial features and FLD (Fisher Linear Discriminant) ensemble.
CVNov 16, 2017
Enhanced Attacks on Defensively Distilled Deep Neural NetworksYujia Liu, Weiming Zhang, Shaohua Li et al.
Deep neural networks (DNNs) have achieved tremendous success in many tasks of machine learning, such as the image classification. Unfortunately, researchers have shown that DNNs are easily attacked by adversarial examples, slightly perturbed images which can mislead DNNs to give incorrect classification results. Such attack has seriously hampered the deployment of DNN systems in areas where security or safety requirements are strict, such as autonomous cars, face recognition, malware detection. Defensive distillation is a mechanism aimed at training a robust DNN which significantly reduces the effectiveness of adversarial examples generation. However, the state-of-the-art attack can be successful on distilled networks with 100% probability. But it is a white-box attack which needs to know the inner information of DNN. Whereas, the black-box scenario is more general. In this paper, we first propose the epsilon-neighborhood attack, which can fool the defensively distilled networks with 100% success rate in the white-box setting, and it is fast to generate adversarial examples with good visual quality. On the basis of this attack, we further propose the region-based attack against defensively distilled DNNs in the black-box setting. And we also perform the bypass attack to indirectly break the distillation defense as a complementary method. The experimental results show that our black-box attacks have a considerable success rate on defensively distilled networks.
MLMay 25, 2015
Clustering via Content-Augmented Stochastic BlockmodelsJ. Massey Cashore, Xiaoting Zhao, Alexander A. Alemi et al.
Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters from this interaction data. These methods, however, typically ignore the items' contents and the information they provide about item clusters, despite the tendency of items in the same latent cluster to share commonalities in content. We introduce content-augmented stochastic blockmodels (CASB), which use item content together with user-item interaction data to enhance the user communities and item clusters learned. Comparisons to several state-of-the-art benchmark methods, on datasets arising from scientists interacting with scientific articles, show that content-augmented stochastic blockmodels provide highly accurate clusters with respect to metrics representative of the underlying community structure.