Ying Zhao

CV
h-index95
25papers
708citations
Novelty36%
AI Score51

25 Papers

LGJun 2
HiSE: A Lightweight Hierarchical Semantic Explainer for Heterogeneous Graph Neural Networks

Zongrui Li, Yuhang Zhao, Ying Zhao et al.

Heterogeneous graph neural networks (HGNNs) have demonstrated remarkable performance in modeling complex relational data, however their interpretability in high-stakes applications remains a critical challenge. Existing explanation methods suffer from two major limitations: on the one hand, the generated explanations fail to reflect the inherent semantic hierarchy of HGNNs, resulting in a lack of fidelity to the model's internal decision-making mechanism; on the other hand, feature explanations often rely on complex search or perturbation mechanisms, leading to excessive computational complexity and poor efficiency. To address these issues, we propose HiSE, a lightweight feature-oriented interpretable model for HGNNs. HiSE achieves semantically aware feature explanations through hierarchical semantic modeling: at the semantic level, local surrogate models based on the Least Absolute Shrinkage and Selection Operator (LASSO) are employed to learn sparse feature representations under each semantic view; at the cross-semantic level, the contributions of different semantic views are adaptively characterized via KL divergence to produce a unified explanation. Extensive experiments demonstrate that HiSE outperforms existing methods in terms of fidelity, robustness, and cross-semantic explanation capability, while its lightweight framework incurs low computational overhead, enabling efficient application to large-scale, complex real-world heterogeneous graphs.

AIJun 2
Uncertainty-Aware Clarification in LLM Agents with Information Gain

Mengyi Deng, Zhiwei Li, Xin Li et al.

Large Language Model (LLM) agents often operate under underspecified user instructions, where latent uncertainty over user intent leads to erroneous tool actions. To address this challenge, we propose a goal-oriented clarification framework that aligns clarification behavior with ambiguity resolution. Central to our approach is the Information Gain Reward, a metric that quantifies the utility of clarification questions by measuring the Bayesian belief update towards the ground-truth goal induced by the clarification exchange. We train the clarifier (LLM) using this reward to optimize for high information gain, ensuring that clarifications effectively reduce uncertainty and improve task completion within the agent-tool-user environment. We validate our framework within a clarification-enhanced $τ$-Bench environment, conducting cross-agent evaluations across five heterogeneous backbones. Empirical results demonstrate that our method consistently improves the success rate by 3.7\% over the no-clarification baseline, while adding only 0.3 total interaction steps on average.

ROJul 15, 2024Code
GRUtopia: Dream General Robots in a City at Scale

Hanqing Wang, Jiahe Chen, Wensi Huang et al.

Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements: (a) The scene dataset, GRScenes, includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments. In contrast to previous works mainly focusing on home, GRScenes covers 89 diverse scene categories, bridging the gap of service-oriented environments where general robots would be initially deployed. (b) GRResidents, a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction, task generation, and task assignment, thus simulating social scenarios for embodied AI applications. (c) The benchmark, GRBench, supports various robots but focuses on legged robots as primary agents and poses moderately challenging tasks involving Object Loco-Navigation, Social Loco-Navigation, and Loco-Manipulation. We hope that this work can alleviate the scarcity of high-quality data in this field and provide a more comprehensive assessment of Embodied AI research. The project is available at https://github.com/OpenRobotLab/GRUtopia.

CVOct 19, 2022Code
Using deep convolutional neural networks to classify poisonous and edible mushrooms found in China

Baiming Zhang, Ying Zhao, Zhixiang Li

Because of their abundance of amino acids, polysaccharides, and many other nutrients that benefit human beings, mushrooms are deservedly popular as dietary cuisine both worldwide and in China. However, if people eat poisonous fungi by mistake, they may suffer from nausea, vomiting, mental disorder, acute anemia, or even death. Each year in China, there are around 8000 people became sick, and 70 died as a result of eating toxic mushrooms by mistake. It is counted that there are thousands of kinds of mushrooms among which only around 900 types are edible, thus without specialized knowledge, the probability of eating toxic mushrooms by mistake is very high. Most people deem that the only characteristic of poisonous mushrooms is a bright colour, however, some kinds of them do not correspond to this trait. In order to prevent people from eating these poisonous mushrooms, we propose to use deep learning methods to indicate whether a mushroom is toxic through analyzing hundreds of edible and toxic mushrooms smartphone pictures. We crowdsource a mushroom image dataset that contains 250 images of poisonous mushrooms and 200 images of edible mushrooms. The Convolutional Neural Network (CNN) is a specialized type of artificial neural networks that use a mathematical operation called convolution in place of general matrix multiplication in at least one of their layers, which can generate a relatively precise result by analyzing a huge amount of images, and thus is very suitable for our research. The experimental results demonstrate that the proposed model has high credibility and can provide a decision-making basis for the selection of edible fungi, so as to reduce the morbidity and mortality caused by eating poisonous mushrooms. We also open source our hand collected mushroom image dataset so that peer researchers can also deploy their own model to advance poisonous mushroom identification.

DCJul 10, 2023
FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks

Youquan Xian, Xiaoyun Gan, Chuanjian Yao et al.

Federated Learning (FL), as a privacy-preserving machine learning paradigm, trains a global model across devices without exposing local data. However, resource heterogeneity and inevitable stragglers in wireless networks severely impact the efficiency and accuracy of FL training. In this paper, we propose a novel Dynamic Cross-Tier Federated Learning framework (FedDCT). Firstly, we design a dynamic tiering strategy that dynamically partitions devices into different tiers based on their response times and assigns specific timeout thresholds to each tier to reduce single-round training time. Then, we propose a cross-tier device selection algorithm that selects devices that respond quickly and are conducive to model convergence to improve convergence efficiency and accuracy. Experimental results demonstrate that the proposed approach under wireless networks outperforms the baseline approach, with an average reduction of 54.7\% in convergence time and an average improvement of 1.83\% in convergence accuracy.

CVAug 18, 2024
AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization

Ying Zhao

Recent advances in anomaly generation approaches alleviate the effect of data insufficiency on task of anomaly localization. While effective, most of them learn multiple large generative models on different datasets and cumbersome anomaly prediction models for different classes. To address the limitations, we propose a novel scalable framework, named AnomalyFactory, that unifies unsupervised anomaly generation and localization with same network architecture. It starts with a BootGenerator that combines structure of a target edge map and appearance of a reference color image with the guidance of a learned heatmap. Then, it proceeds with a FlareGenerator that receives supervision signals from the BootGenerator and reforms the heatmap to indicate anomaly locations in the generated image. Finally, it easily transforms the same network architecture to a BlazeDetector that localizes anomaly pixels with the learned heatmap by converting the anomaly images generated by the FlareGenerator to normal images. By manipulating the target edge maps and combining them with various reference images, AnomalyFactory generates authentic and diversity samples cross domains. Comprehensive experiments carried on 5 datasets, including MVTecAD, VisA, MVTecLOCO, MADSim and RealIAD, demonstrate that our approach is superior to competitors in generation capability and scalability.

CVAug 4, 2020Code
Simultaneous Semantic Alignment Network for Heterogeneous Domain Adaptation

Shuang Li, Binhui Xie, Jiashu Wu et al.

Heterogeneous domain adaptation (HDA) transfers knowledge across source and target domains that present heterogeneities e.g., distinct domain distributions and difference in feature type or dimension. Most previous HDA methods tackle this problem through learning a domain-invariant feature subspace to reduce the discrepancy between domains. However, the intrinsic semantic properties contained in data are under-explored in such alignment strategy, which is also indispensable to achieve promising adaptability. In this paper, we propose a Simultaneous Semantic Alignment Network (SSAN) to simultaneously exploit correlations among categories and align the centroids for each category across domains. In particular, we propose an implicit semantic correlation loss to transfer the correlation knowledge of source categorical prediction distributions to target domain. Meanwhile, by leveraging target pseudo-labels, a robust triplet-centroid alignment mechanism is explicitly applied to align feature representations for each category. Notably, a pseudo-label refinement procedure with geometric similarity involved is introduced to enhance the target pseudo-label assignment accuracy. Comprehensive experiments on various HDA tasks across text-to-image, image-to-image and text-to-text successfully validate the superiority of our SSAN against state-of-the-art HDA methods. The code is publicly available at https://github.com/BIT-DA/SSAN.

SENov 29, 2019Code
Pythia: AI-assisted Code Completion System

Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu et al.

In this paper, we propose a novel end-to-end approach for AI-assisted code completion called Pythia. It generates ranked lists of method and API recommendations which can be used by software developers at edit time. The system is currently deployed as part of Intellicode extension in Visual Studio Code IDE. Pythia exploits state-of-the-art large-scale deep learning models trained on code contexts extracted from abstract syntax trees. It is designed to work at a high throughput predicting the best matching code completions on the order of 100 $ms$. We describe the architecture of the system, perform comparisons to frequency-based approach and invocation-based Markov Chain language model, and discuss challenges serving Pythia models on lightweight client devices. The offline evaluation results obtained on 2700 Python open source software GitHub repositories show a top-5 accuracy of 92\%, surpassing the baseline models by 20\% averaged over classes, for both intra and cross-project settings.

CVMay 11, 2024
LogicAL: Towards logical anomaly synthesis for unsupervised anomaly localization

Ying Zhao

Anomaly localization is a practical technology for improving industrial production line efficiency. Due to anomalies are manifold and hard to be collected, existing unsupervised researches are usually equipped with anomaly synthesis methods. However, most of them are biased towards structural defects synthesis while ignoring the underlying logical constraints. To fill the gap and boost anomaly localization performance, we propose an edge manipulation based anomaly synthesis framework, named LogicAL, that produces photo-realistic both logical and structural anomalies. We introduce a logical anomaly generation strategy that is adept at breaking logical constraints and a structural anomaly generation strategy that complements to the structural defects synthesis. We further improve the anomaly localization performance by introducing edge reconstruction into the network structure. Extensive experiments on the challenge MVTecLOCO, MVTecAD, VisA and MADsim datasets verify the advantage of proposed LogicAL on both logical and structural anomaly localization.

LGMar 3, 2025
Building Machine Learning Challenges for Anomaly Detection in Science

Elizabeth G. Campolongo, Yuan-Tang Chou, Ekaterina Govorkova et al.

Scientific discoveries are often made by finding a pattern or object that was not predicted by the known rules of science. Oftentimes, these anomalous events or objects that do not conform to the norms are an indication that the rules of science governing the data are incomplete, and something new needs to be present to explain these unexpected outliers. The challenge of finding anomalies can be confounding since it requires codifying a complete knowledge of the known scientific behaviors and then projecting these known behaviors on the data to look for deviations. When utilizing machine learning, this presents a particular challenge since we require that the model not only understands scientific data perfectly but also recognizes when the data is inconsistent and out of the scope of its trained behavior. In this paper, we present three datasets aimed at developing machine learning-based anomaly detection for disparate scientific domains covering astrophysics, genomics, and polar science. We present the different datasets along with a scheme to make machine learning challenges around the three datasets findable, accessible, interoperable, and reusable (FAIR). Furthermore, we present an approach that generalizes to future machine learning challenges, enabling the possibility of large, more compute-intensive challenges that can ultimately lead to scientific discovery.

ETApr 4
Kill Webs by Collaborative & Self-organizing Agents (CSOAs)

Ying Zhao, Charles C. Zhou

A single agent represents a single system capable of ingesting local data, indexing, cataloging information, performing knowledge pattern discovery, and separating patterns and anomalies from data. Multiple agents work collaboratively in a peer-to-peer network. Each agent has a peer list. Such multiple agents' collaboration can be modeled as cooperative games. Each agent optimizes its own objective locally. We show that each agent self-organizes or converges to its best value and the whole agent network achieves the best social welfare based on both the quantum adiabatic evolution transformation (QAET), and quantum intelligence game (QIG) or the QAET-QIG framework. We apply the QAET-QIG framework to the kill web concept that can potentially improve the traditional kill chain process or the find, fix, track, target, engage, and assess (F2T2EA) process. The improvement is measured in the values of powerful global optimization, distributed lethality, and load balancing. We show a use case of the QAET-QIG frame in a potential application of mixed sensors, platforms, weapons, and effects.

CVJun 29, 2025
PCLVis: Visual Analytics of Process Communication Latency in Large-Scale Simulation

Chongke Bi, Xin Gao, Baofeng Fu et al.

Large-scale simulations on supercomputers have become important tools for users. However, their scalability remains a problem due to the huge communication cost among parallel processes. Most of the existing communication latency analysis methods rely on the physical link layer information, which is only available to administrators. In this paper, a framework called PCLVis is proposed to help general users analyze process communication latency (PCL) events. Instead of the physical link layer information, the PCLVis uses the MPI process communication data for the analysis. First, a spatial PCL event locating method is developed. All processes with high correlation are classified into a single cluster by constructing a process-correlation tree. Second, the propagation path of PCL events is analyzed by constructing a communication-dependency-based directed acyclic graph (DAG), which can help users interactively explore a PCL event from the temporal evolution of a located PCL event cluster. In this graph, a sliding window algorithm is designed to generate the PCL events abstraction. Meanwhile, a new glyph called the communication state glyph (CS-Glyph) is designed for each process to show its communication states, including its in/out messages and load balance. Each leaf node can be further unfolded to view additional information. Third, a PCL event attribution strategy is formulated to help users optimize their simulations. The effectiveness of the PCLVis framework is demonstrated by analyzing the PCL events of several simulations running on the TH-1A supercomputer. By using the proposed framework, users can greatly improve the efficiency of their simulations.

SOFTMay 15, 2025
Polymer Data Challenges in the AI Era: Bridging Gaps for Next-Generation Energy Materials

Ying Zhao, Guanhua Chen, Jie Liu

The pursuit of advanced polymers for energy technologies, spanning photovoltaics, solid-state batteries, and hydrogen storage, is hindered by fragmented data ecosystems that fail to capture the hierarchical complexity of these materials. Polymer science lacks interoperable databases, forcing reliance on disconnected literature and legacy records riddled with unstructured formats and irreproducible testing protocols. This fragmentation stifles machine learning (ML) applications and delays the discovery of materials critical for global decarbonization. Three systemic barriers compound the challenge. First, academic-industrial data silos restrict access to proprietary industrial datasets, while academic publications often omit critical synthesis details. Second, inconsistent testing methods undermine cross-study comparability. Third, incomplete metadata in existing databases limits their utility for training reliable ML models. Emerging solutions address these gaps through technological and collaborative innovation. Natural language processing (NLP) tools extract structured polymer data from decades of literature, while high-throughput robotic platforms generate self-consistent datasets via autonomous experimentation. Central to these advances is the adoption of FAIR (Findable, Accessible, Interoperable, Reusable) principles, adapted to polymer-specific ontologies, ensuring machine-readability and reproducibility. Future breakthroughs hinge on cultural shifts toward open science, accelerated by decentralized data markets and autonomous laboratories that merge robotic experimentation with real-time ML validation. By addressing data fragmentation through technological innovation, collaborative governance, and ethical stewardship, the polymer community can transform bottlenecks into accelerants.

CVApr 6, 2025
AnomalyHybrid: A Domain-agnostic Generative Framework for General Anomaly Detection

Ying Zhao

Anomaly generation is an effective way to mitigate data scarcity for anomaly detection task. Most existing works shine at industrial anomaly generation with multiple specialists or large generative models, rarely generalizing to anomalies in other applications. In this paper, we present AnomalyHybrid, a domain-agnostic framework designed to generate authentic and diverse anomalies simply by combining the reference and target images. AnomalyHybrid is a Generative Adversarial Network(GAN)-based framework having two decoders that integrate the appearance of reference image into the depth and edge structures of target image respectively. With the help of depth decoders, AnomalyHybrid achieves authentic generation especially for the anomalies with depth values changing, such a s protrusion and dent. More, it relaxes the fine granularity structural control of the edge decoder and brings more diversity. Without using annotations, AnomalyHybrid is easily trained with sets of color, depth and edge of same images having different augmentations. Extensive experiments carried on HeliconiusButterfly, MVTecAD and MVTec3D datasets demonstrate that AnomalyHybrid surpasses the GAN-based state-of-the-art on anomaly generation and its downstream anomaly classification, detection and segmentation tasks. On MVTecAD dataset, AnomalyHybrid achieves 2.06/0.32 IS/LPIPS for anomaly generation, 52.6 Acc for anomaly classification with ResNet34, 97.3/72.9 AP for image/pixel-level anomaly detection with a simple UNet.

CVApr 18, 2025
Zebrafish Counting Using Event Stream Data

Qianghua Chen, Huiyu Wang, Li Ming et al.

Zebrafish share a high degree of homology with human genes and are commonly used as model organism in biomedical research. For medical laboratories, counting zebrafish is a daily task. Due to the tiny size of zebrafish, manual visual counting is challenging. Existing counting methods are either not applicable to small fishes or have too many limitations. The paper proposed a zebrafish counting algorithm based on the event stream data. Firstly, an event camera is applied for data acquisition. Secondly, camera calibration and image fusion were preformed successively. Then, the trajectory information was used to improve the counting accuracy. Finally, the counting results were averaged over an empirical of period and rounded up to get the final results. To evaluate the accuracy of the algorithm, 20 zebrafish were put in a four-liter breeding tank. Among 100 counting trials, the average accuracy reached 97.95%. As compared with traditional algorithms, the proposed one offers a simpler implementation and achieves higher accuracy.

CLFeb 15, 2024
A Dataset of Open-Domain Question Answering with Multiple-Span Answers

Zhiyi Luo, Yingying Zhang, Shuyun Luo et al.

Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications, as it requires extracting multiple pieces of information from a text to answer complex questions. Despite the active studies and rapid progress in English MSQA research, there is a notable lack of publicly available MSQA benchmark in Chinese. Previous efforts for constructing MSQA datasets predominantly emphasized entity-centric contextualization, resulting in a bias towards collecting factoid questions and potentially overlooking questions requiring more detailed descriptive responses. To overcome these limitations, we present CLEAN, a comprehensive Chinese multi-span question answering dataset that involves a wide range of open-domain subjects with a substantial number of instances requiring descriptive answers. Additionally, we provide established models from relevant literature as baselines for CLEAN. Experimental results and analysis show the characteristics and challenge of the newly proposed CLEAN dataset for the community. Our dataset, CLEAN, will be publicly released at zhiyiluo.site/misc/clean_v1.0_ sample.json.

HCAug 25, 2021
Evaluating Effects of Background Stories on Graph Perception

Ying Zhao, Jingcheng Shi, Jiawei Liu et al.

A graph is an abstract model that represents relations among entities, for example, the interactions between characters in a novel. A background story endows entities and relations with real-world meanings and describes the semantics and context of the abstract model, for example, the actual story that the novel presents. Considering practical experience and prior research, human viewers who are familiar with the background story of a graph and those who do not know the background story may perceive the same graph differently. However, no previous research has adequately addressed this problem. This research paper thus presents an evaluation that investigated the effects of background stories on graph perception. Three hypotheses that focused on the role of visual focus areas, graph structure identification, and mental model formation on graph perception were formulated and guided three controlled experiments that evaluated the hypotheses using real-world graphs with background stories. An analysis of the resulting experimental data, which compared the performance of participants who read and did not read the background stories, obtained a set of instructive findings. First, having knowledge about a graph's background story influences participants' focus areas during interactive graph explorations. Second, such knowledge significantly affects one's ability to identify community structures but not high degree and bridge structures. Third, this knowledge influences graph recognition under blurred visual conditions. These findings can bring new considerations to the design of storytelling visualizations and interactive graph explorations.

CLOct 25, 2020
Transgender Community Sentiment Analysis from Social Media Data: A Natural Language Processing Approach

Yuqiao Liu, Yudan Wang, Ying Zhao et al.

Transgender community is experiencing a huge disparity in mental health conditions compared with the general population. Interpreting the social medial data posted by transgender people may help us understand the sentiments of these sexual minority groups better and apply early interventions. In this study, we manually categorize 300 social media comments posted by transgender people to the sentiment of negative, positive, and neutral. 5 machine learning algorithms and 2 deep neural networks are adopted to build sentiment analysis classifiers based on the annotated data. Results show that our annotations are reliable with a high Cohen's Kappa score over 0.8 across all three classes. LSTM model yields an optimal performance of accuracy over 0.85 and AUC of 0.876. Our next step will focus on using advanced natural language processing algorithms on a larger annotated dataset.

LGSep 17, 2020
An early prediction of covid-19 associated hospitalization surge using deep learning approach

Yuqi Meng, Qiancheng Sun, Suning Hong et al.

The global pandemic caused by COVID-19 affects our lives in all aspects. As of September 11, more than 28 million people have tested positive for COVID-19 infection, and more than 911,000 people have lost their lives in this virus battle. Some patients can not receive appropriate medical treatment due the limits of hospitalization volume and shortage of ICU beds. An estimated future hospitalization is critical so that medical resources can be allocated as needed. In this study, we propose to use 4 recurrent neural networks to infer hospitalization change for the following week compared with the current week. Results show that sequence to sequence model with attention achieves a high accuracy of 0.938 and AUC of 0.850 in the hospitalization prediction. Our work has the potential to predict the hospitalization need and send a warning to medical providers and other stakeholders when a re-surge initializes.

CYSep 6, 2020
SilkViser:A Visual Explorer of Blockchain-based Cryptocurrency Transaction Data

Zengsheng Zhong, Shuirun Wei, Yeting Xu et al.

Many blockchain-based cryptocurrencies provide users with online blockchain explorers for viewing online transaction data. However, traditional blockchain explorers mostly present transaction information in textual and tabular forms. Such forms make understanding cryptocurrency transaction mechanisms difficult for novice users (NUsers). They are also insufficiently informative for experienced users (EUsers) to recognize advanced transaction information. This study introduces a new online cryptocurrency transaction data viewing tool called SilkViser. Guided by detailed scenario and requirement analyses, we create a series of appreciating visualization designs, such as paper ledger-inspired block and blockchain visualizations and ancient copper coin-inspired transaction visualizations, to help users understand cryptocurrency transaction mechanisms and recognize advanced transaction information. We also provide a set of lightweight interactions to facilitate easy and free data exploration. Moreover, a controlled user study is conducted to quantitatively evaluate the usability and effectiveness of SilkViser. Results indicate that SilkViser can satisfy the requirements of NUsers and EUsers. Our visualization designs can compensate for the inexperience of NUsers in data viewing and attract potential users to participate in cryptocurrency transactions.

CVSep 5, 2020
Reverse-engineering Bar Charts Using Neural Networks

Fangfang Zhou, Yong Zhao, Wenjiang Chen et al.

Reverse-engineering bar charts extracts textual and numeric information from the visual representations of bar charts to support application scenarios that require the underlying information. In this paper, we propose a neural network-based method for reverse-engineering bar charts. We adopt a neural network-based object detection model to simultaneously localize and classify textual information. This approach improves the efficiency of textual information extraction. We design an encoder-decoder framework that integrates convolutional and recurrent neural networks to extract numeric information. We further introduce an attention mechanism into the framework to achieve high accuracy and robustness. Synthetic and real-world datasets are used to evaluate the effectiveness of the method. To the best of our knowledge, this work takes the lead in constructing a complete neural network-based method of reverse-engineering bar charts.

LGOct 6, 2019
Using Deep Learning and Machine Learning to Detect Epileptic Seizure with Electroencephalography (EEG) Data

Haotian Liu, Lin Xi, Ying Zhao et al.

The prediction of epileptic seizure has always been extremely challenging in medical domain. However, as the development of computer technology, the application of machine learning introduced new ideas for seizure forecasting. Applying machine learning model onto the predication of epileptic seizure could help us obtain a better result and there have been plenty of scientists who have been doing such works so that there are sufficient medical data provided for researchers to do training of machine learning models.

DBAug 6, 2019
RSATree: Distribution-Aware Data Representation of Large-Scale Tabular Datasets for Flexible Visual Query

Honghui Mei, Wei Chen, Yating Wei et al.

Analysts commonly investigate the data distributions derived from statistical aggregations of data that are represented by charts, such as histograms and binned scatterplots, to visualize and analyze a large-scale dataset. Aggregate queries are implicitly executed through such a process. Datasets are constantly extremely large; thus, the response time should be accelerated by calculating predefined data cubes. However, the queries are limited to the predefined binning schema of preprocessed data cubes. Such limitation hinders analysts' flexible adjustment of visual specifications to investigate the implicit patterns in the data effectively. Particularly, RSATree enables arbitrary queries and flexible binning strategies by leveraging three schemes, namely, an R-tree-based space partitioning scheme to catch the data distribution, a locality-sensitive hashing technique to achieve locality-preserving random access to data items, and a summed area table scheme to support interactive query of aggregated values with a linear computational complexity. This study presents and implements a web-based visual query system that supports visual specification, query, and exploration of large-scale tabular data with user-adjustable granularities. We demonstrate the efficiency and utility of our approach by performing various experiments on real-world datasets and analyzing time and space complexity.

HCAug 1, 2019
Evaluating Perceptual Bias During Geometric Scaling of Scatterplots

Yating Wei, Honghui Mei, Ying Zhao et al.

Scatterplots are frequently scaled to fit display areas in multi-view and multi-device data analysis environments. A common method used for scaling is to enlarge or shrink the entire scatterplot together with the inside points synchronously and proportionally. This process is called geometric scaling. However, geometric scaling of scatterplots may cause a perceptual bias, that is, the perceived and physical values of visual features may be dissociated with respect to geometric scaling. For example, if a scatterplot is projected from a laptop to a large projector screen, then observers may feel that the scatterplot shown on the projector has fewer points than that viewed on the laptop. This paper presents an evaluation study on the perceptual bias of visual features in scatterplots caused by geometric scaling. The study focuses on three fundamental visual features (i.e., numerosity, correlation, and cluster separation) and three hypotheses that are formulated on the basis of our experience. We carefully design three controlled experiments by using well-prepared synthetic data and recruit participants to complete the experiments on the basis of their subjective experience. With a detailed analysis of the experimental results, we obtain a set of instructive findings. First, geometric scaling causes a bias that has a linear relationship with the scale ratio. Second, no significant difference exists between the biases measured from normally and uniformly distributed scatterplots. Third, changing the point radius can correct the bias to a certain extent. These findings can be used to inspire the design decisions of scatterplots in various scenarios.

DCJun 16, 2018
EdgeChain: An Edge-IoT Framework and Prototype Based on Blockchain and Smart Contracts

Jianli Pan, Jianyu Wang, Austin Hester et al.

The emerging Internet of Things (IoT) is facing significant scalability and security challenges. On the one hand, IoT devices are "weak" and need external assistance. Edge computing provides a promising direction addressing the deficiency of centralized cloud computing in scaling massive number of devices. On the other hand, IoT devices are also relatively "vulnerable" facing malicious hackers due to resource constraints. The emerging blockchain and smart contracts technologies bring a series of new security features for IoT and edge computing. In this paper, to address the challenges, we design and prototype an edge-IoT framework named "EdgeChain" based on blockchain and smart contracts. The core idea is to integrate a permissioned blockchain and the internal currency or "coin" system to link the edge cloud resource pool with each IoT device' account and resource usage, and hence behavior of the IoT devices. EdgeChain uses a credit-based resource management system to control how much resource IoT devices can obtain from edge servers, based on pre-defined rules on priority, application types and past behaviors. Smart contracts are used to enforce the rules and policies to regulate the IoT device behavior in a non-deniable and automated manner. All the IoT activities and transactions are recorded into blockchain for secure data logging and auditing. We implement an EdgeChain prototype and conduct extensive experiments to evaluate the ideas. The results show that while gaining the security benefits of blockchain and smart contracts, the cost of integrating them into EdgeChain is within a reasonable and acceptable range.