OPTICSJul 17, 2023
Nonlinear optical encoding enabled by recurrent linear scatteringFei Xia, Kyungduk Kim, Yaniv Eliezer et al.
Optical information processing and computing can potentially offer enhanced performance, scalability and energy efficiency. However, achieving nonlinearity-a critical component of computation-remains challenging in the optical domain. Here we introduce a design that leverages a multiple-scattering cavity to passively induce optical nonlinear random mapping with a continuous-wave laser at a low power. Each scattering event effectively mixes information from different areas of a spatial light modulator, resulting in a highly nonlinear mapping between the input data and output pattern. We demonstrate that our design retains vital information even when the readout dimensionality is reduced, thereby enabling optical data compression. This capability allows our optical platforms to offer efficient optical information processing solutions across applications. We demonstrate our design's efficacy across tasks, including classification, image reconstruction, keypoint detection and object detection, all of which are achieved through optical data compression combined with a digital decoder. In particular, high performance at extreme compression ratios is observed in real-time pedestrian detection. Our findings open pathways for novel algorithms and unconventional architectural designs for optical computing.
CVDec 12, 2022
Joint Counting, Detection and Re-Identification for Multi-Object TrackingWeihong Ren, Denglu Wu, Hui Cao et al.
The recent trend in 2D multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes,or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 79.7), MOT17 (MOTA of 81.3%) and MOT20 (MOTA of 78.9%).
CROct 19, 2022
Emerging Threats in Deep Learning-Based Autonomous Driving: A Comprehensive SurveyHui Cao, Wenlong Zou, Yinkun Wang et al.
Since the 2004 DARPA Grand Challenge, the autonomous driving technology has witnessed nearly two decades of rapid development. Particularly, in recent years, with the application of new sensors and deep learning technologies extending to the autonomous field, the development of autonomous driving technology has continued to make breakthroughs. Thus, many carmakers and high-tech giants dedicated to research and system development of autonomous driving. However, as the foundation of autonomous driving, the deep learning technology faces many new security risks. The academic community has proposed deep learning countermeasures against the adversarial examples and AI backdoor, and has introduced them into the autonomous driving field for verification. Deep learning security matters to autonomous driving system security, and then matters to personal safety, which is an issue that deserves attention and research.This paper provides an summary of the concepts, developments and recent research in deep learning security technologies in autonomous driving. Firstly, we briefly introduce the deep learning framework and pipeline in the autonomous driving system, which mainly include the deep learning technologies and algorithms commonly used in this field. Moreover, we focus on the potential security threats of the deep learning based autonomous driving system in each functional layer in turn. We reviews the development of deep learning attack technologies to autonomous driving, investigates the State-of-the-Art algorithms, and reveals the potential risks. At last, we provides an outlook on deep learning security in the autonomous driving field and proposes recommendations for building a safe and trustworthy autonomous driving system.
CLAug 1, 2023
JIANG: Chinese Open Foundation Language ModelQinhua Duan, Wenchao Gu, Yujia Chen et al.
With the advancements in large language model technology, it has showcased capabilities that come close to those of human beings across various tasks. This achievement has garnered significant interest from companies and scientific research institutions, leading to substantial investments in the research and development of these models. While numerous large models have emerged during this period, the majority of them have been trained primarily on English data. Although they exhibit decent performance in other languages, such as Chinese, their potential remains limited due to factors like vocabulary design and training corpus. Consequently, their ability to fully express their capabilities in Chinese falls short. To address this issue, we introduce the model named JIANG (Chinese pinyin of ginger) specifically designed for the Chinese language. We have gathered a substantial amount of Chinese corpus to train the model and have also optimized its structure. The extensive experimental results demonstrate the excellent performance of our model.
CRFeb 16, 2022
Contextualize differential privacy in image database: a lightweight image differential privacy approach based on principle component analysis inverseShiliang Zhang, Xuehui Ma, Hui Cao et al.
Differential privacy (DP) has been the de-facto standard to preserve privacy-sensitive information in database. Nevertheless, there lacks a clear and convincing contextualization of DP in image database, where individual images' indistinguishable contribution to a certain analysis can be achieved and observed when DP is exerted. As a result, the privacy-accuracy trade-off due to integrating DP is insufficiently demonstrated in the context of differentially-private image database. This work aims at contextualizing DP in image database by an explicit and intuitive demonstration of integrating conceptional differential privacy with images. To this end, we design a lightweight approach dedicating to privatizing image database as a whole and preserving the statistical semantics of the image database to an adjustable level, while making individual images' contribution to such statistics indistinguishable. The designed approach leverages principle component analysis (PCA) to reduce the raw image with large amount of attributes to a lower dimensional space whereby DP is performed, so as to decrease the DP load of calculating sensitivity attribute-by-attribute. The DP-exerted image data, which is not visible in its privatized format, is visualized through PCA inverse such that both a human and machine inspector can evaluate the privatization and quantify the privacy-accuracy trade-off in an analysis on the privatized image database. Using the devised approach, we demonstrate the contextualization of DP in images by two use cases based on deep learning models, where we show the indistinguishability of individual images induced by DP and the privatized images' retention of statistical semantics in deep learning tasks, which is elaborated by quantitative analyses on the privacy-accuracy trade-off under different privatization settings.
CLFeb 14, 2022
Research on Dual Channel News Headline Classification Based on ERNIE Pre-training ModelJunjie Li, Hui Cao
The classification of news headlines is an important direction in the field of NLP, and its data has the characteristics of compactness, uniqueness and various forms. Aiming at the problem that the traditional neural network model cannot adequately capture the underlying feature information of the data and cannot jointly extract key global features and deep local features, a dual-channel network model DC-EBAD based on the ERNIE pre-training model is proposed. Use ERNIE to extract the lexical, semantic and contextual feature information at the bottom of the text, generate dynamic word vector representations fused with context, and then use the BiLSTM-AT network channel to secondary extract the global features of the data and use the attention mechanism to give key parts higher The weight of the DPCNN channel is used to overcome the long-distance text dependence problem and obtain deep local features. The local and global feature vectors are spliced, and finally passed to the fully connected layer, and the final classification result is output through Softmax. The experimental results show that the proposed model improves the accuracy, precision and F1-score of news headline classification compared with the traditional neural network model and the single-channel model under the same conditions. It can be seen that it can perform well in the multi-classification application of news headline text under large data volume.
CVMay 31, 2021
SN-Graph: a Minimalist 3D Object Representation for ClassificationSiyu Zhang, Hui Cao, Yuqi Liu et al.
Using deep learning techniques to process 3D objects has achieved many successes. However, few methods focus on the representation of 3D objects, which could be more effective for specific tasks than traditional representations, such as point clouds, voxels, and multi-view images. In this paper, we propose a Sphere Node Graph (SN-Graph) to represent 3D objects. Specifically, we extract a certain number of internal spheres (as nodes) from the signed distance field (SDF), and then establish connections (as edges) among the sphere nodes to construct a graph, which is seamlessly suitable for 3D analysis using graph neural network (GNN). Experiments conducted on the ModelNet40 dataset show that when there are fewer nodes in the graph or the tested objects are rotated arbitrarily, the classification accuracy of SN-Graph is significantly higher than the state-of-the-art methods.
CRApr 7, 2021
An Object Detection based Solver for Google's Image reCAPTCHA v2Md Imran Hossen, Yazhou Tu, Md Fazle Rabby et al.
Previous work showed that reCAPTCHA v2's image challenges could be solved by automated programs armed with Deep Neural Network (DNN) image classifiers and vision APIs provided by off-the-shelf image recognition services. In response to emerging threats, Google has made significant updates to its image reCAPTCHA v2 challenges that can render the prior approaches ineffective to a great extent. In this paper, we investigate the robustness of the latest version of reCAPTCHA v2 against advanced object detection based solvers. We propose a fully automated object detection based system that breaks the most advanced challenges of reCAPTCHA v2 with an online success rate of 83.25%, the highest success rate to date, and it takes only 19.93 seconds (including network delays) on average to crack a challenge. We also study the updated security features of reCAPTCHA v2, such as anti-recognition mechanisms, improved anti-bot detection techniques, and adjustable security preferences. Our extensive experiments show that while these security features can provide some resistance against automated attacks, adversaries can still bypass most of them. Our experimental findings indicate that the recent advances in object detection technologies pose a severe threat to the security of image captcha designs relying on simple object detection as their underlying AI problem.
CVNov 9, 2020
A Fast Hybrid Cascade Network for Voxel-based 3D Object ClassificationJi Luo, Hui Cao, Jie Wang et al.
Voxel-based 3D object classification has been thoroughly studied in recent years. Most previous methods convert the classic 2D convolution into a 3D form that will be further applied to objects with binary voxel representation for classification. However, the binary voxel representation is not very effective for 3D convolution in many cases. In this paper, we propose a hybrid cascade architecture for voxel-based 3D object classification. It consists of three stages composed of fully connected and convolutional layers, dealing with easy, moderate, and hard 3D models respectively. Both accuracy and speed can be balanced in our proposed method. By giving each voxel a signed distance value, an obvious gain regarding the accuracy can be observed. Besides, the mean inference time can be speeded up hugely compared with the state-of-the-art point cloud and voxel based methods.
CVDec 25, 2019
InSphereNet: a Concise Representation and Classification Method for 3D ObjectHui Cao, Haikuan Du, Siyu Zhang et al.
In this paper, we present an InSphereNet method for the problem of 3D object classification. Unlike previous methods that use points, voxels, or multi-view images as inputs of deep neural network (DNN), the proposed method constructs a class of more representative features named infilling spheres from signed distance field (SDF). Because of the admirable spatial representation of infilling spheres, we can not only utilize very fewer number of spheres to accomplish classification task, but also design a lightweight InSphereNet with less layers and parameters than previous methods. Experiments on ModelNet40 show that the proposed method leads to superior performance than PointNet and PointNet++ in accuracy. In particular, if there are only a few dozen sphere inputs or about 100000 DNN parameters, the accuracy of our method remains at a very high level (over 88%). This further validates the conciseness and effectiveness of the proposed InSphere 3D representation. Keywords: 3D object classification , signed distance field , deep learning , infilling sphere
CRApr 9, 2018
An Efficient Privacy-Preserving Algorithm based on Randomized Response in IoT-based Smart GridHui Cao, Shubo Liu, Zhitao Guan et al.
Among existing privacy-preserving approaches, Differential Privacy (DP) is a powerful tool that can provide privacy-preserving noisy query answers over statistical databases and has been widely adopted in many practical fields. In particular, as a privacy machine of DP, Randomized Aggregable Privacy-Preserving Ordinal Response (RAPPOR) enables strong privacy, efficient, and high-utility guarantees for each client string in data crowdsourcing. However, as for Internet of Things(IoT), such as smart gird, data are often processed in batches. Therefore, developing a new random response algorithm that can support batch-processing tend to make it more efficient and suitable for IoT applications than existing random response algorithms. In this paper, we propose a new randomized response algorithm that can achieve differential-privacy and utility guar-antees for consumer's behaviors, and process a batch of data at each time. Firstly, by applying sparse coding in this algorithm, a behavior signature dictionary is created from the aggregated energy consumption data in fog. Then, we add noise into the behavior signature dictionary by classical randomized response techniques and achieve the differential privacy after data re-aggregation. Through the security analysis with the principle of differential privacy and experimental results verification, we find that our Algorithm can preserve consumer's privacy with-out comprising utility.
CRApr 5, 2018
Achieving Differential Privacy against Non-Intrusive Load Monitoring in Smart Grid: a Fog Computing approachHui Cao, Shubo Liu, Longfei Wu et al.
Fog computing, a non-trivial extension of cloud computing to the edge of the network, has great advantage in providing services with a lower latency. In smart grid, the application of fog computing can greatly facilitate the collection of consumer's fine-grained energy consumption data, which can then be used to draw the load curve and develop a plan or model for power generation. However, such data may also reveal customer's daily activities. Non-intrusive load monitoring (NILM) can monitor an electrical circuit that powers a number of appliances switching on and off independently. If an adversary analyzes the meter readings together with the data measured by an NILM device, the customer's privacy will be disclosed. In this paper, we propose an effective privacy-preserving scheme for electric load monitoring, which can guarantee differential privacy of data disclosure in smart grid. In the proposed scheme, an energy consumption behavior model based on Factorial Hidden Markov Model (FHMM) is established. In addition, noise is added to the behavior parameter, which is different from the traditional methods that usually add noise to the energy consumption data. The analysis shows that the proposed scheme can get a better trade-off between utility and privacy compared with other popular methods.