Yue Li

h-index15

4papers

103citations

Novelty43%

AI Score32

Ranked #123,168 of 194,257 authors (top 63%)#3,049 in CR (top 45%)

4 Papers

32.6CRApr 19, 2024Code

CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models

Manish Bhatt, Sahana Chennabasappa, Yue Li et al.

Large language models (LLMs) introduce new security risks, but there are few comprehensive evaluation suites to measure and reduce these risks. We present BenchmarkName, a novel benchmark to quantify LLM security risks and capabilities. We introduce two new areas for testing: prompt injection and code interpreter abuse. We evaluated multiple state-of-the-art (SOTA) LLMs, including GPT-4, Mistral, Meta Llama 3 70B-Instruct, and Code Llama. Our results show that conditioning away risk of attack remains an unsolved problem; for example, all tested models showed between 26% and 41% successful prompt injection tests. We further introduce the safety-utility tradeoff: conditioning an LLM to reject unsafe prompts can cause the LLM to falsely reject answering benign prompts, which lowers utility. We propose quantifying this tradeoff using False Refusal Rate (FRR). As an illustration, we introduce a novel test set to quantify FRR for cyberattack helpfulness risk. We find many LLMs able to successfully comply with "borderline" benign requests while still rejecting most unsafe requests. Finally, we quantify the utility of LLMs for automating a core cybersecurity task, that of exploiting software vulnerabilities. This is important because the offensive capabilities of LLMs are of intense interest; we quantify this by creating novel test sets for four representative problems. We find that models with coding capabilities perform better than those without, but that further work is needed for LLMs to become proficient at exploit generation. Our code is open source and can be used to evaluate other LLMs.

7.6CVDec 3, 2023Code

Deeper into Self-Supervised Monocular Indoor Depth Estimation

Chao Fan, Zhenyu Yin, Yue Li et al.

Monocular depth estimation using Convolutional Neural Networks (CNNs) has shown impressive performance in outdoor driving scenes. However, self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers because of the following two main reasons. One is the large areas of low-texture regions and the other is the complex ego-motion on indoor training datasets. In this work, our proposed method, named IndoorDepth, consists of two innovations. In particular, we first propose a novel photometric loss with improved structural similarity (SSIM) function to tackle the challenge from low-texture regions. Moreover, in order to further mitigate the issue of inaccurate ego-motion prediction, multiple photometric losses at different stages are used to train a deeper pose network with two residual pose blocks. Subsequent ablation study can validate the effectiveness of each new idea. Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin. In addition, we also validate the generalization ability of our method on ScanNet dataset. Code is availabe at https://github.com/fcntes/IndoorDepth.

4.9SDJan 7, 2024

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

He Wang, Pengcheng Guo, Yue Li et al.

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.

3.8CRJun 25, 2021

CLOAK: A Framework For Development of Confidential Blockchain Smart Contracts

Qian Ren, Han Liu, Yue Li et al.

In recent years, as blockchain adoption has been expanding across a wide range of domains, e.g., digital asset, supply chain finance, etc., the confidentiality of smart contracts is now a fundamental demand for practical applications. However, while new privacy protection techniques keep coming out, how existing ones can best fit development settings is little studied. Suffering from limited architectural support in terms of programming interfaces, state-of-the-art solutions can hardly reach general developers. In this paper, we proposed the CLOAK framework for developing confidential smart contracts. The key capability of CLOAK is allowing developers to implement and deploy practical solutions to multi-party transaction (MPT) problems, i.e., transact with secret inputs and states owned by different parties by simply specifying it. To this end, CLOAK introduced a domain-specific annotation language for declaring privacy specifications and further automatically generating confidential smart contracts to be deployed with trusted execution environment (TEE) on blockchain. In our evaluation on both simple and real-world applications, developers managed to deploy business services on blockchain in a concise manner by only developing CLOAK smart contracts whose size is less than 30% of the deployed ones.