Seungjin Baek

CLSep 1, 2023Code

Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

Sunjun Kweon, Junu Kim, Jiyoun Kim et al.

The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train our specialized clinical large language model, Asclepius. While Asclepius is trained on synthetic data, we assess its potential performance in real-world applications by evaluating it using real clinical notes. We benchmark Asclepius against several other large language models, including GPT-3.5-turbo and other open-source alternatives. To further validate our approach using synthetic notes, we also compare Asclepius with its variants trained on real clinical notes. Our findings convincingly demonstrate that synthetic clinical notes can serve as viable substitutes for real ones when constructing high-performing clinical language models. This conclusion is supported by detailed evaluations conducted by both GPT-4 and medical professionals. All resources including weights, codes, and data used in the development of Asclepius are made publicly accessible for future research. (https://github.com/starmpcc/Asclepius)

43.0CRApr 26

The Vehicle May Be Sick: Denial of Diagnostic Services by Exploiting the CAN Transport Protocol

Seungjin Baek, Seonghoon Jeong, Huy Kang Kim

Vehicle diagnostics has become essential for detecting in-vehicle errors and ensuring safety. While the Unified Diagnostic Services (UDS) protocol is widely adopted for diagnostic operations, it relies on the ISO 15765-2 standard as the transport protocol over the Controller Area Network (CAN), which was designed without inherent security considerations. In this paper, we identify eight novel attack scenarios that exploit specific transport layer mechanisms in the ISO 15765-2 standard, including Flow Control manipulation, Sequence Number violations, and error handling abuses. We evaluate these attacks on a real passenger vehicle using two distinct diagnostic tools to demonstrate their practical impact. Our results confirm that three of these attack scenarios successfully induce denial of diagnostic services, leading to abnormal diagnostic results such as concealed faults and manipulated sensor readings. These findings highlight critical vulnerabilities that can deceive technicians and drivers, potentially exposing vehicles to significant safety risks.

Seungjin Baek

2 Papers