Scaling Sentence Embeddings with Large Language ModelsTing Jiang, Shaohan Huang, Zhongzhi Luan et al.
Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area of ongoing research. In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance. Our approach involves adapting the previous prompt-based representation method for autoregressive models, constructing a demonstration set that enables LLMs to perform in-context learning, and scaling up the LLMs to different model sizes. Through extensive experiments, in-context learning enables LLMs to generate high-quality sentence embeddings without any fine-tuning. It helps LLMs achieve performance comparable to current contrastive learning methods. By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity (STS) tasks. However, the largest model outperforms other counterparts and achieves the new state-of-the-art result on transfer tasks. We also fine-tune LLMs with current contrastive learning approach, and the 2.7B OPT model, incorporating our prompt-based method, surpasses the performance of 4.8B ST5, achieving the new state-of-the-art results on STS tasks. Our code is available at https://github.com/kongds/scaling_sentemb.
PARNESS: A Paper Harness for End-to-End Automated Scientific Research with Dynamic Workflows, Full-Text Indexing, and Cross-Run Knowledge AccumulationYuchen Wang, Zhongzhi Luan
Recent autonomous research systems -- AI-Scientist, PaperOrchestra, AutoSOTA, DeepResearch, InternAgent, ResearchAgent and others -- show LLM agents can ideate, run experiments and write papers, but each fixes a particular control-flow shape (linear pipeline, state machine, single-agent loop, or fixed-recipe skill pack) at the framework level. We argue this rigidity has five roots: (1) workflows are dynamic and discipline-specific (lab work, surveys, simulations, theory all loop differently); (2) ideation is bounded by LLM context and cross-domain ideation needs knowledge a single context cannot hold; (3) summary-only views miss the paper body, yet full-text access is uneven, so the cumulative corpus must do the work; (4) a paper's open-source repository is often the only complete specification of its experimental scheme, but the paper-to-code link is neglected; (5) no tool persists cross-run knowledge retrievably into a finite LLM context. We present PARNESS, an open-source framework built on four design moves. (i) A thin DAG kernel with a four-field Agent contract decouples scheduling from domain semantics, so any discipline's loop is expressible as user-editable YAML. (ii) A full-text PDF-parsing and literature-library subsystem indexes paper bodies, figures and tables as typed objects, with graceful abstract-only fall-back. (iii) A knowledge-graph index over papers, ideas, experiments and code repositories, with scenario-typed retrieval (similar / contradictory / cross-domain / counter-intuitive), surfaces a focused slice into each LLM call. (iv) A small extension surface lets any modern coding agent (Claude Code, Cursor, Copilot, OpenCode) add or replace any module. To our knowledge PARNESS is the first open-source system combining declarative pipelines, full-PDF and code-repository indexing, and cross-run knowledge. Source: https://github.com/gtrhythm/PARNESS
8.2CLNov 29, 2024Code
On Domain-Adaptive Post-Training for Multimodal Large Language ModelsDaixuan Cheng, Shaohan Huang, Ziyu Zhu et al.
Adapting general multimodal large language models (MLLMs) to specific domains, such as scientific and industrial fields, is highly significant in promoting their practical applications. This paper systematically investigates domain adaptation of MLLMs via post-training, focusing on data synthesis, training pipeline, and task evaluation. (1) Data Synthesis: Using only open-source models, we develop a generate-then-filter pipeline that curates diverse visual instruction tasks based on domain-specific image-caption pairs. The resulting data surpass the data synthesized by manual rules or strong closed-source models in enhancing domain-specific performance. (2) Training Pipeline: Unlike general MLLMs that typically adopt a two-stage training paradigm, we find that a single-stage approach is more effective for domain adaptation. (3) Task Evaluation: We conduct extensive experiments in high-impact domains such as biomedicine, food, and remote sensing, by post-training a variety of MLLMs and then evaluating MLLM performance on various domain-specific tasks. Finally, we fully open-source our models, code, and data to encourage future research in this area.
CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural NetworksWeicheng Li, Rui Wang, Zhongzhi Luan et al.
Convolutional Neural Network (CNN) based Deep Learning (DL) has achieved great progress in many real-life applications. Meanwhile, due to the complex model structures against strict latency and memory restriction, the implementation of CNN models on the resource-limited platforms is becoming more challenging. This work proposes a solution, called CompactNet\footnote{Project URL: \url{https://github.com/CompactNet/CompactNet}}, which automatically optimizes a pre-trained CNN model on a specific resource-limited platform given a specific target of inference speedup. Guided by a simulator of the target platform, CompactNet progressively trims a pre-trained network by removing certain redundant filters until the target speedup is reached and generates an optimal platform-specific model while maintaining the accuracy. We evaluate our work on two platforms of a mobile ARM CPU and a machine learning accelerator NPU (Cambricon-1A ISA) on a Huawei Mate10 smartphone. For the state-of-the-art slim CNN model made for the embedded platform, MobileNetV2, CompactNet achieves up to a 1.8x kernel computation speedup with equal or even higher accuracy for image classification tasks on the Cifar-10 dataset.
2.3OTFeb 21, 2025
Strategic priorities for transformative progress in advancing biology with proteomics and artificial intelligenceYingying Sun, Jun A, Zhiwei Liu et al.
Artificial intelligence (AI) is transforming scientific research, including proteomics. Advances in mass spectrometry (MS)-based proteomics data quality, diversity, and scale, combined with groundbreaking AI techniques, are unlocking new challenges and opportunities in biological discovery. Here, we highlight key areas where AI is driving innovation, from data analysis to new biological insights. These include developing an AI-friendly ecosystem for proteomics data generation, sharing, and analysis; improving peptide and protein identification and quantification; characterizing protein-protein interactions and protein complexes; advancing spatial and perturbation proteomics; integrating multi-omics data; and ultimately enabling AI-empowered virtual cells.
3.4SEJan 21, 2025
Beyond Window-Based Detection: A Graph-Centric Framework for Discrete Log Anomaly DetectionJiaxing Qi, Chang Zeng, Zhongzhi Luan et al.
Detecting anomalies in discrete event logs is critical for ensuring system reliability, security, and efficiency. Traditional window-based methods for log anomaly detection often suffer from context bias and fuzzy localization, which hinder their ability to precisely and efficiently identify anomalies. To address these challenges, we propose a graph-centric framework, TempoLog, which leverages multi-scale temporal graph networks for discrete log anomaly detection. Unlike conventional methods, TempoLog constructs continuous-time dynamic graphs directly from event logs, eliminating the need for fixed-size window grouping. By representing log templates as nodes and their temporal relationships as edges, the framework dynamically captures both local and global dependencies across multiple temporal scales. Additionally, a semantic-aware model enhances detection by incorporating rich contextual information. Extensive experiments on public datasets demonstrate that our method achieves state-of-the-art performance in event-level anomaly detection, significantly outperforming existing approaches in both accuracy and efficiency.
2.6LGDec 18, 2024
Quantum Machine Learning in Log-based Anomaly Detection: Challenges and OpportunitiesJiaxing Qi, Chang Zeng, Zhongzhi Luan et al.
Log-based anomaly detection (LogAD) is the main component of Artificial Intelligence for IT Operations (AIOps), which can detect anomalous that occur during the system on-the-fly. Existing methods commonly extract log sequence features using classical machine learning techniques to identify whether a new sequence is an anomaly or not. However, these classical approaches often require trade-offs between efficiency and accuracy. The advent of quantum machine learning (QML) offers a promising alternative. By transforming parts of classical machine learning computations into parameterized quantum circuits (PQCs), QML can significantly reduce the number of trainable parameters while maintaining accuracy comparable to classical counterparts. In this work, we introduce a unified framework, \ourframework{}, for evaluating QML models in the context of LogAD. This framework incorporates diverse log data, integrated QML models, and comprehensive evaluation metrics. State-of-the-art methods such as DeepLog, LogAnomaly, and LogRobust, along with their quantum-transformed counterparts, are included in our framework.Beyond standard metrics like F1 score, precision, and recall, our evaluation extends to factors critical to QML performance, such as specificity, the number of circuits, circuit design, and quantum state encoding. Using \ourframework{}, we conduct extensive experiments to assess the performance of these models and their quantum counterparts, uncovering valuable insights and paving the way for future research in QML model selection and design for LogAD.
1.9CLMar 14, 2024
Exploring the Comprehension of ChatGPT in Traditional Chinese Medicine KnowledgeLi Yizhen, Huang Shaohan, Qi Jiaxing et al.
No previous work has studied the performance of Large Language Models (LLMs) in the context of Traditional Chinese Medicine (TCM), an essential and distinct branch of medical knowledge with a rich history. To bridge this gap, we present a TCM question dataset named TCM-QA, which comprises three question types: single choice, multiple choice, and true or false, to examine the LLM's capacity for knowledge recall and comprehensive reasoning within the TCM domain. In our study, we evaluate two settings of the LLM, zero-shot and few-shot settings, while concurrently discussing the differences between English and Chinese prompts. Our results indicate that ChatGPT performs best in true or false questions, achieving the highest precision of 0.688 while scoring the lowest precision is 0.241 in multiple-choice questions. Furthermore, we observed that Chinese prompts outperformed English prompts in our evaluations. Additionally, we assess the quality of explanations generated by ChatGPT and their potential contribution to TCM knowledge comprehension. This paper offers valuable insights into the applicability of LLMs in specialized domains and paves the way for future research in leveraging these powerful models to advance TCM.
22.3LGSep 3, 2023
LogGPT: Exploring ChatGPT for Log-Based Anomaly DetectionJiaxing Qi, Shaohan Huang, Zhongzhi Luan et al.
The increasing volume of log data produced by software-intensive systems makes it impractical to analyze them manually. Many deep learning-based methods have been proposed for log-based anomaly detection. These methods face several challenges such as high-dimensional and noisy log data, class imbalance, generalization, and model interpretability. Recently, ChatGPT has shown promising results in various domains. However, there is still a lack of study on the application of ChatGPT for log-based anomaly detection. In this work, we proposed LogGPT, a log-based anomaly detection framework based on ChatGPT. By leveraging the ChatGPT's language interpretation capabilities, LogGPT aims to explore the transferability of knowledge from large-scale corpora to log-based anomaly detection. We conduct experiments to evaluate the performance of LogGPT and compare it with three deep learning-based methods on BGL and Spirit datasets. LogGPT shows promising results and has good interpretability. This study provides preliminary insights into prompt-based models, such as ChatGPT, for the log-based anomaly detection task.
1.8LGJan 1, 2022
FamilySeer: Towards Optimized Tensor Codes by Exploiting Computation Subgraph SimilarityShanjun Zhang, Mingzhen Li, Hailong Yang et al.
Deploying various deep learning (DL) models efficiently has boosted the research on DL compilers. The difficulty of generating optimized tensor codes drives DL compiler to ask for the auto-tuning approaches, and the increasing demands require increasing auto-tuning efficiency and quality. Currently, the DL compilers partition the input DL models into several subgraphs and leverage the auto-tuning to find the optimal tensor codes of these subgraphs. However, existing auto-tuning approaches usually regard subgraphs as individual ones and overlook the similarities across them, and thus fail to exploit better tensor codes under limited time budgets. We propose FamilySeer, an auto-tuning framework for DL compilers that can generate better tensor codes even with limited time budgets. FamilySeer exploits the similarities and differences among subgraphs can organize them into subgraph families, where the tuning of one subgraph can also improve other subgraphs within the same family. The cost model of each family gets more purified training samples generated by the family and becomes more accurate so that the costly measurements on real hardware can be replaced with the lightweight estimation through cost model. Our experiments show that FamilySeer can generate model codes with the same code performance more efficiently than state-of-the-art auto-tuning frameworks.
7.2CRDec 31, 2019
Privacy for Rescue: A New Testimony Why Privacy is Vulnerable In Deep ModelsRuiyuan Gao, Ming Dun, Hailong Yang et al.
The huge computation demand of deep learning models and limited computation resources on the edge devices calls for the cooperation between edge device and cloud service by splitting the deep models into two halves. However, transferring the intermediates results from the partial models between edge device and cloud service makes the user privacy vulnerable since the attacker can intercept the intermediate results and extract privacy information from them. Existing research works rely on metrics that are either impractical or insufficient to measure the effectiveness of privacy protection methods in the above scenario, especially from the aspect of a single user. In this paper, we first present a formal definition of the privacy protection problem in the edge-cloud system running DNN models. Then, we analyze the-state-of-the-art methods and point out the drawbacks of their methods, especially the evaluation metrics such as the Mutual Information (MI). In addition, we perform several experiments to demonstrate that although existing methods perform well under MI, they are not effective enough to protect the privacy of a single user. To address the drawbacks of the evaluation metrics, we propose two new metrics that are more accurate to measure the effectiveness of privacy protection methods. Finally, we highlight several potential research directions to encourage future efforts addressing the privacy protection problem.
1.8LGApr 16, 2019
swTVM: Towards Optimized Tensor Code Generation for Deep Learning on Sunway Many-Core ProcessorMingzhen Li, Changxi Liu, Jianjin Liao et al.
The flourish of deep learning frameworks and hardware platforms has been demanding an efficient compiler that can shield the diversity in both software and hardware in order to provide application portability. Among the existing deep learning compilers, TVM is well known for its efficiency in code generation and optimization across diverse hardware devices. In the meanwhile, the Sunway many-core processor renders itself as a competitive candidate for its attractive computational power in both scientific computing and deep learning workloads. This paper combines the trends in these two directions. Specifically, we propose swTVM that extends the original TVM to support ahead-of-time compilation for architecture requiring cross-compilation such as Sunway. In addition, we leverage the architecture features during the compilation such as core group for massive parallelism, DMA for high bandwidth memory transfer and local device memory for data locality, in order to generate efficient codes for deep learning workloads on Sunway. The experiment results show that the codes generated by swTVM achieves 1.79x on average compared to the state-of-the-art deep learning framework on Sunway, across six representative benchmarks. This work is the first attempt from the compiler perspective to bridge the gap of deep learning and Sunway processor particularly with productivity and efficiency in mind. We believe this work will encourage more people to embrace the power of deep learning and Sunway many-core processor.