Yuriy Hulovatyy

h-index26
2papers

2 Papers

DCNov 18, 2024Code
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations

Igor Fedorov, Kate Plawiak, Lemeng Wu et al.

This paper presents Llama Guard 3-1B-INT4, a compact and efficient Llama Guard model, which has been open-sourced to the community during Meta Connect 2024. We demonstrate that Llama Guard 3-1B-INT4 can be deployed on resource-constrained devices, achieving a throughput of at least 30 tokens per second and a time-to-first-token of 2.5 seconds or less on a commodity Android mobile CPU. Notably, our experiments show that Llama Guard 3-1B-INT4 attains comparable or superior safety moderation scores to its larger counterpart, Llama Guard 3-1B, despite being approximately 7 times smaller in size (440MB).

CLSep 30, 2024
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG

Chenhao Fang, Derek Larson, Shitong Zhu et al.

This paper presents new methods that have the potential to improve privacy process efficiency with LLM and RAG. To reduce hallucination, we continually pre-train the base LLM model with a privacy-specific knowledge base and then augment it with a semantic RAG layer. Our evaluations demonstrate that this approach enhances the model performance (as much as doubled metrics compared to out-of-box LLM) in handling privacy-related queries, by grounding responses with factual information which reduces inaccuracies.