Granite Guardian: Risk Detection Measures for RAG Models

Welcome to Agent Daily, summarizing Agent papers every morning.

RAG Direction:

Paper Title:Granite Guardian

Paper Link:

http://arxiv.org/abs/2412.07724v1

Publication Date:2024-12-10

Authors:Inkit Padhi

Abstract:We introduce the Granite Guardian model, a set of risk detection safety measures designed to provide risk detection for prompts and responses, which can be combined with any large language model (LLM) to ensure safe and responsible use. These models provide comprehensive coverage across multiple risk dimensions, including social bias, vulgar language, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks, such as contextual relevance, groundedness, and answer relevance in retrieval-augmented generation (RAG). The Granite Guardian model is trained on a unique dataset that combines human annotations and synthetic data from various sources, addressing risks often overlooked by traditional risk detection models, such as jailbreaking and RAG-specific issues. The Granite Guardian achieved AUC scores of 0.871 and 0.854 on harmful content benchmarks and RAG-hallucination-related benchmarks, making it the most versatile and competitive model currently available. Released as an open-source project, Granite Guardian aims to promote responsible AI development within the community. Project link: https://github.com/ibm-granite/granite-guardian

Leave a Comment