RAG Direction:
Paper Title:Granite Guardian
Paper Link:
http://arxiv.org/abs/2412.07724v1
Publication Date:2024-12-10
Authors:Inkit Padhi
Abstract:We introduce the Granite Guardian model, a set of risk detection safety measures designed to provide risk detection for prompts and responses, which can be combined with any large language model (LLM) to ensure safe and responsible use. These models provide comprehensive coverage across multiple risk dimensions, including social bias, vulgar language, violence, sexual content, unethical behavior, jailbreaking, and hallucination-related risks, such as contextual relevance, groundedness, and answer relevance in retrieval-augmented generation (RAG). The Granite Guardian model is trained on a unique dataset that combines human annotations and synthetic data from various sources, addressing risks often overlooked by traditional risk detection models, such as jailbreaking and RAG-specific issues. The Granite Guardian achieved AUC scores of 0.871 and 0.854 on harmful content benchmarks and RAG-hallucination-related benchmarks, making it the most versatile and competitive model currently available. Released as an open-source project, Granite Guardian aims to promote responsible AI development within the community. Project link: https://github.com/ibm-granite/granite-guardian