Evaluation reference
Evaluation methods in RAGET provide comprehensive testing capabilities for RAG systems.
- giskard.rag.evaluate(answer_fn: Callable | Sequence[AgentAnswer | str], testset: QATestset | None = None, knowledge_base: KnowledgeBase | None = None, llm_client: LLMClient | None = None, agent_description: str = 'This agent is a chatbot that answers question from users.', metrics: Sequence[Callable] | None = None) RAGReport [source]
Evaluate an agent by comparing its answers on a QATestset.
- Parameters:
answers_fn (Union[Callable, Sequence[Union[AgentAnswer,str]]]) – The prediction function of the agent to evaluate or a list of precalculated answers on the testset.
testset (QATestset, optional) – The test set to evaluate the agent on. If not provided, a knowledge base must be provided and a default testset will be created from the knowledge base. Note that if the answers_fn is a list of answers, the testset is required.
knowledge_base (KnowledgeBase, optional) – The knowledge base of the agent to evaluate. If not provided, a testset must be provided.
llm_client (LLMClient, optional) – The LLM client to use for the evaluation. If not provided, a default openai client will be used.
agent_description (str, optional) – Description of the agent to be tested.
metrics (Optional[Sequence[Callable]], optional) – Metrics to compute on the test set.
- Returns:
The report of the evaluation.
- Return type:
RAGReport
- class giskard.llm.evaluators.CorrectnessEvaluator(answer_col='reference_answer', *args, **kwargs)[source]
Assess the correctness of a model answers given questions and associated reference answers.