Evaluation reference

Evaluation methods in RAGET provide comprehensive testing capabilities for RAG systems.

giskard.rag.evaluate(answer_fn: Callable | Sequence[AgentAnswer | str], testset: QATestset | None = None, knowledge_base: KnowledgeBase | None = None, llm_client: LLMClient | None = None, agent_description: str = 'This agent is a chatbot that answers question from users.', metrics: Sequence[Callable] | None = None) RAGReport[source]

Evaluate an agent by comparing its answers on a QATestset.

Parameters:
  • answers_fn (Union[Callable, Sequence[Union[AgentAnswer,str]]]) – The prediction function of the agent to evaluate or a list of precalculated answers on the testset.

  • testset (QATestset, optional) – The test set to evaluate the agent on. If not provided, a knowledge base must be provided and a default testset will be created from the knowledge base. Note that if the answers_fn is a list of answers, the testset is required.

  • knowledge_base (KnowledgeBase, optional) – The knowledge base of the agent to evaluate. If not provided, a testset must be provided.

  • llm_client (LLMClient, optional) – The LLM client to use for the evaluation. If not provided, a default openai client will be used.

  • agent_description (str, optional) – Description of the agent to be tested.

  • metrics (Optional[Sequence[Callable]], optional) – Metrics to compute on the test set.

Returns:

The report of the evaluation.

Return type:

RAGReport

class giskard.llm.evaluators.CorrectnessEvaluator(answer_col='reference_answer', *args, **kwargs)[source]

Assess the correctness of a model answers given questions and associated reference answers.