Evaluation metrics reference

Evaluation metrics in RAGET provide quantitative measures of RAG system performance across different dimensions.

Correctness

Using LLM as a judge strategy, the correctness metrics check if an answer is correct compared to the reference answer.

giskard.rag.metrics.correctness.correctness_metric(question_sample: dict, answer: AgentAnswer) → dict

RAGAS Metrics

We provide wrappers for some RAGAS metrics. You can implement other RAGAS metrics using the RAGASMetric class.

giskard.rag.metrics.ragas_metrics.ragas_context_precision(question_sample: dict, answer: AgentAnswer) → dict

giskard.rag.metrics.ragas_metrics.ragas_faithfulness(question_sample: dict, answer: AgentAnswer) → dict

giskard.rag.metrics.ragas_metrics.ragas_answer_relevancy(question_sample: dict, answer: AgentAnswer) → dict

giskard.rag.metrics.ragas_metrics.ragas_context_recall(question_sample: dict, answer: AgentAnswer) → dict

Base Metric

class giskard.rag.metrics.Metric(name: str, llm_client: LLMClient = None)[source]

Metric base class. All metrics should inherit from this class and implement the __call__ method. The instances of this class can be passed to the evaluate method.

abstract __call__(question_sample: dict, answer: AgentAnswer)[source]

Compute the metric on a single question and its associated answer.

Parameters:

question_sample (dict) – A question sample from a QATestset.
answer (AgentAnswer) – The agent answer on that question.

Returns:

The result of the metric computation. The keys should be the names of the metrics computed.

Return type:

dict