Knowledge base reference

Knowledge base management in RAGET allows you to create and manage the information sources used by RAG systems.

class giskard.rag.knowledge_base.KnowledgeBase(data: DataFrame, columns: Sequence[str] | None = None, seed: int = None, llm_client: LLMClient | None = None, embedding_model: BaseEmbedding | None = None, min_topic_size: int | None = None, chunk_size: int = 2048)[source]

A class to handle the knowledge base and the associated vector store.

Parameters:
  • knowledge_base_df (pd.DataFrame) – A dataframe containing the whole knowledge base.

  • columns (Sequence[str], optional) – The list of columns from the knowledge_base to consider. If not specified, all columns of the knowledge base dataframe will be concatenated to produce a single document. Example: if your knowledge base consists in FAQ data with columns “Q” and “A”, we will format each row into a single document “Q: [question]nA: [answer]” to generate questions.

  • seed (int, optional) – The seed to use for random number generation.

  • llm_client (LLMClient, optional:) – The LLM client to use for question generation. If not specified, a default openai client will be used.

  • embedding_model (BaseEmbedding, optional) – The giskard embedding model to use for the knowledge base. By default we use giskard default model which is OpenAI “text-embedding-3-small”.

  • min_topic_size (int, optional) – The minimum number of document to form a topic inside the knowledge base.

  • chunk_size (int = 2048) – The number of document to embed in a single batch.

classmethod from_pandas(df: DataFrame, columns: Sequence[str] | None = None, **kwargs) KnowledgeBase[source]

Create a KnowledgeBase from a pandas DataFrame.

Parameters:
  • df (pd.DataFrame) – The DataFrame containing the knowledge base.

  • columns (Sequence[str], optional) – The list of columns from the knowledge_base to consider. If not specified, all columns of the knowledge base dataframe will be concatenated to produce a single document. Example: if your knowledge base consists in FAQ data with columns “Q” and “A”, we will format each row into a single document “Q: [question]nA: [answer]” to generate questions.

  • kwargs – Additional settings for knowledge base (see __init__).

class giskard.rag.knowledge_base.Document(document: Dict[str, str], doc_id: str = None, features: Sequence | None = None, topic_id: int = None)[source]

A class to wrap the elements of the knowledge base into a unified format.