Skip to content

retrieval

Classes

CollectionNameForm

Bases: BaseModel

Form for specifying a collection name.

Attributes

collection_name
collection_name: Optional[str] = None

The name of the collection.

ProcessUrlForm

Bases: CollectionNameForm

Form for processing a URL.

Attributes

url
url: str

The URL to process.

SearchForm

Bases: BaseModel

Form for search queries.

Attributes

queries
queries: List[str]

List of search queries.

OpenAIConfigForm

Bases: BaseModel

Configuration for OpenAI embedding model.

Attributes

url
url: str

The base URL for the OpenAI API.

key
key: str

The API key for the OpenAI API.

OllamaConfigForm

Bases: BaseModel

Configuration for Ollama embedding model.

Attributes

url
url: str

The base URL for the Ollama API.

key
key: str

The API key for the Ollama API.

AzureOpenAIConfigForm

Bases: BaseModel

Configuration for Azure OpenAI embedding model.

Attributes

url
url: str

The base URL for the Azure OpenAI API.

key
key: str

The API key for the Azure OpenAI API.

version
version: str

The API version for the Azure OpenAI API.

EmbeddingModelUpdateForm

Bases: BaseModel

Form for updating the embedding model configuration.

Attributes

openai_config
openai_config: Optional[OpenAIConfigForm] = None

Configuration for OpenAI embedding model.

ollama_config
ollama_config: Optional[OllamaConfigForm] = None

Configuration for Ollama embedding model.

azure_openai_config
azure_openai_config: Optional[AzureOpenAIConfigForm] = None

Configuration for Azure OpenAI embedding model.

RAG_EMBEDDING_ENGINE
RAG_EMBEDDING_ENGINE: str

The embedding engine to use (e.g., 'ollama', 'openai').

RAG_EMBEDDING_MODEL
RAG_EMBEDDING_MODEL: str

The embedding model to use.

RAG_EMBEDDING_BATCH_SIZE
RAG_EMBEDDING_BATCH_SIZE: Optional[int] = 1

The batch size for embedding generation.

ENABLE_ASYNC_EMBEDDING
ENABLE_ASYNC_EMBEDDING: Optional[bool] = True

Whether to enable asynchronous embedding generation.

WebConfig

Bases: BaseModel

Configuration for web search and retrieval.

Attributes

ENABLE_WEB_SEARCH: Optional[bool] = None

Whether to enable web search.

WEB_SEARCH_ENGINE
WEB_SEARCH_ENGINE: Optional[str] = None

The web search engine to use.

WEB_SEARCH_TRUST_ENV
WEB_SEARCH_TRUST_ENV: Optional[bool] = None

Whether to trust the environment variables for web search.

WEB_SEARCH_RESULT_COUNT
WEB_SEARCH_RESULT_COUNT: Optional[int] = None

The number of web search results to retrieve.

WEB_SEARCH_CONCURRENT_REQUESTS
WEB_SEARCH_CONCURRENT_REQUESTS: Optional[int] = None

The number of concurrent web search requests.

WEB_LOADER_CONCURRENT_REQUESTS
WEB_LOADER_CONCURRENT_REQUESTS: Optional[int] = None

The number of concurrent web loader requests.

WEB_SEARCH_DOMAIN_FILTER_LIST
WEB_SEARCH_DOMAIN_FILTER_LIST: Optional[List[str]] = []

List of domains to filter from web search results.

BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL
BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL: Optional[
    bool
] = None

Whether to bypass embedding and retrieval for web search results.

BYPASS_WEB_SEARCH_WEB_LOADER
BYPASS_WEB_SEARCH_WEB_LOADER: Optional[bool] = None

Whether to bypass the web loader for web search results.

OLLAMA_CLOUD_WEB_SEARCH_API_KEY
OLLAMA_CLOUD_WEB_SEARCH_API_KEY: Optional[str] = None

API key for Ollama Cloud web search.

SEARXNG_QUERY_URL
SEARXNG_QUERY_URL: Optional[str] = None

The query URL for SearXNG.

YACY_QUERY_URL
YACY_QUERY_URL: Optional[str] = None

The query URL for YaCy.

YACY_USERNAME
YACY_USERNAME: Optional[str] = None

The username for YaCy.

YACY_PASSWORD
YACY_PASSWORD: Optional[str] = None

The password for YaCy.

GOOGLE_PSE_API_KEY
GOOGLE_PSE_API_KEY: Optional[str] = None

API key for Google Programmable Search Engine.

GOOGLE_PSE_ENGINE_ID
GOOGLE_PSE_ENGINE_ID: Optional[str] = None

Engine ID for Google Programmable Search Engine.

BRAVE_SEARCH_API_KEY
BRAVE_SEARCH_API_KEY: Optional[str] = None

API key for Brave Search.

KAGI_SEARCH_API_KEY
KAGI_SEARCH_API_KEY: Optional[str] = None

API key for Kagi Search.

MOJEEK_SEARCH_API_KEY
MOJEEK_SEARCH_API_KEY: Optional[str] = None

API key for Mojeek Search.

BOCHA_SEARCH_API_KEY
BOCHA_SEARCH_API_KEY: Optional[str] = None

API key for Bocha Search.

SERPSTACK_API_KEY
SERPSTACK_API_KEY: Optional[str] = None

API key for Serpstack.

SERPSTACK_HTTPS
SERPSTACK_HTTPS: Optional[bool] = None

Whether to use HTTPS for Serpstack.

SERPER_API_KEY
SERPER_API_KEY: Optional[str] = None

API key for Serper.

SERPLY_API_KEY
SERPLY_API_KEY: Optional[str] = None

API key for Serply.

TAVILY_API_KEY
TAVILY_API_KEY: Optional[str] = None

API key for Tavily.

SEARCHAPI_API_KEY
SEARCHAPI_API_KEY: Optional[str] = None

API key for SearchAPI.

SEARCHAPI_ENGINE
SEARCHAPI_ENGINE: Optional[str] = None

The engine to use for SearchAPI.

SERPAPI_API_KEY
SERPAPI_API_KEY: Optional[str] = None

API key for SerpAPI.

SERPAPI_ENGINE
SERPAPI_ENGINE: Optional[str] = None

The engine to use for SerpAPI.

JINA_API_KEY
JINA_API_KEY: Optional[str] = None

API key for Jina.

BING_SEARCH_V7_ENDPOINT
BING_SEARCH_V7_ENDPOINT: Optional[str] = None

The endpoint for Bing Search V7.

BING_SEARCH_V7_SUBSCRIPTION_KEY
BING_SEARCH_V7_SUBSCRIPTION_KEY: Optional[str] = None

The subscription key for Bing Search V7.

EXA_API_KEY
EXA_API_KEY: Optional[str] = None

API key for Exa.

PERPLEXITY_API_KEY
PERPLEXITY_API_KEY: Optional[str] = None

API key for Perplexity.

PERPLEXITY_MODEL
PERPLEXITY_MODEL: Optional[str] = None

The model to use for Perplexity.

PERPLEXITY_SEARCH_CONTEXT_USAGE
PERPLEXITY_SEARCH_CONTEXT_USAGE: Optional[str] = None

The search context usage for Perplexity.

PERPLEXITY_SEARCH_API_URL
PERPLEXITY_SEARCH_API_URL: Optional[str] = None

The search API URL for Perplexity.

SOUGOU_API_SID
SOUGOU_API_SID: Optional[str] = None

The SID for Sougou API.

SOUGOU_API_SK
SOUGOU_API_SK: Optional[str] = None

The SK for Sougou API.

WEB_LOADER_ENGINE
WEB_LOADER_ENGINE: Optional[str] = None

The web loader engine to use.

ENABLE_WEB_LOADER_SSL_VERIFICATION
ENABLE_WEB_LOADER_SSL_VERIFICATION: Optional[bool] = None

Whether to enable SSL verification for the web loader.

PLAYWRIGHT_WS_URL
PLAYWRIGHT_WS_URL: Optional[str] = None

The WebSocket URL for Playwright.

PLAYWRIGHT_TIMEOUT
PLAYWRIGHT_TIMEOUT: Optional[int] = None

The timeout for Playwright.

FIRECRAWL_API_KEY
FIRECRAWL_API_KEY: Optional[str] = None

API key for Firecrawl.

FIRECRAWL_API_BASE_URL
FIRECRAWL_API_BASE_URL: Optional[str] = None

The base URL for Firecrawl.

TAVILY_EXTRACT_DEPTH
TAVILY_EXTRACT_DEPTH: Optional[str] = None

The extract depth for Tavily.

EXTERNAL_WEB_SEARCH_URL
EXTERNAL_WEB_SEARCH_URL: Optional[str] = None

The URL for external web search.

EXTERNAL_WEB_SEARCH_API_KEY
EXTERNAL_WEB_SEARCH_API_KEY: Optional[str] = None

The API key for external web search.

EXTERNAL_WEB_LOADER_URL
EXTERNAL_WEB_LOADER_URL: Optional[str] = None

The URL for external web loader.

EXTERNAL_WEB_LOADER_API_KEY
EXTERNAL_WEB_LOADER_API_KEY: Optional[str] = None

The API key for external web loader.

YOUTUBE_LOADER_LANGUAGE
YOUTUBE_LOADER_LANGUAGE: Optional[List[str]] = None

List of languages for YouTube loader.

YOUTUBE_LOADER_PROXY_URL
YOUTUBE_LOADER_PROXY_URL: Optional[str] = None

The proxy URL for YouTube loader.

YOUTUBE_LOADER_TRANSLATION
YOUTUBE_LOADER_TRANSLATION: Optional[str] = None

The translation language for YouTube loader.

ConfigForm

Bases: BaseModel

Configuration form for retrieval settings.

Attributes

RAG_TEMPLATE
RAG_TEMPLATE: Optional[str] = None

Template for RAG.

TOP_K
TOP_K: Optional[int] = None

Top K results to retrieve.

BYPASS_EMBEDDING_AND_RETRIEVAL
BYPASS_EMBEDDING_AND_RETRIEVAL: Optional[bool] = None

Whether to bypass embedding and retrieval.

RAG_FULL_CONTEXT
RAG_FULL_CONTEXT: Optional[bool] = None

Whether to use full context for RAG.

ENABLE_RAG_HYBRID_SEARCH: Optional[bool] = None

Whether to enable hybrid search.

ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS
ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS: Optional[bool] = (
    None
)

Whether to enable enriched texts for hybrid search.

TOP_K_RERANKER
TOP_K_RERANKER: Optional[int] = None

Top K results for reranker.

RELEVANCE_THRESHOLD
RELEVANCE_THRESHOLD: Optional[float] = None

Relevance threshold for search results.

HYBRID_BM25_WEIGHT
HYBRID_BM25_WEIGHT: Optional[float] = None

Weight for BM25 in hybrid search.

CONTENT_EXTRACTION_ENGINE
CONTENT_EXTRACTION_ENGINE: Optional[str] = None

Engine for content extraction.

PDF_EXTRACT_IMAGES
PDF_EXTRACT_IMAGES: Optional[bool] = None

Whether to extract images from PDFs.

DATALAB_MARKER_API_KEY
DATALAB_MARKER_API_KEY: Optional[str] = None

API key for DataLab Marker.

DATALAB_MARKER_API_BASE_URL
DATALAB_MARKER_API_BASE_URL: Optional[str] = None

Base URL for DataLab Marker API.

DATALAB_MARKER_ADDITIONAL_CONFIG
DATALAB_MARKER_ADDITIONAL_CONFIG: Optional[str] = None

Additional configuration for DataLab Marker.

DATALAB_MARKER_SKIP_CACHE
DATALAB_MARKER_SKIP_CACHE: Optional[bool] = None

Whether to skip cache for DataLab Marker.

DATALAB_MARKER_FORCE_OCR
DATALAB_MARKER_FORCE_OCR: Optional[bool] = None

Whether to force OCR for DataLab Marker.

DATALAB_MARKER_PAGINATE
DATALAB_MARKER_PAGINATE: Optional[bool] = None

Whether to paginate results for DataLab Marker.

DATALAB_MARKER_STRIP_EXISTING_OCR
DATALAB_MARKER_STRIP_EXISTING_OCR: Optional[bool] = None

Whether to strip existing OCR for DataLab Marker.

DATALAB_MARKER_DISABLE_IMAGE_EXTRACTION
DATALAB_MARKER_DISABLE_IMAGE_EXTRACTION: Optional[bool] = (
    None
)

Whether to disable image extraction for DataLab Marker.

DATALAB_MARKER_FORMAT_LINES
DATALAB_MARKER_FORMAT_LINES: Optional[bool] = None

Whether to format lines for DataLab Marker.

DATALAB_MARKER_USE_LLM
DATALAB_MARKER_USE_LLM: Optional[bool] = None

Whether to use LLM for DataLab Marker.

DATALAB_MARKER_OUTPUT_FORMAT
DATALAB_MARKER_OUTPUT_FORMAT: Optional[str] = None

Output format for DataLab Marker.

EXTERNAL_DOCUMENT_LOADER_URL
EXTERNAL_DOCUMENT_LOADER_URL: Optional[str] = None

URL for external document loader.

EXTERNAL_DOCUMENT_LOADER_API_KEY
EXTERNAL_DOCUMENT_LOADER_API_KEY: Optional[str] = None

API key for external document loader.

TIKA_SERVER_URL
TIKA_SERVER_URL: Optional[str] = None

URL for Tika server.

DOCLING_SERVER_URL
DOCLING_SERVER_URL: Optional[str] = None

URL for Docling server.

DOCLING_API_KEY
DOCLING_API_KEY: Optional[str] = None

API key for Docling.

DOCLING_PARAMS
DOCLING_PARAMS: Optional[Dict] = None

Parameters for Docling.

Dict Fields
  • image_export_mode (str, optional): How images should be exported. Defaults to "placeholder" if not specified.
  • Additional VLM (Vision Language Model) pipeline parameters may be supported by the Docling API.

This dictionary is passed directly to the Docling API's /v1/convert/file endpoint. See the Docling API documentation for additional supported parameters.

DOCUMENT_INTELLIGENCE_ENDPOINT
DOCUMENT_INTELLIGENCE_ENDPOINT: Optional[str] = None

Endpoint for Document Intelligence.

DOCUMENT_INTELLIGENCE_KEY
DOCUMENT_INTELLIGENCE_KEY: Optional[str] = None

Key for Document Intelligence.

DOCUMENT_INTELLIGENCE_MODEL
DOCUMENT_INTELLIGENCE_MODEL: Optional[str] = None

Model for Document Intelligence.

MISTRAL_OCR_API_BASE_URL
MISTRAL_OCR_API_BASE_URL: Optional[str] = None

Base URL for Mistral OCR API.

MISTRAL_OCR_API_KEY
MISTRAL_OCR_API_KEY: Optional[str] = None

API key for Mistral OCR.

MINERU_API_MODE
MINERU_API_MODE: Optional[str] = None

API mode for MinerU.

MINERU_API_URL
MINERU_API_URL: Optional[str] = None

URL for MinerU API.

MINERU_API_KEY
MINERU_API_KEY: Optional[str] = None

API key for MinerU.

MINERU_PARAMS
MINERU_PARAMS: Optional[Dict] = None

Parameters for MinerU.

Dict Fields
  • enable_ocr (bool, optional): Enable OCR processing. Defaults to False.
  • enable_formula (bool, optional): Enable formula processing. Defaults to True.
  • enable_table (bool, optional): Enable table processing. Defaults to True.
  • language (str, optional): Language code for processing. Defaults to "en".
  • model_version (str, optional): Model version to use. Defaults to "pipeline".
  • page_ranges (str, optional): Page ranges to process. Defaults to empty string.

This dictionary is passed directly to the MinerU API for document parsing configuration.

RAG_RERANKING_MODEL
RAG_RERANKING_MODEL: Optional[str] = None

Model for RAG reranking.

RAG_RERANKING_ENGINE
RAG_RERANKING_ENGINE: Optional[str] = None

Engine for RAG reranking.

RAG_EXTERNAL_RERANKER_URL
RAG_EXTERNAL_RERANKER_URL: Optional[str] = None

URL for external reranker.

RAG_EXTERNAL_RERANKER_API_KEY
RAG_EXTERNAL_RERANKER_API_KEY: Optional[str] = None

API key for external reranker.

TEXT_SPLITTER
TEXT_SPLITTER: Optional[str] = None

Text splitter to use.

CHUNK_SIZE
CHUNK_SIZE: Optional[int] = None

Size of text chunks.

CHUNK_OVERLAP
CHUNK_OVERLAP: Optional[int] = None

Overlap between text chunks.

FILE_MAX_SIZE
FILE_MAX_SIZE: Optional[int] = None

Maximum size of uploaded files.

FILE_MAX_COUNT
FILE_MAX_COUNT: Optional[int] = None

Maximum count of uploaded files.

FILE_IMAGE_COMPRESSION_WIDTH
FILE_IMAGE_COMPRESSION_WIDTH: Optional[int] = None

Width for image compression.

FILE_IMAGE_COMPRESSION_HEIGHT
FILE_IMAGE_COMPRESSION_HEIGHT: Optional[int] = None

Height for image compression.

ALLOWED_FILE_EXTENSIONS
ALLOWED_FILE_EXTENSIONS: Optional[List[str]] = None

List of allowed file extensions.

ENABLE_GOOGLE_DRIVE_INTEGRATION
ENABLE_GOOGLE_DRIVE_INTEGRATION: Optional[bool] = None

Whether to enable Google Drive integration.

ENABLE_ONEDRIVE_INTEGRATION
ENABLE_ONEDRIVE_INTEGRATION: Optional[bool] = None

Whether to enable OneDrive integration.

web
web: Optional[WebConfig] = None

Web search configuration.

ProcessFileForm

Bases: BaseModel

Form for processing a file.

Attributes

file_id
file_id: str

The ID of the file to process.

content
content: Optional[str] = None

The content of the file.

collection_name
collection_name: Optional[str] = None

The name of the collection.

ProcessTextForm

Bases: BaseModel

Form for processing text.

Attributes

name
name: str

The name of the text.

content
content: str

The text content.

collection_name
collection_name: Optional[str] = None

The name of the collection.

QueryDocForm

Bases: BaseModel

Form for querying a document.

Attributes

collection_name
collection_name: str

The name of the collection to query.

query
query: str

The search query.

k
k: Optional[int] = None

Number of results to retrieve.

k_reranker
k_reranker: Optional[int] = None

Number of results to rerank.

r
r: Optional[float] = None

Relevance threshold.

hybrid
hybrid: Optional[bool] = None

Whether to use hybrid search.

hybrid_bm25_weight
hybrid_bm25_weight: Optional[float] = None

Weight for BM25 in hybrid search.

QueryCollectionsForm

Bases: BaseModel

Form for querying multiple collections.

Attributes

collection_names
collection_names: List[str]

List of collection names to query.

query
query: str

The search query.

k
k: Optional[int] = None

Number of results to retrieve.

k_reranker
k_reranker: Optional[int] = None

Number of results to rerank.

r
r: Optional[float] = None

Relevance threshold.

hybrid
hybrid: Optional[bool] = None

Whether to use hybrid search.

hybrid_bm25_weight
hybrid_bm25_weight: Optional[float] = None

Weight for BM25 in hybrid search.

enable_enriched_texts
enable_enriched_texts: Optional[bool] = None

Whether to enable enriched texts.

DeleteForm

Bases: BaseModel

Form for deleting a file from a collection.

Attributes

collection_name
collection_name: str

The name of the collection.

file_id
file_id: str

The ID of the file to delete.

BatchProcessFilesForm

Bases: BaseModel

Form for batch processing files.

Attributes

files
files: List[FileModel]

List of files to process.

collection_name
collection_name: str

The name of the collection.

BatchProcessFilesResult

Bases: BaseModel

Result of a batch file processing operation.

Attributes

file_id
file_id: str

The ID of the file.

status
status: str

The status of the processing.

error
error: Optional[str] = None

The error message if processing failed.

BatchProcessFilesResponse

Bases: BaseModel

Response for batch process files request.

Attributes

results
results: List[BatchProcessFilesResult]

List of successful results.

errors

List of failed results.