retrieval
Retrieval, RAG configuration, and web search models.
Classes
CollectionNameForm
Bases: BaseModel
Form for specifying a collection name.
- Code Reference models retrieval Classes ProcessUrlForm
ProcessUrlForm
Bases: CollectionNameForm
Form for processing a URL.
- Code Reference routers retrieval Classes RetrievalClient Functions
SearchForm
Bases: BaseModel
Form for search queries.
- Code Reference routers retrieval Classes RetrievalClient Functions process_web_search
OpenAIConfigForm
Bases: BaseModel
Configuration for OpenAI embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes openai_config
OllamaConfigForm
Bases: BaseModel
Configuration for Ollama embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes ollama_config
AzureOpenAIConfigForm
Bases: BaseModel
Configuration for Azure OpenAI embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes azure_openai_config
EmbeddingModelUpdateForm
Bases: BaseModel
Form for updating the embedding model configuration.
- Code Reference routers retrieval Classes RetrievalClient Functions update_embedding_config
Attributes
openai_config
openai_config: Optional[OpenAIConfigForm] = None
Configuration for OpenAI embedding model.
ollama_config
ollama_config: Optional[OllamaConfigForm] = None
Configuration for Ollama embedding model.
azure_openai_config
azure_openai_config: Optional[AzureOpenAIConfigForm] = None
Configuration for Azure OpenAI embedding model.
RAG_EMBEDDING_ENGINE
The embedding engine to use (e.g., 'ollama', 'openai').
RAG_EMBEDDING_BATCH_SIZE
The batch size for embedding generation.
ENABLE_ASYNC_EMBEDDING
Whether to enable asynchronous embedding generation.
WebConfig
Bases: BaseModel
Configuration for web search and retrieval.
- Code Reference models retrieval Classes ConfigForm Attributes web
Attributes
ENABLE_WEB_SEARCH_CONFIRMATION
Whether users must confirm before a web search runs. When enabled, the client
UI shows the WEB_SEARCH_CONFIRMATION_CONTENT message and requires acknowledgement
before the search proceeds.
WEB_SEARCH_CONFIRMATION_CONTENT
Confirmation message shown to users before a web search runs, when
ENABLE_WEB_SEARCH_CONFIRMATION is enabled. Defaults to
'Your query will be sent to the configured web search provider.'.
WEB_SEARCH_TRUST_ENV
Whether to trust the environment variables for web search.
WEB_SEARCH_RESULT_COUNT
The number of web search results to retrieve.
WEB_SEARCH_CONCURRENT_REQUESTS
The number of concurrent web search requests.
WEB_LOADER_CONCURRENT_REQUESTS
The number of concurrent web loader requests.
WEB_SEARCH_DOMAIN_FILTER_LIST
List of domains to filter from web search results.
WEB_FETCH_MAX_CONTENT_LENGTH
Maximum content length in characters for web fetch results. Content exceeding this is truncated.
BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL
Whether to bypass embedding and retrieval for web search results.
BYPASS_WEB_SEARCH_WEB_LOADER
Whether to bypass the web loader for web search results.
OLLAMA_CLOUD_WEB_SEARCH_API_KEY
API key for Ollama Cloud web search.
GOOGLE_PSE_API_KEY
API key for Google Programmable Search Engine.
GOOGLE_PSE_ENGINE_ID
Engine ID for Google Programmable Search Engine.
BRAVE_SEARCH_CONTEXT_TOKENS
Maximum number of context tokens returned by Brave Search. Defaults to 8192.
SERPHOUSE_API_KEY
API key for SERPHouse Search (used when WEB_SEARCH_ENGINE == 'serphouse').
SERPHOUSE_DOMAIN
Search domain passed to SERPHouse as the domain query parameter, e.g.
'google.com' or 'bing.com'. Defaults to 'google.com' if empty.
BING_SEARCH_V7_ENDPOINT
The endpoint for Bing Search V7.
BING_SEARCH_V7_SUBSCRIPTION_KEY
The subscription key for Bing Search V7.
PERPLEXITY_SEARCH_CONTEXT_USAGE
The search context usage for Perplexity.
PERPLEXITY_SEARCH_API_URL
The search API URL for Perplexity.
MICROSOFT_WEB_IQ_API_BASE_URL
Base URL for the Microsoft Web IQ API (used for both search and page browsing),
selected when WEB_SEARCH_ENGINE == 'microsoft_web_iq' or
WEB_LOADER_ENGINE == 'microsoft_web_iq'. Defaults to 'https://api.microsoft.ai/v3'.
MICROSOFT_WEB_IQ_API_KEY
API key for the Microsoft Web IQ API (sent as the x-apikey header).
MICROSOFT_WEB_IQ_LANGUAGE
Language code forwarded to the Microsoft Web IQ API as the language field, e.g.
'en'. Defaults to 'en'.
ENABLE_WEB_LOADER_SSL_VERIFICATION
Whether to enable SSL verification for the web loader.
EXTERNAL_WEB_SEARCH_URL
The URL for external web search.
EXTERNAL_WEB_SEARCH_API_KEY
The API key for external web search.
EXTERNAL_WEB_LOADER_URL
The URL for external web loader.
EXTERNAL_WEB_LOADER_API_KEY
The API key for external web loader.
YOUTUBE_LOADER_LANGUAGE
List of languages for YouTube loader.
YOUTUBE_LOADER_PROXY_URL
The proxy URL for YouTube loader.
YOUTUBE_LOADER_TRANSLATION
The translation language for YouTube loader.
YANDEX_WEB_SEARCH_API_KEY
API key for Yandex Search.
YANDEX_WEB_SEARCH_CONFIG
JSON configuration string for Yandex search.
Dict Fields (when parsed as JSON):
- query (dict, optional): Query configuration options.
- searchType (str, optional): Search type, e.g., 'SEARCH_TYPE_COM'.
- Additional Yandex API parameters may be included.
Defaults to '{"query": {"searchType": "SEARCH_TYPE_COM"}}' if not specified.
LINKUP_SEARCH_PARAMS
Parameters for Linkup search.
Dict Fields
url(str, optional): Override endpoint URL. Defaults to 'https://api.linkup.so/v1/search'.depth(str, optional): Search depth. Typical values: 'standard', 'deep'. Defaults to 'standard'.outputType(str, optional): Output type. Typical values: 'sourcedAnswer', 'searchResults'. Defaults to 'sourcedAnswer'.- Additional Linkup API parameters may be included.
The dictionary is forwarded to the Linkup Search API as the JSON body (with q and maxResults
injected automatically). The special url key, if present, is popped and used as the request
endpoint instead of the default. See the Linkup API documentation for additional supported parameters.
ConfigForm
Bases: BaseModel
Configuration form for retrieval settings.
- Code Reference routers retrieval Classes RetrievalClient Functions update_config
Attributes
BYPASS_EMBEDDING_AND_RETRIEVAL
Whether to bypass embedding and retrieval.
ENABLE_RAG_HYBRID_SEARCH
Whether to enable hybrid search.
ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS
Whether to enable enriched texts for hybrid search.
RELEVANCE_THRESHOLD
Relevance threshold for search results.
CONTENT_EXTRACTION_ENGINE
Engine for content extraction.
PDF_LOADER_MODE
Mode for PDF loading. 'page' creates one document per page, 'single' combines all pages into one document for better chunking across page boundaries.
DATALAB_MARKER_API_BASE_URL
Base URL for DataLab Marker API.
DATALAB_MARKER_ADDITIONAL_CONFIG
Additional configuration for DataLab Marker.
DATALAB_MARKER_SKIP_CACHE
Whether to skip cache for DataLab Marker.
DATALAB_MARKER_FORCE_OCR
Whether to force OCR for DataLab Marker.
DATALAB_MARKER_PAGINATE
Whether to paginate results for DataLab Marker.
DATALAB_MARKER_STRIP_EXISTING_OCR
Whether to strip existing OCR for DataLab Marker.
DATALAB_MARKER_DISABLE_IMAGE_EXTRACTION
Whether to disable image extraction for DataLab Marker.
DATALAB_MARKER_FORMAT_LINES
Whether to format lines for DataLab Marker.
DATALAB_MARKER_USE_LLM
Whether to use LLM for DataLab Marker.
DATALAB_MARKER_OUTPUT_FORMAT
Output format for DataLab Marker.
EXTERNAL_DOCUMENT_LOADER_URL
URL for external document loader.
EXTERNAL_DOCUMENT_LOADER_API_KEY
API key for external document loader.
EXTERNAL_DOCUMENT_LOADER_HEADERS
Extra HTTP headers appended to requests sent to the external document loader
server, in addition to the auto-injected Content-Type, Authorization (built
from EXTERNAL_DOCUMENT_LOADER_API_KEY), and X-Filename. Only used when
CONTENT_EXTRACTION_ENGINE == 'external'.
Values are strings (any non-string value is coerced to a string at request time) and may contain template tokens that are substituted per uploaded file before the request is sent.
Dict Fields
<header-name>(str, optional): Any HTTP header name mapped to a string value. Values support case-sensitive template tokens that are replaced at request time, including{{FILE_ID}},{{FILE_NAME}},{{FILE_CONTENT_TYPE}},{{CHAT_ID}},{{MESSAGE_ID}},{{USER_MESSAGE_ID}},{{USER_MESSAGE_PARENT_ID}},{{USER_ID}},{{USER_NAME}},{{USER_EMAIL}},{{USER_ROLE}},{{USER_AGENT}}, and{{TASK}}.
Defaults to {} when unset. Example:
{"X-OpenWebUI-File-Id": "{{FILE_ID}}"}.
DOCLING_PARAMS
Parameters for Docling.
Dict Fields
image_export_mode(str, optional): How images should be exported. Defaults to "placeholder" if not specified.- Additional VLM (Vision Language Model) pipeline parameters may be supported by the Docling API.
This dictionary is passed directly to the Docling API's /v1/convert/file endpoint. See the Docling API documentation for additional supported parameters.
DOCUMENT_INTELLIGENCE_ENDPOINT
Endpoint for Document Intelligence.
DOCUMENT_INTELLIGENCE_KEY
Key for Document Intelligence.
DOCUMENT_INTELLIGENCE_MODEL
Model for Document Intelligence.
MISTRAL_OCR_API_BASE_URL
Base URL for Mistral OCR API.
MISTRAL_OCR_USE_BASE64
When True (and CONTENT_EXTRACTION_ENGINE == 'mistral_ocr'), send the PDF as a
base64 data URL inline instead of first uploading it to Mistral and referencing
the uploaded file. Defaults to False.
PADDLEOCR_VL_BASE_URL
Base URL for PaddleOCR VL service. Defaults to 'http://localhost:8080'.
PADDLEOCR_VL_TOKEN
Authentication token for PaddleOCR VL service.
MINERU_PARAMS
Parameters for MinerU.
Dict Fields
enable_ocr(bool, optional): Enable OCR processing. Defaults to False.enable_formula(bool, optional): Enable formula processing. Defaults to True.enable_table(bool, optional): Enable table processing. Defaults to True.language(str, optional): Language code for processing. Defaults to "en".model_version(str, optional): Model version to use. Defaults to "pipeline".page_ranges(str, optional): Page ranges to process. Defaults to empty string.
This dictionary is passed directly to the MinerU API for document parsing configuration.
MINERU_FILE_EXTENSIONS
List of file extensions that MinerU is allowed to process (e.g., ['pdf']).
Files uploaded to the system with extensions in this list are routed through the MinerU
content extraction engine when CONTENT_EXTRACTION_ENGINE is set to mineru. Frontend
typically accepts a comma-separated string (e.g., 'pdf') and splits it into a list.
Defaults to ['pdf'] if not specified.
RAG_RERANKING_BATCH_SIZE
Batch size for reranking operations. Defaults to 32.
RAG_EXTERNAL_RERANKER_URL
URL for external reranker.
RAG_EXTERNAL_RERANKER_API_KEY
API key for external reranker.
RAG_EXTERNAL_RERANKER_TIMEOUT
The timeout for the external reranker.
RAG_TOKENIZER_MODEL
HuggingFace tokenizer model id (or local path) used by the 'token_transformers'
text splitter (TEXT_SPLITTER == 'token_transformers') for token-based chunking.
A bare name with no path/slash is prefixed with 'sentence-transformers/'. When
empty, the backend falls back to the configured embedding model's tokenizer and
raises if none is available.
ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER
Whether to enable markdown header text splitter.
CHUNK_MIN_SIZE_TARGET
Minimum target size for text chunks.
FILE_IMAGE_COMPRESSION_WIDTH
Width for image compression.
FILE_IMAGE_COMPRESSION_HEIGHT
Height for image compression.
ALLOWED_FILE_EXTENSIONS
List of allowed file extensions.
ENABLE_GOOGLE_DRIVE_INTEGRATION
Whether to enable Google Drive integration.
ENABLE_ONEDRIVE_INTEGRATION
Whether to enable OneDrive integration.
ProcessFileForm
Bases: BaseModel
Form for processing a file.
- Code Reference routers retrieval Classes RetrievalClient Functions process_file
ProcessTextForm
Bases: BaseModel
Form for processing text.
- Code Reference routers retrieval Classes RetrievalClient Functions process_text
QueryDocForm
Bases: BaseModel
Form for querying a document.
- Code Reference routers retrieval Classes RetrievalClient Functions query_doc
Attributes
QueryCollectionsForm
Bases: BaseModel
Form for querying multiple collections.
- Code Reference routers retrieval Classes RetrievalClient Functions query_collection
Attributes
DeleteForm
Bases: BaseModel
Form for deleting a file from a collection.
- Code Reference routers retrieval Classes RetrievalClient Functions delete
BatchProcessFilesForm
Bases: BaseModel
Form for batch processing files.
- Code Reference routers retrieval Classes RetrievalClient Functions process_files_batch
Attributes
BatchProcessFilesResult
Bases: BaseModel
Result of a batch file processing operation.
- Code Reference models retrieval Classes BatchProcessFilesResponse Attributes
BatchProcessFilesResponse
Bases: BaseModel
Response for batch process files request.
- Code Reference routers retrieval Classes RetrievalClient Functions process_files_batch