retrieval
Classes
CollectionNameForm
Bases: BaseModel
Form for specifying a collection name.
- Code Reference models retrieval Classes ProcessUrlForm
ProcessUrlForm
Bases: CollectionNameForm
Form for processing a URL.
- Code Reference routers retrieval Classes RetrievalClient Functions
SearchForm
Bases: BaseModel
Form for search queries.
- Code Reference routers retrieval Classes RetrievalClient Functions process_web_search
OpenAIConfigForm
Bases: BaseModel
Configuration for OpenAI embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes openai_config
OllamaConfigForm
Bases: BaseModel
Configuration for Ollama embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes ollama_config
AzureOpenAIConfigForm
Bases: BaseModel
Configuration for Azure OpenAI embedding model.
- Code Reference models retrieval Classes EmbeddingModelUpdateForm Attributes azure_openai_config
EmbeddingModelUpdateForm
Bases: BaseModel
Form for updating the embedding model configuration.
- Code Reference routers retrieval Classes RetrievalClient Functions update_embedding_config
Attributes
openai_config
openai_config: Optional[OpenAIConfigForm] = None
Configuration for OpenAI embedding model.
ollama_config
ollama_config: Optional[OllamaConfigForm] = None
Configuration for Ollama embedding model.
azure_openai_config
azure_openai_config: Optional[AzureOpenAIConfigForm] = None
Configuration for Azure OpenAI embedding model.
RAG_EMBEDDING_ENGINE
The embedding engine to use (e.g., 'ollama', 'openai').
RAG_EMBEDDING_BATCH_SIZE
The batch size for embedding generation.
WebConfig
Bases: BaseModel
Configuration for web search and retrieval.
- Code Reference models retrieval Classes ConfigForm Attributes web
Attributes
WEB_SEARCH_TRUST_ENV
Whether to trust the environment variables for web search.
WEB_SEARCH_RESULT_COUNT
The number of web search results to retrieve.
WEB_SEARCH_CONCURRENT_REQUESTS
The number of concurrent web search requests.
WEB_LOADER_CONCURRENT_REQUESTS
The number of concurrent web loader requests.
WEB_SEARCH_DOMAIN_FILTER_LIST
List of domains to filter from web search results.
BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL
Whether to bypass embedding and retrieval for web search results.
BYPASS_WEB_SEARCH_WEB_LOADER
Whether to bypass the web loader for web search results.
OLLAMA_CLOUD_WEB_SEARCH_API_KEY
API key for Ollama Cloud web search.
GOOGLE_PSE_API_KEY
API key for Google Programmable Search Engine.
GOOGLE_PSE_ENGINE_ID
Engine ID for Google Programmable Search Engine.
BING_SEARCH_V7_ENDPOINT
The endpoint for Bing Search V7.
BING_SEARCH_V7_SUBSCRIPTION_KEY
The subscription key for Bing Search V7.
PERPLEXITY_SEARCH_CONTEXT_USAGE
The search context usage for Perplexity.
PERPLEXITY_SEARCH_API_URL
The search API URL for Perplexity.
ENABLE_WEB_LOADER_SSL_VERIFICATION
Whether to enable SSL verification for the web loader.
EXTERNAL_WEB_SEARCH_URL
The URL for external web search.
EXTERNAL_WEB_SEARCH_API_KEY
The API key for external web search.
EXTERNAL_WEB_LOADER_URL
The URL for external web loader.
EXTERNAL_WEB_LOADER_API_KEY
The API key for external web loader.
YOUTUBE_LOADER_LANGUAGE
List of languages for YouTube loader.
YOUTUBE_LOADER_PROXY_URL
The proxy URL for YouTube loader.
ConfigForm
Bases: BaseModel
Configuration form for retrieval settings.
- Code Reference routers retrieval Classes RetrievalClient Functions update_config
Attributes
BYPASS_EMBEDDING_AND_RETRIEVAL
Whether to bypass embedding and retrieval.
ENABLE_RAG_HYBRID_SEARCH
Whether to enable hybrid search.
ENABLE_RAG_HYBRID_SEARCH_ENRICHED_TEXTS
Whether to enable enriched texts for hybrid search.
RELEVANCE_THRESHOLD
Relevance threshold for search results.
CONTENT_EXTRACTION_ENGINE
Engine for content extraction.
DATALAB_MARKER_API_BASE_URL
Base URL for DataLab Marker API.
DATALAB_MARKER_ADDITIONAL_CONFIG
Additional configuration for DataLab Marker.
DATALAB_MARKER_SKIP_CACHE
Whether to skip cache for DataLab Marker.
DATALAB_MARKER_FORCE_OCR
Whether to force OCR for DataLab Marker.
DATALAB_MARKER_PAGINATE
Whether to paginate results for DataLab Marker.
DATALAB_MARKER_STRIP_EXISTING_OCR
Whether to strip existing OCR for DataLab Marker.
DATALAB_MARKER_DISABLE_IMAGE_EXTRACTION
Whether to disable image extraction for DataLab Marker.
DATALAB_MARKER_FORMAT_LINES
Whether to format lines for DataLab Marker.
DATALAB_MARKER_USE_LLM
Whether to use LLM for DataLab Marker.
DATALAB_MARKER_OUTPUT_FORMAT
Output format for DataLab Marker.
EXTERNAL_DOCUMENT_LOADER_URL
URL for external document loader.
EXTERNAL_DOCUMENT_LOADER_API_KEY
API key for external document loader.
DOCLING_PARAMS
Parameters for Docling.
Dict Fields
image_export_mode(str, optional): How images should be exported. Defaults to "placeholder" if not specified.- Additional VLM (Vision Language Model) pipeline parameters may be supported by the Docling API.
This dictionary is passed directly to the Docling API's /v1/convert/file endpoint. See the Docling API documentation for additional supported parameters.
DOCUMENT_INTELLIGENCE_ENDPOINT
Endpoint for Document Intelligence.
DOCUMENT_INTELLIGENCE_KEY
Key for Document Intelligence.
DOCUMENT_INTELLIGENCE_MODEL
Model for Document Intelligence.
MISTRAL_OCR_API_BASE_URL
Base URL for Mistral OCR API.
MINERU_PARAMS
Parameters for MinerU.
Dict Fields
enable_ocr(bool, optional): Enable OCR processing. Defaults to False.enable_formula(bool, optional): Enable formula processing. Defaults to True.enable_table(bool, optional): Enable table processing. Defaults to True.language(str, optional): Language code for processing. Defaults to "en".model_version(str, optional): Model version to use. Defaults to "pipeline".page_ranges(str, optional): Page ranges to process. Defaults to empty string.
This dictionary is passed directly to the MinerU API for document parsing configuration.
RAG_EXTERNAL_RERANKER_URL
URL for external reranker.
RAG_EXTERNAL_RERANKER_API_KEY
API key for external reranker.
FILE_IMAGE_COMPRESSION_WIDTH
Width for image compression.
FILE_IMAGE_COMPRESSION_HEIGHT
Height for image compression.
ALLOWED_FILE_EXTENSIONS
List of allowed file extensions.
ENABLE_GOOGLE_DRIVE_INTEGRATION
Whether to enable Google Drive integration.
ENABLE_ONEDRIVE_INTEGRATION
Whether to enable OneDrive integration.
ProcessFileForm
Bases: BaseModel
Form for processing a file.
- Code Reference routers retrieval Classes RetrievalClient Functions process_file
ProcessTextForm
Bases: BaseModel
Form for processing text.
- Code Reference routers retrieval Classes RetrievalClient Functions process_text
QueryDocForm
Bases: BaseModel
Form for querying a document.
- Code Reference routers retrieval Classes RetrievalClient Functions query_doc
Attributes
QueryCollectionsForm
Bases: BaseModel
Form for querying multiple collections.
- Code Reference routers retrieval Classes RetrievalClient Functions query_collection
Attributes
DeleteForm
Bases: BaseModel
Form for deleting a file from a collection.
- Code Reference routers retrieval Classes RetrievalClient Functions delete
BatchProcessFilesForm
Bases: BaseModel
Form for batch processing files.
- Code Reference routers retrieval Classes RetrievalClient Functions process_files_batch
Attributes
BatchProcessFilesResult
Bases: BaseModel
Result of a batch file processing operation.
- Code Reference models retrieval Classes BatchProcessFilesResponse Attributes
BatchProcessFilesResponse
Bases: BaseModel
Response for batch process files request.
- Code Reference routers retrieval Classes RetrievalClient Functions process_files_batch