Skip to content

audio

Classes

TTSConfigForm

Bases: BaseModel

Configuration for Text-to-Speech (TTS).

Attributes

OPENAI_API_BASE_URL
OPENAI_API_BASE_URL: str

Base URL for OpenAI-compatible TTS API.

OPENAI_API_KEY
OPENAI_API_KEY: str

API Key for OpenAI-compatible TTS API.

OPENAI_PARAMS
OPENAI_PARAMS: Optional[Dict] = None

Additional parameters for OpenAI TTS requests.

Dict Fields
  • model (str, optional): ID of the model to use. Defaults to tts-1 or tts-1-hd
  • voice (str, optional): The voice to use for speech. Options: alloy, echo, fable, nova, onyx, shimmer
  • response_format (str, optional): Format of the returned audio. Options: mp3, opus, aac, flac, wav, pcm
  • speed (float, optional): The speed of the generated audio. Must be between 0.25 and 4.0. Default is 1.0
  • Any other parameters supported by the OpenAI TTS API can be included
API_KEY
API_KEY: str

API Key for other TTS engines (e.g. ElevenLabs, Azure).

ENGINE
ENGINE: str

The TTS engine to use (e.g. 'openai', 'elevenlabs', 'azure', 'transformers').

MODEL
MODEL: str

The model identifier to use (e.g. 'tts-1', 'eleven_multilingual_v2').

VOICE
VOICE: str

The voice identifier to use.

SPLIT_ON
SPLIT_ON: str

Character or pattern to split text on (e.g. punctuation).

AZURE_SPEECH_REGION
AZURE_SPEECH_REGION: str

Azure Speech region (if using Azure engine).

AZURE_SPEECH_BASE_URL
AZURE_SPEECH_BASE_URL: str

Azure Speech base URL (optional override).

AZURE_SPEECH_OUTPUT_FORMAT
AZURE_SPEECH_OUTPUT_FORMAT: str

Azure Speech output format.

STTConfigForm

Bases: BaseModel

Configuration for Speech-to-Text (STT).

Attributes

OPENAI_API_BASE_URL
OPENAI_API_BASE_URL: str

Base URL for OpenAI-compatible STT API.

OPENAI_API_KEY
OPENAI_API_KEY: str

API Key for OpenAI-compatible STT API.

ENGINE
ENGINE: str

The STT engine to use (e.g. 'openai', 'deepgram', 'azure', 'mistral', or empty for local Whisper).

MODEL
MODEL: str

The model identifier to use.

SUPPORTED_CONTENT_TYPES
SUPPORTED_CONTENT_TYPES: List[str] = []

List of supported content types (MIME types) for uploads.

WHISPER_MODEL
WHISPER_MODEL: str

Local Whisper model name (e.g. 'base', 'small').

DEEPGRAM_API_KEY
DEEPGRAM_API_KEY: str

Deepgram API Key.

AZURE_API_KEY
AZURE_API_KEY: str

Azure Speech API Key for STT.

AZURE_REGION
AZURE_REGION: str

Azure Speech region for STT.

AZURE_LOCALES
AZURE_LOCALES: str

Comma-separated list of Azure locales.

AZURE_BASE_URL
AZURE_BASE_URL: str

Azure Speech base URL for STT.

AZURE_MAX_SPEAKERS
AZURE_MAX_SPEAKERS: str

Maximum number of speakers for Azure diarization.

MISTRAL_API_KEY
MISTRAL_API_KEY: str

Mistral API Key.

MISTRAL_API_BASE_URL
MISTRAL_API_BASE_URL: str

Mistral API Base URL.

MISTRAL_USE_CHAT_COMPLETIONS
MISTRAL_USE_CHAT_COMPLETIONS: bool

Whether to use Mistral Chat Completions API (for audio input) instead of Transcription API.

AudioConfigUpdateForm

Bases: BaseModel

Form for updating audio configuration (TTS and STT).

Attributes

tts

TTS configuration.

stt

STT configuration.