Vocals SDK Python

A Python SDK for voice processing and real-time audio communication with AI assistants. Stream microphone input or audio files to receive live transcription, AI responses, and text-to-speech audio.

Features

🎤 Real-time microphone streaming with voice activity detection
📁 Audio file playback support (WAV format)
✨ Live transcription with partial and final results
🤖 Streaming AI responses with real-time text display
🔊 Text-to-speech playback with automatic audio queueing
📊 Conversation tracking and session statistics
🚀 Easy setup with minimal configuration required
🔄 Auto-reconnection and robust error handling

Prerequisites

Before using the SDK, make sure you have:

Python 3.8 or higher
A Vocals API key (set as VOCALS_DEV_API_KEY environment variable)
Working microphone and audio output

Getting Started? Check out the Installation Guide and Quick Start first.

Class-Based API

The Vocals SDK now uses a class-based API as the primary interface. This provides better resource management, cleaner code organization, and context manager support.

Basic Usage

import asyncio
from vocals import VocalsClient

async def main():
    # Create client instance
    client = VocalsClient()

    # Stream microphone
    await client.stream_microphone(duration=10.0)

    # Clean up
    await client.disconnect()
    client.cleanup()

asyncio.run(main())

Context Manager Support (Recommended)

The client supports context managers for automatic resource cleanup:

import asyncio
from vocals import VocalsClient

async def main():
    async with VocalsClient() as client:
        await client.stream_microphone(duration=10.0)
        # Automatic cleanup when exiting context

asyncio.run(main())

API Reference

SDK Modes

The Vocals SDK supports two usage patterns:

Default Experience (No Modes)

# Full experience with automatic handlers, playback, and console output
client = VocalsClient()

Controlled Experience (With Modes)

# Controlled experience - you handle all logic
client = VocalsClient(modes=['transcription', 'voice_assistant'])

Available Modes:

'transcription': Enables transcription-related processing
'voice_assistant': Enables AI response handling

Core Functions

`VocalsClient(modes=None)`

Creates a new Vocals SDK client instance.

Parameters:

modes (list, optional): List of modes to enable. Available modes:
- 'transcription': Enables transcription processing
- 'voice_assistant': Enables AI response handling

Returns: VocalsClient instance

Example:

# Default experience (full auto-contained)
client = VocalsClient()

# Controlled experience (manual handling)
client = VocalsClient(modes=['transcription', 'voice_assistant'])

`client.stream_microphone(...)`

Stream microphone input for real-time processing.

Parameters:

duration (float): Recording duration in seconds (0 for infinite)
auto_connect (bool): Auto-connect to service if needed
auto_playback (bool): Auto-play received audio
verbose (bool): Enable verbose output
stats_tracking (bool): Track session statistics
amplitude_threshold (float): Voice activity detection threshold

Returns: Dictionary with session statistics

Example:

stats = await client.stream_microphone(
    duration=30.0,
    auto_connect=True,
    auto_playback=True,
    stats_tracking=True
)

`client.stream_audio_file(filepath)`

Stream audio file for processing.

Parameters:

filepath (str): Path to audio file (WAV format)

Example:

await client.stream_audio_file("path/to/audio.wav")

`client.on_message(handler)`

Parameters:

handler (function): Function to handle incoming messages

Message Types:

transcription: Speech-to-text results
llm_response: AI assistant responses
tts_audio: Text-to-speech audio data
speech_interruption: Speech interruption events

Example:

def handle_message(message):
    if message.type == "transcription":
        print(f"Transcription: {message.data.get('text')}")
    elif message.type == "llm_response":
        print(f"AI Response: {message.data.get('response')}")

client.on_message(handle_message)

`client.on_connection_change(handler)`

Parameters:

handler (function): Function to handle connection state changes

States:

CONNECTING: Attempting to connect
CONNECTED: Successfully connected
DISCONNECTED: Disconnected from service

Example:

def handle_connection(state):
    print(f"Connection state: {state.name}")

client.on_connection_change(handle_connection)

`client.play_audio()`

Manually play queued audio segments.

Example:

await client.play_audio()

`client.stop_recording()`

Stop the current recording session.

Example:

await client.stop_recording()

`client.disconnect()`

Disconnect from the Vocals service.

Example:

await client.disconnect()

`client.cleanup()`

Clean up resources and close connections.

Example:

client.cleanup()

Helper Functions

`create_conversation_tracker()`

Create a conversation tracker for logging and statistics.

Returns: Conversation tracker instance

Example:

tracker = create_conversation_tracker()
tracker["add_transcription"]("Hello world", False)
tracker["add_response"]("Hi there!")
tracker["print_conversation"]()

`create_enhanced_message_handler()`

Create an enhanced message handler with automatic formatting.

Returns: Enhanced message handler function

`create_default_connection_handler()`

Create a default connection state handler.

Returns: Default connection handler function

`create_default_error_handler()`

Create a default error handler.

Returns: Default error handler function

Command Line Interface

The SDK includes command-line tools for quick testing:

# Run setup wizard
vocals setup

# Test installation
vocals test

# Run demo
vocals demo

Error Handling

The SDK includes robust error handling and auto-reconnection:

client = VocalsClient()

try:
    await client.stream_microphone(duration=30.0)
except Exception as e:
    print(f"Error: {e}")
finally:
    await client.disconnect()
    client.cleanup()

Advanced Features

For advanced usage patterns, see:

Advanced Usage - Enhanced streaming and conversation tracking
Custom Audio Processing - Custom audio handlers

Troubleshooting

Common Issues

Audio device not found
- Ensure microphone is connected and working
- Check system audio settings
API key not set
- Set VOCALS_DEV_API_KEY environment variable
- Or create .env file with your API key
Connection failed
- Check internet connection
- Verify API key is valid

Getting Help

License

This project is licensed under the MIT License - see the LICENSE file for details.

Features​

Prerequisites​

Class-Based API​

Basic Usage​

Context Manager Support (Recommended)​

API Reference​

SDK Modes​

Default Experience (No Modes)​

Controlled Experience (With Modes)​

Core Functions​

VocalsClient(modes=None)​

client.stream_microphone(...)​

client.stream_audio_file(filepath)​

client.on_message(handler)​

client.on_connection_change(handler)​

client.play_audio()​

client.stop_recording()​

client.disconnect()​

client.cleanup()​

Helper Functions​

create_conversation_tracker()​

create_enhanced_message_handler()​

create_default_connection_handler()​

create_default_error_handler()​

Command Line Interface​

Error Handling​

Advanced Features​

Troubleshooting​

Common Issues​

Getting Help​

License​

Features

Prerequisites

Class-Based API

Basic Usage

Context Manager Support (Recommended)

API Reference

SDK Modes

Default Experience (No Modes)

Controlled Experience (With Modes)

Core Functions

`VocalsClient(modes=None)`

`client.stream_microphone(...)`

`client.stream_audio_file(filepath)`

`client.on_message(handler)`

`client.on_connection_change(handler)`

`client.play_audio()`

`client.stop_recording()`

`client.disconnect()`

`client.cleanup()`

Helper Functions

`create_conversation_tracker()`

`create_enhanced_message_handler()`

`create_default_connection_handler()`

`create_default_error_handler()`

Command Line Interface

Error Handling

Advanced Features

Troubleshooting

Common Issues

Getting Help

License