Converse API

Converse API
Type RESTful API
Framework FastAPI (Python)
Authentication Firebase Auth (JWT)
Database Cloud Firestore
AI Provider Pollinations AI
TTS Engine Edge TTS
OCR Service OCR.space

The Converse API is a comprehensive backend service that provides conversational AI capabilities, real-time messaging, optical character recognition, text-to-speech synthesis, and image generation. Built on FastAPI with Firebase integration, it serves as the backbone for multi-modal communication applications.

Overview

Converse API represents a modern approach to conversational platforms, integrating multiple AI services into a unified interface. The system employs a microservices-oriented architecture where each functional domain (AI chat, OCR, TTS, messaging) operates as a discrete service boundary while sharing common authentication and storage infrastructure.

The API follows RESTful principles with JSON as the primary data interchange format. All endpoints requiring user context implement bearer token authentication via Firebase ID tokens, ensuring secure, stateless request validation. Cross-Origin Resource Sharing (CORS) is configured to allow unrestricted access, suitable for public-facing applications.

Key Features

Architecture

System Design

The architecture follows a layered design pattern consisting of presentation (API endpoints), business logic (service functions), and data access (Firebase SDK) layers. The system maintains separation of concerns through dedicated modules for authentication, AI processing, messaging, and media handling.

Technology Stack

Component Technology Purpose
Web Framework FastAPI 0.x Asynchronous HTTP server with automatic OpenAPI documentation
Authentication Firebase Admin SDK JWT verification and user identity management
Database Cloud Firestore NoSQL document store with real-time capabilities
HTTP Client HTTPX Async HTTP requests to external services
Image Processing Pillow (PIL) Image format conversion and optimization
TTS Engine edge-tts Microsoft Edge Text-to-Speech synthesis

Data Flow

Request processing follows this sequence: (1) CORS middleware validates origin, (2) endpoint handler receives request, (3) authentication middleware verifies Firebase token, (4) business logic processes data with external service calls as needed, (5) Firestore transactions update state, (6) response serialization returns JSON or streaming media.

Authentication

The API implements Firebase Authentication using ID tokens as bearer credentials. All protected endpoints expect an Authorization header containing a Firebase-issued JWT token.

Token Verification Process

The verify_user() function extracts the bearer token from the Authorization header and validates it against Firebase's public keys. Upon successful verification, the function returns the authenticated user's UID (User Identifier), which serves as the primary key for all user-scoped operations.

Authorization: Bearer <FIREBASE_ID_TOKEN>
Token Lifecycle: Firebase ID tokens have a 1-hour expiration. Client applications must implement token refresh logic using Firebase Auth SDK's automatic refresh mechanism.

Authentication Flow

  1. Client authenticates with Firebase Authentication (email/password, OAuth, etc.)
  2. Firebase returns an ID token to the client
  3. Client includes token in Authorization header for all API requests
  4. API validates token signature and expiration using Firebase Admin SDK
  5. API extracts UID from validated token for user-specific operations

API Endpoints

GET /

Health Check & Documentation

Returns this comprehensive HTML documentation page. Used for service monitoring and developer reference.

Response

Content-Type: text/html Status: 200 OK [This documentation page]
POST /user/init

User Initialization

Creates or updates user profile information in Firestore. This endpoint should be called after initial authentication to establish user presence in the system and register FCM tokens for push notifications.

Authentication

Required. Firebase ID token in Authorization header.

Request Body

Field Type Required Description
name string No User's display name
email string No User's email address
fcm_token string Yes Firebase Cloud Messaging device token
Example Request
POST /user/init Authorization: Bearer eyJhbGc... Content-Type: application/json { "name": "John Doe", "email": "john@example.com", "fcm_token": "fX9Y2Z..." }

Response

{ "status": "ok" }
Implementation Detail: This endpoint uses Firestore's set() with merge=True, which performs an upsert operation. Existing user data is preserved unless explicitly overwritten by new values.
POST /ask

AI Conversation

Processes natural language input through the AI system, generating a textual response and corresponding visual representation. The conversation history is persisted to Firestore for retrieval via the history endpoint.

Authentication

Required. Firebase ID token in Authorization header.

Request Body

Field Type Required Description
message string Yes User's natural language input

Processing Pipeline

  1. User Verification: Validates Firebase token and ensures user document exists
  2. AI Processing: Sends message to Pollinations AI with system prompt context
  3. Image Generation: Uses AI response as prompt for image generation API
  4. Image Optimization: Converts generated image to JPEG format at 85% quality
  5. Data Persistence: Stores conversation turn (user message, AI reply, image) in Firestore
  6. Response Serialization: Returns AI text and base64-encoded image
Example Request
POST /ask Authorization: Bearer eyJhbGc... Content-Type: application/json { "message": "Explain quantum entanglement" }

Response

{ "reply": "Quantum entanglement is a phenomenon...", "image_b64": "/9j/4AAQSkZJRgABAQAA..." }
Timeout Consideration: Image generation can take 30-60 seconds. Client implementations should set appropriate timeout values and implement loading states.

Firestore Storage Schema

Chat turns are stored in: users/{uid}/chats/{auto_id}

{ "user": "string", // User's message "reply": "string", // AI's response "image_b64": "string", // Base64 JPEG "ts": float // Unix timestamp }
POST /ocr

Optical Character Recognition

Extracts text content from uploaded images using OCR.space API, then provides an AI-powered explanation of what the image contains. Returns a natural language description rather than raw OCR text.

Authentication

Required. Firebase ID token in Authorization header.

Request

Multipart form data with file upload:

Field Type Description
file UploadFile Image file containing text to extract

OCR Configuration

  • Language: English (eng) - configurable via language parameter
  • Overlay: Disabled - returns only text without positional metadata
  • Processing: Executed in thread pool to avoid blocking event loop
  • AI Enhancement: Extracted text is sent to AI for natural explanation
Example Request (cURL)
curl -X POST https://api.example.com/ocr -H "Authorization: Bearer eyJhbGc..." -F "file=@document.jpg"

Response

{ "text": "This appears to be a restaurant receipt showing a dinner purchase..." }
AI Enhancement: Instead of returning raw OCR text, the system uses AI to explain what the image seems to be about in natural, conversational language, making it more accessible and understandable.

Error Handling

If OCR fails or no text is detected, returns:

{ "text": "I couldn't clearly understand what the picture was showing." }
POST /speech

Text-to-Speech Synthesis

Converts text input to natural-sounding speech using Microsoft Edge's neural TTS engine. Returns streaming MP3 audio with prosody enhancements for improved naturalness.

Authentication

Required. Firebase ID token in Authorization header.

Request Body

Field Type Required Description
text string Yes Text content to synthesize
voice string No Voice profile identifier (default: en-US-JennyNeural)

Speech Processing

The for_speech() function applies prosody modifications to improve speech naturalness:

  • Breathing Pauses: Adds line breaks after sentences (periods, question marks)
  • Thinking Pauses: Replaces em-dashes and semicolons with ellipses
  • Pacing Control: Prepends ellipsis to short responses to prevent rushed delivery
Example Request
POST /speech Authorization: Bearer eyJhbGc... Content-Type: application/json { "text": "Welcome to Converse API. How can I assist you today?", "voice": "en-US-GuyNeural" }

Response

Streaming audio response:

Content-Type: audio/mpeg Content-Disposition: inline; filename=speech.mp3 Cache-Control: no-store [MP3 audio stream]

Available Voices

The system supports all Microsoft Edge TTS voices. Common options include:

  • en-US-JennyNeural - Female, American English (default)
  • en-US-GuyNeural - Male, American English
  • en-GB-SoniaNeural - Female, British English
  • en-AU-NatashaNeural - Female, Australian English
Streaming Architecture: Audio is generated and transmitted in chunks using async iteration, enabling low-latency playback for long texts without buffering the entire audio file.
GET /image

Image Generation Proxy

Generates images from text prompts using Pollinations AI's image synthesis API. Acts as a proxy with image optimization and caching headers.

Query Parameters

Parameter Type Required Description
prompt string Yes Text description of desired image

Processing Steps

  1. URL-encodes the prompt parameter
  2. Requests image from Pollinations API (60-second timeout)
  3. Converts image to RGB color space
  4. Re-encodes as JPEG at 90% quality
  5. Returns optimized image with cache headers
Example Request
GET /image?prompt=serene%20mountain%20landscape%20at%20sunset

Response

Content-Type: image/jpeg Cache-Control: public, max-age=3600 [JPEG image data]
Error Handling: If image generation fails or produces invalid data, the endpoint returns 204 No Content rather than an error response, allowing clients to handle the absence gracefully.
GET /history

Chat History Retrieval

Fetches the authenticated user's complete AI conversation history, ordered chronologically by timestamp.

Authentication

Required. Firebase ID token in Authorization header.

Response

{ "chats": [ { "user": "User's message", "reply": "AI's response", "image_b64": "base64_encoded_image_data", "ts": 1704398765.123 }, ... ] }
Persistence: All conversations are stored in users/{uid}/chats collection in Firestore, enabling full history retrieval and cross-device synchronization.
POST /chat/send

Send Chat Message

Sends a message to another user, creating or updating a conversation. Implements WhatsApp-style chat architecture with real-time WebSocket delivery and FCM fallback notifications.

Authentication

Required. Firebase ID token in Authorization header.

Request Body

Field Type Required Description
to_uid string Yes Recipient's user ID
text string Yes Message content

Processing Flow

  1. User Validation: Verifies sender authentication and prevents self-messaging
  2. Recipient Verification: Ensures target user exists in Firestore
  3. Conversation Management: Creates or updates conversation using deterministic ID (sorted UIDs)
  4. Message Storage: Persists message in conversations/{convo_id}/messages
  5. Chat Index Update: Updates both sender and recipient's chat index with unread counters
  6. Real-time Delivery: Attempts WebSocket delivery to active connections
  7. Push Notification: Sends FCM notification as fallback for offline users
Example Request
POST /chat/send Authorization: Bearer eyJhbGc... Content-Type: application/json { "to_uid": "abc123xyz", "text": "Hey! How are you doing?" }

Response

{ "status": "sent", "convo_id": "abc123xyz_def456uvw", "message_id": "msg_auto_generated_id", "ts": 1704398765.123 }

Conversation ID Structure

Conversation IDs are deterministic, formed by sorting participant UIDs alphabetically and joining with underscore:

convo_id = "_".join(sorted([uid1, uid2])) # Example: "alice123_bob456"
Delivery Guarantees: The system attempts real-time WebSocket delivery first. If the recipient is offline or WebSocket fails, FCM push notification serves as a fallback. Messages are always persisted to Firestore regardless of delivery method success.

Firestore Data Structure

Conversation Document: conversations/{convo_id}

{ "participants": ["uid1", "uid2"], "last_message": "Latest message text", "updated_at": 1704398765.123 }

Message Document: conversations/{convo_id}/messages/{message_id}

{ "id": "message_id", "sender": "uid1", "text": "Message content", "ts": 1704398765.123, "seen": false, "edited": false }

Chat Index Document: users/{uid}/chat_index/{convo_id}

{ "buddy_uid": "other_user_uid", "buddy_name": "Display Name", "buddy_photo": "photo_url", "last_message": "Latest message preview", "updated_at": 1704398765.123, "unread_count": 3 }
GET /chat/list

List User Conversations

Retrieves all conversations the authenticated user is participating in, ordered by most recent activity.

Authentication

Required. Firebase ID token in Authorization header.

Query Strategy

Uses Firestore's array-contains query to find conversations where the user is a participant:

conversations .where("participants", "array_contains", uid) .order_by("updated_at", direction="DESCENDING")
Example Request
GET /chat/list Authorization: Bearer eyJhbGc...

Response

[ { "convo_id": "alice123_bob456", "participants": ["alice123", "bob456"], "last_message": "See you tomorrow!", "updated_at": 1704398765.123 }, { "convo_id": "alice123_charlie789", "participants": ["alice123", "charlie789"], "last_message": "Thanks for the help", "updated_at": 1704395000.000 } ]
Performance Consideration: This endpoint queries the global conversations collection. For applications with high user counts, consider using the /chat/index endpoint which reads from a user-specific subcollection for better performance.
GET /chat/messages/{convo_id}

Get Conversation Messages

Retrieves all messages from a specific conversation in chronological order. Implements permission checking to ensure users can only access conversations they participate in.

Authentication

Required. Firebase ID token in Authorization header.

Path Parameters

Parameter Type Description
convo_id string Conversation identifier (format: uid1_uid2)

Authorization Check

The endpoint verifies:

  1. Conversation exists in Firestore
  2. Authenticated user is in the conversation's participants array
Example Request
GET /chat/messages/alice123_bob456 Authorization: Bearer eyJhbGc...

Response

[ { "id": "msg_001", "sender": "alice123", "text": "Hi Bob!", "ts": 1704390000.000, "seen": true, "edited": false }, { "id": "msg_002", "sender": "bob456", "text": "Hey Alice, how are you?", "ts": 1704390100.000, "seen": true, "edited": false }, { "id": "msg_003", "sender": "alice123", "text": "Doing great! Thanks for asking.", "ts": 1704390200.000, "seen": false, "edited": false } ]

Error Responses

Status Code Condition Description
404 Conversation not found Specified convo_id does not exist
403 Not allowed User is not a participant in this conversation
Message Ordering: Messages are always returned in chronological order (oldest first) using Firestore's order_by("ts") to maintain conversation flow consistency.
GET /chat/index

Get User Chat Index

Retrieves the authenticated user's personalized chat index, providing a WhatsApp-style conversation list with buddy information, message previews, and unread counts. This is the recommended endpoint for building chat list UIs.

Authentication

Required. Firebase ID token in Authorization header.

Architecture Advantage

Unlike /chat/list which queries the global conversations collection, this endpoint reads from a user-specific subcollection (users/{uid}/chat_index), providing:

  • O(1) read complexity regardless of total conversation count
  • Denormalized buddy information (name, photo) for instant display
  • Per-user unread counters
  • Optimized for client-side rendering
Example Request
GET /chat/index Authorization: Bearer eyJhbGc...

Response

[ { "convo_id": "alice123_bob456", "buddy_uid": "bob456", "buddy_name": "Bob Smith", "buddy_photo": "https://example.com/bob.jpg", "last_message": "See you tomorrow!", "updated_at": 1704398765.123, "unread_count": 2 }, { "convo_id": "alice123_charlie789", "buddy_uid": "charlie789", "buddy_name": "Charlie Brown", "buddy_photo": null, "last_message": "Thanks!", "updated_at": 1704395000.000, "unread_count": 0 } ]
Performance Best Practice: Use this endpoint for chat list UI instead of /chat/list. The denormalized structure eliminates the need for additional queries to fetch participant details.

Unread Counter Management

The unread_count field is automatically managed:

  • Incremented: When a message is received (via /chat/send)
  • Reset to 0: When the user sends a message in that conversation
  • Manual Reset: Client can implement mark-as-read by updating the counter directly
GET /notify-ui

Admin Notification Interface

Web-based admin panel for sending push notifications to users. Protected by admin key authentication.

Authentication

Query parameter key must match ADMIN_NOTIFY_KEY environment variable.

Query Parameters

Parameter Type Required Description
key string Yes Admin authentication key
Example Request
GET /notify-ui?key=your_admin_key_here

Features

  • User selection with "Select All" checkbox
  • Custom notification title and body
  • Real-time delivery status feedback
  • Material Design UI
Security Warning: The admin key provides full access to send notifications to all users. Store it securely and never expose it in client-side code or public repositories.
POST /notify-ui/send

Send Admin Notifications

Programmatic endpoint for sending bulk notifications to selected users. Used by the admin UI but can also be called directly via API.

Authentication

Required. Admin key in admin-key header.

Request Headers

admin-key: your_admin_key_here Content-Type: application/json

Request Body

Field Type Required Description
uids array[string] Yes List of user IDs to notify
title string Yes Notification title
body string Yes Notification message
Example Request
POST /notify-ui/send admin-key: your_admin_key_here Content-Type: application/json { "uids": ["alice123", "bob456", "charlie789"], "title": "System Maintenance", "body": "Scheduled maintenance tonight at 10 PM UTC" }

Response

Sent to 3 user(s) ✅

Delivery Logic

  1. Validates admin key
  2. For each UID:
    • Checks if user document exists
    • Retrieves FCM token from user document
    • Sends FCM notification if token exists
    • Silently skips users without tokens
  3. Returns count of successful deliveries
Notification Data Payload: Admin notifications include metadata: {"type": "admin", "ts": timestamp} which clients can use to differentiate from chat notifications.
WS /ws

Real-time WebSocket Connection

Establishes a persistent WebSocket connection for receiving real-time chat messages. Enables instant message delivery without polling.

Authentication

Required. Firebase ID token as query parameter.

Connection URL

ws://api.example.com/ws?token=FIREBASE_ID_TOKEN

Connection Lifecycle

  1. Handshake: Client connects with Firebase token in query params
  2. Verification: Server validates token and extracts UID
  3. Registration: Connection added to ConnectionManager for user's UID
  4. Keep-alive: Client sends periodic pings to maintain connection
  5. Message Delivery: Server pushes JSON messages when events occur
  6. Disconnect: Connection removed from manager on close/error

Message Format

Chat messages received via WebSocket:

{ "type": "chat", "convo_id": "alice123_bob456", "from_uid": "bob456", "message_id": "msg_auto_id", "text": "Message content", "ts": 1704398765.123 }
JavaScript Client Example
const token = await firebase.auth().currentUser.getIdToken(); const ws = new WebSocket(`ws://api.example.com/ws?token=${token}`); ws.onmessage = (event) => { const data = JSON.parse(event.data); if (data.type === 'chat') { displayMessage(data); } }; // Keep-alive ping every 30 seconds setInterval(() => ws.send('ping'), 30000);
Concurrent Connections: The system supports multiple connections per user (e.g., mobile app + web browser). All active connections receive message broadcasts simultaneously.

Error Handling

Close Code Reason Description
1008 Policy Violation Missing or invalid authentication token

Data Models

User Document

Location: users/{uid}

{ "name": "string", // Display name "email": "string", // Email address "fcm_token": "string", // Firebase Cloud Messaging token "photo_url": "string", // Profile picture URL (optional) "created_at": float, // Unix timestamp "updated_at": float // Unix timestamp }

Chat Turn Document

Location: users/{uid}/chats/{chat_id}

{ "user": "string", // User's message "reply": "string", // AI's response "image_b64": "string", // Base64-encoded JPEG "ts": float // Unix timestamp }

Conversation Document

Location: conversations/{convo_id}

{ "participants": ["uid1", "uid2"], // Array of participant UIDs "last_message": "string", // Preview of most recent message "updated_at": float // Unix timestamp of last activity }

Message Document

Location: conversations/{convo_id}/messages/{message_id}

{ "id": "string", // Message ID "sender": "string", // Sender's UID "text": "string", // Message content "ts": float, // Unix timestamp "seen": boolean, // Read receipt status "edited": boolean // Edit status }

Chat Index Document

Location: users/{uid}/chat_index/{convo_id}

{ "buddy_uid": "string", // Other participant's UID "buddy_name": "string", // Cached display name "buddy_photo": "string", // Cached profile picture URL "last_message": "string", // Message preview "updated_at": float, // Unix timestamp "unread_count": integer // Number of unread messages }

Error Handling

The API uses standard HTTP status codes and returns JSON error responses:

Common Error Responses

Status Code Meaning Common Causes
400 Bad Request Invalid request body, missing required fields, self-messaging attempt
401 Unauthorized Missing or invalid Firebase token, expired token
403 Forbidden Accessing conversation without participation, admin key mismatch
404 Not Found User or conversation doesn't exist
500 Internal Server Error External service failure (AI, image generation, OCR), Firestore errors

Error Response Format

{ "detail": "Error description message" }
Example Error Response
HTTP/1.1 401 Unauthorized Content-Type: application/json { "detail": "Invalid Firebase token" }

Firebase Integration

Service Account Configuration

The API requires a Firebase Admin SDK service account stored in the FIREBASE_SERVICE_ACCOUNT environment variable as JSON string:

{ "type": "service_account", "project_id": "your-project-id", "private_key_id": "...", "private_key": "...", "client_email": "...", "client_id": "...", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://oauth2.googleapis.com/token", "auth_provider_x509_cert_url": "...", "client_x509_cert_url": "..." }

Firebase Services Used

FCM Notification Structure

Notifications are sent as data messages with system notification overlay:

{ "message": { "token": "device_fcm_token", "notification": { "title": "Notification Title", "body": "Notification Body" }, "data": { "type": "chat|admin", "convo_id": "...", "from_uid": "...", "ts": "1704398765.123" }, "android": { "priority": "HIGH" } } }

Security Considerations

Authentication Best Practices

Authorization Model

The API implements resource-level authorization:

API Key Management

Key Type Environment Variable Purpose
Pollinations AI POLLINATIONS_API_KEY AI chat and image generation
OCR.space OCR_SPACE_API_KEY Text extraction from images
Admin Notify ADMIN_NOTIFY_KEY Broadcast notification access
Firebase Service Account FIREBASE_SERVICE_ACCOUNT Authentication and Firestore access
Secrets Management: Never commit API keys or service account credentials to version control. Use environment variables or secret management services (AWS Secrets Manager, Google Secret Manager, etc.).

CORS Configuration

The API currently allows all origins (allow_origins=["*"]). For production deployments, restrict to specific domains:

app.add_middleware( CORSMiddleware, allow_origins=["https://yourdomain.com"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"], )

Rate Limiting

Consider implementing rate limiting for public endpoints to prevent abuse:


Converse API Documentation
For support or feature requests, contact Vinaycharyvelpula@gmail.com