Signal.broadcast

Thoughts on Voice AI, Machine Learning, and Engineering

Jan 30, 2025 Voice AI Daily

Optimizing WebSocket Latency for Real-Time Voice Streams

Today I tackled a latency bottleneck in our real-time voice pipeline. The key insight was batching audio frames at the transport layer while maintaining per-frame processing at the inference layer. Here's the approach and benchmarks...

WebSocket Audio Streaming Latency
Jan 28, 2025 Voice AI Weekly

Building Low-Latency Voice Pipelines for Real-Time AI Agents

A deep dive into the architecture behind sub-200ms voice processing pipelines. Covers VAD integration, streaming ASR, LLM inference chains, and TTS synthesis in a real-time telephony context. Lessons from building production voice agents at Voicing AI...

VAD ASR TTS Telephony
Jan 25, 2025 LLM Daily

Debugging Hallucinations in RAG Pipelines: A Practical Checklist

When your RAG system starts generating confident but wrong answers, where do you look? A quick checklist covering retrieval quality, chunk sizing, embedding drift, and prompt anchoring techniques...

RAG LLM Debugging
Jan 20, 2025 LLM Weekly

RAG vs Fine-Tuning: When to Use Which for Production LLMs

A practical comparison drawn from real-world experience at Pixis and Voicing AI. When does retrieval augmentation win over fine-tuning? Cost analysis, accuracy trade-offs, and a decision framework for enterprise LLM deployments...

RAG Fine-Tuning LLM Production
Jan 15, 2025 Architecture Monthly

Designing Scalable AI Telephony Systems: From Prototype to Production

A comprehensive guide on architecting B2B AI telephony platforms. Covers SIP integration, media servers, voice pipeline orchestration, scaling strategies, and the engineering decisions that matter when handling thousands of concurrent voice AI sessions...

Telephony Architecture Scaling SIP
Jan 10, 2025 MLOps Daily

Quick Tip: Monitoring LLM Token Usage in Production

A short note on instrumenting your LLM calls to track token consumption, latency percentiles, and cost per request. Using OpenTelemetry spans with custom attributes for ML inference observability...

MLOps Monitoring OpenTelemetry
Jan 5, 2025 Backend Weekly

LangGraph for Multi-Agent Orchestration: Patterns and Pitfalls

Using LangGraph to build multi-agent systems that coordinate across voice, retrieval, and action-taking agents. Covers state management, conditional routing, human-in-the-loop patterns, and error recovery in production agent graphs...

LangGraph Multi-Agent LangChain
Dec 20, 2024 Infrastructure Monthly

Scaling Kafka for ML Inference Workloads: Lessons from Migration

A detailed retrospective on migrating from Google Pub/Sub to Apache Kafka for high-throughput ML inference pipelines at FireCompass. Covers Dead Letter Queue implementation, consumer group tuning, exactly-once semantics, and the cost savings achieved...

Kafka Pub/Sub ML Inference DLQ
Nov 15, 2024 Computer Vision Monthly

Generative AI for Ad Creatives: Building a Production CV Pipeline

How we built a generative AI platform at Pixis that creates ad creatives using computer vision. Covers the model architecture, training data curation, evaluation metrics, and the feedback loop that made the system progressively better...

Computer Vision Generative AI Ad Tech

> No posts found for this filter. Check back soon_