Jan 30, 2025
Voice AI
Daily
Optimizing WebSocket Latency for Real-Time Voice Streams
Today I tackled a latency bottleneck in our real-time voice pipeline. The key insight was batching audio frames at the transport layer while maintaining per-frame processing at the inference layer. Here's the approach and benchmarks...
WebSocket
Audio Streaming
Latency
Jan 28, 2025
Voice AI
Weekly
Building Low-Latency Voice Pipelines for Real-Time AI Agents
A deep dive into the architecture behind sub-200ms voice processing pipelines. Covers VAD integration, streaming ASR, LLM inference chains, and TTS synthesis in a real-time telephony context. Lessons from building production voice agents at Voicing AI...
VAD
ASR
TTS
Telephony
Jan 25, 2025
LLM
Daily
Debugging Hallucinations in RAG Pipelines: A Practical Checklist
When your RAG system starts generating confident but wrong answers, where do you look? A quick checklist covering retrieval quality, chunk sizing, embedding drift, and prompt anchoring techniques...
RAG
LLM
Debugging
Jan 20, 2025
LLM
Weekly
RAG vs Fine-Tuning: When to Use Which for Production LLMs
A practical comparison drawn from real-world experience at Pixis and Voicing AI. When does retrieval augmentation win over fine-tuning? Cost analysis, accuracy trade-offs, and a decision framework for enterprise LLM deployments...
RAG
Fine-Tuning
LLM
Production
Jan 15, 2025
Architecture
Monthly
Designing Scalable AI Telephony Systems: From Prototype to Production
A comprehensive guide on architecting B2B AI telephony platforms. Covers SIP integration, media servers, voice pipeline orchestration, scaling strategies, and the engineering decisions that matter when handling thousands of concurrent voice AI sessions...
Telephony
Architecture
Scaling
SIP
Jan 10, 2025
MLOps
Daily
Quick Tip: Monitoring LLM Token Usage in Production
A short note on instrumenting your LLM calls to track token consumption, latency percentiles, and cost per request. Using OpenTelemetry spans with custom attributes for ML inference observability...
MLOps
Monitoring
OpenTelemetry
Jan 5, 2025
Backend
Weekly
LangGraph for Multi-Agent Orchestration: Patterns and Pitfalls
Using LangGraph to build multi-agent systems that coordinate across voice, retrieval, and action-taking agents. Covers state management, conditional routing, human-in-the-loop patterns, and error recovery in production agent graphs...
LangGraph
Multi-Agent
LangChain
Dec 20, 2024
Infrastructure
Monthly
Scaling Kafka for ML Inference Workloads: Lessons from Migration
A detailed retrospective on migrating from Google Pub/Sub to Apache Kafka for high-throughput ML inference pipelines at FireCompass. Covers Dead Letter Queue implementation, consumer group tuning, exactly-once semantics, and the cost savings achieved...
Kafka
Pub/Sub
ML Inference
DLQ
Nov 15, 2024
Computer Vision
Monthly
Generative AI for Ad Creatives: Building a Production CV Pipeline
How we built a generative AI platform at Pixis that creates ad creatives using computer vision. Covers the model architecture, training data curation, evaluation metrics, and the feedback loop that made the system progressively better...
Computer Vision
Generative AI
Ad Tech
> No posts found for this filter. Check back soon_