GPT-5.1 Unlocked: How I Brought 200-Page Document Generation Down to Under 2 Minutes
March 15, 2026 •
9 min read
The first version of Gov Co-Author took almost 20 minutes to write a 200-page compliance
document. Authors hated waiting. We hated shipping it. So I went back in, moved to GPT-5.1,
and rebuilt the pipeline piece by piece. It now finishes in under 2 minutes. The model
helped, but the real wins were smaller: writing sections in parallel, streaming results
as they came, and caching the bits we kept asking for again and again. In this post I
walk through what I changed, why, and the numbers behind each call.
GPT-5.1
LLM Engineering
Latency Optimization
Azure OpenAI
Production
Building a Citation Engine from Scratch: Taking RAG Source Attribution to Production
July 22, 2025 •
12 min read
Most RAG tools treat citations like a footnote here is the answer, here are some file
names, good luck. For a compliance product, that does not fly. Auditors want to know
the exact section, paragraph, even the sentence an answer came from. So I built our own
citation engine. It tracks where every chunk lives in the source, links each claim back
to its evidence after retrieval, and ships a confidence score with the answer. I will
share the design choices I made, the dead ends I ran into, and what it really took to
get this running in a regulated setup.
RAG
Citation Extraction
LangChain
Production
Compliance AI
How I Cut Document Ingestion Latency by 75% for a Production RAG System
March 8, 2025 •
10 min read
Gov Assistant reads thousands of governing and external regulation documents. Our first
ingestion pipeline ran one step at a time, and a full re-index took over 110 minutes. Updating a single document? Same wait. I sat down with a profiler and the answer was clear embedding API calls
were blocking everything. I switched to an async client, added smart batching with retries,
moved to delta updates so we only re-process what changed, and cleaned up the chunking.
Full re-index now takes under 16 minutes. Small updates finish in seconds. Here is how,
with the numbers to back it up.
RAG
Ingestion Pipeline
Async Python
Embeddings
Performance
RAG vs Fine-Tuning for Enterprise Compliance: Lessons Learned in Production
November 10, 2025 •
8 min read
After shipping two GenAI products in BFSI compliance, the question I keep getting is
simple: "Should we use RAG or fine-tune?" My honest answer it depends, and most blog
posts make it sound easier than it is. RAG wins when your data keeps changing, when you
need real citations, and when you cannot send sensitive data off for training. Fine-tuning
wins when you need a steady tone, a fixed output shape, or reasoning that prompts alone
cannot pull off. In this post I share the simple checklist I now run through on every
project, with real examples from Gov Assistant and Gov Co-Author.
RAG
Fine-Tuning
LLM Strategy
BFSI
Architecture