Presented at NewCrafts 2024, Patrick shares hard-won lessons from over a year and a half of building and shipping GenAI features in production at Showpad, a content management platform for sales and marketing. Rather than focusing on hype, the talk is structured around three phases: building GenAI applications, operating them in production, and scaling the capability across multiple engineering teams. He explicitly distinguishes this work from traditional ML Ops, focusing entirely on the generative AI side of large language models.
On the build side, Patrick walks through the full stack of a GenAI application: prompt engineering, model selection, data integration via RAG, and orchestration frameworks. He shares practical lessons about how prompts are never singular but form chains with retries and format enforcement, how switching models forces complete prompt rewrites, and how the quality gap between open-source and closed-source models is rapidly narrowing. He discusses quantization for running models locally, the “lost in the middle” problem with large context windows, and the importance of checking model cards for bias and data provenance. A particularly candid observation is that customers trained by Google’s keyword-based search struggle to formulate proper questions for LLMs, requiring hidden prompt reformulation under the hood.
The operations section covers the new monitoring and observability requirements that GenAI applications introduce. Patrick describes metrics like token throughput, time to first token, and cost tracking alongside traditional API metrics. He emphasizes the need for continuous quality monitoring in production using LLM-as-a-judge patterns, PII checks, and prompt tracing across multi-step flows. He shares a nuanced approach to user feedback, explaining that thumbs up/down signals are unreliable and that more useful signals come from observing whether users copy results, hit retry, or edit generated content in an embedded editor.
The final section addresses scaling GenAI across the organization. Patrick frames this as a journey from a single pioneering team to a platform team model, drawing directly on the Team Topologies pattern. He argues that the shared infrastructure (model access, vector databases, evaluation frameworks, governance) naturally belongs in a platform team that also handles enablement and governance. He notes the emergence of the “AI engineer” role as a bridge between data science and production engineering, and warns about the innovation tax of being an early adopter: everything built today is instant legacy. The talk closes with observations about desktop-recording AI tools, dynamically generated web content, and the inevitability of over-engineering prompt pipelines as the field matures.
Watch on YouTube — available on the jedi4ever channel
This summary was generated using AI based on the auto-generated transcript.