Session Details: SIGNAL San Francisco 2025

Want more confidence in your AI outputs?

This session dives into how Twilio leverages evaluations—automated benchmarks, human assessments, and LLMs as evaluators—to measure understanding, performance, and error avoidance in large language models. You'll learn how our Emerging Tech and Innovation team accelerates development with evals, and what we’ve built to help customers run their own.

Walk away with practical guidance on integrating evaluations into your LLM workflows using diverse metrics, contextual relevance, and transparent practices—so you can ship smarter, faster, and with fewer surprises.