Holden Karau

GoogleOpen Source Big Data Developer Advocate

Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, Airflow, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and PMC on Apache Spark and committer on SystemML & Mahout projects. She was tricked into the world of big data while trying to improve search and recommendation systems and has long since forgotten her original goal.

Check out all of our other speakers.

AI and Authentic Comms

October 17, 4:30pm PT, C3PO Room

End to End ML with Kubeflow: Scaling with Big & Tiny Data (+ deep learning of course)

There are many great tools for training machine learning tools, ranging from sci-kit to Apache Spark, and tensorflow. However many of these systems largely leave open the question how to use our models outside of the batch world (like in a reactive application). Different options exist for persisting the results and using them for live training, and we will explore the trade-offs of the different formats and their corresponding serving/prediction layers. From there we choose two formats and illustrate how to build an auto-scaling reactive serving layer. We'll tie this all together with Kubeflow as well as showing how it can be used with non-Spark systems. Just building an end-to-end pipeline like this isn’t enough to be able to take your model to production, so we will wrap up with important pointers for things like updating models, validation, and other little details that often get overlooked since they don’t fit nicely on a slide with a cat picture (not that we won’t try). Time permitting we’ll talk about designing clients for graceful degradation when everything eventually catches on fire.