Now on-demand: Data + AI Summit sessions for data architects, engineers, and scientists (2024)

Thousands of data architects, engineers, and scientists met at Data + AI Summit in San Francisco to hear from industry luminaries like Fei Fei Li and Yejin Choi, attend sessions on everything from building a custom LLM to preparing for Apache Spark™ 4, explore the latest in Databricks, and ultimately learn how to accelerate efforts to deploy data intelligence across their businesses.

Every day provided opportunities to improve existing skills, get introduced to something new, and gain the knowledge your business needs to thrive in the GenAI era. In fact, for many of the attendees, the challenge becomes making time for all the sessions they want to attend.

Whether you missed sessions in person or are just now attending virtually, the great news is that you can now watch all 500+ sessions (and the full keynote) on-demand! Below, I’m calling out some specific sessions for data architects, data engineers, and data scientists that I think are worth a watch!

Data Architect

Today, analytics and AI workloads are split across too many different environments. It becomes impossible for data architects to properly manage the underlying infrastructure. It’s one reason why so many companies are looking to consolidate. These sessions showcase why the Lakehouse is the unified platform enterprises need to unleash data intelligence across their businesses while ensuring the right security and governance throughout their data landscape.

Delta Lake Meets DuckDB via Delta Kernel

Speakers: Nick Lanham

Over the past few years, Delta-rs grew rapidly. And now, with delta-kernel-rs, it’s even easier for Rust and Python users to create connections. This session will cover how to bring Delta support to the open source analytical database DuckDB. It will discuss how the support works, the architecture of the integration, and lessons learned along the way.

Deep Dive into Delta Lake and UniForm on Databricks

Speakers: Joe Widen, Michelle Leon

This is a beginner’s guide to everything Delta Lake, a powerful open-source storage layer that brings reliability, performance, governance, and quality to existing data lakes. This session will provide an overview of Delta Lake, including how it’s built for both streaming and batch use cases, explain the power of Delta Lake and Unity Catalog together, and highlight innovative use cases of Delta Lake across different sectors. Attendees will also learn about Delta UniForm, a tool that makes it easy for developers to work across other lakehouse formats including Apache Iceberg and Apache Hudi.

Dependency Management in Spark Connect: Simple, Isolated, Powerful

Speakers: Hyukjin Kwon, Akhil Gudesa

Managing an application hosted in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user's code are complex tasks, significantly more so when dynamic support is required. This session will cover how Spark Connect can simplify the management of a distributed computing environment. Through practical and comprehensive examples, attendees will learn how to create, package, utilize and update custom isolated environments ensuring flexible and seamless execution for both Python and Scala applications.

Fast, Cheap, and Easy Data Ingestion with AWS Lambda and Delta Lake

Speakers: R. Tyler Croy

Join R Tyler Cory, one of the creators of Delta Rust, learn how to work with Delta tables from AWS Lambdas. Using the native Python or Rust libraries for Delta Lake, you'll learn to explore the transaction log, write updates, perform table maintenance, and even query Delta tables in milliseconds from AWS Lambda.

Let's Do Some Data Engineering With Rust and Delta Lake!

Data Engineer

In businesses today, speed is paramount. Leaders want access to information immediately. That’s putting more pressure on the individuals tasked with managing and optimizing streaming ETL pipelines. These sessions help data engineers deliver on the promise of real-time analytics and AI.

Delta Live Tables in Depth: Best Practices for Intelligent Data Pipelines

Speakers: Michael Armbrust, Paul Lappas

Learn how to master Delta Live Tables from one of the people who knows it best. The original creator of Spark SQL, Structured Streaming and Delta, Michael Armbrust will get attendees up-to-speed on what’s new with DLT and what’s coming. (Spoiler alert: Some BIG news.)

Effective Lakehouse Streaming with Delta Lake and Friends

Speakers: Scott Haines, Ashok Singamaneni

In this session, attendees discover the true power of the streaming lakehouse architecture, how to achieve success at scale, and, more importantly, why Delta Lake is the key to unlocking a consistent data foundation and empowering a "stress-free" data ecosystem.

Speakers: Holden Krau, Robert Merck

Apache Spark™ 4 is on the horizon. So what’s involved in upgrading to the latest and greatest Spark? Learn how Netflix automated large parts of its upgrade and how you can use the techniques for your data platform. In this session, you will learn how to: upgrade your Spark pipelines without crying and validate Spark pipelines even when you don't trust the tests.

Introducing the New Python Data Source API for Apache Spark™

Speakers: Allison Wang, Ryan Nienhuis

Data Scientist

GenAI is inescapable. Every business is figuring out how to develop and deploy LLMs. For those actually making AI and ML a reality, these sessions help keep you up-to-date on the latest techniques for improving and accelerating your GenAI strategy.

Software 2.0: Shipping LLMs with New Knowledge

Speakers: Sharon Zhou

Increasingly, companies want to take existing LLMs and teach them new knowledge to differentiate the technology. This process goes beyond just prompting or retrieving—it also involves instruction-finetuning, content-finetuning, pretraining, and more. In this session, you'll learn about Lamini, an all-in-one LLM stack that makes LLMs less picky about the data it can learn from, making it easy for LLMs to take in billions of new documents.

Exploring MLOps and LLMOps: Architectures and Best Practices

Speakers: Joseph Bradley, Yinxi Zhang and Arpit Jasapara

This session offers a detailed look at the architectures involved in Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps). Attendees will learn about the technical specifics and practical applications of MLOps and LLMOps, including the key components and workflows that define these fields. And they’ll walk away with strategies for implementing effective MLOps and LLMOps in their own projects.

In the Trenches with DBRX: Building a State-of-the-Art Open-Source Model

Speakers: Jonathan Frankle, Abhinav Venigalla

Want the behind-the-scenes story on how we built DBRX, a cutting-edge, open-source foundation model trained in-house by Databricks? Hear from the people who built it about the tools, methods, and lessons learned during the development process. Attendees will get an inside look at what it takes to train a high-quality LLM, hear why we chose Mixture of Experts architecture, and learn how they can use the same tools and techniques to build their own custom models.

Introduction to DBRX and other Databricks Foundation Models

Speakers: Margaret Qian, Hagay Lupesko

This session offers a comprehensive introduction to DBRX and other foundational models available on Databricks. Attendees will get practical guidance on how to leverage these models to enhance data analytics and machine learning projects. And they’ll leave with a clear understanding of how to effectively utilize Databricks' foundational models to drive innovation and efficiency in their data-driven initiatives.

Layered Intelligence: Generative AI Meets Classical Decision Sciences

Speakers: Danielle Heymann

The session will explore how Generative AI, especially LLMs, integrates into classical decision science methodologies. Attendees will learn how LLMs extend beyond chatbots to enhance optimization algorithms, statistical models, and graph analytics—breathing new life into decision sciences and advancing strategic analytics and decision-making. This layered approach brings a new edge to traditional methods, allowing for complex problem-solving, nuanced data interaction, and improved interpretability.

Building Production RAG Over Complex Documents

Speakers: Jerry Liu

RAG is a powerful technique that enables enterprises to further customize existing LLMs on their own data. However, building production RAG is very challenging, especially as users scale to larger and more complex data sources.RAG is only as good as your data, and developers must carefully consider how to parse, ingest, and retrieve their data to successfully build RAG over complex documents. This session provides an in-depth exploration of this entire process.

SEA-LION: Representing the Diverse Languages of Southeast Asia with LLMs

Speakers: Jeanne Choo, Ngee Chia Tai

Southeast Asia is one of the world's most culturally diverse regions, covering countries such as Singapore, Vietnam, Thailand, and Indonesia. People speak multiple languages and draw cultural influences from China, India and the West. Learn how, working with Databricks MosaicML, the Singapore government built SEA-LION, an open-sourced large language model trained on local languages such as Thai, Indonesian and Tamil.

State-Of-The-Art Retrieval Augmented Generation At Scale In Spark NLP

Speakers: David Talby, Veysel Kocaman

Get a crash course in scaling and building RAG LLM pipelines for production. Current systems struggle to efficiently handle the jump from proof-of-concept production. This session will show how to address scaling issues with the open source Spark NLP library.

Check out all the Data + AI Summit sessions and keynotes here!

Now on-demand: Data + AI Summit sessions for data architects, engineers, and scientists (2024)

FAQs

Where is the Databricks Summit in 2024? ›

We're just getting back from another whirlwind week at the Moscone Center in San Francisco. The Databricks Data & AI Summit 2024 was another smashing success, with core focuses on AI for innovation in data – and ample reminders that you can't be successful in AI without a solid data foundation.

Know More ›

Where is the data AI Summit? ›

Over the next three days, Databricks will offer more than 500 sessions at the Data + AI Summit, which is taking place at the Moscone Center in downtown San Francisco.

Get More Info Here ›

What is the code for Databricks AI summit? ›

What is the difference between DLT and DLT meta? ›

DLT-META is a metadata-driven framework based on Databricks Delta Live Tables (aka DLT) which lets you automate your bronze and silver data pipelines. With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec.

Read On ›

Where is the Adobe Summit held? ›

The ultimate experience is back in Las Vegas on March 25-28, 2024. Expand your skillset, spark inspiration, and build connections that empower you to make the digital economy personal.