Now on-demand: Data + AI Summit sessions for data architects, engineers, and scientists (2024)

Thousands of data architects, engineers, and scientists met at Data + AI Summit in San Francisco to hear from industry luminaries like Fei Fei Li and Yejin Choi, attend sessions on everything from building a custom LLM to preparing for Apache Spark™ 4, explore the latest in Databricks, and ultimately learn how to accelerate efforts to deploy data intelligence across their businesses.

Every day provided opportunities to improve existing skills, get introduced to something new, and gain the knowledge your business needs to thrive in the GenAI era. In fact, for many of the attendees, the challenge becomes making time for all the sessions they want to attend.

Whether you missed sessions in person or are just now attending virtually, the great news is that you can now watch all 500+ sessions (and the full keynote) on-demand! Below, I’m calling out some specific sessions for data architects, data engineers, and data scientists that I think are worth a watch!

Data Architect

Today, analytics and AI workloads are split across too many different environments. It becomes impossible for data architects to properly manage the underlying infrastructure. It’s one reason why so many companies are looking to consolidate. These sessions showcase why the Lakehouse is the unified platform enterprises need to unleash data intelligence across their businesses while ensuring the right security and governance throughout their data landscape.

Delta Lake Meets DuckDB via Delta Kernel

Speakers: Nick Lanham

Over the past few years, Delta-rs grew rapidly. And now, with delta-kernel-rs, it’s even easier for Rust and Python users to create connections. This session will cover how to bring Delta support to the open source analytical database DuckDB. It will discuss how the support works, the architecture of the integration, and lessons learned along the way.

Deep Dive into Delta Lake and UniForm on Databricks

Speakers: Joe Widen, Michelle Leon

This is a beginner’s guide to everything Delta Lake, a powerful open-source storage layer that brings reliability, performance, governance, and quality to existing data lakes. This session will provide an overview of Delta Lake, including how it’s built for both streaming and batch use cases, explain the power of Delta Lake and Unity Catalog together, and highlight innovative use cases of Delta Lake across different sectors. Attendees will also learn about Delta UniForm, a tool that makes it easy for developers to work across other lakehouse formats including Apache Iceberg and Apache Hudi.

Dependency Management in Spark Connect: Simple, Isolated, Powerful

Speakers: Hyukjin Kwon, Akhil Gudesa

Managing an application hosted in a distributed computing environment can be challenging. Ensuring that all nodes have the necessary environment to execute code and determining the actual location of the user's code are complex tasks, significantly more so when dynamic support is required. This session will cover how Spark Connect can simplify the management of a distributed computing environment. Through practical and comprehensive examples, attendees will learn how to create, package, utilize and update custom isolated environments ensuring flexible and seamless execution for both Python and Scala applications.

Fast, Cheap, and Easy Data Ingestion with AWS Lambda and Delta Lake

Speakers: R. Tyler Croy

Join R Tyler Cory, one of the creators of Delta Rust, learn how to work with Delta tables from AWS Lambdas. Using the native Python or Rust libraries for Delta Lake, you'll learn to explore the transaction log, write updates, perform table maintenance, and even query Delta tables in milliseconds from AWS Lambda.

Let's Do Some Data Engineering With Rust and Delta Lake!

Speakers: R. Tyler Croy

The future of data engineering is looking increasingly Rust-y. By adopting the foundational crates of Delta Lake, data fusion, and arrow, developers can write high-performance and low-cost ingestion pipelines, transformation jobs, and data query applications. Don’t know Rust? No problem. You’ll review fundamental concepts of the language as they pertain to the data engineering domain with a co-creator of Delta Rust and leave with a basis to apply Rust to real-world data problems.

What's Wrong with the Medallion Architecture?

Speakers: Simon Whiteley

While enterprises are reaping the benefits of the lakehouse architecture, many have one regret: layering their zones. No one really knows what terms like “silver” vs. “gold” mean. The reality is that Medallion architecture may not always be the best option. Using real-world examples, this session will dive into when and how to use it.

Data Engineer

In businesses today, speed is paramount. Leaders want access to information immediately. That’s putting more pressure on the individuals tasked with managing and optimizing streaming ETL pipelines. These sessions help data engineers deliver on the promise of real-time analytics and AI.

Delta Live Tables in Depth: Best Practices for Intelligent Data Pipelines

Speakers: Michael Armbrust, Paul Lappas

Learn how to master Delta Live Tables from one of the people who knows it best. The original creator of Spark SQL, Structured Streaming and Delta, Michael Armbrust will get attendees up-to-speed on what’s new with DLT and what’s coming. (Spoiler alert: Some BIG news.)

Effective Lakehouse Streaming with Delta Lake and Friends

Speakers: Scott Haines, Ashok Singamaneni

In this session, attendees discover the true power of the streaming lakehouse architecture, how to achieve success at scale, and, more importantly, why Delta Lake is the key to unlocking a consistent data foundation and empowering a "stress-free" data ecosystem.

Speakers: Holden Krau, Robert Merck

Apache Spark™ 4 is on the horizon. So what’s involved in upgrading to the latest and greatest Spark? Learn how Netflix automated large parts of its upgrade and how you can use the techniques for your data platform. In this session, you will learn how to: upgrade your Spark pipelines without crying and validate Spark pipelines even when you don't trust the tests.

Introducing the New Python Data Source API for Apache Spark™

Speakers: Allison Wang, Ryan Nienhuis

Traditionally, integrating custom data sources into Spark required understanding Scala, posing a challenge for the vast Python community. Our new API simplifies this process, allowing developers to implement custom data sources directly in Python without the complexities of existing APIs. This session will explore the motivations and the code behind how we’ve made reading and writing operations for Python developers much easier.

Incremental Change Data Capture: A Data-Informed Journey

Speakers: Christina Taylor

Learn how to iterate on incremental ingestion from SaaS applications, relational databases, and event streams into a centralized data lake, the role of CDCs and how to ultimately streamline maintenance and improve reliability with Delta Lake. Attendees will walk away with a data-informed mentality to design architecture that promotes long-term stewardship and developer happiness

What’s next for the upcoming Apache Spark™ 4.0

Speakers: Xiao Li, Wenchen Fan

The upcoming release of Apache Spark 4.0 delivers substantial enhancements that refine the functionality and augment the developer experience with the unified analytics engine. This is your chance to ask the experts what’s coming and how to prepare.

Data Scientist

GenAI is inescapable. Every business is figuring out how to develop and deploy LLMs. For those actually making AI and ML a reality, these sessions help keep you up-to-date on the latest techniques for improving and accelerating your GenAI strategy.

Software 2.0: Shipping LLMs with New Knowledge

Speakers: Sharon Zhou

Increasingly, companies want to take existing LLMs and teach them new knowledge to differentiate the technology. This process goes beyond just prompting or retrieving—it also involves instruction-finetuning, content-finetuning, pretraining, and more. In this session, you'll learn about Lamini, an all-in-one LLM stack that makes LLMs less picky about the data it can learn from, making it easy for LLMs to take in billions of new documents.

Exploring MLOps and LLMOps: Architectures and Best Practices

Speakers: Joseph Bradley, Yinxi Zhang and Arpit Jasapara

This session offers a detailed look at the architectures involved in Machine Learning Operations (MLOps) and Large Language Model Operations (LLMOps). Attendees will learn about the technical specifics and practical applications of MLOps and LLMOps, including the key components and workflows that define these fields. And they’ll walk away with strategies for implementing effective MLOps and LLMOps in their own projects.

In the Trenches with DBRX: Building a State-of-the-Art Open-Source Model

Speakers: Jonathan Frankle, Abhinav Venigalla

Want the behind-the-scenes story on how we built DBRX, a cutting-edge, open-source foundation model trained in-house by Databricks? Hear from the people who built it about the tools, methods, and lessons learned during the development process. Attendees will get an inside look at what it takes to train a high-quality LLM, hear why we chose Mixture of Experts architecture, and learn how they can use the same tools and techniques to build their own custom models.

Introduction to DBRX and other Databricks Foundation Models

Speakers: Margaret Qian, Hagay Lupesko

This session offers a comprehensive introduction to DBRX and other foundational models available on Databricks. Attendees will get practical guidance on how to leverage these models to enhance data analytics and machine learning projects. And they’ll leave with a clear understanding of how to effectively utilize Databricks' foundational models to drive innovation and efficiency in their data-driven initiatives.

Layered Intelligence: Generative AI Meets Classical Decision Sciences

Speakers: Danielle Heymann

The session will explore how Generative AI, especially LLMs, integrates into classical decision science methodologies. Attendees will learn how LLMs extend beyond chatbots to enhance optimization algorithms, statistical models, and graph analytics—breathing new life into decision sciences and advancing strategic analytics and decision-making. This layered approach brings a new edge to traditional methods, allowing for complex problem-solving, nuanced data interaction, and improved interpretability.

Building Production RAG Over Complex Documents

Speakers: Jerry Liu

RAG is a powerful technique that enables enterprises to further customize existing LLMs on their own data. However, building production RAG is very challenging, especially as users scale to larger and more complex data sources.RAG is only as good as your data, and developers must carefully consider how to parse, ingest, and retrieve their data to successfully build RAG over complex documents. This session provides an in-depth exploration of this entire process.

SEA-LION: Representing the Diverse Languages of Southeast Asia with LLMs

Speakers: Jeanne Choo, Ngee Chia Tai

Southeast Asia is one of the world's most culturally diverse regions, covering countries such as Singapore, Vietnam, Thailand, and Indonesia. People speak multiple languages and draw cultural influences from China, India and the West. Learn how, working with Databricks MosaicML, the Singapore government built SEA-LION, an open-sourced large language model trained on local languages such as Thai, Indonesian and Tamil.

State-Of-The-Art Retrieval Augmented Generation At Scale In Spark NLP

Speakers: David Talby, Veysel Kocaman

Get a crash course in scaling and building RAG LLM pipelines for production. Current systems struggle to efficiently handle the jump from proof-of-concept production. This session will show how to address scaling issues with the open source Spark NLP library.

Check out all the Data + AI Summit sessions and keynotes here!

Now on-demand: Data + AI Summit sessions for data architects, engineers, and scientists (2024)

FAQs

Where is the Databricks Summit in 2024? ›

We're just getting back from another whirlwind week at the Moscone Center in San Francisco. The Databricks Data & AI Summit 2024 was another smashing success, with core focuses on AI for innovation in data – and ample reminders that you can't be successful in AI without a solid data foundation.

Where is the data AI Summit? ›

Over the next three days, Databricks will offer more than 500 sessions at the Data + AI Summit, which is taking place at the Moscone Center in downtown San Francisco.

What is the code for Databricks AI summit? ›

Register now for training and certification at Data + AI Summit and save 50% today with discount code: TRAIN50FOTY.

How long does it take to get Databricks badge? ›

Please note that badges and certifications may take up to 48 hours to be sent to your email. Step 1: Please go to credentials.databricks.com.

How do you get the secret Databricks? ›

To set up secrets you:
  1. Create a secret scope. Secret scope names are case insensitive.
  2. Add secrets to the scope. Secret names are case insensitive.
  3. If you have the Premium plan or above, assign access control to the secret scope.
Mar 22, 2024

Where do AI get their data? ›

Some of the common sources include publicly available data, databases, websites, APIs, and structured knowledge bases. Additionally, AI assistants also rely on real-time information and updates to stay up-to-date and provide the most accurate responses.

What is the largest AI conference? ›

AI DevSummit is the World's Leading AI Developer & Engineering Conference with tracks covering chatbots, machine learning, open source AI libraries, AI for the enterprise, and deep AI / neural networks.

What is the AI summit? ›

The AI Summit Series brings together tech enthusiasts and investors from across the globe to network, learn and showcase ground-breaking technology solutions for business. We look beyond hype and buzzwords to focus on practical applications of AI technologies.

Can I get Databricks for free? ›

The Databricks Community Edition is free of charge. You do not pay for the platform nor do you incur AWS costs.

How to get Databricks courses for free? ›

Free online courses from Databricks

To learn more, follow Databricks on Twitter, LinkedIn, and Facebook. Browse free online courses in a variety of subjects. Databricks courses found below can be audited free or students can choose to receive a verified certificate for a small fee. Select a course to learn more.

What is the difference between DLT and DLT meta? ›

DLT-META is a metadata-driven framework based on Databricks Delta Live Tables (aka DLT) which lets you automate your bronze and silver data pipelines. With this framework you need to record the source and target metadata in an onboarding json file which acts as the data flow specification aka Dataflowspec.

Where is the Adobe Summit held? ›

The ultimate experience is back in Las Vegas on March 25-28, 2024. Expand your skillset, spark inspiration, and build connections that empower you to make the digital economy personal.

Where is the Snowflake summit? ›

We are excited to announce Snowflake Summit 2024 will be returning to San Francisco! Join us at the Moscone Center from June 3-6, 2024, for the largest educational Data, Apps, and AI conference in the world.

Where is Databricks hosted? ›

Databricks workspaces can be hosted on Amazon AWS, Microsoft Azure, and Google Cloud Platform.

What is the date range for Databricks? ›

Databricks Runtime 7.0 also switched to the Proleptic Gregorian calendar for the Timestamp type. The ISO SQL:2016 standard declares the valid range for timestamps is from 0001-01-01 00:00:00 to 9999-12-31 23:59:59.999999 . Databricks Runtime 7.0 fully conforms to the standard and supports all timestamps in this range.

Top Articles
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6050

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.