Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

The podcast by and for AI Engineers! In 2023, over 1 million visitors came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space www.latent.space

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Feb 16 2024

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

dataAIdeveloper productivitycloud infrastructure

This episode covers Eric Bernharson's background and contributions, Modal's mission and focus, improving developer productivity with Moto, AWS and serverless infrastructure, inference capabilities and challenges, Modal's versatility and comparison to competitors, challenges in AI infrastructure and market dynamics, Modal's features and success stories, network restrictions and market dynamics, benefits and considerations of Oracle Cloud, developer productivity and infrastructure investments, and building strong foundations for startups.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Wed Nov 29 2023

Notebooks = Chat++ and RAG = RecSys! — with Bryan Bischof of Hex Magic

data scienceHex platformGPT-4general modelsnotebook structure

This episode covers Brian's background and his work at Hex, the use of GPT-4 and general models in the Hex platform, notebook structure and user experience, exploring AI applications and ML systems, evaluating models and AI applications, finding value and AI tools, and personal interests in AI applications.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Nov 03 2023

Beating GPT-4 with Open Source LLMs — with Michael Royzen of Phind

EntrepreneurshipComputer VisionNatural Language ProcessingQuestion AnsweringCode Assistance

The episode covers Michael Roysen's journey with Smart Lens, the development of Find, the improvements in question answering models, the future of language models like GPT-4, and OpenAI's focus on product and research. It also discusses the importance of GPUs, optimizing performance, and building a startup with passion and community.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Thu Oct 19 2023

The End of Finetuning — with Jeremy Howard of Fast.ai

Deep LearningMachine LearningNLPTransfer LearningResearch

This episode covers Jeremy Howard's background, the challenges of getting started in machine learning, the shift in NLP approaches, making an impact with Fast AI, research achievements, limitations of language models, fine-tuning models, engaging with AI communities, Swift for TensorFlow, small models, advancements in language models, and understanding language models and distributed power.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Thu Aug 10 2023

LLMs Everywhere: Running 70B models in browsers and iPhones using MLC — with Tianqi Chen of CMU / OctoML

Machine LearningCompilationTVMMXNetCollaborative Projects

TNT Chen, an assistant professor at CMU and chief technologist of OctoML, created Apache TVM, XG Boost, and MXNet. The MXNet and TVM projects are collaborative efforts driven by a community of contributors. Chen's motivation behind TVM was to build a more generic and automatic machine learning engine. Compilation is an emerging field in machine learning, with projects like MLCIS aiming for universal compilation. Model compilation involves optimization at a soft code level and leverages TVM's memory planning. Compiler optimization can improve performance through dynamic shape tracing and loop transformation. Running models in the browser using WebGPU technology is an area of focus, with the development of WebIOM for easy integration. Improving Web GPU and enabling easier universal deployment are ongoing efforts. The team is exploring new opportunities in AI and ML, including personalized AI assistance and new architectures post-transformers.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Aug 04 2023

[AI Breakdown] Summer AI Technical Roundup: a Latent Space x AI Breakdown crossover pod!

TransformersAI DeploymentTrends in AIGPT-4.5AI Safety

The episode covers various topics including the future of Transformers and AI deployment, trends in AI, key events in July, improvements in GPT-4.5, the shift in AI safety conversation, custom instructions for ChatGPT, LAMA 2 release, the impact of open source and safeguarding of data, AI friends and Cloud2 API, Claude and Bard updates, evaluation companies and the AI engineer profession, and the growth of the AI profession. The episode also mentions upcoming events like Facebook Connect and hackathons in the Bay Area.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Wed Jul 26 2023

FlashAttention 2: making Transformers 800% faster w/o approximation - with Tri Dao of Together AI

Flash AttentionKernel FusionTilingMemory Bandwidth OptimizationHBM

Flash Attention improves traditional attention by making it linear instead of quadratic in sequence time, resulting in faster and more memory-efficient models. It offers a wall clock speedup of 2 to 4 times, allowing for training with longer sequence lengths without approximation. Flash Attention incorporates ideas such as kernel fusion and tiling from system-side techniques. The future of Flash Attention depends on the evolution of HBM and SRAM technologies.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Wed Jul 19 2023

Llama 2: The New Open LLM SOTA (ft. Nathan Lambert, Matt Bornstein, Anton Troynikov, Russell Kaplan, Whole Mars Catalog et al.)

AILama 2Open SourceStartupsData Quality

A comprehensive summary of the episode discussing the release of Lama 2 in the AI landscape, its impact on innovation, licensing debates, data quality considerations, drama surrounding Lama 2, Meta's commitment to open source and startups, insights from chapters, working with LAMA2 and hardware configurations, benchmarking and potential applications of LAMA2, AI development and future possibilities, the release of LAMA and its implications, and preparing for potential negative scenarios.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Mon Jul 17 2023

AI Fundamentals: Datasets 101

datasetsbenchmarkstokenstokenizationmodel sizes

This episode covers various aspects of datasets, benchmarks, tokens, tokenization, model sizes, training, data compression, training data sources, Common Crawl and other datasets, additional datasets and considerations, legal and ethical considerations, licensing and data usage, data set imbalance, and language models. It explores the differences between datasets and benchmarks, the importance of understanding datasets for building specific models, the role of tokens in deep learning models, the compression of large datasets into smaller parameter sizes, the use of Common Crawl and other datasets for training language models, legal and ethical issues surrounding dataset usage, challenges with data set imbalance and language-specific tokenization, and the impact of different languages on language modeling.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Mon Jul 10 2023

Code Interpreter == GPT 4.5 (w/ Simon Willison, Alex Volkov, Aravind Srinivas, Alex Graveley, et al. — AUDIO FIXED)

code interpreterAI modeltimeoutsdisconnectionscontainerized environment

The episode covers the new feature in Jambotron called code interpreter, the capabilities of the AI model used in the podcast, timeouts and disconnections in code interpreter, running a containerized environment, Python libraries for generating files, data analysis capabilities, exploits and system performance, using code interpreter as a fellow debugger, uploading code to Code Interpreter, different behaviors observed in the model, interfacing with external plugins, and the potential use of code interpreter for business analysts.

Page 1 of 4