Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al on Scribbler

How to train a Million Context LLM — with Mark Huang of Gradient.ai

AIMachine LearningModel TrainingContext LengthModel Quality

This episode covers various topics related to AI and machine learning. Mark Wang from Gradient discusses his transition from finance to tech and the founding story of Gradient. Key insights include the focus on out-of-domain problems in machine learning and out-of-domain generalization in AI. Attention mechanisms, positional encodings, and extending context length in models are also explored. The episode delves into Gemini's million token context, recent papers on model training, manipulating models, benchmarking, scaling token context size, Google's focus on model quality and handling evolving context, building technology with early fusion models, and staying updated on AI research. The importance of context length, theta scaling, and prioritizing valuable tasks is also discussed.

ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models, and other Learning Representations) ft. Christian Szegedy, Ilya Sutskever, Durk Kingma

Variational AutoencodersVAEsDeep LearningProbabilistic ModelsDiffusion Models

This episode covers a wide range of topics related to Variational Autoencoders (VAEs), including their introduction and evolution, advantages and challenges of latent variable models, applications of VAEs, diffusion models and scalability of VAEs, interpreting diffusion models and concept decomposition, concept manipulation and adversarial examples in generative models, unsupervised learning and distribution matching, compression and prediction in unsupervised learning, training with compression objective and adversarial learning, adversarial examples and vulnerability of neural networks, outliers and attention maps in vision transformers, pause tokens and data selection for model training, data selection and sub-selection schemes, Dora and self-supervised learning methods, efficient training and inference in large language models, adaptive KV cache compression and efficient training techniques, efficient computation in large language models, state space models and efficient computation, diffusion models and state space models. The episode provides key insights into these topics and their implications.

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Tue Jul 23 2024

Llama 2, 3 & 4: Synthetic Data, RLHF, Agents on the path to Open Source AGI

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Jul 12 2024

Benchmarks 201: Why Leaderboards > Arenas >> LLM-as-Judge

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Jul 05 2024

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Tue Jun 25 2024

State of the Art: Training >70B LLMs on 10,000 H100 clusters

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Tue Jun 25 2024

[High Agency] AI Engineer World's Fair Preview

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Fri Jun 21 2024

How To Hire AI Engineers — with James Brady & Adam Wiggins of Elicit

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Tue Jun 11 2024

How AI is eating Finance — with Mike Conover of Brightwave

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Mon Jun 10 2024

ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Thu May 30 2024

How to train a Million Context LLM — with Mark Huang of Gradient.ai

Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

Mon May 27 2024

ICLR 2024 — Best Papers & Talks (ImageGen, Vision, Transformers, State Space Models, and other Learning Representations) ft. Christian Szegedy, Ilya Sutskever, Durk Kingma