Deep Papers

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s seminal AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

Deep Papers

Mon May 13 2024

Breaking Down EvalGen: Who Validates the Validators?

evaluation criteriaLLM judgeslanguage modelsworkflowmetrics

The episode discusses various aspects of aligning evaluation criteria with user needs, including the use of LLM judges, workflow for setting criteria, balancing assertions in language models, email generation workflow, open source evaluation library Phoenix, managing metrics in LLM evaluation, and streamlining the evaluation process.

Deep Papers

Fri Apr 26 2024

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

React Prompting TechniqueLanguage ModelsInterpretabilityModel PerformanceFact Hallucinations

The podcast explores the React prompting technique, which integrates reasoning with actionable outputs for language models. It discusses the importance of interpretability in language models and compares different prompting methods. The episodes also cover enhancing performance with React and chain of thought, minimizing fact hallucinations, recalibrating approaches with the reflection technique, improving success rates with self-reflection, and building understandable agents with easy-to-understand code examples.

Deep Papers

Thu Apr 04 2024

Demystifying Chronos: Learning the Language of Time Series

time series forecastingChronos frameworklanguage modelsdeep learningKronos

The episode discusses Amazon's Chronos paper on time series forecasting. It provides an overview of classical and deep learning models used in time series analysis. The Chronos framework enhances traditional language model frameworks by exploiting sequential similarities in time series models. It also explores the challenges associated with LLM-based forecasters and the impact of ignoring temporal data. The episode evaluates Kronos' performance compared to actual data and highlights community feedback on interpretability, cost, and performance trade-offs of using large language models. Additionally, it delves into the use of language models for time series analysis and the potential for improvement through additional methods and features. The episode concludes by discussing the skepticism and experimentation with new models in the field of time series forecasting.

Deep Papers

Mon Mar 25 2024

Anthropic Claude 3

AI modelsMultimodal modelsModel performanceTransparencyRefusal rate

The episode discusses the introduction of new models Haiku, Sonnet, and Opus, highlighting Opus as the highest performance model. It explores advancements in vision and language understanding, the performance of Gemini Ultra and Quad2, API formats and reasoning capabilities, the performance of GPT-4 and Cloud3, reactions to Cloud 3, and the visualization tool and language models. The episode raises concerns about transparency, hidden datasets, reproducibility, and human intervention in model usage.

Deep Papers

Fri Mar 15 2024

Reinforcement Learning in the Era of LLMs

reinforcement learninglarge language modelsalignment problemfundamentalsimitation learning

The episode explores the application of reinforcement learning to large language models (LLMs) and its importance in specific use cases. It discusses the alignment problem with LLMs, the fundamentals of reinforcement learning, imitation learning, inverse reinforcement learning, training agents with human feedback, optimizing language models at the token level, prompt optimization, and challenges in reinforcement learning.

Deep Papers

Fri Mar 01 2024

Sora: OpenAI’s Text-to-Video Generation Model

Soravideo generationtransformer architecturesimulation capabilitiestraining data

The episode discusses the Sora technical report on video generation. It covers topics such as the transformer architecture, simulation capabilities, training data, physics representation, evaluation metrics, text-to-video consistency, motion quality, and video generation evaluation. The episode explores the potential impact of Sora on the industry and compares it to Google's work.

Deep Papers

Thu Feb 08 2024

RAG vs Fine-Tuning

RAGSearch and Retrieval MethodsPDF ExtractionEvaluation MetricsRelevance Measurement

The episode discusses retrieval augmented generation (RAG) and search and retrieval methods, PDF extraction and evaluation metrics, relevance measurement in chat GPT models, words and relevance in answer generation, RAG vs fine-tuning and future trends, and effective prompt engineering and cost considerations.

Deep Papers

Fri Feb 02 2024

Phi-2 Model

Small Language ModelsLarge Language ModelsPhi2Coding TasksHigh-Quality Data

This episode explores the differences between Small Language Models (SLMs) and Large Language Models (LLMs), focusing on Phi2 as an open-source model released by Microsoft. It discusses the use of high-quality data sets like the "code textbook" to train SLMs for coding tasks, highlighting their state-of-the-art performance. The episode also introduces Spy 2, a highly performant model that surpasses others in various tasks. It explores the advantages and deployment options of SLMs, along with practical tips for interacting with them locally. The episode concludes by discussing features and future possibilities of SLMs.

Deep Papers

Fri Feb 02 2024

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

dense retrievalsparse retrievalembeddingrelevance modelinglanguage models

The podcast discusses recent papers on precise zero-shot dense retrieval without relevance labels, the difference between dense and sparse retrieval approaches, unsupervised embedding and generating multiple documents, using the same encoder for queries and documents, and search and retrieval debugging and troubleshooting. The papers propose solutions such as generating hypothetical documents, using contriver encoders, and offloading relevance modeling to NLG models. The episodes highlight the importance of document structure and formatting in retrieval systems and explore the challenges and trade-offs in dense retrieval. The speakers also discuss the impact of language models on performance and suggest experimenting with different approaches to improve retrieval quality. A workshop on search and retrieval is mentioned for further exploration and troubleshooting.

Deep Papers

Wed Dec 27 2023

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

MistralMixtralPowerLallsLarge Language ModelsDense Models

The podcast is a presentation by individuals from Arise discussing Mistral and Mixral. The presenters introduce themselves as DAT, Amman, and Aparna. They initially planned to cover both mixtures and Gemini but decided to focus on mixtures for this discussion. The agenda for the presentation includes an overview of PowerLalls in large language models, the evolution of L.L.A.M.s, recent improvements in Mistral and Mixtral architectures, dense model architectures, mixture of experts architectures, and the relationship between compute, model size, and training data.

Page 1 of 3