Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

[Practical AI] AI Trends: a Latent Space x Practical AI crossover pod!

Sun Jul 02 2023

Practical AIAI trendsLanguage modelsPrompt engineeringEvaluationData-centric AIModel performance

Introduction
Favorite Episodes and Topics
Shift in Focus and Popular Episodes
Evaluation of Language Models
Language Models and Technology
AI Engineering and Data Science Challenges
Engineering and Ops around Large Language Models
Tooling and Data-centric AI
Improving Model Performance and Language Diversity

This comprehensive summary covers various topics discussed in the Practical AI podcast. The hosts share their favorite episodes, cover trends in AI and prompt engineering, and highlight the shift in focus from image generation to NLP and language models. They discuss the evaluation of language models, the use of LLMs in different contexts, and the challenges faced by engineers and data scientists in working with large language models. The summary also explores tooling for data-centric AI, improving model performance, and the importance of exploring different modalities of language. Overall, it provides valuable insights into the practical aspects of AI.

Introduction

00:01 - 07:22

This is a crossover episode with the Practical AI podcast.
The hosts discuss their favorite episodes and overall takeaways from their shows.
They cover trends in AI and prompt engineering.
Dan Whitenack, the host of Practical AI, has a background in mathematical and computational physics.
He worked as a data scientist for SIL International, a nonprofit organization focused on language-related work.
Dan recently left SIL to work on Prediction Guard and is also involved with the T-panel Code.
He lives in Indiana and enjoys playing folk music.
Practical AI started as an idea pitched by Adam Stikoveak from the changelog podcast at a Go conference in 2016.
Chris Benson, Dan's co-host, reached out to him later to collaborate on the podcast.
They wanted Practical AI to be a practical resource for people interested in AI.

Favorite Episodes and Topics

06:55 - 13:37

The podcast 'Practical AI' was created with the goal of providing practical and useful information about AI to listeners.
Prediction Guard is a platform focused on compliance and control for running state-of-the-art AI models in a compliant way, as well as structuring and validating output.
The hosts of 'Practical AI' have done fully connected episodes where they discuss subjects in detail without a guest, such as chat GPT, instruction tune models, stable diffusion, and alpha fold.
'Practical AI' has also featured episodes about AI in Africa, highlighting grassroots communities that develop models for their specific use cases.
One of the hosts' personal favorite episodes was with Mike Conover from Databricks, discussing the Dali efforts and the red pajama dataset release.
Another highlight for the hosts was an episode about ChatGPT app store plugins, which attracted a large audience and captured the excitement at that time.

Shift in Focus and Popular Episodes

13:10 - 20:23

The podcast discusses the shift in focus from image generation to NLP and language models.
Metaflow, a Python package for full stack data science modeling work, was a crowd favorite episode.
The episode titled 'From notebooks to production' resonated with listeners due to its discussion on model life cycle and practical usage of models.
LLMOps is emerging as a new trend in machine learning operations, focusing on pre-trained models and prompting engineering.
Model-based episodes are more popular than infrastructure-based episodes.
Evaluating language models is a challenge, especially when it comes to open-ended text generation questions.

Evaluation of Language Models

19:54 - 27:27

Evaluation of language models (LLMs) is a trending topic
Benchmarks and model-based evaluation are two approaches being explored
Unbabel uses a popular machine translation evaluator called Comet
The benchmarks used to evolve every few years, but now models catch up every six months
There is a race between benchmark creators and model developers to push the boundaries
Helloswag's adversarially generated benchmarks were surprising
Roboflow's paper revealed that less than 1% of the data was human-generated
Models evaluating models and using simulated data is becoming more common at a large scale
Concerns about mode collapse arise from stacking models on top of each other
Increasing linguistic diversity in LLMs can benefit low-resource languages and scenarios
Grassroots organizations like Masakane work on creating technology for specific language communities
Different contexts require different applications of NLP technology, such as agriculture use cases

Language Models and Technology

27:05 - 34:01

Massacane and other groups are producing technology for disease identification, drought identification, disaster relief, etc.
Rasha from Huggingface explains the capabilities of LLMs and the different features they have
Language models can be situated on various axes like close or open, available for commercial use or not, task specificity
Content creators stay updated through podcasts, Twitter, LinkedIn, and hugging face statistics
Hugging face is a good place to find useful models with high download numbers
There are many people doing cool stuff on hugging face that aren't recognized at a higher level
Meta released a six modality model incorporating data from grassroots organizations found on hugging face
The market for language models has expanded with new releases from companies like OpenAI and Stability
MPT 7B by Mosaic ML aims to keep the space as open as possible for anyone to create their own models

AI Engineering and Data Science Challenges

33:39 - 40:47

Jonathan is passionate about open source and emphasizes the importance of keeping models open.
The hosts discuss integrating soundboard effects into the podcast to spice it up.
They talk about the different tasks involved in ML operations at mid-sized organizations and the need to time box and prioritize them.
The guest runs a website focused on workshops and advising for companies using ML models, and shares insights from these workshops.
Most enterprises still don't have LLMs integrated across their technology stack, which may be surprising given the demos on Twitter.
It takes time for new technologies to trickle down into enterprise adoption.
Using generative text models requires prompt engineering, guardrails, fine-tuning, and other techniques that go beyond simple prompts.
Enterprise users often struggle with connecting these models to a practical workflow for problem-solving.
Prompt engineering as a term is hyped, but there is a real need for engineering and ops around large language models.

Engineering and Ops around Large Language Models

40:20 - 47:09

Prompt engineering is a term that is overhyped, but engineering and ops around large language models is a real thing
There's a marketplace of prompts called Prop base
There are unique challenges for engineers transitioning into AI engineering and data scientists transitioning into this field
Software engineers are dealing with non-deterministic systems for the first time and lack control over model drift
The latent space of capabilities in language models is not fully explored yet
Data scientists may have a knee-jerk reaction to jump into fine-tuning or training their own models, but there's value in using pre-trained models with prompting, chaining, and data augmentation techniques
AI UX, the last mile of making AI output consumable and usable by people, is as valuable as training the model itself
The reception to AI output depends on both the innovation under the hood and the user experience (UX)
GitHub spent six months tuning the UX of co-pilot to integrate it seamlessly into code writing
Engineers generally think about UI/UX more than data scientists do

Tooling and Data-centric AI

46:42 - 53:52

Engineers generally have a natural inclination towards thinking about UI and UX, especially if they are backend systems engineers.
Label Studio is a popular open-source framework for data labeling that has released new tools for fine-tuning generative AI models.
Augmented tooling is becoming a trend in NLP, focusing on an approachable way to fine tune models with human feedback or customized data.
The concept of reinforcement learning from human feedback can be confusing, but tooling like Label Studio aims to make it more understandable and user-friendly.
There are various companies in the labeling space, such as Scale, Snorkel, Labelbox, and Label Studio, each offering different approaches to data-centric AI.
Enterprises often think in terms of model-centric approaches when creating custom models using their own data rather than building a whole platform for AI.
APIs like Cohere and OpenAI offer fine-tuning as part of their services, appealing to those who want to enhance existing state-of-the-art models with their own data.
There is still uncertainty regarding the best practices and data mix for unsupervised or self-supervised learning datasets.
The lack of clarity around the data mix used by popular models drives researchers to experiment with different combinations of public datasets.

Improving Model Performance and Language Diversity

53:27 - 1:00:08

Mixing up public datasets and filtering them in unique ways can improve model performance
Existing datasets with data quality issues can be filtered and combined to create a special mix for training
Augmenting existing datasets with simulated or augmented data can also enhance model performance
Experimenting with different models is important as each model has its own strengths and weaknesses based on the data it was trained on
Large language models have the potential to go beyond traditional NLP tasks and perform well in other domains like fraud detection
There is still a focus on English and Mandarin in large language models, but there are many other languages that need improvement
Exploring different modalities of language, such as sign language, presents new challenges and opportunities for research
Getting hands-on with these models and exploring the available tooling is crucial for building intuition and understanding their capabilities