The Data Exchange with Ben Lorica

The Data Exchange with Ben Lorica

A series of informal conversations with thought leaders, researchers, practitioners, and writers on a wide range of topics in technology, science, and of course big data, data science, artificial intelligence, and related applications. Anchored by Ben Lorica (@BigData), the Data Exchange also features a roundup of the most important stories from the worlds of data, machine learning and AI. Detailed show notes for each episode can be found on https://thedataexchange.media/ The Data Exchange podcast is a production of Gradient Flow [https://gradientflow.com/].

The Data Exchange with Ben Lorica

Thu May 30 2024

Machine Unlearning: Techniques, Challenges, and Future Directions

Machine LearningUnlearningPrivacyModel Development

This episode explores the concept of machine unlearning, which involves removing influences of certain training data points from machine learning models. It discusses the history and influences behind unlearning, challenges in comparing unlearning methods, collecting representative data, controlling pre-trained models, and evaluating unlearning success. The episode also delves into the practical implementation of unlearning, privacy concerns, future outlook, and its role in the model development life cycle.

The Data Exchange with Ben Lorica

Thu May 23 2024

Unleashing the Power of AI Agents

AI agentsautomationAI modelsAgentic AIfine-tuning models

This episode discusses the rise of AI agents in the tech space and their ability to perform tasks autonomously. It explores key considerations for automating tasks, integrating AI models into day-to-day applications, designing agents for different applications, fine-tuning models, multi-agent use cases, verification of content sources, and upcoming features in the AI project. The episode highlights the challenges and benefits of automation and emphasizes the importance of trust, research integration, and adaptability in the AI industry.

The Data Exchange with Ben Lorica

Thu May 16 2024

Monthly Roundup: Llama 3, Agents, Evaluation Metrics, Cyc, TikTok, and more

AIMachine LearningPartnershipsAgentsEnsembles

This episode covers various topics in the field of AI, including new foundation models, partnerships and AI model development, agents in the tech industry, ensembles of models and AI approaches, Doug's work on machine learning and psych, implications of foreign-owned social media companies and AI model evaluation, and entity resolution and data quality for graph applications.

The Data Exchange with Ben Lorica

Thu May 09 2024

LLMs for Data Access: Unlocking Insights with Text-to-SQL

Text to SQLDatabase PerformanceTechnology AdoptionSQL CodeQuery Optimization

Gunther Hoglider, co-founder of W-A-I-I dot A-I, discusses the use of 'Text to SQL' technology to simplify data access and democratize data usage. The market for this technology is broad, catering to data-driven businesses across various industries and company sizes. Understanding the database schema is crucial for accurate query generation, and the technology also aids in database migrations and cost reduction. Optimizing database performance involves accessing column-level statistics and integrating semantic information. Technology adoption and challenges vary among companies, with some conducting bake-offs to compare systems. Handling SQL code and optimizing queries are important aspects, as well as treating SQL code differently based on context. Efforts are being made to enhance text-to-SQL systems through benchmarking and knowledge graph integration. Future developments include standards for sharing information and automation of visualization.

The Data Exchange with Ben Lorica

Thu May 02 2024

2024 Artificial Intelligence Index

AIArtificial IntelligenceBenchmarkingEvaluationMultimodal Models

This episode covers key insights from the 2024 Artificial Intelligence Index Report, advancements in AI models and benchmarking, the importance of human evaluation, responsible AI practices, and the potential of AI in solving scientific problems. It also discusses the challenges faced by academic institutions in keeping up with industry developments and the differences in public opinion on AI between China and the US.

The Data Exchange with Ben Lorica

Thu Apr 25 2024

DBRX and the Future of Open LLMs

DatabricksDBRXLLMsMixture of Experts ModelsVRX Models

Databricks Mosaic AI developed DBRX to fill gaps in the landscape of LLMs and enable customers to build general applications efficiently. DBRX aims to bridge the gap between open and closed LLM models, offering a state-of-the-art open model with benefits like fine-tuning and control over data serving. Using a mixture of experts model allows for faster training and inference compared to dense models of the same quality. The choice of hyperparameters like the number of experts activated and total number of experts impacts the model's quality. Two versions of the VRX were open-sourced: a base model and one fine-tuned for instruction following, suitable for multi-term conversation. As models increase in size, the incremental improvements in quality become smaller. The importance of ongoing commitment to open source models by teams like DBRX LAMA is highlighted. Databricks is enhancing its platform with AI capabilities to assist users in writing SQL queries, running Spark, and building data warehouses. Ease of use is crucial for the adoption of new technologies by developers.

The Data Exchange with Ben Lorica

Thu Apr 18 2024

Monthly Roundup: New LLMs, GTC 2024, Constraint-Driven Innovation, Model Safety, and GraphRAG

AIData ProcessingHardwareSecurityGenerative AI

This episode covers monthly news roundup, the shift towards open models, advancements in hardware and software development, security vulnerabilities and mitigation, generative AI and model deployment, and the growing interest in knowledge graphs.

The Data Exchange with Ben Lorica

Thu Apr 11 2024

Automating Software Upgrades: How to Combine AI and Expert Developers

software upgradesopen sourcedependenciesIronfieldcommunity contributions

This episode discusses the problem of keeping open source software up to date and presents two approaches to solving it: outsourcing and tooling. Ironfield is introduced as a solution that combines software and human expertise to turn software upgrades into a data problem. The importance of community contributions, formalizing issue reporting, and managing dependencies is highlighted. The episode also explores psychological barriers to software upgrades, the role of AI in automation, and Infield's plans for expansion. Managing dependencies and addressing security risks are discussed, along with Enfield's target customers and plans for contributing back to the open source community. The episode concludes with insights on addressing technical debt and legacy systems.

The Data Exchange with Ben Lorica

Thu Apr 04 2024

Generative AI in the Industrial Sphere

generative AIindustrial AIAI integrationefficiency gainsknowledge graphs

This episode explores the application of generative AI in industrial settings. It covers key insights on industrial AI challenges, integration strategies, efficiency gains, innovative approaches, knowledge graphs, fault tree automation, reliable AI, and process transformation. The episode emphasizes the importance of reliable AI and provides recommendations for getting started with generative AI in industrial processes.

The Data Exchange with Ben Lorica

Thu Mar 28 2024

The Intersection of LLMs, Knowledge Graphs, and Query Generation

knowledge graphsretrievalquestion answeringLLMsdata modeling

The episode discusses using knowledge graphs for retrieval and question answering. Challenges in using LLMs to generate complex queries include limitations in handling recursions, subqueries, and unions of joins. Benchmarks like VQC, Cool, and Spider are commonly used to test LLM performance but may not be realistic for enterprise scenarios. LLMs perform well on simple data models like star schemas with fewer tables and simpler join types. Full automation of designing data warehouses and ETL pipelines with LLMs for large-scale databases is considered a distant possibility.

Page 1 of 2