The Inside View

The Inside View

The goal of this podcast is to create a place where people discuss their inside views about existential risk from AI.

The Inside View

Fri May 17 2024

[Crosspost] Adam Gleave on Vulnerabilities in GPT-4 APIs (+ extra Nathan Labenz interview)

AI securityvulnerabilitiesdisclosurefuture advancementsbiotechnology

This episode covers various topics related to AI security, vulnerabilities, disclosure, and future advancements. It discusses the challenges of securing AI models and the potential risks associated with their capabilities. The episode also explores the disclosure of vulnerabilities in AI models and the need for safety standards and best practices. Additionally, it delves into the ethical considerations and influence on AI behavior. The episode concludes with insights into the advancements in biotechnology and the potential risks they pose.

The Inside View

Tue Apr 09 2024

Ethan Perez on Selecting Alignment Research Projects (ft. Mikita Balesni & Henry Sleight)

research approachproject selectionbottleneckssafety problemscollaboration

This episode covers insights on research approach, project selection, bottlenecks, safety problems, collaboration, mentorship, experimentation, and project progress. It emphasizes the importance of empirical experimentation, the need to address safety problems in AI, and the benefits of mentorship and collaboration. The speaker shares experiences with generating hard questions for model evaluation and highlights the significance of seeking feedback and reducing fixed costs for projects. Junior researchers are advised to focus on one project and explore interesting ideas while reassessing progress regularly.

The Inside View

Tue Feb 20 2024

Emil Wallner on Sora, Generative AI Startups and AI optimism

AI RiskEmerging TechnologiesVideo Generation ModelsSora ModelGPT-4

The episode covers a wide range of topics related to AI risk, emerging technologies, and the development of advanced AI models like Sora and GPT-4. The conversation explores the implications, concerns, and potential applications of these models, as well as the challenges in regulating and mitigating risks associated with AI development. Differing perspectives on the future impact of AI and the need for alignment and safety measures are also discussed.

The Inside View

Mon Feb 12 2024

Evan Hubinger on Sleeper Agents, Deception and Responsible Scaling Policies

AI modelsdeceptive behavioradversarial trainingpreference modelschain of thought

This episode explores the challenges of deceptive behavior in AI models and the effectiveness of different training techniques. It discusses threat models like sleeper agents and model poisoning, as well as the limitations of adversarial training. The importance of preference models and nuanced learning is highlighted, along with the role of chain of thought in model robustness. The episode also delves into instrumental reasoning and deception, training models for deceptive behavior, challenges in training and deployment, and future challenges and safety measures. Finally, it touches on AI safety levels and stress testing.

The Inside View

Sat Jan 27 2024

[Jan 2023] Jeffrey Ladish on AI Augmented Cyberwarfare and compute monitoring

AIcyber warfareexploit developmentnetwork penetrationvulnerabilities

The episode discusses the automation of exploit development and network penetration using AI, the different classes of tools for exploiting systems, limitations and challenges in current AI systems, risks and regulation in AI systems, unintended consequences and mitigation of advanced AI technologies, and challenges and future of AI systems.

The Inside View

Mon Jan 22 2024

Holly Elmore on pausing AI

AI developmentAI safetyPublic perceptionBalancing speed and safety'Surgical pause' concept

This episode explores the advocacy for a global indefinite pause on AI development, challenges in discussing AI safety, balancing speed and safety in AI development, debates surrounding the concept of a 'surgical pause', the importance of early advocacy for AI safety, growing recognition of the importance of pausing AI development, differing perspectives on risk and progress in AI development, debating regulations and potential risks and benefits of AI, consideration of risks and benefits of AI in relation to global issues, uncertainty and debates surrounding AI safety and governance, support for regulating AI and challenges in advocacy work, factors influencing the pursuit of a pause on AI development, concerns about OpenAI's collaboration with the Pentagon, holding OpenAI accountable and strategies for activism, and insights on AI safety, stunts in advocacy, and compute governance.

The Inside View

Fri Aug 11 2023

Erik Jones on Automatically Auditing Large Language Models

Language ModelsModel SafetyAutomated AuditingDiscrete OptimizationUndesirable Outputs

This paper discusses the challenges of evaluating the safety of language models and proposes an automated auditing method using discrete optimization. The authors highlight concerns about undesirable outputs, such as derogatory strings, and the need for efficient evaluation tools. The optimization process aims to identify specific prompts that generate desired outputs or reveal certain behaviors. The paper explores the challenges in auditing language models for deception and toxicity. Automated auditing tools can assist in making informed decisions about model deployment, while future research directions focus on adaptive evaluation methods and improving model safety.

The Inside View

Wed Aug 09 2023

Dylan Patel on the GPU Shortage, Nvidia and the Deep Learning Supply Chain

GPU FlopsChip ProductionAI ApplicationsScaling ModelsGPU Allocation

The episode discusses the increase in GPU flops shipping for data centers, the challenges of chip production and installation, the demand for GPUs in AI applications, and the scalability of models like GPT-4. It also explores the allocation of GPUs and the democratization of AI.

The Inside View

Fri Aug 04 2023

Tony Wang on Beating Superhuman Go AIs with Advesarial Policies

AIGoAlphaGoAdversarial ExamplesIntent Alignment

The paper explores the vulnerabilities of Go AIs, specifically AlphaGo, and demonstrates that even amateur players can exploit their hidden weaknesses. The authors discuss their motivation for this work and the hypotheses they aimed to test. They trained their own Go AI to find exploits against Katago, a modern version of AlphaGo. The paper also highlights the impact of their exploits on other Go AIs and emphasizes the importance of rigorous testing and intent alignment in building safe AI systems.

The Inside View

Tue Aug 01 2023

David Bau on Editing Facts in GPT, AI Safety and Interpretability

Machine LearningInterpretabilitySafetyFactual KnowledgeTransformers

This episode covers an interview with David Bowe, author of the Rome paper, at ICML in Hawaii. It explores interesting papers, interpretability in machine learning models, safety concerns, and the Rome paper's findings on organizing factual knowledge. The episode also discusses superposition and the architecture of transformers.

Page 1 of 3