"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

E45: The AI Copilot Revolution with Div Garg of MULTI·ON

Thu Jul 13 2023

AI agentsMultiOnChrome extensionTask delegationUser interactionTechnical detailsTask successUser safetySkill buildingSafety considerationsFuture developments

Description

This episode explores the development and potential of AI agents, focusing on MultiOn, an AI agent wrapped in a Chrome extension. The chapters cover topics such as the impact of AI agents, challenges in development, interacting with users, technical details of operation, ensuring task success and user safety, features and future plans of MultiOn, building skills and ensuring safety, safety considerations and future developments, and the vision for AI agents in the future.

Insights

AI agents have the potential to unlock parallelism for humanity

By coordinating the work of multiple agents, AI agents can significantly increase productivity and efficiency.

MultiOn is an AI agent wrapped in a Chrome extension

MultiOn allows users to delegate tasks, supervise, and assist in real time using a natural language approach to skills.

The development of AI agents faces challenges in reliability and predictability

Ensuring that AI agents are reliable and predictable is a key challenge that developers are focusing on.

Interacting with users in the same space is a smart approach

Using a browser extension that acts in the same space as the user allows for seamless user experience and easy distribution.

The technical details of AI agent operation involve a planar model and action grammar

AI agents use a planar model to determine actions and an action grammar for type checking and verification of safety.

Ensuring task success and user safety is a complex task

Determining success or failure for tasks and ensuring user safety are ongoing challenges in AI agent development.

MultiOn aims to provide reliable task delivery and learn user preferences

MultiOn focuses on delivering tasks in the background with notifications and actively learning user preferences over time.

Building skills and ensuring safety are key aspects of AI agent development

AI agents dynamically generate skills at runtime based on high-level natural language instructions, while also implementing safety measures to prevent malicious code.

Safety considerations are important in AI agent development

AI agents implement safety measures such as type checking and simulation of actions to detect malicious code or privacy violations.

The vision for AI agents is to simplify and automate mundane tasks

AI agents aim to intrinsically understand users' needs and automate delegated tasks, ultimately unlocking parallelism for humanity.

Chapters

AI agents and their impact
Challenges and strategies in AI agent development
Interacting with the user and integrating with existing solutions
Technical details of the agent's operation
Ensuring task success and user safety
Features and future plans of MultiOn
Building skills and ensuring safety
Safety considerations and future developments
The vision for AI agents and their role in the future

Summary

Transcript

AI agents and their impact

00:00 - 07:03

AI agents can unlock parallelism for humanity by coordinating the work of multiple agents
Agents should not be able to modify their own source code to prevent unpredictable behavior
The AI agent space has different product and rollout strategies, each with its own strengths and weaknesses
MultiOn is an AI agent wrapped in a Chrome extension that allows users to delegate tasks, supervise, and assist in real time
MultiOn's product strategy includes a natural language approach to skills
AI agents will impact human work and require steps for user safety and reliability
Div Garg left his PhD program at Stanford to build MultiOn as a startup focused on AI agents
This is an early stage for AI agents, but it's the right time to start building them

Challenges and strategies in AI agent development

06:38 - 14:12

AI is starting to get applied in everyday life
The pace of research makes it hard for product builders to keep up
Researchers starting companies is a good idea
The company is in the middle of a seed race and will make an announcement soon
They are looking to hire journalists and people passionate about AI and agents
They want to solve AI research problems as well as product problems
Making the agent reliable and predictable is a challenge they are focusing on
The browser extension approach was chosen for seamless user experience and easy distribution
If something goes wrong, the agent can ask the user for help

Interacting with the user and integrating with existing solutions

13:44 - 20:43

Using a browser extension that acts in the same space as the user is a smart approach
Authentication can be a barrier when connecting different platforms
The AI can navigate and perform actions in the same space as the user
The model follows a simple loop where actions change the state space
The ability to interact with other extensions is being explored
Feedback and demonstration can help the AI learn and improve performance over time
Integration with existing browser solutions, like password managers, is possible
The core architecture involves taking snapshots of the current state and performing actions based on it
Layering on skills component is an important aspect of development

Technical details of the agent's operation

20:15 - 27:29

The system uses a planar model, similar to GPD4, to determine the right actions for a given task.
A React paradigm is used where the agent decides the appropriate action based on the object.
An intermediate action grammar is used instead of directly generating JavaScript code.
The action grammar allows for type checking and verification of safety.
The grammar is compiled into actual DOM level events for the website.
The agent is trained to always try to come as close as possible to the original objective and not give up easily.
Positive self-talk and grit are important components in training the agent.
There are three modules: representation, planner, and action module.
Initially, everything except the planner was heuristic code, but now it's transitioning to models.
A combination of screenshot and DOM is used for representation, with embeddings from both sources combined together.
The representation is text-based for easy analysis and understanding of what's happening.
IDs are assigned to each element in the semantic representation, allowing predictions on which ID to choose.
Coordinates can also be predicted if needed for moving the cursor or making choices at specific locations.
GPT-4 or similar models are used for central planning tasks due to their long-term reasoning and planning capabilities.
[Speaker] has experimented with fine-tuning their own model using collected interaction data.

Ensuring task success and user safety

27:09 - 33:59

Recursive delegation was used to build a product, but it may not be the right approach.
Determining success or failure for tasks is challenging.
Automatic detection of success or failure is being experimented with, along with a separate critic module.
Booking a flight can be determined as successful if a confirmation code is received, but there are still questions about whether it met the user's requirements.
The future of memory systems on a per-user basis is being explored.
Tasks like finding URLs for liked LinkedIn posts can be difficult to ground and determine success.
Having a separate critic agent or validation agent can help verify if the task was completed correctly.
User acknowledgement flows can be implemented to ensure safe completion of tasks.
Recording manual interventions and using them for learning and improvement is planned.
The vision is to have tasks run in the background and notify the user when they are done.

Features and future plans of MultiOn

33:30 - 40:03

Reliable delivery of tasks in the background with notifications
Cloud mode allows for running tasks remotely and receiving notifications
Memory feature includes scratch pad for personal details and preferences
Agent actively learns user preferences over time
Skills are high-level actions or functions that can be customized
Combination of scripting and natural language in current skills
Plans to develop an ecosystem for users to create and share skills

Building skills and ensuring safety

39:34 - 46:11

The agent can pull in the best tool automatically for each website, allowing it to improve quickly and receive contributions from users.
High-level instructions have been effective in guiding the agent's behavior and improving its performance.
Skills are not stored as actual procedures but are dynamically generated at runtime.
Natural language descriptions of skills can be used to generate the corresponding action code at runtime and cache it for future use.
Keeping everything as high-level natural language instructions allows for easy adaptation to changes in websites or flows.
The platform aims to remove technical barriers by enabling non-technical users to modify and build on top of existing skills.
To address malicious skills, the platform is currently in a closed beta phase with safety checks in place. A Stanford launch is planned for the next month before wider release.
Phased launches help ensure security and allow for improvements based on user feedback.
Protective measures can be implemented, such as making certain skills unmodifiable or restricting skill modifications to specific users.

Safety considerations and future developments

45:50 - 53:02

Skills can be made protected to prevent modification by unauthorized users.
Prompt injection attacks are a concern, and precautions are taken to ensure safety.
Actions in Multi-On are type checked and simulated before execution to detect any malicious code or privacy violations.
The level of safety required depends on the risk involved, with consumer settings having lower risks compared to enterprise settings.
Blacklisted websites, such as banking sites, are considered high-risk and precautions are taken accordingly.
Multi-On is being continuously improved to provide a better user experience and address potential issues.
For simple tasks, Multi-On is almost ready for non-technical users within the next three months.
More complex tasks may require breakthroughs in foundation models for advanced reasoning capabilities.
The future of work with AI assistants like Multi-On is seen as a supplementary aid initially, but it may evolve to rival human employees over time.

The vision for AI agents and their role in the future

52:34 - 59:19

Initially, the AI technology is seen as a supplementary aid for professionals and assistants to simplify their lives in terms of scheduling.
The evolution of AI is moving towards humans becoming more like organizers and coordinators, while AI handles the actual interactions and web navigation.
Safety measures in AI agents include having a kill switch and preventing agents from modifying their own source code.
The current kill switch for AI agents is running out of budget or disabling interaction with certain systems.
The goal of Multion is to simplify and automate mundane tasks in life by intrinsically understanding users' needs and automating delegated tasks.
The vision for the future is to unlock parallelism for humanity by coordinating multiple AI agents to work simultaneously on different tasks.