Dwarkesh Podcast

Eliezer Yudkowsky - Why AI Will Kill Us, Aligning LLMs, Nature of Intelligence, SciFi, & Rationality

Thu Apr 06 2023

Concerns about AI and its impact on society

The speaker, Eliezer Yudkowski, wrote an article calling for a moratorium on further AI training runs.
They were surprised to find that normal people were more willing to entertain the idea.
Concerns exist about the speed at which technology is advancing and the potential negative outcomes that may result.
The development of GPT-5 is uncertain, and it is unclear what impact it will have on society.
Training algorithms continue to improve, even if a hard limit is put in place.

Potential Hail Mary Passes for human intelligence enhancement

Neurofeedback and reinforcement learning from human feedback are discussed as potential methods for enhancing human intelligence.
Another potential method is to slice, scan, simulate, and upgrade human brains or brain uploads.
The topic of orthogonality is introduced, with a question about breeding humans.

AI's ability to predict human conversation and behavior

The podcast discusses the ability of AI to predict human conversation and behavior.
The AI is trained to switch rapidly between different personas based on the conversation.
The training process is not similar to how humans are raised or evolved, but it may be more likely to produce alignment than a black box system.

Reflections on thoughts and desires

The speaker reflects on their own thoughts and desires to rearrange their thought processes.
They question how they can change the world and sometimes try to persuade others.
The system has capabilities that are not solely focused on pretending to be the speaker.

Concerns about human evolution and reproduction

Humans are not orthogonal to the evolutionary process that produced them, and as they become smarter, they become more out of distribution from inclusive genetic fitness.
While most humans still want kids and care for their offspring, there may be sentimental attachment or credibility problems that lead them to choose less intelligent or sicker children.

AI alignment challenges

Aligning AI is a dangerous task that requires understanding of AI design, human psychology, game theory, and adversarial situations.
Verification is easier than generation in most domains, making alignment a nightmare.
Confirming new solutions in alignment may be easier than generating new solutions.
Effective altruism has been debating different proposals for AI alignment.

Challenges in understanding AI systems

The thought processes in AI are difficult to describe in human terms.
There is a limit on serial depth, but the AI can keep context and simulate humans who write text on the internet.
The cognitive capacity to do what we think AI can't do is present in the system.

Potential dangers of GPT and misaligned AI

The discussion revolves around the potential dangers of GPT and its ability to plan schemes without being detected.
The Visible Thoughts Project was an attempt to build a dataset that would encourage large language models to think out loud where we could see them by recording humans thinking about out loud about a storytelling problem.
The alignment of AI systems is not transparent in the code.

Predicting the future of intelligence and AI

The speaker is no longer willing to say that GPT-6 won't end the world.
They expect more incremental takeoffs in AI capabilities rather than sudden jumps.
Large language models like GPT-4 are hanging out in a weird near-human space that was hard to visualize.
The speaker's model of intelligence predicts that there will be a big jump in capability at some point.

Regulation and global cooperation for AI

The speaker suggests that global regulation may be necessary to prevent the proliferation of dangerous AI.
It is unlikely that the government will allocate a large sum of money towards aligning AI.

Uncertainty about the future and predicting outcomes

Concrete predictions can establish a track record, but it may not be useful in the end.
The end result of AI is smarter than the scaling process.
Predicting the endpoint is easier than predicting the process.

Challenges in AI alignment and understanding human values

The conversation revolves around the limitations and constraints of human-level AI.
The analogy of Oppenheimer's role in the Manhattan Project is used to explain how a powerful mind can be given authority over a specific task while being constrained by broader limitations.
The argument is that the capability of AI is constrained, and it will not become less aligned with humanity as it gets smarter.

Challenges in finding replacements and mentoring

The speaker discusses the challenges of mentoring and finding replacements for oneself.
They mention that most people do not have sufficient writing skills to replace them in their work.
They discuss the possibility of a correlation between their health issues and their difficulty in finding replacements.

Focus on AI projects and civilization

The focus of the podcast is on civilization and AI projects.
The main reason for focusing on AI was due to a lack of time to improve civilization.
In 2015-2017, there were concerns about things moving faster than anticipated, but things slowed down in 2019-2020.

Morality, intelligence, and human goals

Increasing intelligence may shift a human's morality and make them nicer, but this may not be the case for arbitrary minds.
Education and knowledge can improve human goals and abilities to achieve them, leading to moral betterment.
Large language models change their preferences as they get smarter, but at some point, the system crystallizes.

Potential dangers of nanotechnology and shifts in order

The speaker discusses the potential dangers of nanosystems becoming replicators and turning the world into goo.
They express skepticism about this scenario but acknowledge that it may be more convincing to those who are not familiar with nanotechnology.
The conversation then shifts to the idea of a shift in the foundations of order in the universe.

Challenges in AI alignment and understanding human values

The speaker suggests training on real and fictional people who argue validly to boost the system's performance.
They propose filtering training data to focus on nice, kind, and careful examples rather than including all darkness.
The speaker acknowledges that alignment is not easy and cannot be achieved by simply getting what you want.

Rationality, decision-making, and optimization

The podcast discusses the concept of rationality as a systematized winning approach, rather than a life philosophy or trying hard at various things.
Adopting the philosophy of Bayesianism may lead to more concrete wins, but only in scattered bits and pieces.
The principle of not updating in a predictable direction and jumping ahead to maximize a criterion is discussed.

Challenges in predicting the future and uncertainty

Uncertainty about the future exists over a space of possibilities.
There is no simple solution to uncertainty.
The chance of a good outcome is 33% when there is one possible good outcome and two possible bad outcomes.
People who say they have a 50/50 chance of getting a good or bad outcome from something may be misunderstanding probability.

Challenges in predicting the future and uncertainty

Natural selection was a controversial theory because advanced predictions were harder to come by compared to theories like gravity.
The theory of Darwinian selection is more well-developed than its precursors, but there is still no strong conclusion about intelligence.
There was a debate between two people about the possibility of single AI systems that can do everything.

Challenges in conveying ideas and writing fiction

The speaker plans to have their character launch into lectures in their writing.
They are proud of the life or death battles of wits they've written and believe it's a unique plot device.
Nonfiction is more organized than fiction, which is easier to write but less organized as knowledge.

Challenges in conveying ideas and writing fiction

The speaker did not manage to put vast quantities of shards in something.
It is uncertain if there is a long list of Nobel laureates who have read H.P.M.O.R. due to delay times on granting prizes.
The speaker has spent many years trying to tackle a big problem but cannot solve it with one sentence.

Challenges in conveying ideas and writing fiction

The speaker believes that a technically feasible path to align AI with human values exists, but it is not likely to be pursued.
The success of this path depends on approaching the problem from the right angle and paying attention to warning signs.
Breaking down the problem into stages does not necessarily lead to accurate probability estimates.

Ethical considerations and utility functions

The conversation revolves around the compatibility of a certain utility function with the flourishing of humanity.
There are many possible utility functions, including one where humans are kept in a zoo-like environment.
Keeping humans in a zoo is not the best outcome for humanity, but it allows for survival and some level of flourishing.

Predicting the future of intelligence and AI

The conversation is about how to reason about what general intelligence will do in the future.
One person argues that assuming what might happen in the future as part of an answer seems like a bad way to test what will happen in the future.
They discuss evidence for and against certain futures, acknowledging that evidence against a future is not evidence for another future.

Predicting the future of intelligence and AI

The speakers discuss their recent debate and how it was received by others.
They touch on the importance of AI and deep learning, and how different experts have varying opinions on the subject.
The conversation shifts to the benefits of conveying ideas through fiction rather than nonfiction, including the added element of plot and characters.

Challenges in predicting the future and uncertainty

The speaker argues that even if humans become much smarter, there will still be spruce trees millions of years in the future.
They believe that it's important to acknowledge human psychology's research will still exist as evidence of generality arising.
The speaker questions whether humans being kept alive in jars or living the same day over and over again is a good outcome.

Challenges in predicting the future and uncertainty

The perspective of a generic bacterium may be the same as that of a spruce tree.
Maximizing a criterion requires considering what else satisfies it, not just arguing for what one thinks is a good idea.
Narrowing down AI to end up where one wants it to may not be efficient in maximizing its utility function.