You have 4 summaries left

The Daily

The Sunday Read: ‘Wikipedia’s Moment of Truth’

Sun Sep 10 2023
AIWikipediaknowledgechatbotsinformationtechnology

Description

The episode explores the relationship between artificial intelligence (AI) and Wikipedia. It highlights the importance of Wikipedia in building AI models, the role of AI in Wikipedia's evolution, challenges faced by Wikipedia due to AI, improvements in AI systems, the role of the Wikimedia plugin, and insights from a dedicated Wikipedia editor. The future of AI and Wikipedia is also discussed, including potential regulatory frameworks and the need for alignment between AI systems and humans.

Insights

Wikipedia's Crucial Role in AI Development

Wikipedia provides vast knowledge banks for large-language models and has become a crucial source for search engines, virtual assistants, and AI models training data.

Concerns About AI's Impact on Wikipedia

AI chatbots can simplify the world, hallucinate, and provide factually unreliable information. The death of Wikipedia is possible if it is outflanked by AI that cannibalizes its data.

The Evolution of Wikipedia with AI

AI tools have been used by Wikipedia since 2002 for content review and translation. The rollout of chat GPT has gained millions of users, but there are concerns about users choosing chat GPT over Wikipedia entries.

Challenges Faced by Wikipedia Due to AI

The close relationship between search engines and Wikipedia raises existential questions for Wikipedia. AI-generated summaries lack citations and grounding in literature compared to search engine summaries.

Improvements in AI Systems

Higher quality data and retrieval elements can improve accuracy and reduce biased answers in AI systems. OpenAI's newer model, GPT-4, has made significant improvements in factual content.

The Role of the Wikimedia Plugin

The Wikimedia plugin allows chat GPT to access up-to-date information from Wikipedia, improving factuality and currency of answers. It also helps protect Wikipedia's future.

The Future of AI and Wikipedia

The future will likely involve a range of information sources including chat GPT, Wikipedia, Reddit, TikTok, etc. Regulatory frameworks may force tech companies to label AI-generated content and disclose more information about AI training data.

Insights from a Dedicated Wikipedia Editor

Jade, a dedicated Wikipedia editor, believes in sharing knowledge and is passionate about her work. She acknowledges the need for ethical discussions regarding AI but remains optimistic about the future of volunteer work on Wikipedia.

Chapters

  1. The Importance of Wikipedia in AI Development
  2. The Role of AI in Wikipedia's Evolution
  3. Challenges and Concerns with AI and Wikipedia
  4. Accuracy and Improvements in AI Systems
  5. The Role of Wikimedia Plugin and Future Developments
  6. The Future of AI and Wikipedia
  7. The Perspective of a Wikipedia Editor
Summary
Transcript

The Importance of Wikipedia in AI Development

00:04 - 09:05

  • Wikipedia is central to building artificial intelligence by providing vast knowledge banks for large-language models.
  • Large-language models ingest a trillion words from public sources like Wikipedia, Reddit, and Google's patent database.
  • Wikipedia is highly formatted and contains a tremendous amount of factual information maintained by a community of active editors.
  • AI chatbots aim to converse fluently but often simplify the world and can hallucinate or conjure falsehoods.
  • If AI becomes the primary source of information, our knowledge could become factually unreliable.
  • A Wikipedia editor expressed concern about AI's impact on Wikipedia and the future of human-created knowledge.
  • GPT-3, a precursor to new chatbots, has potential but mixes fictional elements with factual answers.
  • Wikipedia remains a throwback to the Internet's early days with open collaboration and free access to human knowledge.
  • Wikipedia does not run ads and its contributors are unpaid, making its success surprising in capitalist terms.
  • Wikipedia has become a crucial source for search engines, virtual assistants, and AI models training data.
  • The death of Wikipedia outcome is possible if it is outflanked by AI that cannibalizes its data.

The Role of AI in Wikipedia's Evolution

08:36 - 16:53

  • A computer intelligence, plugged into the web, can summarize source materials and news articles instantly.
  • Some participants in a conference call expressed confidence that AI tools would expand Wikipedia's articles and global reach.
  • Others worried about users choosing chat GBT over Wikipedia entries.
  • Wikipedia wants knowledge to be created by humans.
  • The Wikimedia Foundation explored how Wikipedia could evolve by 2030 to protect and share information.
  • Trends like online misinformation require more vigilance from Wikipedia.
  • Artificial intelligence is improving at a rate that could change knowledge gathering and synthesis.
  • The rollout of chat GPT gained an estimated 100 million users within two months of its release in late 2022.
  • Wikipedia has used AI tools and bots since 2002 for content review and translation.
  • Fledgling AI systems were trained on Wikipedia's articles in the past decade, but now they take in larger amounts of information from various sources.
  • Wikipedia remains one of the largest single sources for large language models (LLMs).
  • The quality of data a model trains on affects the accuracy and coherence of its responses.
  • Wikipedia's goal is to spread knowledge broadly and freely, even if it is being repurposed by companies with different objectives.
  • There is an interdependence between Google and Wikipedia, where Google search results benefit from Wikipedia's contributions, while Wikipedia receives most of its traffic from Google.
  • Wikimedia Enterprise was created to sell access to APIs that provide accelerated updates to Wikipedia articles.

Challenges and Concerns with AI and Wikipedia

16:32 - 24:56

  • One upshot from the collision with Google and others who repurposed Wikipedia's content was the creation two years ago of Wikimedia Enterprise, a separate business unit that sells access to a series of application programming interfaces that provide accelerated updates to Wikipedia articles.
  • The enterprise unit is either a more formalized way for tech companies to direct the equivalent of large charitable donations to Wikipedia or a way for Wikipedia to recoup some of the financial value it creates for the digital world and thus help fund its future operation.
  • Wikipedia's openness allows any tech company to access Wikipedia at any time, but the APIs make new Wikipedia entries almost instantly readable, speeding up connection.
  • The close relationship between search engines and Wikipedia has raised existential questions for Wikipedia as reduced traffic may oversimplify our understanding of the world, make it difficult to recruit new contributors, and result in fewer donations.
  • AI-generated summaries lack citations and grounding in literature compared to search engine summaries which offer links back to Wikipedia, making AI more difficult and potentially harmful from Wikipedia's perspective.
  • These technologies are highly self-destructive as they threaten to obliterate the very content they depend upon for training.
  • Wikipedia has flaws such as gender and racial gaps in coverage, short articles that can be inaccurate or vandalized, but it acts as a consensus truth and reality check in an era where facts are contested.
  • Wikipedia's transparency through footnotes, source materials, previous edits, experienced editors' intervention when needed, NPOV guidelines, and self-examination contribute to its truthfulness quotient.
  • Generative AI poses concerns at Wikipedia regarding health information accuracy and oversimplification of complex issues like medical advice or origins of conflicts.

Accuracy and Improvements in AI Systems

24:30 - 33:13

  • One worry about generative AI at Wikipedia is related to health information.
  • People might ask this technology for medical advice, which may be wrong and potentially harmful.
  • Stanford University scientists evaluated four AI-powered search engines and found that only about half of the sentences generated in response to a query could be fully supported by factual sources.
  • The low accuracy of chatbots is due to their probabilistic nature when choosing the next word in a sentence.
  • Accuracy should be a fundamental priority for AI, but big companies prioritize introducing AI products over reliability.
  • Improvements in accuracy and reducing biased answers can be achieved through higher quality data and retrieval elements for fact-checking in real time.
  • Market competition can also drive improvement in AI systems' truthfulness and accuracy.
  • OpenAI's newer AI model, GPT-4, has made significant improvements in factual content compared to earlier models.
  • GPT-4 still needs improvement in fixing hallucinations and providing complex, accurate answers to historical questions.
  • In the future, AI systems might differentiate between rigorous factual answers and more creative responses based on user queries.
  • A plugin developed by Wikimedia Foundation allows ChatGPT to access up-to-date information from Wikipedia for improved factuality and currency of answers.
  • The plugin directs a search for relevant Wikipedia articles that answer a chatbot query, improving the combinatoric experience of users.

The Role of Wikimedia Plugin and Future Developments

32:45 - 41:19

  • After the plugin found the relevant Wikipedia articles, it sent them to the bot, which in turn read and summarize them, then spit out its answer.
  • The plugin always forced chat GPT to append a note with links to Wikipedia entries, saying that its information was derived from Wikipedia.
  • The plugin allows users to engage with Wikipedia without actually being on the website.
  • Chatbots can be deceived by how a question is worded, resulting in false answers. Wikipedia helps by offering accurate information and linking to relevant articles.
  • The Wikimedia plugin is a significant move toward protecting Wikipedia's future.
  • AI models are being adapted for use by Wikipedia editors to aid new volunteers and predict article outcomes.
  • Tools are being developed to help maintain neutral point of view, detect AI-generated text, and improve content scrutiny on Wikipedia.
  • The future will likely involve a range of information sources including chat GPT, Wikipedia, Reddit, TikTok, etc.

The Future of AI and Wikipedia

40:53 - 49:37

  • The future of information includes options like chat GPT, Wikipedia, Reddit, and TikTok.
  • A dedicated plugin could improve chatbot answers for health, weather, and history questions.
  • Big tech companies are betting on new technologies despite their shortcomings or risks.
  • Wikipedia may need to adapt to the future created by AI rather than exert influence over it.
  • Regulatory frameworks in the EU and Congress may force tech companies to label AI-generated content and disclose more information about AI training data.
  • Legal scrutiny is increasing with lawsuits challenging the use of copyrighted images and personal data scraping.
  • Using Wikipedia's corpus without proper attribution may violate its terms of use.
  • The Wikimedia Foundation could argue for fair compensation from tech companies for API access and prominent attribution in chatbot answers.
  • Building a global encyclopedia without using Wikipedia's knowledge would be difficult for AI companies.
  • AI models trained solely on synthetic data can lead to chaos and misperception of reality.
  • Data from genuine human interactions will be increasingly valuable for future language models (LLMs).
  • Alignment between AI systems and humans is crucial to prevent damage or compromise of reliable knowledge systems like Wikipedia.
  • Human editors provide a basic level of alignment by default in summarizing information on Wikipedia.

The Perspective of a Wikipedia Editor

49:13 - 51:52

  • Jade, a dedicated Wikipedia editor, spends 10 to 20 hours a week editing Wikipedia because she believes in sharing knowledge and is passionate about it.
  • Jade works on various topics, including nature and birds, as well as the American Civil War entry on Wikipedia.
  • Her goal is to improve completeness and accuracy in the civil war article to achieve featured status on Wikipedia.
  • Jade's work receives millions of views annually, and she considers it an honor to have people reading her contributions.
  • She acknowledges the need for ethical discussions regarding using AI for creating Wikipedia articles but doesn't believe robots will fully replace humans on Wikipedia anytime soon.
  • Despite factual shortcomings, chatbot conversations are captivating and enchanting for millions of people.
  • Jade remains optimistic about the future of volunteer work on Wikipedia.
1