Papers Read on AI on Scribbler

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

voice interactionsspeech recognitionvoice generationmultilingualemotion recognition

FunAudio LLM introduces innovative models, SenseVoice and CozyVoice, for enhancing natural voice interactions between humans and large language models. SenseVoice offers multilingual speech recognition, emotion recognition, and audio event detection with low-latency OSR for multiple languages. CozyVoice excels in multilingual voice generation, zero-shot learning, cross-lingual voice cloning, and instruction-following capabilities. The models related to SenseVoice and CozyVoice have been open-sourced on ModelScope and HuggingFace. FunAudio LLM enables applications like speech-to-speech translation, emotional voice chat, interactive podcasts, and expressive audiobook narration by integrating these models with LLMs.

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

large-language modelsLATS frameworkMonte Carlo tree searchdecision-makingreasoning abilities

Large-language models (LLMs) have limitations in acting processes, leading to the proposal of LATS, a framework that combines planning, acting, and reasoning abilities. LATS utilizes Monte Carlo tree search to enhance decision-making by repurposing LLMs' strengths for diverse domains like programming and web browsing. Existing methods augmenting LLMs with external feedback fall short of human-like deliberate decision-making characteristics, prompting the development of LATS for autonomous reasoning. LATS outperforms previous models on tasks like Hotpot QA and webshop navigation by doubling performance and raising average scores significantly. The framework combines reasoning, acting, and planning to adaptively solve problems using heuristics from LLMs while integrating external feedback for enhanced model sensibility.

Papers Read on AI

Papers Read on AI

Fri Jul 26 2024

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Papers Read on AI

Thu Jul 25 2024

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Papers Read on AI

Wed Jul 24 2024

Patch-Level Training for Large Language Models

Papers Read on AI

Tue Jul 23 2024

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Papers Read on AI

Mon Jul 22 2024

IMAGDressing-v1: Customizable Virtual Dressing

Papers Read on AI

Fri Jul 19 2024

A Comprehensive Survey on Human Video Generation: Challenges, Methods, and Insights

Papers Read on AI

Thu Jul 18 2024

Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence

Papers Read on AI

Tue Jul 16 2024

SEED-Story: Multimodal Long Story Generation with Large Language Model

Papers Read on AI

Mon Jul 15 2024

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Papers Read on AI

Fri Jul 12 2024

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control