Microsoft has made a significant leap forward in AI speech generation with its VALL-E 2 text-to-speech (TTS) system. VALL-E 2 achieves human parity, meaning it can produce voices indistinguishable from real people. The system only needs a few seconds of audio to learn and mimic a speaker's voice.
Tests on speech datasets like LibriSpeech and VCTK showed that VALL-E 2's voice quality matches or even surpasses human quality. Features like 'Repetition Aware Sampling' and 'Grouped Code Modeling' allow the system to handle complex sentences and repetitive phrases naturally, ensuring smooth and realistic speech output.
Despite releasing audio samples, Microsoft considers VALL-E 2 too advanced for public release due to potential misuse like voice spoofing. This cautious approach aligns with the wider industry's concerns, as seen with OpenAI's restrictions on its voice technology.
While VALL-E 2 represents a significant breakthrough, it remains a research project for now. The development of AI continues apace, with companies striving to balance innovation with ethical considerations.
Зарегистрируйтесь по электронной почте сейчас для еженедельной акции акции
100% free, Unsubscribe any time!Add 1: Room 605 6/F FA YUEN Commercial Building, 75-77 FA YUEN Street, Mongkok KL, HongKong Add 2: Room 405, Building E, MeiDu Building, Gong Shu District, Hangzhou City, Zhejiang Province, China
Whatsapp/ тел: +8618057156223 * телефон: *: 0086 571 86729517 Tel in HK: 00852 66181601
Электронная почта: [email protected]