Technology

Meta’s AutoCraft – A New Text-to-Music Tool Powered by Artificial Intelligence

A new audio tool powered by Artificial Intelligence (AI) and called AudioCraft was recently released by Meta Platform, parent company to the likes of Facebook, Messenger, Instagram and WhatsApp.

AutoCraft has the ability to generate music based on text that simply explains the kind of audio the user needs. Imagine an entrepreneur who wants to add a soundtrack to a new video advertisement he wants to post on Instagram. He can do so with ease by using AutoCraft , which works not only for music creation but also for generating sound effects and for building compression algorithms all in the same place or code base. Moreover, AutoCraft allows rebuilding and reuse on top of what other users have previously created.

Overview of AudioCraft’s Three AI Models

Metal’s AudioCraft constitutes three AI models: AudioGen, MusicGen and EnCodec.

AudioGen – is a textually guided model, trained to create text-to-environmental and public-sound effects, such as birds chirping, river flowing, wind blowing, footsteps running, dogs barking or cars honking.

MusicGen – is a text-to-music generator trained to generate text-guided music. It’s a transformer model capable of generating high quality Meta-owned and licensed music from text prompts. The audio prompts are text descriptions that went through a frozen text encoder model. The purpose of which is to extract a series of hidden-state representations.

EnCodec– described as a streaming encoder-decoder architecture with 3 major components: 1) the encoder network 2) the vector quantization layer and 3) the decoder network.

Now here’s the good news, the Meta Platform has open-sourced all three AI models for research purposes. Open-source sharing gives practitioners and researchers the ability to train their own AI models using their own datasets. Doing so can help in furthering the advancement of AI-generated sound and music technology.