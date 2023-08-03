On Wednesday, Meta revealed that it would make AudioCraft, a collection of generative AI tools for producing music and audio from text prompts, available to the public. Content producers can use the tools to create elaborate audio landscapes, compose songs, and even mimic complete virtual orchestras by entering basic language descriptions.

EnCodec is a neural network-based audio compression codec, MusicGen is a tool that can construct musical compositions and melodies from descriptions, and AudioGen is a program that can generate various audio effects and soundscapes. These three fundamental components make up AudioCraft.

EnCodec, which we first discussed in November, has just been enhanced, according to Meta, and now enables “higher quality music generation with fewer artifacts.” A dog barking, a car horn honking, or footsteps on a wooden floor are just a few of the audio sound effects that AudioGen can produce. A song can be created by MusicGen from scratch in a variety of genres based on specifications like “Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach.”

On its website, Meta has posted a number of audio samples for consideration. The outcomes appear to be in accordance with their cutting-edge labeling, but it is debatable if they are quite high quality enough to take the place of commercial audio effects or music that has been expertly crafted.

Despite receiving a lot of attention and being very simple for users to interact with online, generative AI models focused on text and still images have fallen behind in development, according to Meta. The authors argue that while there is some work available, it is “highly complicated and not very open, so people aren’t able to readily play with it.” However, they anticipate that by making accessible tools for audio and musical experimentation available through the MIT License, the release of AudioCraft will benefit the larger community.

“The models are available for research purposes and to further people’s understanding of the technology. We’re excited to give researchers and practitioners access so they can train their own models with their own datasets for the first time and help advance the state of the art,” Meta said.

When it comes to experimenting with audio and music generators powered by AI, Meta is not the only business to do so. OpenAI introduced its Jukebox in the year 2020; Google introduced MusicLM in the month of January; and in December of last year, an independent research team developed a text-to-music creation platform called Riffusion by employing a Stable Diffusion basis. These are just some of the more notable recent initiatives.

In spite of the fact that none of these generative audio projects have received as much attention as picture synthesis models, the process of building them is not any less hard, as Meta observes on its website:

It is noteworthy that Meta claims that MusicGen was trained on 20,000 hours of music that is either owned by Meta or licensed specifically for this purpose. This claim comes at a time when there has been controversy surrounding the use of undisclosed and possibly unethical training material in the creation of image synthesis models such as Stable Diffusion, DALL-E, and Midjourney. On the surface, this appears to be a step in a more ethical direction, which may appease some of the individuals who have reservations about generative AI.

Check Out: Meta Unveils Open Source AI Music Generator.