Stability AI Unveils Stable Audio, a Breakthrough in AI-Powered Sound Creation

In Brief

Today, Stability AI officially introduced its first dedicated product aimed at revolutionizing music and sound generation, named Stable Audio.

Users have the ability to input text prompts to produce audio tracks of their preferred duration.

The foundational model was developed using a wealth of music and accompanying metadata sourced from the music library called AudioSparx.

Stability AI , the generative tech firm renowned for Stable Diffusion, has recently launched its first AI-driven product focused on sound and music generation. Stable Audio . This new offering is tailored for musicians seeking to create unique samples and audio tracks. According to the company, users are invited to provide text prompts, generating audio pieces of their desired length.

For instance, one could input a phrase like, 'Post-Rock, Guitars, Drum Kit, Bass, Strings, Euphoric, Up-Lifting, Moody, Flowing, Raw, Epic, Sentimental, 125 BPM' and request a 95-second track,” as stated by Stability AI in a blog post .

The company also shared an insightful video demonstrating how prompt-based music generation operates:

"We aspire that Stable Audio will enable music lovers and creators to easily craft new content with AI at their side, paving the way for endless creativity and innovation,” remarked Emad Mostaque, CEO of Stability AI. professionals Stability AI reveals that its base model was constructed utilizing music data and metadata from AudioSparx. The firm asserts that the Stable Audio model can produce 95 seconds of stereo audio at a 44.1 kHz sample rate in under a second when operated on an NVIDIA A100 GPU.

According to Stability AI, the architecture of Stable Audio consists of latent diffusion models, which include several elements akin to Stable Diffusion. These elements consist of a Variational Autoencoder (VAE), a text encoder, and a diffusion model conditioned on U-Net.

Throughout this process, the VAE converts stereo audio into a compact, resilient, and reversible lossy latent encoding. This technique allows for a quicker generation and training process compared to handling raw audio samples directly.

As per a research report The latent diffusion structure smartly utilizes audio data, considering factors like text metadata, file duration, and starting point. This enables precise control over both the sound content and duration of the generated pieces. To integrate text prompts with the model, the audio platform employs a frozen text encoder from a

model that was meticulously crafted from the dataset itself. CLAP A complimentary version of Stable Audio is available, albeit with limited features, enabling users to create and download tracks up to 20 seconds long. Alternatively, a 'Pro' subscription is offered, extending track lengths to 90 seconds, ideal for commercial use.

Stable Audio represents the latest addition in a series of AI innovations released by Stability AI this year. Just in August, the company introduced a

, please be reminded that the content presented on this page is for informational purposes only and should not be construed as legal, tax, investment, financial, or any other form of advice. It is essential to only invest what you can afford to lose and to seek independent financial advice if unsure. For more details, it's recommended to consult the terms and conditions alongside the help and support pages provided by the issuer or advertiser. MetaversePost is dedicated to delivering accurate and impartial news, although market conditions can fluctuate without warning. Japanese language model and Stable Chat , which aims to rival ChatGPT.

Tags:

Search

Hack Seasons Airdrops Calendar Cindy is a contributing journalist at Metaverse Post, focusing on web3, NFTs, the metaverse, and AI, with an emphasis on conducting interviews with key industry figures in Web3. She has had the pleasure of engaging with over 30 C-level executives, sharing their invaluable insights with our readers. Originally hailing from Singapore, Cindy currently resides in Tbilisi, Georgia. She earned a Bachelor's degree in Communications & Media Studies from the University of South Australia and boasts a decade's worth of experience in journalism and writing.