AI进化太快了，新项目已开源！

科技

2023-06-13 08:06

大家好，我是 Jack。

这两天，Meta 开源了一个 AI 生成音乐工具 Audiocraft，又火了。

今天跟大家聊聊！

一、Audiocraft

我们先看下效果：

输入文本描述：

80s pop track with bassy drums and synth

翻译：

80年代流行歌曲，有低音鼓和合成音

输出结果：

输入文本描述：

90s rock song with loud guitars and heavy drums

翻译：

90年代的摇滚歌曲，伴随着响亮的吉他和沉重的鼓声

输出结果：

此外还支持上传已有的音乐作为参照，生成音乐。

Audiocraft 主要是利用了一个名叫 MusicGen 的生成模型。这个 MusicGen 是一个单级子回归的 Transformer 模型，在 32kHz EnCodec 分词器上训练得到，具备 4 个以 50Hz 采样的码本。

Meta 团队一共使用了两万小时的音乐。

这些音乐数据在 32kHz 下被重新采样，每个音乐都有对应为本文描述。

除了训练数据，还有评价数据集 MusicCaps。

MusicCaps 由 5500 条专业作曲家谱写的 10 秒长的音乐构成。

音乐的 Encode 阶段，通过实验对四种不同的方式进行了测试。

Transformer 部分训练了 300M、1.5B、3.3B 三个不同参数量的自回归式 Transformer。

感兴趣的小伙伴，可以直接在线体验：

https://huggingface.co/spaces/facebook/MusicGen

当然，也可以离线部署，项目地址：

https://github.com/facebookresearch/audiocraft

本地部署的方法也不复杂。

1、先创建虚拟环境：

conda create -n musicgen python=3.9

2、安装依赖：

# Best to make sure you have torch installed first, in particular before installing xformers.
# Don't run this if you already have PyTorch installed.
pip install 'torch>=2.0'
# Then proceed to one of the following
pip install -U audiocraft  # stable release
pip install -U git+https://[email protected]/facebookresearch/audiocraft#egg=audiocraft  # bleeding edge
pip install -e .  # or if you cloned the repo locally

3、然后下载模型权重：

这个在项目的说明文档里就能找到。

4、使用 api 生成音乐。

import torchaudio
from audiocraft.models import MusicGen
from audiocraft.data.audio import audio_write

model = MusicGen.get_pretrained('melody')
model.set_generation_params(duration=8)  # generate 8 seconds.
wav = model.generate_unconditional(4)    # generates 4 unconditional audio samples
descriptions = ['happy rock', 'energetic EDM', 'sad jazz']
wav = model.generate(descriptions)  # generates 3 samples.

melody, sr = torchaudio.load('./assets/bach.mp3')
# generates using the melody from the given audio and the provided descriptions.
wav = model.generate_with_chroma(descriptions, melody[None].expand(3, -1, -1), sr)

for idx, one_wav in enumerate(wav):
    # Will save under {idx}.wav, with loudness normalization at -14 db LUFS.
    audio_write(f'{idx}', one_wav.cpu(), model.sample_rate, strategy="loudness", loudness_compressor=True)

二、总结

AI 可以生成图片、声音、音乐，距离直接生成视频，又前进了一步呢。

好了，今天就聊这么多吧。

我是 Jack，我们下期见！

·················END·················

微信扫码关注该文公众号作者

戳这里提交新闻线索和高质量文章给我们。

来源: qq

点击查看作者最近其他文章