February 10, 2023

第1728期:Google AI Tool Creates Music from Written Descriptions

6 minutes

This week, Google researchers published a paper describing results from an artificial intelligence (AI) tool built to create music. The tool, called MusicLM, is not the first AI music tool to launch. But the examples Google provides demonstrate musical creative ability based on a limited set of descriptive words. AI shows how complex computer systems have been trained to behave in human-like ways.

本周，谷歌研究人员发表了一篇论文，描述了一种用于创作音乐的人工智能 (AI) 工具的结果。这个名为 MusicLM 的工具并不是第一个推出的 AI 音乐工具。但是谷歌提供的例子展示了基于一组有限的描述性词语的音乐创作能力。人工智能展示了复杂的计算机系统是如何被训练成以类似人类的方式行事的。

Tools like ChatGPT can quickly produce, or generate, written documents that compare well with the work by humans. ChatGPT and similar systems require powerful computers to operate complex machine-learning models. The San Francisco-based company OpenAI launched ChatGPT late last year. Developers train such systems on huge amounts of data to learn methods for recreating different forms of content. For example, computer-generated content could include written material, design elements, art or music.↳ ChatGPT has recently received a lot of attention for its ability to generate complex writings and other content from just a simple description in natural language.

像 ChatGPT 这样的工具可以快速生成或生成与人类工作相媲美的书面文档。ChatGPT 和类似系统需要强大的计算机来运行复杂的机器学习模型。总部位于旧金山的 OpenAI 公司去年年底推出了 ChatGPT。开发人员使用大量数据训练此类系统，以学习重新创建不同形式内容的方法。例如，计算机生成的内容可能包括书面材料、设计元素、艺术或音乐。↳ ChatGPT 最近因其能够从自然语言的简单描述中生成复杂的文字和其他内容而受到广泛关注。

Google engineers explain the MusicLM system this way: First, a user comes up with a word or words that describe the kind of music they want the tool to create. For example, a user could enter this short phrase into the system: “a continuous calming violin backed by a soft guitar sound.” The descriptions entered can include different music styles, instruments or other existing sounds.

谷歌工程师这样解释 MusicLM 系统：首先，用户想出一个或多个词来描述他们希望该工具创建的音乐类型。例如，用户可以在系统中输入这个短语：“柔和的吉他声伴随着持续平静的小提琴。”输入的描述可以包括不同的音乐风格、乐器或其他现有声音。

Several different music examples produced by MusicLM were published online. Some of the generated music came from just one- or two-word descriptions, such as “jazz,” “rock” or “techno.” The system created other examples from more detailed descriptions containing whole sentences. In one example, Google researchers include these instructions to MusicLM: “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds…”

MusicLM 制作的几个不同的音乐示例已在线发布。一些生成的音乐仅来自一两个词的描述，例如“爵士乐”、“摇滚”或“电子音乐”。该系统根据包含整个句子的更详细的描述创建了其他示例。在一个示例中，谷歌研究人员将这些说明包含在 MusicLM 中：“街机游戏的主要配乐。它节奏快且乐观，带有朗朗上口的电吉他即兴重复段。音乐是重复的，容易记住，但有意想不到的声音……”

In the resulting recording, the music seems to keep very close to the description. The team said that the more detailed the description is, the better the system can attempt to produce it. The MusicLM model operates similarly to the machine-learning systems used by ChatGPT. Such tools can produce human-like results because they are trained on huge amounts of data. Many different materials are fed into the systems to permit them to learn complex skills to create realistic works. In addition to generating new music from written descriptions, the team said the system can also create examples based on a person’s own singing, humming, whistling or playing an instrument.

在最终的录音中，音乐似乎与描述非常接近。该团队表示，描述越详细，系统就越能尝试生成它。 MusicLM 模型的运行类似于 ChatGPT 使用的机器学习系统。这些工具可以产生类似人类的结果，因为它们接受了大量数据的训练。许多不同的材料被输入系统，使他们能够学习复杂的技能来创作逼真的作品。该团队表示，除了根据书面描述生成新音乐外，该系统还可以根据一个人自己的歌唱、哼唱、吹口哨或演奏乐器来创建示例。

The researchers said the tool “produces high-quality music...over several minutes, while being faithful to the text conditioning signal.” At this time, the Google team has not released the MusicLM models for public use. This differs from ChatGPT, which was made available online for users to experiment with in November. However, Google announced it was releasing a “high-quality dataset” of more than 5,500 music-writing pairs prepared by professional musicians called MusicCaps. The researchers took that step to assist in the development of other AI music generators.

研究人员表示，该工具“可以在几分钟内产生高质量的音乐，同时忠实于文本调节信号。”目前，谷歌团队尚未发布 MusicLM 模型供公众使用。这与 ChatGPT 不同，ChatGPT 于 11 月在线提供给用户进行试验。然而，谷歌宣布它正在发布一个名为 MusicCaps 的由专业音乐家准备的超过 5,500 对音乐创作对的“高质量数据集”。研究人员采取这一步骤是为了协助开发其他人工智能音乐生成器。

The MusicLM researchers said they believe they have designed a new tool to help anyone quickly and easily create high-quality music selections. However, the team said it also recognizes some risks linked to the machine learning process. One of the biggest issues the researchers identified was “biases present in the training data.” A bias might be including too much of one side and not enough of the other. The researchers said this raises a question “about appropriateness for music generation for cultures underrepresented in the training data.”

MusicLM 研究人员表示，他们相信他们已经设计出一种新工具，可以帮助任何人快速轻松地创建高质量的音乐选集。然而，该团队表示，它也认识到与机器学习过程相关的一些风险。研究人员发现的最大问题之一是“训练数据中存在偏差”。偏见可能包括一侧太多而另一侧不够。研究人员表示，这提出了一个问题，“关于在训练数据中代表性不足的文化，音乐生成的适当性”。

The team said it plans to continue to study any system results that could be considered cultural appropriation. The goal would be to limit biases through more development and testing. In addition, the researchers said they plan to keep improving the system to include lyrics generation, text conditioning and better voice and music quality.

该团队表示，它计划继续研究任何可被视为文化挪用的系统结果。目标是通过更多的开发和测试来限制偏见。此外，研究人员表示，他们计划不断改进该系统，包括歌词生成、文本调节以及更好的语音和音乐质量。

...more