The AI Process of Art: Harmonious Resonance, Exploring Boundaries, Discovering More Possibilities | WISE2024 King of Business
The environment is constantly changing, and the times are always evolving. The "Business Kings" follow the trend of the times, insist on creating, and seek new driving forces. Based on the current major transformation of the Chinese economy, the WISE2024 Business Kings Conference aims to discover the truly resilient "Business Kings" and explore the "right things" in the Chinese business wave.
From November 28 to 29, the two-day 36Kr WISE2024 Business Kings Conference was grandly held in Beijing. As an all-star event in the Chinese business field, the WISE Conference has now reached its twelfth edition, witnessing the resilience and potential of Chinese business in an ever-changing era.
The impact of AI on various industries is gradually deepening, while also creating new possibilities. Currently in the art field, there are already targeted large-scale model products. Creators have also begun to learn how to coexist with AI. In this process, how does AI assist and even influence artistic creation? What problems have been encountered? What applications and concerns do front-line art practitioners have regarding AI technology? Shi Yi, the manager of Vibration Studio, Jia Shuo, the vice president of Qutoutiao Technology, Yang Sheng, the CEO of Bule Technology and a well-known game producer, and Zeng Yixiong, an independent musician and songwriter, have conducted in-depth discussions on this. The following is the recorded content of the discussion:
Shi Yi: I'm Shi Yi. I'm very glad to discuss this topic with all the teachers today. Just now, we also heard the song sung by Teacher Zeng Yixiong, which is very amazing. Today, Teacher Jia Shuo from Qutoutiao Technology has also come well-prepared. He used their product "Tianpule" to generate three AI music pieces based on the song just sung by Teacher Zeng Yixiong. Before we officially start the discussion, let's listen to these three songs together.
Three music pieces with different styles. I think they are very special. Their generation can show the very typical characteristics of different music. Teacher Zeng Yixiong, what do you feel after hearing these AI music pieces adapted from your lyrics?
Zeng Yixiong: When I first heard it, I was very surprised. Leaving aside the lyrics and explaining from the level of melody, I think it is a very mature work. Whether it is the melody or the motivation, it is very complete, and the three styles are also very clear.
Shi Yi: Because this is AI-generated music, it is different from the music you make. What is your usual process of making music, and what is the process like?
Zeng Yixiong: From the beginning of creation, I write songs more casually. I might like to run, take a walk, or when I'm on a plane, there might be a short melody, not a complete song, just a couple of lyrics and a couple of melodies. I want to develop this song. After I get home, I take out the instrument to write. After writing the whole song, I go to the studio to communicate with the musicians and the recording studio on how to record it completely.
Shi Yi: Have you used tools such as AI music to assist you in this process?
Zeng Yixiong: Not yet. I just use voice memos and text memos.
Shi Yi: Just now, when it was mentioned that these several pieces of music were generated, in fact, I have a particularly strong feeling. Because I have also used many AI music generators to generate some songs like playing music and listened to them. But after these songs are generated, there is a characteristic that I think is a pity, that is, in the aspect of human voice singing, it is very electronic and unnatural. The music generated by Tianpule has made up for and improved this point, and it is more human-like. I think this is quite remarkable. I would like to ask Teacher Jia Shuo, how does Qutoutiao Technology achieve such an effect?
Jia Shuo: Hello everyone! I'm Jia Shuo from Qutoutiao Technology. In fact, Shi Yi's question is quite relevant to the topic. Today, our theme is the right thing. In fact, every problem has two layers: whether there is the ability to solve this problem and whether there is the willingness to solve this problem. So simply put, the answer to this question is to effectively use the current large-scale model training technology. The problem of unnatural human voice can be overcome. But how to interpret the issue of ability and willingness here? First of all, it is necessary to have the ability to solve this problem. The entire development team needs to have a complete large-scale model architecture design and a complete set of training capabilities in data processing, pre-training, post-processing, etc., including the underlying funds and computing power resources. This is actually a relatively high-threshold ability.
The second point is the issue of willingness. Whether the human voice sounds natural or not is a meaningful issue from an artistic perspective. But if it is transformed into a scientific research problem, it may not be a particularly easy problem to quantify and index. Can a researcher or algorithm engineer solve this problem and publish a paper? Can they publish a thesis? I think there are quite a lot of teams with the ability, but the people who really have the willingness to solve it are how much they care about and respect music as an art. Fortunately, our product and engineer team at Qutoutiao are just a group of people with an artistic temperament.
Jia Shuo, Vice President of Qutoutiao Technology
Shi Yi: Many people in your team have a music background.
Jia Shuo: Yes, half of the team members have a strong music background, such as a Grade 10 in piano, a Grade 8 in guzheng, etc. Although it is AI, they treat it as their greatest work to create. Just like the problem of unnatural human voice mentioned earlier, although it is difficult to solve, precisely because it is the right thing, our team has the willingness to solve it. In terms of the final results presented, the current effect can be said to be competitive in the domestic AI music field.
Shi Yi: For example, before, the sound may not be so natural. How did this situation occur? And how did you solve them through specific methods? Is it only through algorithms, or are there different debugging methods?
Jia Shuo: An extendable issue is the development of AI music. In fact, its development has gone through several stages. People are more interested in this thing this year because in the past two years, the AIGC large-scale model technology has expanded to the music field. With the large-scale model technology, it is actually capable and possible to achieve a very good degree of the laws and naturalness of various aspects of music. The previous generation of AI is that people transformed their own summarized experience and laws into some "dogmas", input these "dogmas" to AI, and let AI execute the "dogmas" it understands. In this case, the final result is that it is more likely to appear that the human voice is not natural and the music theory is not very reasonable. But the most magical part of the entire large-scale model is that in fact, you only need to let it grow freely, just like raising a "child". What really needs to be done is to give it the resources for growth and broaden its horizons. There is no need to preach to it, and there is no need for so much "fatherly" tone. After it has seen more and learned more, it will learn by itself.
Shi Yi: It is equivalent to that it will grow and learn by itself to solve problems that could not be solved originally.
There is also a point that I am quite curious about. In addition to the human voice, because there are many components of music, including instrumental music. When we analyze music or feel music, we will also feel its musical style or rhythm. What I am more curious about is that in a good work, it may have a more complex musical structure, or the result of the fusion of many different musical styles. Can AI handle such problems? For example, we say that different instruments have different parts, how to organize these parts well, and how to reasonably arrange their melody lines according to different timbre characteristics and playing techniques. Even some music has a strong sense of rhythm, such as black music, jazz, and Latin. These are very complex parts. How does the current AI music technology handle this, and can it handle them well?
Jia Shuo: This question is quite representative. If it is done with the previous generation of AI technology, it is currently a rather complex thing to input very complex various experience and laws into the model. But now these problems are technically not a problem. But surely the reason why Shi Yi asked this question is that after experiencing a large number of products on the market, it is found that in this problem, especially for some audiences who have a certain level of attainment in music, it is still relatively easy to find flaws. So my understanding of this problem may be a matter of priority. Currently, everyone may prioritize solving some problems that are obviously fake at first glance, or that do not sound so much like AI at first listen. As for the problems that are fake at second or third glance, in the end, with the continuous development of technology and the advancement of time, I am still relatively confident that it can be solved.
Shi Yi: In fact, it is not only music. AI will play a role in various fields. For example, in film and television dramas, like our Teacher Yang Sheng, I have seen a video you posted online before, an AI promotional video for the movie "Venom", which is very special because it is in an ink-wash style. Before watching it, I did not expect it to be in this style. So I am very curious why your team used such an ink-wash style when making such a promotional video, and what role did AI play in this process?
Yang Sheng: The reason for using the ink-wash style is first of all to bypass some restrictions in Hollywood. Hollywood has set a very large framework for all our AI creations. The content of the framework is that all AI products you submit must obtain the complete authorization of the artist himself. Our opponent is SORA. Ink-wash is not registered in Hollywood now. For example, the ink-wash style of Qi Baishi, the artist. So we have adopted a relatively clever way. If there can be a gap between the two sides, I think many of the intentions in the East have become our common intellectual property rights. We can create many unique styles. And I think this style may be a point where we may win visually. That is, when we and SORA are running forward at the same time, if we insist on the ink-wash painting, printmaking, paper-cut art, etc. in the East, one is that we bypass the big restrictions of PGA, and the second is that the artist's authorization becomes very simple. Because what we really do not lack is folk artists. So I just said to Teacher Jia under the stage that using ink-wash will become more and more Eastern. On the contrary, if we follow the same realistic or a path as the West, we may not be able to go on, but we can find a smarter way to solve the problem.
Shi Yi: What exactly did AI do in this process?
Yang Sheng: Let's first take a look at the most difficult problem that AI is currently facing, which is stability. Now for an AI subject, for example, if we draw Teacher Jia in, it may change. We limit the beginning and end frames of each shot, just like drawing a comic strip. We draw the person before and after throwing a punch, and we tell AI the beginning and the end. Let it guess what the process is. Our film ratio reaches 100:1, which is very exaggerated. It is not that their generation is not good, but that we can choose the most suitable one from it.
Shi Yi: It is equivalent to that it is responsible for completing the intermediate association work, and we come to decide which one is the most suitable.
Yang Sheng: Yes, in the traditional animation workflow, this is called the mid-term team, which is the most expensive and the hardest to find now.
Shi Yi: In addition to the visual work, in the film and television drama industry, what was the previous process for the use of soundtracks before AI music came out?
Yang Sheng: The most real process is that most of the film soundtracks in the past were like this. The director and the editor. The editor said there are several reference music. What do you think? The director said let's copy this. But some big soundtrack teachers, I have a few new songs recently, let's listen to them. The director took the hard drive to find the editor and said can you adjust the tone according to this song. The editor felt it was similar. The picture could not be moved. Adjust the tone according to the picture. So many times our film and television soundtracks are very utilitarian, that is, taking business as the first orientation. For the soundtrack of the film and television industry, it is to say that you follow "Pacific Rim" or "Let the Bullets Fly" to do it.
Shi Yi: Will you later use AI tools to produce music?
Yang Sheng: Now it involves the general authorization of the film. I think Teacher Jia will also discuss this in the future. Including after our current film and television works enter Hollywood and the global market for sales, now in the Hollywood, including the European market and the Cannes Film Market, we are worried that a new regulation will emerge, that is, the proportion of AI in your film. How much of it uses artists and how much is AI. This problem needs to be solved. So we are also discussing how to cooperate with outstanding artists to complete AI creation, and this method can be carried out effectively under the condition of legal compliance. We divide artists facing AI into three schools. Teacher Jia and their team communicate with artists very frequently. Including before you hosted the discussion between the three of us today, which school do you belong to?
Shi Yi: Do I belong to the Advent School? I'm not sure.
Yang Sheng: Different schools will lead us to completely different paths. I was a thorough Resistant School and turned.