StartseiteArtikel

Von der technologischen Umsetzung bis zur philosophischen Spekulation: Die Schlüsselthemen der Entwicklung von AI Agenten

硅谷1012025-06-20 13:29
Von der technologischen Umsetzung bis zur philosophischen Spekulation: Die Schlüsselthemen der Entwicklung von AI-Agenten

Perhaps you have already experienced the advantages of an AI assistant at work or read in the news about how agents automatically handle complex tasks. Agents are among the hottest buzzwords in the Silicon Valley AI field, and a multitude of products are emerging like mushrooms after the rain. They are penetrating our work and lives at an unprecedented speed. IDC predicts that in 2025, there will be a wave of large - scale implementation of AI agents. Their potential to reshape standardized work processes through intelligent task processing is highly anticipated.

However, there are a series of key questions that urgently need to be answered: What can current AI agents actually do for us, and what are their limitations? How can start - up companies stand out when Silicon Valley giants are making their plans? More importantly, how will the human - machine relationship be redefined when AI changes from a tool to a "team member", and where will the unique value of humans be demonstrated?

In this issue, "Silicon Valley 101" has invited seven guests from various fields such as AI research, business analysis, and psychology. Our guest researcher Sophie, who is also the founder of "Entrepreneurs of Life", will lead us in an in - depth discussion about AI agents from four perspectives: user experience, technical challenges, business logic, and social impacts.

Here are the highlights of this discussion:

01 User Perspective: The Difference Between Ideal and Reality

Chapter 1.1 What is an Agent? From Tool to Partner

Sophie: Let's start with the most basic question: What exactly is an AI agent? We've found that different people actually have different opinions. First, we'd like to hear the view of a typical geek.

Yage is an AI application scientist at the large logistics software company Samsara and an active AI enthusiast in the open - source community. He modified the open - source project of Cursor on GitHub so that it received over five thousand stars. In Yage's view, AI agents must meet three necessary conditions.

Yage: In my view, agents must meet three necessary conditions: First, they must have the ability to use tools, such as calling search engines or programming languages. Second, they must have the ability to make independent decisions. After receiving a task, they can independently divide it and call the tool parameters in the correct order to achieve the final goal. Third, the decision - making must be a multi - step, self - iterating dynamic process, that is, they can dynamically determine the next action based on the result of the previous step, rather than following a static, fixed work process. For example, they can decide whether to end the search or change the search terms based on the search results.

In summary, I think an agent can only be called an agent when it meets the three conditions of "tool call", "independent decision - making", and "multiple iterations".

Source: pexels

Sophie: Yage's technical - perspective definition is very clear. At the same time, Xinqi, a guest without an AI technology background, who works as a data strategy manager in a global corporation and is a podcast host in her free time, focuses on the cooperation relationship between humans and AI when defining agents.

Xinqi: From a cooperation perspective, the relationship between humans and agents is a true client - contractor relationship, not a relationship like that with a day laborer. When we work with a day laborer, we have to define the problem, divide the key steps, and review the delivered results. However, a true client - contractor relationship means that the contractor, as a system that takes over the entire process from start to finish, actively intervenes at critical points, gives decision - making suggestions, and automatically executes after receiving the top - level instructions to deliver a finished product rather than a semi - finished product at the end.

Sophie: Xinqi mentioned during the interview that the agent products she currently uses, although having many strengths, still have a certain gap from her ideal image of an agent.

Now we'd like to hear in what situations she was surprised and impressed when using agent products, and which functions or experiences impressed her the most.

Yage told us that he relies on three types of agents from coding at work to taking care of his child after work. Let's listen.

Chapter 1.2 The Diverse Surprises of Agents

Yage: The AI agents I often use can be mainly divided into three categories: coach - type, secretary - type, and partner - type.

Coach - type: For example, OpenAI's Deep Research and ChatGPT's O3 are mainly used to obtain research information and support in - depth thinking. I use them as windows to learn about areas I'm less familiar with.

Secretary - type: Just like the currently popular tools Manus and Devin (both subscription tools), they are suitable for handling relatively simple, non - immersive work tasks. If I want to put my child to sleep, I ask Manus to adapt the story of "Snow White" and insert "good nutrition" and other educational guidelines. Then I call the TTS tool to generate and play an audio file. Secretary - type tools are good at handling such tasks, so that I can simply put my child to sleep with a customized voice.

Partner - type: I prefer Cursor, Windsurf, and other tools for real software development because they support and encourage frequent interaction and allow me to lead the entire process. First, we discuss the design concept, then I let it assemble the individual components, and finally, as an architect, I combine these components and check the result to ensure that the development goal is achieved. This is more in line with a professional, high - quality work process.

Source: pixabay

Sophie: CreateWise is an AI software that I participated in its beta - test. If you simply upload an audio track, the software can directly output a fully edited audio. It can even give decision - making suggestions on which parts should be edited, imitate the voice using AI, and adjust the structure to make the sentences clearer. The changed sentences are highlighted so that users can easily compare the changes. After my test, I suggested to the development team that I'd like to have the option to edit sentence by sentence because I like some edits and dislike others. They have now increased the priority of this function and launched it.

In addition, CreateWise can, based on user selections during editing, directly refer to the "Text Creation and Advertising" module and generate corresponding texts for different platforms. For example, it can create show notes, quotes, and title suggestions for audio platforms. For platforms like YouTube or Instagram, it can also generate content that can be directly published based on details such as the video aspect ratio.

Source: CreateWise

Sophie: This product, which focuses on podcast production, impressed Xinqi with its in - depth understanding of the work process and specific optimization of each phase. On the other hand, some general agent products also received positive reviews from Kolento, a third - semester applied psychology student at New York University, when it comes to performing general tasks.

Kolento: I'd like to describe some different scenarios.

Generally, I've been using Manus repeatedly recently. In the newly released Genspark Janus Park, I was particularly impressed by the Super Agent mode because it can help me complete tasks I wouldn't otherwise like to do. One of the differences between the two lies in the user experience: First, the visual effect of Manus's UI/UX impressed me the most. Genspark, however, has a function that allows combining a large number of images, contents, and links. I've mainly used it for travel planning so far, but its visual effect is not as strong as that of Manus, which has reduced my motivation to use it further to some extent.

Essentially, both Genspark and Manus support the function of sharing links for agent execution and repeating conversations. Users can follow the entire conversation process and even continue the conversation based on this context. Both can also call many tools. However, I only have limited knowledge of the rules behind them. I've only heard that Manus may not use MCP but CodeAct, but I don't know exactly which algorithms or tools Genspark uses internally. Both can plan and divide tasks well and call many different tools, but the tools they use may differ slightly. Since I think Genspark has worked quite well in travel planning so far, I suspect it has some preset travel planning tools.

There is also an interesting difference: Genspark has some functions that Manus may not have yet. For example, the "call for me" function can make calls for you and book hotels, which surprised me positively.

Source: Manus

In the field of programming, I like Replit Rapid the best. I've used Cursor and Windsurf before, but Replit Rapid is more like an agent and can take on more roles.

In the academic field, I've used Elicit recently, but it doesn't meet my definition of an agent.

Chapter 1.3 The Complaints of Users

Sophie: Of course, AI not only has positive aspects but also negative ones. Before we discuss the specific problems, I'd like to share an interesting insight from Yage. With the release of more and more agent products, his view on the problems has changed rapidly. Many things he complained about before, such as the inability to call tools in complex tasks, the overly strong AI flavor in texts, or the too - short context window, have been greatly improved in the new versions of recently released products. The problems and weaknesses that users have today will also be the next focuses for agent developers. After hearing the problems, we'll also hear the thoughts and answers of some developers. Let's start with Yage's problems.

Yage: The ability of current AI models to follow instructions has improved significantly compared to the past, but it is still insufficient. Take GPT 4.1 for example. If I ask it to write the first three chapters of a five - chapter outline and then continue with the last two chapters, and clearly state that there should be no "To be continued" at the end of the first three chapters, the model still ends with statements like "To be continued" or "We'll continue next time. Do you have anything else you'd like to write?" I've tried different prompt - engineering methods, but the problem couldn't be solved. Finally, I solved it with a reverse - thinking approach: I removed the string "To be continued" automatically added by the model through a program to perfectly solve the problem. However, if AI models were able to perfectly follow instructions, such problems wouldn't occur.

Source: pexels

Another problem is that many AI products still have the problem of using AI "just for the sake of AI". For example, Claude's Computer Use or OpenAI's Operator always show an example of how they can help you book a flight ticket by entering your credit card number and other information and then clicking the booking button. However, the time users actually spend on flight - ticket booking doesn't lie in entering the information but in the decision - making process of...