Nachdem wir Manus einen Tag lang als Praktikanten bei 36Kr arbeiten lassen haben, möchten wir ihm etwas mehr Bonuszahlung geben.
Text | Deng Yongyi
Editor | Su Jianxun
(In view of the hype and controversy caused by Manus, 36Kr would like to specifically state that this article is by no means a promotion. In fact, it took us quite a lot of effort to even get an invitation code...)
There's no need to elaborate on the sensation Manus has caused: everyone has seen in various video clips that Manus diligently searches for information, creates PPTs, and develops web-based mini-games. The replay-style sharing design makes it easy to see at a glance the direct efficiency improvement brought by the Agent, which has also enabled Manus to quickly break into the mainstream.
After finally getting an invitation code, the editorial department of 36Kr had a discussion. In order to better understand the features and functions of Manus, we decided to hire Manus as an intern and assign tasks to it according to the normal workflow to see if Manus could handle them.
OK, after entering the invitation code, Manus, the new intern at 36Kr, is on the job!
Source: Manus
First of all, if you want to hire this "intern", the first reality you may need to accept is that this is a "colleague" who is prone to crashing.
Currently, Manus' service is very unstable. During the actual test on the weekend, 36Kr's first impression was: it was extremely frustrating... Tasks often got stuck because Manus runs on a virtual machine in the cloud and often needs to be manually reset to continue.
This actual test was carried out during the frequent crashes of Manus.
The test interface always shows "Connection lost" or "Encountered a serious problem", and you need to keep resetting/starting a new session...
The occasional hallucinations (it's not certain whether they are hallucinations or official notifications) are also very real. One moment, Manus says it needs two hours for an upgrade and maintenance, but as soon as you nudge it, it immediately starts working again...
The unpredictable Manus
Manus claims to be "the first general-purpose Agents (intelligent agents)", which means it doesn't follow the path of a vertical expert. Its advantage lies in handling more general tasks. Manus' official website lists several categories:
Manus official website. Source: Manus
Agents (intelligent agents) are different from large models. If a large model only has one dialogue window for information input and output, then Agents are like giving the large model the ability to take action, allowing it to flexibly call various tools to complete tasks.
36Kr decided to start with the daily usage scenarios of our editorial department and arrange tasks from easy to difficult for Manus to handle.
Please note that the following scenarios are all one-time output results. Except for resetting the computer due to crashes during the tasks, 36Kr did not conduct any repeated tests.
Proofreading and Organizing
First, we asked Manus to complete relatively basic proofreading and organizing work.
36Kr gave Manus a previous interview audio transcript (about 28,000 words) to organize. The core requirement was to "organize the audio transcript word for word without compression", remove the corresponding verbal tics, and proofread the parts with unclear semantics.
In previous operations, we had to interact with the model back and forth at least a dozen times: manually proofread the errors in the audio transcript - then feed the segments into the model - after the output, we still needed to feed it back to the model for proofreading to check for factual errors.
But Manus obviously compressed the previous multiple steps into one step. The feeling of waiting for acceptance after assigning a task is more than ten times better than the experience of interacting with a ChatBot.
Source: Manus
However, Manus' flaws are also obvious: the context is too short, and hallucinations still exist. Many complex tasks were aborted because too many Tokens were consumed before they were completed.
In the task of proofreading and polishing, the length of the final output document was greatly compressed. It basically only output the last part of the interview, a total of more than 3,800 words, and the previous parts were basically lost. But from the organized part that was output, the tone and information integrity were still quite good.
Manus is performing a long-text task
This is probably because the reasoning and collaboration mechanism is not well-developed. The model can only provide one-time output results, leading to compression. It's also possible that the Memory mechanism is not well-implemented - Memory can be regarded as a "warehouse" where the model temporarily stores information. For example, a chatbot will remember what you said before.
Some earlier research work pointed out that memory fades as time or the number of task steps increases. And the Tokens consumed by an Agent are at least two orders of magnitude higher than those of a single ChatBot. An Agent practitioner estimated to 36Kr that the Token consumption of a complex task of Manus could reach the level of one million Tokens. There is still much room for improvement in technical difficulties such as hierarchical management and compression of Memory.
News Follow-up and Writing
For ordinary ChatBots, the output length is always a big problem. In 36Kr's previous test experience, for a 128K model, generally, the single output length is around 1,000 - 2,000 words to ensure information integrity and avoid excessive compression.
36Kr first asked Manus to complete the most basic news follow-up work. This includes several abilities: daily news monitoring - to see if it can screen reliable information sources, then conduct an importance analysis and judgment, and find relevant materials for supplementation and follow-up.
Source: Manus
Manus started to learn from examples, search for relevant news, etc. But when accessing Reuters, it was blocked by a captcha and asked humans to take over. After 36Kr took over, it was found that Manus had been identified as a machine and blocked.
Source: Manus
Manus spent about 9 minutes to complete this task and output 5 AI news items that are most worthy of attention. The news sources are all reliable and authoritative. Finally, Manus chose to write a news item about itself... Haha.
Manus writes a news item about itself
Manus' news text output can be rated at about 70 points. The text is fluent and covers all the main information points. But different from the reference template, the current text is softer and has a stronger AI flavor.
But after we put forward some suggestions for modification, the second version was much better.
Basically, it can be published directly after minor adjustments
Taking it up a notch, we also input a prompt into Manus and asked it to generate a long article using 36Kr's in - depth reporting column "Shen Ke" as an example:
This week, the robot company Zhiyuan, founded by "Zhi Hui Jun" (real name Peng Zhihui), announced that it will launch new products. Please search for the historical process of Peng Zhihui and Zhiyuan Robot, and write an article in 36Kr's style. The theme is to look back on the history of Zhiyuan Robot and reflect on the growth of this company and its significance in the technology industry. The length should be about 5,000 words. You can refer to the style of the in - depth reporting column "Shen Ke".
Please note that the sentences should be easy to understand for ordinary people and avoid piling up professional terms.
Manus automatically collected materials. During the writing stage, it wrote in segments and then combined them, successfully completing the long - article writing. The output result is as follows:
Writing an in - depth long article about Zhiyuan Robot
In the output article, Manus' performance in in - depth writing was average. It was more like a data - type compilation. But the word - choice and sentence - construction were qualified, although the style was still more like a soft - article. Manus still needs to improve its taste in high - quality content.
Data Analysis and Visualization
Research - type tasks are also Manus' strength.
In nature, Manus adopts a multi - agent architecture. Simply put, it can break down complex tasks into subtasks (such as data cleaning, feature engineering, model training) and process them in parallel through different agents, significantly improving the efficiency of data analysis.
However, if the consistency is not well - managed, the local decisions of multi - agents may lead to relatively serious deviations in the overall results.
36Kr asked both Manus and Deep Research under OpenAI to create a "table showing the API price trends of large - model APIs over the past two years".
Deep Research under OpenAI is a single - agent, end - to - end training model - only one centralized agent is responsible for all tasks, and decision - making and execution are centralized. But the advantage is that the module integration is high, it is easy to manage, and the output quality is relatively guaranteed.
Source: Manus
Manus took a relatively long time, about three hours, to generate an interactive web page. The interactivity and table style were quite good. However, there was still a gap in data accuracy compared with Deep Research, which specializes in research, but it wasn't a big problem.
Source: Deep Research
Deep Research can't output charts for now, but in terms of the quality of the output content, Manus can't catch up with it at present.
Creative Tasks: Can Do, but the Aesthetic Sense is Questionable
We also gave Manus some more challenging tasks.