AI is stealing white-collar jobs. OpenAI is investing $1 billion to teach AI to work. Your perfect successor is about to take over.
Tech giants like Anthropic and OpenAI are planning to invest $1 billion annually to teach AI to work like humans. They not only provide AI with a Reinforcement Learning environment (RL environment, abbreviated as gym), but also let AI "learn from" experts in various fields. An executive at OpenAI predicted that in the future, "the entire economy" will, to some extent, turn into an "RL machine."
Is AI taking over white-collar jobs?
In May this year, Dario Amodei, the CEO of Anthropic, said that AI might eliminate half of the entry-level white-collar jobs in the next 1 - 5 years and raise the unemployment rate in the United States to 10 - 20%.
This unprecedented job replacement has sparked widespread concern.
Some netizens believe that white-collar jobs may disappear on a large scale in the next 3 years.
Other netizens think that it's not just the lower-level or entry-level positions that are affected by AI, but also management positions.
Anthropic, OpenAI, and top global AI labs are accelerating the arrival of this "AI storm."
They bring large models into the office and let them learn various applications, such as:
Salesforce's customer relationship management software; Zendesk's customer support software; Cerner's health record application, etc.
The goal is to teach AI to handle some tedious and complex tasks in white-collar jobs.
They have prepared billions of dollars in investment for this AI training.
Invest $1 billion annually to teach AI to work like humans
It is reported that tech giants like Anthropic and OpenAI have set aside a "special fund" of $1 billion annually for AI to teach it to work like humans:
The leadership of Anthropic plans to invest $1 billion in cloned enterprise applications, known as Reinforcement Learning environments (RL environments), in the next year.
OpenAI also plans to invest $1 billion in data-related areas this year, including payments to human experts and RL environment costs, and predicts that this figure will increase to $8 billion by 2030.
These AI "education costs" are still rising.
If these methods succeed, they are expected to help OpenAI and Anthropic break through some bottlenecks they encountered recently when using traditional training techniques.
Moreover, they can also open up new monetization paths for them, such as selling workplace software, AI agents that take over human computers and operate applications on their behalf, and developing new versions of popular enterprise applications using AI.
Dario Amodei, the CEO of Anthropic, once called such products "virtual collaboration partners," saying that they can work side by side with humans and use the same applications as humans.
However, it is still very difficult to achieve this.
Anshul Bhagi, the head of Turing's cutting-edge data project, pointed out many complex details.
For example, to teach AI to handle customer relationship management, it is not only necessary to teach it to search for potential customers on Salesforce, find the most promising leads/customers and send follow-up emails to schedule initial meetings, but also to teach it how to use applications such as LinkedIn, Calendly, and Gmail.
To verify the completion of tasks, Turing also breaks down the overall task into smaller steps and creates a set of evaluation criteria (rubric) to check whether the AI model has correctly executed each step.
Taking the Salesforce application as an example, the checkpoints that this set of evaluation criteria may include are:
Whether the model has filtered the Salesforce database by the last contact date;
Whether an email with a Calendly link has been sent;
Whether the lead status of potential customers has been updated to "re-engaged," etc.
Although the above work is still in its early stages, various AI labs seem ready to invest a large amount of money in it.
According to professional analysis, currently, in Anthropic's budget for the post-training stage (after the initial training, for later fine-tuning and optimization to improve the model), the proportion used for RL environments is less than 10%.
Some investors said that according to the current trend, as early as next year, the investment in RL environments in the post-training budget may increase to a higher proportion.
One of the factors is the increasing cost of hiring human experts.
Labelbox, which provides expert services to companies like OpenAI, said in July that about 20% of its expert contract hours are paid more than $90 per hour, and nearly 10% are paid more than $120 per hour.
Labelbox expects that in the next year and a half, the remuneration of these two types of experts will rise to $150 - $250 per hour.
"RL environments," building the "real world" for AI to learn
According to Jonathan Siddharth, the CEO of Turing, they have built more than 1,000 RL environments, including replicas of Airbnb, Zendesk, and Microsoft Excel.
Turing plans to sell these RL environments to customers and provide 100 - 500 example tasks for AI models to try in the simulated applications, as well as methods for verifying whether the models have correctly completed the tasks.
In recent months, in the field of RL environment services, Turing's competitors, such as Scale, Surge, Mercor, and Invisible Technologies, have joined in, and some startups specialize in selling RL environments to large AI developers.
Edwin Chen, the founder and CEO of Surge, believes that the methods used by OpenAI and Anthropic to improve models "mirror the way humans learn."
For models, RL environments are like letting the models "be in the real world."
In addition to RL environments, during the reinforcement learning process, AI developers also teach models new skills or knowledge in a field by letting the models learn carefully curated examples of difficult problem solutions, such as competitive programming problems or doctoral-level biology problems.
AI training is "learning from" experts in various fields
As the capabilities of AI models improve, the people hired by data annotation companies have shifted from master's and doctoral students to working professionals with many years of experience in niche fields.
Take a look at a recent list of experts hired by Turing:
A data scientist from NASA
A chemist working on a project for the Department of Energy
A radiology resident
A vice president working in private equity
Their responsibility is to use specific applications to complete real-world tasks for AI to observe and learn.
Bhagi gave an example. An AI company may want to teach the model how a change in tax rate assumptions in an Excel file will affect the rest of the discounted cash flow (DCF) analysis.
To teach AI this, first, Turing will ask its contractors to solve this DCF problem and get a single answer for checking accuracy, such as a stock price.
Subsequently, developers can let the model attempt the same DCF task dozens of times, filter out the instances where it gets the same stock price as the human expert, and use them as correct examples to train the model.
In this way, model developers can quickly obtain more correct examples of this task to train AI.
Now, top AI companies like OpenAI are actively collecting similar examples from all walks of life, including medicine and law.
As AI learns professional knowledge in various fields, a senior executive at OpenAI once privately said that they expect the "entire economy" to become an "RL machine" to some extent in the future.
AI may be trained based on these records, which show how professionals in various fields handle their daily work on their devices.
Once AI learns professional knowledge in various fields and how to use workplace applications, the next step may be to gradually take over human jobs in all walks of life.
Are you ready?
Reference:
https://www.theinformation.com/articles/anthropic-openai-developing-ai-co-workers?rc=epv9gi
This article is from the WeChat official account "New Intelligence Yuan," author: New Intelligence Yuan, published by 36Kr with authorization.