HomeArticle

Shen Xiangyang has released a large model that can recognize everything.

咏仪2024-11-23 14:06
Talking about cards hurts feelings; without cards, there's no feelings.

Author | Yongyi Deng

Editor | Jianxun Su

"Talking about cards is hurtful, without cards there's no sentiment."

At the IDEA Conference in 2024, Xiangyang Shen, the founding president of IDEA and a foreign member of the National Academy of Engineering of the United States, made an exceptionally realistic and humorous statement.

On the other hand, this also shows his distinct optimistic attitude towards the future AI era.

The development of large models is no longer experiencing a rapid growth as it did after the release of ChatGPT. Entering the second year of humanity's exploration of AGI (Artificial General Intelligence), the iteration of large language models has slowed down. Correspondingly, AI applications and implementations have occupied the center of global topics.

But Shen Xiangyang believes that although GPT-5 has not yet appeared, the growth of computing power is still an optimistic trend - according to EPOCH AI data statistics, the demand for computing power by large models is increasing by more than four times every year.

At such a growth rate, the Moore's Law, which previously stated that computing power doubles every 18 months, no longer holds. Shen Xiangyang focused on explaining Huang Renxun's "Huang's Law", using model training to measure the growth of computing power. If the computing power continues to increase by four times a year, a 1 million-fold increase in computing power demand may be foreseen in ten years. However, this law still needs time to be verified.

"For large models to progress, not only do the parameters expand and the requirements for training become higher after the model scale becomes larger, but the amount of data also needs to increase. In a sense, the demand for computing power has a quadratic relationship with the parameters, which is an astonishing computing power demand," he said.

Xiangyang Shen Source: Taken by the author

"In the past few years, everyone has been talking about the 'Three Components of Artificial Intelligence', and in fact, they all revolve around these three things: computing power, algorithms, and data." At this conference, Shen Xiangyang threaded the needle and spent 3 hours introducing the new progress of IDEA around these three directions: "algorithms, computing power, and data".

The visual model is still the research focus of IDEA - IDEA officially released the latest general visual large model DINO-X, which can have a true object-level understanding ability.

This means that, unlike ordinary visual models that are limited by training data, DINO-X can achieve object detection in the open world (Open-world) - without user prompts, it can directly detect all objects, including rare long-tail objects (objects with a low frequency of occurrence but a wide variety).

This will also greatly expand the landing scenarios of the model.

For example, "embodied intelligence" is an AI industry topic that has been popular throughout 2024. For high-difficulty scenarios such as services for the visually impaired and service robots, it has previously relied heavily on a large amount of high-quality data annotation, consuming a huge amount of manpower. But with the help of DINO-X, the visual model can help annotation companies quickly complete a large number of high-quality image annotations or provide automated auxiliary results for annotators, thereby reducing the workload of manual annotation.

Source: IDEA

For scenarios where traditional visual models are widely used, DINO-X will also be a powerful supplement in fields such as autonomous driving, intelligent security, and industrial inspection, and the system can cope with various complex scenarios and identify objects that are difficult to detect by traditional models.

The IDEA team also launched an industry platform architecture. Through a large model base combined with general recognition technology, the model can learn while using without the need for retraining, supporting a wide variety of B-end application needs.

"Using one model to solve one million problems" is the key concept of this model release.

Different from the mainstream "full-image understanding" method, IDEA optimizes the hallucination problem of large models by adding a language module to the object-level understanding. Combined with the self-developed "Visual Prompt Optimization" method, it can achieve scenario-based customization under a small sample without changing the model structure or retraining the model.

Source: IDEA Research Institute

However, as the model size continues to increase, high-quality data has become a constraint. "The current development of artificial intelligence has exhausted all high-quality data in human society," Shen Xiangyang said.

Synthetic data emerges as a result. The IDEA team also released its self-developed context graph technology this time to solve the problem of the lack of diversity in past text data synthesis solutions. This technology is equivalent to introducing a "guidance manual" for synthetic data, with the graph as the outline to guide the context sampling for synthesis.

From the experimental results, the IDEA team's solution can continuously improve the ability of the large model, outperforming the current best practice (SOTA); in terms of token consumption, the average cost is saved by 85.7%. Currently, the internal test platform of this technology has been opened and services are provided through API.

In 2024, IDEA has significantly accelerated the implementation of AI applications. Compared to last year's scientific research paper reading platform, IDEA has announced more application explorations in vertical fields this year.

In terms of prediction, IDEA has developed several expert large models in the chemical field, and in the ability to predict molecular properties and chemical reaction predictions, it can reach the industry-leading level.

AI models can also be applied to scientific research data to accelerate the processing speed of scientific research data. The newly released multi-modal large model of chemical literature by IDEA, in collaboration with XtalPi, released the patent data mining platform PatSight. This model shortens the data mining time of patent compounds in the drug field from several weeks to 1 hour.

And one of the popular directions this year - AI programming, has become the next popular application direction precisely because the intelligence level of the model is continuously improving. The MoonBit team of the IDEA Research Institute demonstrated the programming module MoonBit of its development platform. This cloud-native AI programming tool has complete multi-backend support and cross-platform capabilities, can run directly on hardware, supports the RISC-V architecture, and will be officially opened in December.

The model has also moved from the software level to the hardware level, generating more utility in the real world.

Located in the Greater Bay Area, IDEA has unique hardware industry foundation and advantages. At this conference, IDEA also announced three consecutive collaborations: collaborating with Tencent to build the Futian Laboratory in Futian District, Shenzhen and the 河套深港科技创新合作区, focusing on the embodied intelligent technology of the living environment; collaborating with Meituan to explore the visual intelligent technology of unmanned aircraft; collaborating with BYD to expand the intelligent application of industrialized robots.

"Low-altitude economy" is another area emphasized by IDEA. IDEA not only released the "White Paper on the Development of Low-altitude Economy 3.0", but also initiated the establishment of the OpenSILAS Innovation Consortium, working with 17 first batch initiating units, expecting to create an open, shared, technologically advanced, and continuously iterative system and platform.

2024 is already the fourth IDEA Conference. In these four years, AI has not only crossed from the 1.0 era dominated by CV (Visual Recognition) to the 2.0 era dominated by generative AI, but also brought important propositions for the next era such as AI governance. Today's human society may need more thinking: How can we coexist better with AI?

"Can the development of AI be transformed from the greatest economic growth to the greatest well-being of humanity? This is a question that colleagues engaged in technology research and development at the IDEA Research Institute and colleagues in industrial implementation must think about on the road of artificial intelligence development," Shen Xiangyang said.

Welcome to Follow