StartseiteArtikel

Sensationelle Meldung in "Nature": KI tritt in die "Optik"-Ära ein und malt erstmals ein farbiges Van Gogh-Bild

学术头条2025-08-28 14:44
Optische Generierung, voller neuer Chancen.

Fast and energy - efficient scalable inference of generative AI models is one of the most urgent challenges the AI industry currently faces.

Today, the AI industry has taken another step forward by introducing "light" into the field of AIGC. Based entirely on the physical laws of system hardware, the generation of new (never - seen - before) images with specific features was realized for the first time.

A research team from the University of California, Los Angeles, has successfully achieved the optical generation of monochromatic and multicolor images of handwritten digits, fashion items, butterflies, faces, and artworks (e.g., in the style of Vincent van Gogh). The overall performance is comparable to that of generative models based on digital neural networks.

The corresponding research article titled "Optical generative models" was published in the renowned scientific magazine Nature.

Link to the article: https://www.nature.com/articles/s41586-025-09446-5

The research team stated that this optical generative model could open new avenues for energy - efficient and scalable inference tasks and further explore the potential of optics and photonics in the field of AIGC. At the same time, this fusion of optical systems and machine - learning methods could find applications in areas such as Augmented Reality (AR) and Virtual Reality (VR).

In a concurrently published news and opinion article, Daniel Brunner, a researcher at the FEMTO - ST Institute of the French National Center for Scientific Research, noted that this result is of great technical and scientific significance and is "an important step towards the use of unconventional physical systems for the construction of generative computing models."

Link to the article: https://www.nature.com/articles/d41586-025-02523-9

According to Brunner, due to the natural advantages of information processing by photons, such as the simultaneous processing of data in an entire three - dimensional volume, the optical generative model "also has the potential to generate three - dimensional images."

Brunner stated that it may be necessary to "construct a model with an optical encoder and an optical decoder" and develop a multi - layer decoding system to utilize more complex optical phenomena so that future optical generative models can become more powerful and flexible.

However, as Brunner said, "the future value of this research depends on whether it can actually be implemented," but there is still a long way to go.

"Ideally, one should use scalable integrated circuit technologies while avoiding the time - and energy - consuming data pre - processing required for the encoding of current digital hardware. Even after decades of research in the fields of electronics, optical physical computing, and their fusion, this remains a very challenging task."

Optical generative model: Let the light "paint"

In recent years, generative digital models have developed to the extent that they can design diverse high - quality images, human - like natural language, new musical works, and even new proteins. These new generative AI technologies play an important role in applications such as Large Language Models (LLM), Embodied Intelligence, and AIGC.

However, with the successful application of generative models, their size has also increased rapidly, leading to an increasing consumption of power and storage resources and a significant prolongation of the inference time. Their scalability and CO2 footprint are increasingly being critically examined.

Although there are already various methods to reduce the model size and energy consumption and increase the inference speed, there is still an urgent need for new ways to develop energy - efficient and scalable generative AI models.

Against this background, the research team proposed an optical generative model inspired by diffusion models. The encoder is traditionally implemented digitally, while the decoder consists of optical components.

In this architecture, a shallow, fast digital encoder first maps random noise to phase patterns. These patterns serve as optical generation seeds for the target data distribution. Then, a jointly trained, reconfigurable decoder based on free - space propagation will process these seeds entirely optically to generate new images that match the expected data distribution.

It is important to note that this optical generative model consumes almost no computing resources during image generation, except in the phase of the shallow encoder to generate illumination power and random seeds.

Figure: Schematic representation of the optical generative model.

The research team proposed two optical image generation paths: Snapshot and iterative optical generation.

In the snapshot optical generative model, the optical generation of an image or an output snapshot can occur at any time by accessing one of the previously calculated optical generation seeds. The desired image synthesis is based entirely on the propagation of light in free space and is performed by an optimized, fixed state of the diffraction decoder.

Figure: Snapshot optical generative model

In the iterative optical generative model, at each time step, the previously generated image with noise is input into the optical system. After wave propagation, the multicolor information is recorded and used for the next optical iteration, with a preset noise added. In the last time step, the output intensity is recorded by an image sensor array to generate the final image. After the model is trained, the iterative optical generative model reconstructs the target data distribution step - by - step from a Gaussian noise distribution during the blind - inference phase.

Figure: Iterative optical generative model

In addition, the research team showed how, by directly implementing the intensity - to - phase conversion on a spatial light modulator (SLM) and combining it with the photoelectric conversion at the level of the image sensor, complex domain mappings can be realized with the iterative optical generative model - although the performance and image diversity are lower compared to an iterative optical generative model with a digital encoder.

The light actually "painted" numbers and Van Gogh

To demonstrate the snapshot and the multicolor optical generative model, the researchers built a hardware system based on free space in the visible light spectrum. A laser with a wavelength of 520 nm is collimated and used to uniformly illuminate the SLM. The SLM displays the phase patterns processed by a shallow digital encoder and previously calculated, i.e., the optical generation seeds.

These encoded phase patterns modulate the light field after passing through a beam splitter and are then processed by another SLM, which acts as a fixed or static decoder. For each optical generative model, the optimized state of the decoder surface is determined, and the same optical system can generate images corresponding to different target distributions by switching the state. At the output of the snapshot optical generative model, the light intensity information of the generated image is captured by an image sensor.

Depending on the trained dataset, this optical generative model can output images of people, buildings, or plants in the style of Vincent van Gogh and also generate handwritten digits from 0 to 9 or images of fashion accessories. The number and accessory images are black - and - white images, while the images in the style of Van Gogh are colored. The direct generation of new images with specific features by a machine - learning model based entirely on the physical laws of hardware was not possible before.

Figure: Numerical and experimental results of the multicolor optical generative model for creating elaborate artworks in the style of Van Gogh. Compared to a digital teacher - diffusion model with 1000 iterations.

When random seeds are input into the model, although the generated images are different, they still belong to the same category as the training data. For example, a model trained with portraits in the style of Van Gogh outputs a series of portrait images in the style of Van Gogh, and different random seeds can generate people with or without hats.

The researchers compared their experimental results with the simulation results and fully digital generative models. They found that when the same random seed is input, the quality of the images generated by these models is essentially consistent with that of the optical generative model.

Optical generation: Many new opportunities

The research team demonstrated snapshot optical image generation from noise patterns using a diffraction network architecture. Their framework can optically generate diverse images from noise and shows a highly desirable "creative" snapshot image - generation ability that goes beyond the scope of previous research.

In addition, optical generation for different data distributions can be realized by simply reconfiguring the diffraction decoder into a new optimized state without changing the architecture or the physical hardware. This flexibility of the optical generative model is of great significance for areas such as Edge Computing, Augmented Reality, Virtual Reality displays, and various entertainment applications.

The research results also show that under the guidance of a teacher - diffusion probability model (DDPM), the knowledge about the target distribution can be distilled. By simulating the diffusion process, the iterative optical generative model can learn the target distribution in a self - supervised manner, avoid the mode - collapse problem, and generate more diverse results than the original dataset. The iterative optical generative model also has the potential to remove the digital encoder and generate different outputs depending on different data distributions.

Naturally, the optical generative model still faces some general challenges. One of them is the possible misalignment and physical defects in the optical hardware or the system configuration. Another challenge is that the phase bit - depth that can be realized on light modulators or their surface is limited. These components are used to physically represent the generated optical generation seeds and the decoding layer.

To address these challenges, the relevant limitations can be directly incorporated into the training process so that the numerical optimization system better matches the physical limitations and the performance of the local hardware system. This strategy leads to a significant improvement in performance compared to training methods that ignore the bit - depth limitation.

An important finding of this analysis is that a relatively simple decoder with only three discrete phase levels is already sufficient to generate images. This opens up the possibility of replacing the decoder with a passive, thin surface.

Based on this method, one can also design spatially or spectrally multiplexed optical generative models. The optical generative model can also enable the volumetric generation of three - dimensional images and thus offer new opportunities for applications such as Augmented Reality, Virtual Reality, and entertainment.

This article is from the WeChat account "Academic Headlines" (ID: SciTouTiao) and was...