HomeArticle

Nature's Blockbuster: AI Enters the "Optics" Era, Painting a Colorful Van Gogh for the First Time

学术头条2025-08-28 14:44
Optical generation is full of new opportunities.

Quickly and energy - efficiently achieving scalable inference of generative AI is one of the most urgent challenges currently facing the AI industry.

Today, the AI industry has taken a further step forward by introducing "light" into the AIGC field. Based entirely on the physical laws of system hardware, it has for the first time achieved the generation of new (unseen) images with specific features.

A research team from the University of California, Los Angeles has successfully achieved the optical generation of monochromatic and polychromatic images of handwritten digits, fashion products, butterflies, human faces, and artworks (such as in the style of Van Gogh). Moreover, the overall performance is comparable to that of generative models based on digital neural networks.

The relevant research paper titled "Optical generative models" has been published in the authoritative scientific journal Nature.

Paper link: https://www.nature.com/articles/s41586-025-09446-5

The research team said that this optical generative model is expected to open up new paths for energy - efficient and scalable inference tasks, and further explore the potential of optics and photonics in the AIGC field. At the same time, the integration of this optical system and machine - learning methods is expected to be applied in augmented reality (AR) and virtual reality (VR) fields.

In a news and views article published in the same period, Daniel Brunner, a researcher at the FEMTO - ST Institute of the French National Center for Scientific Research, believes that this achievement has important technological and scientific significance and is an important step towards building generative computing models using unconventional physical systems.

Article link: https://www.nature.com/articles/d41586-025-02523-9

In Brunner's view, since photons have natural advantages in processing information, such as being able to process data in an entire three - dimensional volume simultaneously, "the optical generative model is also expected to have the potential to generate three - dimensional images."

Brunner said that to make future optical generative models more powerful and flexible, it may be necessary to build "models that use both optical encoders and optical decoders" and try to build multi - layer decoding systems and use more complex optical phenomena.

However, as Brunner said, the future value of this research depends on whether it can be fully implemented, but there is still a long way to go.

"Ideally, it is necessary to rely on scalable integrated circuit technology while avoiding the time - consuming and energy - consuming data pre - processing steps required by current digital hardware encoding. Even after decades of research in the fields of electronics, optical physical computing, and their integration, this will still be a very challenging task."

Optical generative model: Let light "draw pictures"

In recent years, generative digital models have developed to the point where they can synthesize diverse high - quality images, human - like natural language, new music works, and even design new proteins. These emerging generative AI technologies play an important role in applications including large language models (LLMs), embodied intelligence, and AIGC.

However, with the successful application of generative models, their scale has also expanded rapidly, resulting in an increasing consumption of power and memory resources, and a significant increase in inference time. Their scalability and carbon footprint are becoming increasingly concerning issues.

Although there are already various methods to reduce model scale and energy consumption and improve inference speed, there is still an urgent need for new paths to build energy - efficient and scalable generative AI models.

In this context, the research team proposed an optical generative model inspired by diffusion models. Its encoder is implemented digitally in the traditional way, while the decoder is composed of optical components.

In this architecture, a shallow and fast digital encoder first maps random noise to phase patterns, which serve as optical generation seeds for the target data distribution. Then, a jointly trained, reconfigurable decoder based on free - space propagation performs all - optical processing on these seeds to generate unseen images that follow the expected data distribution.

Notably, except for the shallow encoder stage used to generate illumination power and random seeds, this optical generative model consumes almost no computing resources during the image generation process.

Figure | Schematic diagram of the optical generative model.

The research team proposed two optical image generation paths: snapshot and iterative.

In the snapshot optical generative model, the snapshot optical generation of each image or output data can be achieved by randomly accessing one of these pre - calculated optical generation seeds when needed. The required image synthesis depends entirely on the propagation process of light in free space and is completed by an optimized, fixed - state diffraction decoder.

Figure | Snapshot optical generative model

In the iterative optical generative model, at each time step, the noisy image generated in the previous step is input into the optical system. After wave propagation, the polychromatic information is recorded and used for the next optical iteration, while some preset noise is added. At the last time step, the image sensor array records the output intensity to complete the final image generation. After the model is trained, in the blind inference stage, the iterative optical generative model will gradually reconstruct the target data distribution from the Gaussian noise distribution.

Figure | Iterative optical generative model

In addition, the research team also demonstrated how to directly implement the conversion from intensity to phase on a spatial light modulator (SLM) and combine it with photoelectric conversion on the image sensor plane. They were able to use the iterative optical generative model to achieve complex domain mapping - although its performance and image diversity are lower compared to the iterative optical generative model using a digital encoder.

Light really "drew" digits and Van Gogh

To demonstrate the snapshot and polychromatic optical generative models, the researchers built a hardware system based on free space and operating in the visible light band. A laser with a wavelength of 520 nm is collimated and used to uniformly illuminate the SLM. The SLM displays the phase patterns pre - calculated and processed by the shallow digital encoder, which are the optical generation seeds.

These encoded phase patterns modulate the light field after passing through a beam splitter and are then processed by another SLM, which is used as a fixed or static decoder. For each optical generative model, the surface state of the optimized decoder is fixed, and the same set of optical architectures can generate images that conform to different target distributions by switching states. At the output end of the snapshot optical generative model, the light intensity information of the generated image is captured by the image sensor.

According to the trained dataset, this optical generative model can output images of people, buildings, or plants in the style of Van Gogh, and can also generate handwritten digit images from 0 to 9 or images of fashion accessories. Among them, the digit and accessory images are black - and - white images, while the Van Gogh - style images are color images. Directly generating new images with specific features through a machine - learning model based entirely on hardware physical laws has not been achieved before.

Figure | Numerical and experimental results of the polychromatic optical generative model for creating gorgeous Van Gogh - style artworks. Compared with the teacher digital diffusion model using 1000 - step iteration.

When a random seed is input into the model, although the generated images are different, they still belong to the same category as the training data. For example, a model trained with Van Gogh - style portraits as training data will output a series of images of people in the style of Van Gogh, and different random seeds can generate images of people with or without hats.

The researchers compared their experimental results with experimental simulation results and fully digital generative models. They found that when the same random seed is input, the quality of the images generated by these models is basically the same as that of the optical generative model.

Optical generation is full of new opportunities

The research team demonstrated snapshot optical image generation from noise patterns through a diffraction network architecture. Their framework can optically generate diverse images from noise, showing a highly desirable "creative" snapshot image generation ability that goes beyond the scope of previous research.

In addition, without changing the architecture or physical hardware, optical generation adapted to different data distributions can be achieved by simply reconfiguring the diffraction decoder to a new optimized state. The flexibility of this optical generative model is of great significance for fields such as edge computing, augmented reality, and virtual reality displays, and is also suitable for various entertainment - related applications.

The research results also show that under the guidance of the teacher diffusion probability model (DDPM), knowledge of the target distribution can be distilled. By simulating the diffusion process, the iterative optical generative model can learn the target distribution in a self - supervised manner, avoid mode collapse, and generate more diverse results than the original dataset. The iterative optical generative model also has the potential to remove the digital encoder and generate diverse outputs according to different data distributions.

Of course, the optical generative model still faces some general challenges. One of them is the misalignment and physical defects that may occur in the optical hardware or system configuration; another challenge lies in the limited phase bit - depth that can be achieved by the light modulator device or its surface, and these devices are used to physically present the generated optical generation seeds and the decoding layer.

To address these challenges, relevant constraints can be directly introduced during the training process to make the numerical optimization system better conform to physical limitations and the performance conditions of local hardware. This strategy achieves a significant improvement in performance compared to the training method that ignores bit - depth limitations.

A key finding in this analysis is that a relatively simple decoder surface with only three discrete phase levels is sufficient to complete image generation. This brings the possibility of replacing the decoder with a passive, thin - layer surface.

Based on this method, spatially or spectrally multiplexed optical generative models can also be designed. The optical generative model can also achieve volumetric generation of three - dimensional images, bringing new opportunities for applications such as augmented reality, virtual reality, and entertainment.

This article is from the WeChat official account "Academic Headlines" (ID: SciTouTiao). Author: Xiaoyang. Republished by 36Kr with authorization.