HomeArticle

AI-driven de novo design of diverse small molecule-binding proteins: A Korean team discovers proteins that can selectively recognize stress hormones

超神经HyperAI2026-04-15 16:25
Use AI to design proteins that recognize specific compounds from scratch.

A research team from the Department of Biological Sciences at the Korea Advanced Institute of Science and Technology (KAIST) used a deep learning-driven protein structure generation and sequence design method. With the NTF2-like fold as the core "universal backbone", they de novo designed diverse small molecule-binding proteins and further transformed them into sensors similar to chemical-induced dimerization (CID). The researchers successfully designed a protein that can selectively recognize the stress hormone cortisol and developed an artificial intelligence biosensor based on it.

In the fields of life science and synthetic biology, how to design small molecule-binding proteins with both high affinity and high specificity has always been a key challenge in achieving biosensing and molecular switches. In the past, this area mainly relied on the screening and modification of natural proteins or the physical modeling design based on existing protein backbones, and the universality and scalability were always limited.

In response to this, a research team from the Department of Biological Sciences at the Korea Advanced Institute of Science and Technology (KAIST) used a deep learning-driven protein structure generation and sequence design method. With the NTF2-like fold as the core "universal backbone", they de novo designed diverse small molecule-binding proteins and further transformed them into sensors similar to chemical-induced dimerization (CID). The researchers successfully designed a protein that can selectively recognize the stress hormone cortisol and developed an artificial intelligence biosensor based on it. This case goes beyond protein design itself and moves towards practical sensor technology, solving the long-standing problem of small molecule recognition in the field of protein design.

The relevant research results, titled "Small-molecule binding and sensing with a designed protein family", have been published in Nature Communications.

Research Highlights:

* Use artificial intelligence to de novo design proteins that recognize specific compounds and apply them to functional biosensors

* Traditional methods mainly involve finding natural proteins or modifying some of their functions, while this research "customizes" proteins with the desired functions through AI-based design

* The research results can be widely applied in fields such as disease diagnosis, new drug development, and environmental monitoring

Paper Link: https://www.nature.com/articles/s41467-026-70953-8

Dataset: Building the NTF2 Backbone

To achieve the design goal, the researchers first generated a set of NTF2 structures (Set 1: 1,615 backbones) through the family-level "hallucination" method. Then, they used ProteinMPNN to redesign the sequences of these backbones and screened out proteins that can fold into the designed structures through AlphaFold (Set 2: 3,230 backbones). In addition, they also used Rosetta to parameterize and generate backbone structures, and similarly used ProteinMPNN for sequence design and AlphaFold for structure verification (Set 3: 6,838 backbones). See the figure below:

NTF2 Backbone Generation

Finally, after screening, the researchers also obtained coding oligonucleotides for experimental characterization, including: 630 HCY-binding proteins, 1,661 ROC-binding proteins, 16,276 WRF-binding proteins, 9,024 APX-binding proteins, 19,390 IRI-binding proteins, and 7,573 OHP-binding proteins.

Designing an NTF2 Protein Family with Diverse Pocket Geometries

The NTF2 fold is composed of three α-helices and a curved six-stranded β-sheet. These structures together form the large internal binding pocket unique to this fold family, as shown in the figure below:

The NTF2 Fold Has a Designable Structural Framework

The diversity of this fold in nature mainly comes from the long and irregular loop regions and the unique quaternary structure, both of which affect the geometry and function of the binding pocket. The goal of this study is to design a family of NTF2 proteins with diverse pocket geometries to accommodate a wide range of small molecules while minimizing the loop regions to maintain their modularity and designability. The overall design process is shown in the figure below:

Schematic Diagram of the Design Process for Small Molecule-Binding Proteins Based on the NTF2 Fold

After obtaining more than 10,000 NTF2 designed proteins with diverse pocket geometries, the researchers used RIFdock to place six small molecules with different chemical properties and structures into the central pockets of these backbones. These small molecules include the stress hormone cortisol (HCY), the anticoagulant warfarin (WRF), the muscle relaxant rocuronium bromide (ROC), the anticoagulant apixaban (APX), the anti-tumor active molecule SN-38 (IRI) derived from the anti-cancer drug irinotecan, and the hormone 17-α-hydroxyprogesterone (OHP).

In protein design, the construction of a polar interface is an important challenge. Especially for small molecule-binding proteins, it is necessary to introduce polar residues into the internal pocket to interact with the polar functional groups of the ligand without destroying the overall stability of the protein. To this end, the researchers adopted two strategies:

Method 1 (RIFdock to HBNets)

The researchers docked HCY, WRF, ROC, APX, and IRI into the Set 1 backbones and required at least one protein-small molecule interaction mediated by HBNet residues. Then, they used Rosetta design guided by natural sequences for optimization, where the sequence design was biased using the position-specific scoring matrix derived from NTF2 family proteins.

Method 2 (Unrestricted RIFdock)

Unrestricted RIFdock was used to place OHP, APX, and IRI into the backbones of Set 2 and Set 3, and LigandMPNN was used for sequence design. LigandMPNN is a variant of ProteinMPNN, specifically trained on protein-small molecule complex data, which can explicitly consider the presence of the ligand during the design process.

When screening the design results, the researchers used Rosetta to calculate the number of hydrogen bonds, binding energy (ddG), and contact molecular surface area (CMS) between the protein and the ligand. For Method 2, the single-sequence AlphaFold prediction results were also combined to screen designs that can reproduce both the target fold structure and the binding site (see the figure below).

Design Evaluation Metrics

Results Demonstration: NTF2-Based Small Molecule-Binding Proteins Can Be Applied to Biosensors

The researchers designed a series of experiments to verify the effectiveness of the design strategy proposed in this study:

Structural Characterization of the Designed Binding Proteins

To verify the accuracy of the designed small molecule-binding proteins, the researchers resolved the crystal structures of two protein-ligand complexes: the cortisol-binding protein hcy129 and the apixaban-binding protein apx1049. Among them, the surface of hcy129 was redesigned through ProteinMPNN to improve crystallinity, and a high-resolution structure of its complex with cortisol at 1.5 Å was successfully obtained. Structural alignment showed that its overall fold was highly consistent with the designed model, with a Cα RMSD of 1.1 Å (Figure A below), and the key hydrogen bond residues and ligand conformation also matched precisely (Figure B below), indicating that the pre-constructed hydrogen bond network (HBNet) effectively achieved the precise design of polar interactions.

Structural Analysis of the Designed Cortisol and Apixaban-Binding Proteins

On the other hand, the crystal structure of the apx1049-apixaban complex had a resolution of 2.1 Å and was more consistent with the designed model, with a Cα RMSD of only 0.6 Å within 113 residues (Figure C below). Its protein-ligand interactions almost completely reproduced the design, including the key hydrogen bonds and π-π stacking interactions between aromatic residues (Figure D below), thus stabilizing the ligand conformation and forming a highly shape-complementary binding pocket. These results demonstrate that the design strategy achieved high-precision construction of the protein-ligand interface at the atomic scale.

Structural Analysis of the Designed Cortisol and Apixaban-Binding Proteins

Specificity Evaluation of the Designed Binding Proteins

To evaluate the specificity of the designed proteins, the researchers systematically tested six binding proteins against six ligands, using albumin with non-specific binding ability as a control. The results showed that high-affinity proteins such as hcy129.1, iri807.1, and apx1049 showed good specificity when binding to their respective targets, while albumin hardly bound to most ligands, verifying the effectiveness of the design strategy.

In addition, in the warfarin (WRF) system, the binding affinity of albumin (KD ≈ 5.0 μM) was similar to that of the designed protein wrf1071 (KD ≈ 1.1 μM), indicating that non-specific binding is still a challenge for highly hydrophobic ligands.

Overall, this method has achieved a certain degree of high-specificity recognition, but there is still room for further optimization in distinguishing structurally similar molecules and improving the selectivity for hydrophobic ligands.

Biosensor Construction (Design and Characterization of Cortisol-Induced Heterodimers)

Cortisol usually exists at low nanomolar concentrations in physiological samples, and when the plasma cortisol level is higher than 38 nM, it can be used as a diagnostic basis for diseases such as Cushing's syndrome. To improve the binding affinity of hcy129 for cortisol for biosensing, the researchers constructed a combinatorial mutant library based on the favorable mutations screened from its single-point saturation mutation (SSM) experiments and screened it through yeast display. A significant increase in binding affinity was observed, as shown in the figure below:

Optimization of the Cortisol-Binding Protein hcy129

Subsequently, the researchers screened out the best variant from this library, expressed it in E. coli, and characterized it through isothermal titration calorimetry (ITC). The KD of this variant, hcy129.1, was 68 nM, which was 31 times higher than that of the original design (Figure C below). Structural analysis showed that the enhanced affinity mainly resulted from stronger hydrophobic interactions with cortisol (Figure D below).

Design and Characterization of the Cortisol-Sensing Chemically Induced Heterodimer

On this basis, the researchers further designed a cortisol-dependent heterodimer system. By modifying the structure of hcy129.1 and introducing a small protein backbone, and using methods such as RIFdock, Rosetta, and ProteinMPNN for computational design and screening, they finally obtained a small protein, miniH11, that can form a ternary complex with hcy129.1 and cortisol.

Experiments showed that this system forms a stable complex only in the presence of cortisol. Further, the researchers fused this system with the NanoBiT luciferase system to achieve cortisol sensing function, and detected an EC50 of approximately 72 nM (Figure H below), which is consistent with the binding affinity, verifying the effectiveness of the design. At the same time, the affinity of the system decreased significantly in the absence of cortisol, indicating that the dimerization has good ligand dependence.

Cortisol-Dependent Luminescence Response Curve in the Equimolar (200 nM) hcy129.1_CID-SmBiT and miniH11-LgBiT System

Overall, this work demonstrates that NTF2-based small molecule-binding proteins can be further engineered into functional biosensors.

Conclusion

Overall, this study provides a new path for the de novo design of small molecule-binding proteins: by using an artificial intelligence model to precisely characterize protein-ligand interactions at the atomic level, it has achieved a transformation from "discovering or modifying natural proteins" to "customizing functional proteins on demand" and completed effective experimental verification.

This not only marks a