AI Discovers 118 New Exoplanets: University of Warwick Team Proposes RAVEN to Compare Planetary and False

Achieved an overall accuracy of 91%.

A research team from the University of Warwick has proposed a brand - new screening and validation process called RAVEN for TESS candidates. This process introduces a synthetic training dataset and no longer solely relies on the Threshold Crossing Event (TCE) data generated by the mission itself. This improvement significantly expands and enhances the parameter space of planetary and false - positive scenarios covered by the machine - learning model. In an independent external test set containing 1,361 pre - classified TESS candidates, this process achieved an overall accuracy of 91%, demonstrating its effectiveness in automatically ranking TESS candidates.

With the continuous in - depth development of astronomical research, the discovery of exoplanets has entered a stage of rapid development. Especially the light - curve data provided by NASA's Transiting Exoplanet Survey Satellite (TESS) mission enables scientists to obtain a large number of transit signal candidates every day.

However, confirming or denying the planetary attributes of candidates is a long and challenging process. As of now, a total of 7,658 TESS Objects of Interest (TOIs) are listed in the Exoplanet Archive, among which 5,152 are still marked as candidates. Only 666 have been confirmed as real exoplanets, and another 558 are planets that were previously confirmed but detected by TESS. Meanwhile, 1,185 TESS candidates have been identified as "False Positives (FPs)", and another 97 are classified as "False Alarms (FAs)". Such a high number highlights the difficulty of confirming exoplanet candidates.

Beyond candidate screening is the "validation pipelines", whose goal is to confirm candidates as real planets through statistical methods. Traditional validation methods mainly rely on manual analysis and follow - up observations, including Radial Velocity (RV) measurements and ground - based telescope tracking. These methods are reliable but time - consuming and costly.

In response to this, based on the Kepler process proposed by David J. Armstrong and others, the research team from the University of Warwick has further developed a new screening and validation process for TESS candidates - RAVEN (RAnking and Validation of ExoplaNets). The most crucial change in the new process is the introduction of a synthetic training dataset. It no longer solely depends on the TCE data generated by the mission itself. This improvement greatly expands and enhances the parameter space of planetary and false - positive scenarios covered by the machine - learning model.

The results show that this process achieved an AUC score of over 97% in all false - positive scenarios, and except for one scenario, the scores in the rest exceeded 99%. In an independent external test set containing 1,361 pre - classified TESS candidates, this process achieved an overall accuracy of 91%, demonstrating its effectiveness in automatically ranking TESS candidates.

The researchers also used this process to confirm 118 new exoplanets and identified over 2,000 high - quality planet candidates, nearly 1,000 of which were previously undiscovered.

The relevant research results titled "RAVEN: RAnking and Validation of ExoplaNets" have been published as a preprint on arXiv.

Research Highlights:

* With the help of synthetic datasets, RAVEN can compare planetary scenarios with each false - positive scenario one by one. This ability previously only existed in validation frameworks that rely on model fitting.

* The new process introduces a synthetic training dataset and no longer solely relies on the TCE data generated by the mission itself.

* The new process maintains high operating efficiency: it only takes about one minute to process a typical candidate, and it has good scalability through multi - process support.

Paper URL: https://arxiv.org/abs/2509.17645

Dataset: The Complete Construction Path from Input Data to Training Samples

Input Data: Multi - source Information Fusion Centered on Light Curves

The RAVEN process currently uses light curves generated from the TESS Full Frame Images (FFIs) released by the TESS Science Processing Operations Center. These light curves are extracted from the FFI data of each observation sector through aperture photometry. The sampling rate for sectors 1 - 27 is 30 minutes, and for sectors 28 - 55, it is 10 minutes. The FFIs released in the second extended mission of TESS (starting from sector 56) have a sampling rate of 200 seconds. The light curves used in this study are up to sector 55.

Training Data: Systematic Modeling of Planets and False Positives

The RAVEN process introduces synthetic light - curve data for training the machine - learning model instead of relying on the existing classified candidate light - curve data in the mission.

The initial set of synthetic events uses simulated transits or eclipses and injects them into the SPOC light curves. The simulated events are generated using the researchers' modified version of the PASTIS software, initially including scenarios such as transiting planets (Planet), eclipsing binaries (EB), hierarchical eclipsing binaries (HEB), hierarchical transiting planets (HTP), background eclipsing binaries (BEB), and background transiting planets (BTP). To ensure that the synthetic data is as close as possible to the actual TESS observation population, the primary stars in each scenario are randomly selected from a well - characterized TESS Input Catalog (TIC) sample. Ultimately, the target sample contains 1,200,520 SPOC FFI stars.

On this basis, the construction of false - positive data is more complex and crucial. For Nearby False Positives (NFPs), the researchers consider the following NFP scenarios: Nearby Transiting Planets (NTP): A planet transits a diluted host; Nearby Eclipsing Binaries (NEB): A nearby diluted source is an eclipsing binary; Nearby Hierarchical Eclipsing Binaries (NHEB): A nearby diluted source is a hierarchical eclipsing binary.

Test Data: Real Application Scenarios Centered on TOIs

The performance of this process was finally tested on a set of TOIs (TESS Objects of Interest) with prior classifications. The TOI list and classification information used for testing are from the NASA Exoplanet Archive, dated February 3, 2025. At that time, there were 2,134 pre - classified TOIs, among which 548 were classified as Known Planets (KP), 485 as Confirmed Planets (CP) by TESS, 1,113 as FPs, and 96 as FAs. However, only 1,918 TOIs had associated published SPOC FFI light curves. Finally, after applying depth and period constraints to the remaining samples, the total number of TOIs to be processed was 1,589.

All TOIs went through the complete processing steps of the pipeline, except for one FP TOI whose target star was marked as "DUPLICATE" in the TIC. In the final results, 68 TOIs were excluded because the stellar radius of the target star was missing in the TIC; another 87 were excluded because the TESS magnitude exceeded 13.5, and 22 were excluded because the Gaia magnitude exceeded 14.

The training set of this study does not include events where the target star magnitude is greater than 13.5 Tmag or 14 Gmag. In addition, 28 TOIs were excluded because the MES calculated during the feature generation process was less than 0.8, and 2 TOIs were discarded due to failed feature generation. Finally, 21 TOIs could not generate position probabilities due to centroid data problems, so no posterior probabilities were provided, and they were also excluded from further analysis.

Therefore, the final number of pre - classified TOIs in this test was 1,361, among which 705 were known or confirmed planets, 630 were FPs, and 26 were FAs.

Combining Two Machine - learning Models - GBDT + GP

The RAVEN process is based on the statistical validation framework proposed by David J. Armstrong and others in 2021 for Kepler mission candidates (hereinafter referred to as A21). This framework is adapted to the data of the Transiting Exoplanet Survey Satellite and is also extended and upgraded. The implementation and operation of the entire process are relatively complex and involve multiple steps. The brief process is shown in the figure below:

Flowchart

Machine - learning Training

The core of RAVEN is to combine two machine - learning models: Gradient Boosted Decision Tree (GBDT) and Gaussian Process (GP). The process generates posterior probabilities for 8 false - positive scenarios for each candidate planet, and by taking the minimum value, the RAVEN probability, which represents the lowest confidence in the authenticity of the candidate, is obtained.

① Gradient Boosted Decision Tree (GBDT)

Decision trees are a type of simple but powerful machine - learning models. One of their significant advantages is strong interpretability. However, a single decision tree has limitations in robustness, and it is prone to overfitting when the tree depth is too large. To solve these problems, an ensemble method consisting of multiple "weak" trees is usually adopted. Gradient Boosted Decision Tree (GBDT) is such an ensemble method that forms a stronger final model by sequentially constructing multiple decision trees.

The core feature of GBDT is that each newly generated tree in each round is not directly trained on the original labels but learns from the residual errors produced by the previous round of model prediction. In other words, the goal of each new model is to minimize the loss function of the overall model, which is essentially similar to gradient descent. During the ensemble process, the output results of each sub - model are scaled according to the learning rate and then accumulated to obtain the final prediction.

The model loss is calculated through a preset loss function, and the residuals are determined by the gradient of this loss function. In the process of this study, the GBDT classifier uses the XGBoost implementation proposed by Chen and Guestrin.

② Gaussian Process Classifier

A Gaussian Process (GP) is a stochastic process that extends the Gaussian probability distribution from the "distribution of random variables" to the "distribution of functions". In GP classification, the goal is to output discrete class labels or class probabilities between 0 and 1. For this purpose, a response function needs to be applied to the output of the GP to map the result to the interval of 0 to 1. Then, it is combined with a probability likelihood function (such as the Bernoulli likelihood).

This study uses the variational approximation method proposed by James Hensman and others. This method relies on a set of "inducing points", which are a representative subset of the data, to reduce the computational complexity and improve the scalability of the model.

Training and Calibration

To train and optimize the two classifiers, an iterative approach is adopted. The synthetic training set is trained under different hyperparameter combinations, and the performance is evaluated on the validation set to select the optimal parameters. Parameter tuning mainly focuses on three key FP scenarios: EB, NEB, and NSFP, as they are the most common false - positive events. At the same time, to avoid over - optimizing for a single scenario and causing overfitting, the parameters are kept as consistent as possible among different scenarios.

All models have an "Early Stopping" mechanism enabled: when the loss function on the validation set does not decrease by at least 0.0001 in 20 consecutive iterations, the training stops, and the model state at the last improvement of the loss function is restored.

Statistical Validation

The last component of the process is to derive the posterior probability of the planetary hypothesis by combining the probability of each planet - FP classification obtained from machine learning with its corresponding scenario - specific prior probability. This posterior probability only represents the probability that the candidate is a planet or a specific FP scenario. Therefore, the researchers' statistical validation method requires that the posterior probability of the planet in each of the 8 planet - FP classifications of the candidate exceeds 0.99 to be considered as validated.

RAVEN Performs Well in Screening, Ranking, and Validating Real Planet Candidates

To evaluate the performance of RAVEN, the researchers conducted the following validations on the training set and the test set respectively:

The researchers first tested its performance on a subset of the training set that was not seen during training. This subset consists of 10% of the events randomly selected from each scenario and was independently isolated before training. The model performance was evaluated through four key indicators: accuracy, Area Under the ROC Curve (AUC), precision, and recall. The performance test results are shown in the table below:

Performance Indicators of GBDT and GP Binary Classifiers Trained for Planet - False Positive Pairs on the Test Set

The results show that the two classifiers perform excellently in all FP scenarios, especially in terms of precision. Since the main goal of the RAVEN process is to screen and validate real planet candidates, precision is the most important indicator, which reflects the pipeline's ability to correctly identify FPs without misjudgment. Combining the results of the two classifiers, the precision almost reaches 99% in all scenarios.

The performance of the RAVEN process was finally tested on a set of TOIs with prior classifications. For all 1,361 TOIs in the sample, their RAVEN probabilities are shown in the figure below:

Stacked Histogram Showing the Minimum Posterior Probabilities of Pre - classified Planetary, FP, and FA TOIs

The histogram shows obvious probability differences among the three categories, with good distribution and obvious extreme values. This reflects the effectiveness of RAVEN in identifying FP events and assigning low planetary posterior probabilities to them. Specifically, 93.8% of the FP events have a minimum posterior probability of less than 0.5, among which 69.7% are less than 0.01. The average probability of FP events is 0.076, and the median is 0.00022.

Similarly, for the 26 FA TOIs, 23 of them have probabilities less than 0.5, and the median of the entire category is 0.016. Overall, the results of FP and FA TOIs confirm the high efficiency of the pipeline in screening TESS candidates and can be used to remove most FP events.

Next, the researchers verified RAVEN's ability to identify false positives (FPs). The following table further lists 12 FP events with probabilities greater than 0.9:

该文观点仅代表作者本人，36氪平台仅提供信息存储空间服务。

AI discovers 118 new exoplanets. A team from the University of Warwick proposes RAVEN to compare each planetary scenario with every false-positive scenario one by one.