Abstract DGP2026-60 |
|
Clustering the Exoplanet Database; Unraveling Hidden Patterns in Exoplanet Populations using Unsupervised Machine Learning Techniques
The growing capabilities of exoplanet detection methods have steadily increased the number of known planets. With several thousand confirmed by early 2026, exoplanet catalogs now enable increasingly powerful statistical studies of planetary populations. Most demographic analyses have relied on classical statistical approaches, while data-driven, unsupervised machine-learning methods have been used less frequently for exploratory population studies. Here, we present an exploration of the Extrasolar Planets Encyclopaedia dataset using a range of unsupervised learning algorithms. We first select a subset of system features and complete missing entries via a feature-prediction approach, then apply outlier-detection methods to identify objects with parameter combinations inconsistent with the bulk of the sample. Finally, we apply multiple clustering algorithms to the dataset, both in its unweighted form and in a weighted variant intended to mitigate observational selection effects, to search for latent structure. We hope that the resulting pipeline yields: (i) a feature-prediction engine that can infer previously missing system parameters with feature-dependent precision, (ii) a set of catalog entries whose reported parameters may warrant re-examination, and (iii) clusters that reproduce known demographic patterns while also suggesting potential additional structure in exoplanet populations.