Teen’s AI Model Cracks NASA’s NEOWISE Toughest Dataset in Microseconds

0
153
Credit: Jeremy Thomas/Unsplash

When NASA launched its infrared space telescope NEOWISE (Near-Earth Object Wide-field Infrared Survey Explorer), few imagined that a teenager, more than a decade later, would be the one to unlock some of its most elusive secrets. Among the classrooms of Pasadena High School, California, Matteo (Matthew) Paz was known as a stargazer, but not in the poetic sense. Like many budding astronomers, he was fascinated by the night sky.

Through a summer internship, he got a glimpse of the real data used by space scientists and realized just how messy it can be. During his research internship at Caltech’s Infrared Processing and Analysis Center (IPAC), Paz was introduced to the NEOWISE database. Data, so noisy and jumbled that most scientists had set it aside as too difficult to work with. Under the mentorship of astronomer and IPAC senior scientist Davy Kirkpatrick, Paz brought forth his project titled “The VarWISE, developing a machine-learning model called VARnet to decode NEOWISE

NEOWISE

NEOWISE is a space telescope operated by NASA that scans the sky in infrared light, meaning it can see heat, and not just visible stars. It was originally launched as the WISE telescope in 2009, purposed to capture detailed images of the entire sky, revealing everything from nearby asteroids to distant galaxies. After its coolant ran out, the telescope was reactivated in 2013 under the name NEOWISE, with a new focus on spotting Near Earth Objects (NEOs) such as asteroids and comets.

When NEOWISE looked into the sky, it did not just see stars; it saw them over and over again. Over more than ten years, the telescope collected nearly 200 billion snapshots of light sources. These snapshots are called apparitions, and each one is like a single heartbeat reading from a star or galaxy: when it was seen, where it was, and how bright it appeared. 

NEOWISE’s Database: A Locked Treasure 

At first glance, the data from NEOWISE looks like a goldmine: billions of observations of stars, galaxies, and asteroids, all glowing in infrared light.  However, the telescope had not followed a neat, predictable pattern. It would catch a star a dozen times in one night and not see it again for months. Some measurements were sharp and clear; others were blurred by cosmic noise or background interference. The data turned out to be a vast sea of numbers, scattered with no labels or clear connections.

For many scientists, analyzing this database is like straining to hear a whisper in a stadium packed with screaming fans. Traditionally, a method called phase folding is used. This method identifies repeating patterns by overlapping sections of a light curve (a star’s brightness over time), but phase folding is computationally intensive and struggles with uneven data.

Matteo Paz’s Key to NEOWISE 

Faced with this mountain of tangled data, Paz questioned whether we could do better, using AI built for messy, real-world data? His approach was to ‘clean up’ the database, understand it, and then train AI to recognize meaningful patterns. As each light source was scattered across the database in pieces. Paz first needed to find out which observations belonged to the same star. 

This is where a spatial clustering algorithm called DBSCAN (Density-Based Spatial Clustering of Applications with Noise) came into use. DBSCAN groups points (apparitions) close together in sky coordinates and discards the rest as noise. This allows the algorithm to recover full light curves for individual stars or galaxies. The clustered data for each star was then reformatted into a consistent structure. This ensured Paz’s AI model, VARnet, would receive reliable, standardized input for every star it analyzed.

Building VARnet, An AI Stargazer

With the cleaned-up dataset in hand, Paz built a machine learning model called VARnet. Unlike traditional models that rely on smooth, complete data, VARnet was designed from the ground up to thrive in NEOWISE’s noisy, sparse environment. VARnet uses two key signal-processing techniques, Wavelet Clarification and Custom Fourier Transform.

Wavelet Clarification breaks a signal into pieces at different scales, letting the model zoom in on both long-term trends and brief fluctuations. It is like having multiple lenses for examining a star’s behaviour. Normally, a Fourier Transform identifies repeating cycles in data. Paz modified this method to work better with uneven and gappy NEOWISE light curves, allowing VARnet to pick out rhythmic patterns that traditional tools miss. Together, these techniques help VARnet understand both periodic stars (like pulsators) and unpredictable events (like supernovae), even when the light curves are noisy or incomplete.

NEOWISE
VARnet understands both periodic stars and unpredictable events. Credit: Society for Science

Training VARnet

To train any machine learning model, you need examples with known answers. But NEOWISE’s archive lacked labelled light curves. So Paz built a simulator. Using physical models of how different stars behave, he generated synthetic light curves of four categories. Null – Stars that appear constant in brightness. Transients – Sudden, short-lived bursts of light from explosive events like supernovae. Pulsators – Stars that rhythmically expand and contract. Transits – Dimming caused by objects like planets passing in front of a star.

This simulated data became VARnet’s training ground, teaching it what each type of star ‘looks like’ in infrared. Once trained, VARnet could classify a star in just 53 microseconds (a thousand times faster than the blink of an eye) when run on a GPU. 

From Simulation to Discovery 

VARnet was tested on a 25-square-degree region of the sky, a small patch in the grand scheme of the universe. From that one area alone, fascinating results were obtained in under five minutes. Along with confidently identifying a known system, V1403 Ori, it spotted a new eclipsing binary (a system where two stars take turns eclipsing each other as they orbit). It also detected cosmic events in distant galaxies, such as a potential supernova in LEDA 358365 and a feeding supermassive blackhole in LEDA 340305.

After proving VARnet’s accuracy on a small patch of sky, Paz scaled up. Using the full NEOWISE dataset, VARnet analyzed over 450 million light sources across the infrared sky. From this massive trove, the model identified around 1.5 million variable sources. Out of those, more than 540,000 were discoveries never catalogued before in any existing database of variable stars.

The Future of VARnet 

Matteo hopes to expand VARnet to analyze the entire NEOWISE archive, not just small chunks. That would mean scanning data from every point in the sky, possibly uncovering millions of new variable stars, mysterious objects, and galactic events. He is also exploring how to adapt VARnet for other space missions, like the James Webb Space Telescope or ESA’s Euclid mission, which collect enormous volumes of time-based data. 

Matteo Paz was named the national winner of the prestigious Regeneron Science Talent Search program. Credit: Society for Science

Based on his groundbreaking project, Paz submitted a paper to The Astronomical Journal as the sole author. His paper was peer-reviewed and published, an astronomical achievement for a high school student. In March 2025, he was named the national winner of the prestigious Regeneron Science Talent Search program, winning 250,000 USD, beating out thousands of other talented high school researchers.

References