Machine Learning-Powered Autonomous Data Cleaning for Legend-200

ORAL

Abstract

The Large Enriched Germanium Experiment for Neutrinoless Double-Beta Decay (LEGEND) will operate in two phases to search for neutrinoless double-beta decay (0νββ). The first (second) stage will employ 200 (1000) kg of 76Ge semiconductor detectors to achieve a half-life sensitivity of 1027 (1028) years. In this study, we present a data-driven approach to remove electronic noise, cross-talk events, and recovery from injected test pulses captured by 76Ge detectors in LEGEND  powered by a novel artificial intelligence algorithm. We first de-noise and extract waveform shape information utilizing a Discrete Wavelet Transform (DWT). We then utilize an unsupervised learning clustering algorithm called Affinity Propagation (AP) to obtain a representative waveform basis for a given dataset. We demonstrate that our model is efficient at classifying events for low-background datasets, and can be used as a preliminary data cleaning filter for both low-background and calibration datasets. This method will enable for the automatic detection of background events that require significant time and human effort in traditional data cleaning.

*This work is supported by the U.S. DOE, and the NSF, the LANL, ORNL and LBNL LDRD programs; the European ERC and Horizon programs; the German DFG, BMBF, and MPG; the Italian INFN; the Polish NCN and MNiSW; the Czech MEYS; the Slovak SRDA; the Swiss SNF; the UK STFC; the Russian RFBR ; the Canadian NSERC and CFI; the LNGS and SURF facilities.

Presenters

  • Esteban A León

Authors

  • Esteban A León

  • Julieta Gruszko

    • University of North Carolina at Chapel Hill
    • University of North Carolina at Chapel H
  • Aobo Li

    • University of North Carolina at Chapel H