On May 26, Computer Science graduate student Aalok Ahluwalia presented the paper “Snoring: A Noise in Defect Prediction Datasets,” written by Aalok, Cal Poly Engineering Professor Davide Falessi, and University of Sannio, Italy Professor Massimiliano Di Penta, at the 16th International Conference on Mining Software Repositories (MSR).
MSR is ranked as a CORE A conference, had an acceptance rate of 27%, and is currently ranked number 9th across all SE publications venues (including journals): link.
Abstract
In order to develop and train defect prediction models, researchers rely on datasets in which a defect is often attributed to a release where the defect itself is discovered. However, in many circumstances, it can happen that a defect is only discovered several releases after its introduction. This might introduce a bias in the dataset, i.e., treating the intermediate releases as defect-free and the latter as defect-prone. We call this phenomenon as “sleeping defects”. We call “snoring” the phenomenon where classes are affected by sleeping defects only, that would be treated as defect-free until the defect is discovered.
In this paper we analyze, on data from 282 releases of six open source projects from the Apache ecosystem, the magnitude of the sleeping defects and of the snoring classes. Our results indicate that 1) on all projects, most of the defects in a project slept for more than 20% of the existing releases, and 2) in the majority of the projects the missing rate is more than 25% even if we remove the last 50% of releases.
For more information on Mining Software Repositories 2019, visit HERE.