It’s not a Bug, it’s a Feature: On the Data Quality of Bug Databases @ICSE2013

In a manual examination of more than 7,000 issue reports from the bug databases of five open-source projects, we found 33.8% of all issue reports to be misclassified, that is, rather than referring to a code fix, they resulted in a new feature, an update to documentation, or an internal refactoring. This misclassification introduces bias in bug prediction models, confusing bugs and features: On average, 39% of files marked as defective actually never had a bug. We estimate the impact of this misclassification on earlier studies and recommend manual data validation for future studies.

Further Details

For more information, additional results, and the data sets containing the manual classified issue reports please visit the papers website at

  • [PDF] K. Herzig, S. Just, and A. Zeller, “It’s not a Bug, It’s a Feature: How Misclassification Impacts Bug Prediction,” in Proceedings of the 2013 international conference on software engineering, Piscataway, NJ, USA, 2013, p. 392–401.
    title = {{It's not a Bug, It's a Feature: How Misclassification Impacts Bug Prediction}},
    author = {Kim Herzig and Sascha Just and Andreas Zeller},
    booktitle = {Proceedings of the 2013 International Conference on Software Engineering},
    series = {ICSE '13},
    year = {2013},
    isbn = {978-1-4673-3076-3},
    location = {San Francisco, CA, USA},
    pages = {392--401},
    numpages = {10},
    acmid = {2486840},
    publisher = {IEEE Press},
    address = {Piscataway, NJ, USA},
    link = {},
    pdf = {}