Mining and Untangling Change Genealogies – PhD Thesis

Developers change source code to add new functionality, fix bugs, or refactor their code. Many of these changes have immediate impact on quality or stability. However, some impact of changes may become evident only in the long term. This thesis makes use of change genealogy dependency graphs modeling dependencies between code changes capturing how earlier changes enable and cause later ones. Using change genealogies, it is possible to:
(a) apply formal methods like model checking on version archives to reveal temporal process patterns. Such patterns encode key features of the software process and can be validated automatically: In an evaluation of four open source histories, our prototype would recommend pending activities with a precision of 60—72%.
(b) classify the purpose of code changes. Analyzing the change dependencies on change genealogies shows that change genealogy network metrics can be used to automatically separate bug fixing from feature implementing code changes.
(c) build competitive defect prediction models. Defect prediction models based on change genealogy network metrics show competitive prediction accuracy when compared to state-of-the-art defect prediction models.
As many other approaches mining version archives, change genealogies and their applications rely on two basic assumptions: code changes are considered to be atomic and bug reports are considered to refer to corrective maintenance tasks. In a manual examination of more than 7,000 issue reports and code changes from bug databases and version control systems of open- source projects, we found 34% of all issue reports to be misclassified and that up to 15% of all applied issue fixes consist of multiple combined code changes serving multiple developer maintenance tasks. This introduces bias in bug prediction models confusing bugs and features. To partially solve these issues we present an approach to untangle such combined changes with a mean success rate of 58—90% after the fact.

  • [PDF] K. Herzig, “Mining and Untangling Change Genealogies,” PhD Thesis, 2012.
    title = {{Mining and Untangling Change Genealogies}},
    author = {Kim Herzig},
    year = {2012},
    month = {dez},
    location = {D--66123 Saarland, Germany},
    school = {Universität des Saarlandes},
    link = {},
    pdf = {}