Public data sets for “Untangling Code Changes” moved


The public data sets for our paper on “Untangling Code Changes” published at the Mining Software Repository working conference in 2013 have moved. Please find the data sets using out GitHub repository” The git repository snapshots of the individual projects used in our analyses can be found in the download section below.


Mining version archives became a popular research field. Most research approaches analyze version archives to map bug reports or other external documents to committed changes. Mostly this is done by parsing the commit message of the applied changes scanning for bug report or other document IDs. But these approaches rely on the fact that developers write (correct) commit messages. Even if a commit message contains a bug report ID there is no evidence that the change applied only the bug fix without any extra code changes. In empirical studies researchers found that many of the applied code changes are non-atomic—meaning that applied bug fixes also touch code artifacts that were refactored or reformatted without any hint in the commit message. This introduces noise and bias into prediction and recommendation tools and their evaluation. With this work, we aim to build an algorithm that is capable to automatically untangle code changes into code change partitions. Each partition corresponds to an individual development task and could have been applied separately.

CSV files

The different data sets are provided as CSV files for five open-source projects: ArgoUML, Google Webtool Kit, Jaxen, JRuby, XStream. These files can be accessed using the our Git repository. Please note that transactions are identified by GIT-hashes and not by SVN ids. We also provide a mapping file between the original SVN ids and GIT hashes for the individual project repositories.

Git repositories

The corresponding project repositories were transformed to GIT repositories before analysis. The git repository snapshots used in the experiments were too big to include into the Git repository. Instead, you can find download links on the original paper website or on this site just below this text.



Google WebTool Kit