Predicting defects for code clusters

Software products and projects can become very large and still grow over time. Building one prediction model for a whole software product might be easy but might also limit the prediction accuracy.

Different parts of a software product have different duties (GUI, database, kernel,…). We found out that for each of these different code zones there exist different code characteristics that matter when it comes to defect prediction. The main idea is to cluster code entities by their software duty. Currently we are investigating whether defect prediction taking advantage of such clustering techniques have a higher prediction accuracy than defect prediction models using no clustering technique.