Defect predicion is a research with high activity. A lot of work has been done in this area, also in recent months. But even if there exist a lot of approaches and techniques to predict defects and bugs accurately, many of the static approaches suffer from a fundamental flaw.
Most static approaches rely on source code metrics (such as McCabe), only. The assumption is that the more complex source code is the more difficulties the developer will encounter when changing the source code entity. An indeed, there are many static prediction approaches that work with high accuracy. But how long?
Let’s assume that we have a very accurate prediction model. Now let’s use it. We determine the top 20% of source code entities with the highest defect likelihood. After intense testing, debugging and fixing we are quite sure that there are no more defects left. Now we ask the prediction model again to gain the new top 20%. And now it happens. The prediction model will most likely return the same set of source code entities.
The reason for this is that fixing a defect will not significantly decrease the source code complexity. Even worse, many fixes will increase the complexity. What now?
This is one of the problems I’m currently working on. The solution is not that simple but we are working on it.