Project 2: Comparing Coverage

In this project, you write a tool that collects and compares the coverage of multiple program runs.

Your Task

Obtain and compare coverage information to detect anomalies. Proceed in three steps:

Step 1: Obtain coverage.

To obtain coverage, you must know which statements have been executed. The Python method sys.settrace(func) sets the global trace function to func; it is invoked whenever a new function is entered. func can return a local trace function which is then invoked for every line of the current function. Here's a full description of sys.settrace(); you may also wish to look up internal types such as code or frame objects.

Start by extending an existing program (such as XMLProc from Project 1) with a tracing function. Have your tracing function first report the called functions; then extend it to report the covered lines. See the lecture slides on comparing coverage for an example.
Store the coverage thus obtained in memory and write it to a coverage file when the program exit. (Hint: use the atexit module to define a function to be called at program exit.)
Generalize your approach to form a separate module which only needs to be imported and activated to provide coverage for the current program—for instance, as
```
        import coverage
        ...

        if __name__ == "__main__":
            coverage.start()
            # remainder of execution
            ...
```
Generalize further: Make a stand-alone program (say, coverage.py) that takes another Python program p as argument (as well as p's arguments). Your tool should invoke p and determine its coverage.
Here's a hint on how to invoke p from your program. The following piece of code sets up the arguments for p; its file name is stored in args[0], and args[1]... contain p's arguments.
```
	sys.argv = args
	import __main__
	sys.path[0] = os.path.dirname(sys.argv[0])
	execfile(sys.argv[0], __main__.__dict__)
```
Be sure to have docstrings for every function which describe its purpose. Have a README file (or other appropriate help) that describes how to use the coverage module and/or how to invoke the coverage tool.

Step 2: Compare coverage.

Collect the coverage information of multiple test runs.

Present the following information as a plain file, giving the percentages as numbers next to the source line:
- The percentage of failing test cases in the test cases that executed that line
- The percentage of test cases in all test cases that executed that line
Present your information as a HTML file, using hue and brightness to highlight individual lines, in the style of the Tarantula tool:
- bright(s) = max(%passed(s), %failed(s))
- hue(s) = red hue + %passed(s) / (%passed(s) + %failed(s)) x hue range
(Hint: use HTML tags <font color="rrggbb">...</font> to set the color of a piece of text.)

Step 3 (optional): Implement an advanced method.

Implement one of the following extensions to coverage comparison:

Nearest Neighbor

Rather than comparing against a combination of all passing runs, it may be wiser to compare only against one passing run---the so-called ``nearest neighbor''. Extend your tool such that it picks the passing run with the most similar coverage and compares only against this run.

For details, see Renieris and Reiss, Fault Localization with Nearest Neighbor Queries (ASE 2002).

Note: You do not need to implement the exact method of Renieris and Reiss; it suffices if your tool picks a "nearest neighbor". Which measure of similarity do you find most effective?

Call sequences

Some failures come to be only through a sequence of locations. Extend your tool such that it determines which sequences of locations (either statements or functions) occur only in the failing run.

For details, see Dallmeier et al., Lightweight Defect Localization for Java (ECOOP 2005).

Note: You do not need to implement the exact method of Dallmeier et al.; it suffices if your tool compares sequences of locations. Which kind of location and which length do you find most effective?

Test cases

Demonstrate your techniques on two programs:

The middle program and its test runs as described in the book.
You do not need to create unit tests for this example; it suffices to contrast the failing run with the passing runs.
The XMLProc parser from Project 1.
The XMLdata archive contains a number of passing and failing test inputs. For each of the three failing test inputs, demonstrate how their coverage differs from the passing test inputs.

References

Section 11.2 "Detecting Anomalies" in the book
Tarantula: Fault Localization via Visualization
Renieris and Reiss, Fault Localization with Nearest Neighbor Queries (ASE 2002)
Dallmeier et al., Lightweight Defect Localization for Java (ECOOP 2005).

Get the book at Amazon.com · Amazon.de
Comments? Write to Andreas Zeller <zeller@whyprogramsfail.com>.