Wednesday 15 January 2014

Using a Relational Database for Functional Coverage Collection Part Six

Test Set Profiling and Optimisation

Test Set Profiling

This is the analysis of the coverage data to determine which classes of tests hit which points/buckets. The first naive method of trying to increase coverage is to simply run more of these tests until the cumulative coverage ceases to improve. It also gives feedback on whether the expected test class hits the type of coverage expected.
Some of this feedback can be done interactively within the web GUI using e.g. bucket coverage details. The next level would be analysis of partially hit buckets to suggest which tests to run more of - most likely similar to the ones that already hit the buckets.
Unhit buckets are harder, tests which hit neighbours may not be able to reach the particular bucket or it may be impossible to hit the bucket (in which case it should be marked as such). But simply running more of the tests which have hit other buckets in the coverpoint in 'soak' style may eventually yield some hits with little engineering effort. Having the optimization mechanism described below makes it easy to discard the soaks which did not contribute to new coverage. It must be noted here how vitally important it is to have test generators that do not routinely produce erroneous stimulus that cause tests to fail. In this case the soak will produce errors that will have to be triaged and debugged and so any coverage produced is not without engineering effort.
Knowing which tests hit which buckets can also be useful to locate tests that exercise a particular behaviour for e.g. waveform viewing of a particular behaviour.
No tool is provided in the example code for profiling beyond the interactive bucket coverage. However an interesting visualisation might be a 'heat map', particularly useful in the matrix view. Individual buckets could be coloured and shaded according to which type of test hit them either in combination or individually.

Test Set Optimisation

If we're going to run a set of tests again as a regression we may wish to optimize the test set to hit the highest coverage possible with the smallest number of tests in order to reduce compute requirements. Or at least run that subset first hoping to discover issues as quickly as possible.

An example sequence of commands is given in coverage.readme, note that the regression ids given will be different if you repeat this sequence of commands. First a baseline set of tests are run that do not close coverage. We then run some more and optimize the combined coverage. It's still not 100% so we run even more and then optimize again. This time we're at 100% and are given a list of tests that in combination hit 100%. Finally we rerun with just this subset to check this subset really does reach 100% coverage.

  cd test
  # execute base line regression
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+1 +master=1 +instances=20 +children=12
  # add some more tests with different seed
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+2 +master=1 +instances=20 +children=14
  # see where that got us
  ./optimize -r 1484 -r 1497
  # and a few more to hopefully close coverage
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+3 +master=1 +instances=20 +children=18
  # cherry pick 'best' runs using default order
  ./optimize -r 1484 -r 1497 -r 1515
  # replay the selected seeds to check we're still at 100%
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +master=1 +instances=20 +test_xml=optimize_1484.xml
  # rerun using multiple iterations, the second and third 
  # random orders may reduce the set size slightly.
  ./optimize -r 1484 -r 1497 -r 1515 \
    --order cvg \
    --order rand \
    --order rand

The problem of finding the minimum number of tests that together provide 100% is a variant of the Set Cover Problem, which is NP hard. This means there is no cunning way to arrive at the answer without having to do all of the computation involved - which will grow at an exponential rate as the dataset increases. For a large number of tests with significant coverage this is not going to be practical as we'd have to wait too long for the output.

Fortunately we don't need the absolute smallest optimized set, something approaching it will do. The method currently used to create the optimized test set is very simple, although there are several variants. The core of each is to add tests one by one accumulating coverage into a table each time. When the coverage is 100% we stop. We then have a list of tests and how much coverage each contributed. We can omit those which did not add any coverage to reduce the list to a minimum number of tests required to hit 100%. This may be not the minimum, but normally will be less than all the tests we started with (sometimes it will be significantly less). Even if the coverage is not 100% we can still eliminate some tests that do not contribute to the overall coverage.

The variants introduce the tests in differing orders :

  • pos uses the given order (order of execution in the first case if a regression id is given)
  • rand uses a random order
  • cvg orders by absolute coverage
  • incr starts by first adding the test with the highest absolute coverage, but then recomputes each outstanding tests incremental coverage and adds the test with the next highest. It is probably not suitable for large test sets with significant coverage due to computational requirements and it may produce only a marginally better test set than cvg, but take far longer to complete.

The profile command allows multiple passes so something like --order cvg --order rand can sometimes deliver a minor improvement by changing the starting point on the second pass and eliminating a few more tests. The incr option is an implementation of the greedy algorithm solution for the set cover problem described in the literature, it adds the test with the highest incremental coverage (the most uncovered buckets) each time. This requires recomputing the incremental coverage at each iteration which can be expensive but it will get quicker as the number of hit buckets increases and the number of outstanding tests decreases; but it is still computationally intensive. A hybrid approach that starts like cvg but switches to incr at a predefined point may still yield better than cvg results at a much lower compute cost. Use the --threshold argument to specify a coverage percentage at which to switch from cvg to incr when the incr order is specified.

  # use "incr" ordering; but only switch from "cvg" at 95%
  ./optimize -r 1484 -r 1497 -r 1515 --order incr --threshold 95

Creating a Robust Test Set

We can attempt to offset subsequent functional changes, bug fixes, system instabilities losing tests or other factors perturbing the coverage by over-provisioning each bucket beyond the goal. This would make the test set robust to losing a previously contributing test at the expense of making the optimization process more involved and slower than it currently is, but could be desirable in some circumstances. Possible mechanisms include :

  • Attempt to hit double the goal. As any test can only contribute a maximum of hits equal to the goal at least two tests will be required. The obvious pathological case is where each test only contributes one hit - we will effectively double the number of tests required in this case.
  • If we were to store the maximum number of hits from any of the contributing tests with each bucket and then attempt to hit the goal plus this maximum we would be immune to the loss of any one of the tests.

This functionality is now available in the example, use the --robust switch to enable it. It is of the second type and works best with two passes where the first pass is a simple algorithm that can be used to scope out the value of the maximum hits from any test per bucket. Use something like

  # build a coverage optimized set robust to losing a test
  ./optimize -r 36,595 \
    --order rand \
    --order incr \
    --robust \
    --threshold 99

It has been observed that adding several rand orders at the end can crunch the number of tests down by a few with little compute overhead.

The code can be found in the optimize.py wrapper for the classes at the end of the database.py module. All the optimization variants inherit from the optimize base class, but changing the order of test introduction.


No comments:

Post a Comment