Thursday 30 May 2013

Test Bench Control Part One

Test benches and other infrastructure should be architected and implemented in the same way as any 'real' piece of IP. By this I mean a specification should be written from which an architecture can be created and these documents and the resulting code reviewed in very much the same way as for 'real' design or production code. We should apply as much ingenuity to the testbench and infrastructure as we would to the IP itself. IP has been viewed as a reuseable commodity for many years and we should take the same stance with it's attendant infrastructure. After all if we wish to service the IP itself (maintanence or adding new functionality) we want to be able to reuse as much of the infrastructure as possible in much the same way we want to reuse as much of the IP as possible.

We should view the IP as both the design code and the testbench code (it seems that this is not such a novel idea - see here), it is just that only the design code part actually makes it into transistors.

I have never seen this done. (I would be pleased to hear from persons who have).

Infrastructure always seems to be a morass of unarchitected, unstructured and unloved code that is littered with structures unused and forgotten. It is treated as an afterthought, a poor relation, as it is only the design code which is deemed significant.

There are a number of opportunities here for those with vision.

Verification Co-design


At the point of conception there are a myriad of possible outcomes. Some of these will offer easier paths for the verification process than others. There is scope here to "Architect for Verification", i.e. to introduce features that will ease the task of verification. These additional features are subject to all the other existing 'normal' constraints e.g. functionality, area/resource, timing, complexity and so on. But by evaluating a potential architecture alongside a potential verification strategy there is opportunity to optimize the whole - to architect something that lends itself to be verified by a particular strategy. The goal is to reduce the overall effort of design and verification, or at least to reduce the risk of overrun, or both.

The most simple of these features may be the removal of read only registers or reads that have side effects. Removing undefined state or undefined behaviour makes pseudo random test generation easier and quicker - extra constraints slow test generation and can cause desirable sequences to not appear. This does not preclude end user documentation having 'set as zero', 'undefined' or 'reserved' fields and behaviour - it just makes more sense when creating internal models that need to converge. More advanced is the introduction of features that are specific to verification, enabling verification tests to be run across all platforms from HDL simulation to the real silicon itself via emulation & FPGA. This could be in the format of registers containing useful state or some other behaviour/functionality.

Historically such an approach may have been vetoed because of increased gate count, but microprocessors have shipped with debug features for decades even though the feature is almost never used on all but a vanishingly small number (and most likely inhibited from working by fuse today). DFT structures are used just once. Both use silicon real estate and nobody questions their usefulness so why not add features to expedite verification? The cost in square microns can be small in comparison to the effort of debugging silicon and corresponding re-spins, which could take months and millions of dollars. Even when those square microns are multiplied by millions and millions of produced devices.

It is the reduction in risk that is the compelling reason to adopt Verification Co-design, investing some extra time up front to reduce the risk of project overrun or debugging silicon - which can end up costing significantly more.

It is therefore vitally important to include verification input in the architecture process to allow such optimizations to be made.

Reusable Design, Reusable Test Bench


Taking a similar approach to creating a reusable design will improve the longevity of the test bench. Making it easier to run legacy tests as well as add new ones (old tests should be seen as equally valuable as new ones), build continuous integration in from early on. Adopt a test driven development style flow, you cannot write any design code until you have a test bench, test framework and a test to stimulate it with. It is simply not possible to create any of this infrastructure in an efficient manner until you have evaluated the requirements, and defined them.

I would also argue that the first step should be to code an engineering mock up. A throw-away first pass to allow some aspects of the problem to be explored. It is important that this effort is limited by time and scope, so that only some aspects are targeted (preferably the hard or unknown parts) in a given time box. Valuable lessons will be learned and insights gained even though the resulting code may be thrown away entirely.

It is only by properly exploring the problem at hand that the breadth and depth of the requirements will be understood. I do not believe that many people can explore this fully in their minds alone and write a specification or architecture without some hands on experience. Plus it allows an approach to be reviewed too - you can't do a code review when you haven't got any code written. Whilst it wouldn't be a proper review, it allows feedback at a very early stage and beats the last minute code review where it's far too late to change anything. The worst thing is getting feedback or advice that would have made a problem much easier when it's far too late to change anything and the extra effort has already been made.

 

Improved Test Setup and Error Reporting


Core to re-usability it is vitally important that :

  • Tests should be easy to run.
  • Test results should be easy to interpret.
  • Test results should be deterministic.
Additionally it is also important that :
  • Tests execute fast.
  • Test results should be reported incrementally and as quickly as possible.
  • Test results can be automatically grouped by failure mode ('triage').
  • A subset of tests can easily be run.
  • New tests can be written quickly and easily.
I implicitly include the execution of any framework or compilation (of the testbench or a test) with the term 'test' here.

The motivation is for increased engineer productivity. It is vital that at any time an individual can assess the quality of their work and get the results, quickly, clearly and reliably. From running a single test to a full regression suite it must be quick and easy to run a test. Both for a design engineer changing the design code to a verification engineer writing a new test it is highly advantageous to have a performant test bench. More performance yields more iterations and less dead time waiting for tests and regressions to complete.

It is subsequently important that this can be repeated automatically by a continuous integration server at each commit and every night. Ultimately it is also desirable that the final sign-off run completes using as little CPU resource as possible so it's results are available in the shortest elapsed time, this in turn frees resource for other purposes - be they more tests or back end work, so they finish quicker too.

Making tests easy to run could involve the use of a single command that compiles the test bench and design code and then runs a number of tests as specified by the command.

Much exisiting pass/fail reporting that I have seen seems to centre around grep -i error. There is much scope for losing tests when no files are created, missing error messages (combined stdout/stderr, inserted newlines) or interpreting non errors as real errors and then follows the head scratching as to what happened. This can be most costly at the end of a project when running large regression sets, as it can be impossible to get hundreds or thousands of tests to run reliably. It is often possible to paper over spurious errors by attempting to rerun tests but pathological cases (e.g. a real, serious regression) will just eat resources instead. Lost tests and false negatives are bad - false positives are even worse.

We want to be able to examine test results as they finish, and not have to wait for a final summarise phase at the end of the tests because we want to be able to act on the results immediately, e.g. if there is a catastrophic error. I believe we also want to be able to see the entire team's test results too so we don't have to disturb an engineer to ask what the latest results are. These requirements make a web browser an excellent choice to view the results, and enable us to make the reports more interactive too.

Running a test case will usually mean invoking a script. This script will do the following :
  1. Prepocessing. Compiling, assembling, file munging. Hex dumping to generate files suitable for e.g. loadmemh or equivalent. Preparation of arguments, e.g. +plusarg=value.
  2. Simulation Invocation. Most likely in a subshell of the test case script. The hex files may be unpacked by test bench verilog. The test bench may also attempt to parse the +plusarg=value arguments. Many test benches will use a unified logger - it may be in a C++ foreign model or similar, but might be in verilog or VHDL. It will provide methods to emit INFO, WARNING, ERROR, etc. and keep a tally which is displayed at the end. The standard output of this process will be captured for postprocessing and e.g. memory may be dumped into hex files.
  3. Postprocessing. Probably a continuation of the same script as stage 1, but may not be. The captured standard output/error of the simulation will be parsed. Hex files examined. Results munged. PASS/FAIL printed.
There are advantages to integrating all three of these into one :
  1. Reuse that logging API. Warnings and errors in preprocessing scripts will be treated consistently with those in the simulation, instead of different things happening (in the absolute worst case, slightly different things). No more regular expression parsing is required of the textual output to determine pass/fail status. Less scope exists to lose a test as the number of processes and log files are reduced.
  2. Removing script type work from the test bench - argument parsing for example. Use the right language for the problem, argument parsing is best done by a scripting language.
  3. Simplification - stop munging data into an intermediate format so it can be easily processed by the simulation language. The scripting language now has the ability to poke about into the simulation data structures so can put the required values straight into the required registers instead of via plusargs or hex dump or both.
  4. Recompile less often. By gluing together APIs in a scripting language a code change will not always necessitate a recompile if only interpreted code is changed.
It has always been possible to do something like this in any of ncsim, vcs or msim. But to do so would mean using a simulator specific HDL API (forcing, breakpointing etc.) which would increase vendor lock in. Additionally TCL may not be the best choice of scripting language, and the logging would still need to be imported across the TCL and HDL.

I want to explore how this might be done and to demonstrate how to integrate Python into a Verilog simulator and show how this approach can be used across multiple simulators (including Verilator) to achieve consistent results with all of them. And then go on to show how the logging library's API can be used to dump results to database from which we can generate web based reports, including regression results and regression triage.