Thursday 30 January 2014

Using a Relational Database for Functional Coverage Collection Part Seven

Conclusion

These posts and the corresponding code has shown how to code, extract, view and analyze functional coverage using a relation database as a repository for that coverage. It has also demonstrated the use of some open source tools in silicon verification.
  • Verilator, Icarus Verilog
  • Python & libraries (sqlite3, bottle.py)
  • Sqlite
  • JQuery & plug-ins
Yes, most of those aren't silicon design specific. In fact most are for the creation of the web based single page application for simulation data visualization. One of the biggest current challenges in silicon functional verification is managing the sheer volume of data produced and then meaningfully making sense of that data. Not only for the individual engineer that has produced some of that data, but for the development team as a whole. One aspect of the web based approach is to make the data available and meaningful to the whole team, through a common interface - the web browser. Using a web interface is the best way to ensure that everyone has access to the data (but it's up to you to provide data in a format that is relevant to each class of visitor).
The advantages of this approach include
  • Cross simulator. We are not bound to any vendor specific infrastructure so we can move to any simulator without massive upheaval. It also allows us to use a cycle based verilator approach in addition to a four state event driven simulator, or assertion results instead of functional coverage from a simulation.
  • Open format. There is nothing hidden. Anything can read or write to the database. There is no vendor lockout of any pathway.
  • Extensible. Due to the open nature we can add whatever we want on top.
  • Mining. With the history of all activity kept - all blocks and all users - we can mine that data to gain insights into how the project is progressing. We can examine the trends of error rates and coverage to determine if something is likely to hit its metrics on time or not.
  • Custom reports. As the format is open we can mine all the data to produce block and project specific reports, textually or graphically.
  • Centralised database gives access for all.
    • Dissemination. Information available for all to see across all blocks, all engineers and all managers.
    • Dashboard block-wise & project-wise. This can be customized to the level of detail that any viewer may wish to see. A manager may want less detail than the front end or verification engineer, the CEO would want even less.

Unified Coverage Interoperability Standard (UCIS)

UCIS is an open, cross format coverage API from Accellera. Tools that support it can export their coverage, e.g. from from your Verilog simulator into a custom application written by you or somebody else. As the data format is no longer proprietary and there is a standard, documented API to use you have full access to your generated coverage data. This would allow you to import coverage data generated by such a tool into a database similar to the one presented here.
However so far it appears the EDA companies databases' remain read only, so we cannot import third party results into them. There remains a requirement for an aggregating application (such as this example) until these databases are opened up for writing.

Wednesday 15 January 2014

Using a Relational Database for Functional Coverage Collection Part Six

Test Set Profiling and Optimisation

Test Set Profiling

This is the analysis of the coverage data to determine which classes of tests hit which points/buckets. The first naive method of trying to increase coverage is to simply run more of these tests until the cumulative coverage ceases to improve. It also gives feedback on whether the expected test class hits the type of coverage expected.
Some of this feedback can be done interactively within the web GUI using e.g. bucket coverage details. The next level would be analysis of partially hit buckets to suggest which tests to run more of - most likely similar to the ones that already hit the buckets.
Unhit buckets are harder, tests which hit neighbours may not be able to reach the particular bucket or it may be impossible to hit the bucket (in which case it should be marked as such). But simply running more of the tests which have hit other buckets in the coverpoint in 'soak' style may eventually yield some hits with little engineering effort. Having the optimization mechanism described below makes it easy to discard the soaks which did not contribute to new coverage. It must be noted here how vitally important it is to have test generators that do not routinely produce erroneous stimulus that cause tests to fail. In this case the soak will produce errors that will have to be triaged and debugged and so any coverage produced is not without engineering effort.
Knowing which tests hit which buckets can also be useful to locate tests that exercise a particular behaviour for e.g. waveform viewing of a particular behaviour.
No tool is provided in the example code for profiling beyond the interactive bucket coverage. However an interesting visualisation might be a 'heat map', particularly useful in the matrix view. Individual buckets could be coloured and shaded according to which type of test hit them either in combination or individually.

Test Set Optimisation

If we're going to run a set of tests again as a regression we may wish to optimize the test set to hit the highest coverage possible with the smallest number of tests in order to reduce compute requirements. Or at least run that subset first hoping to discover issues as quickly as possible.

An example sequence of commands is given in coverage.readme, note that the regression ids given will be different if you repeat this sequence of commands. First a baseline set of tests are run that do not close coverage. We then run some more and optimize the combined coverage. It's still not 100% so we run even more and then optimize again. This time we're at 100% and are given a list of tests that in combination hit 100%. Finally we rerun with just this subset to check this subset really does reach 100% coverage.

  cd test
  # execute base line regression
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+1 +master=1 +instances=20 +children=12
  # add some more tests with different seed
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+2 +master=1 +instances=20 +children=14
  # see where that got us
  ./optimize -r 1484 -r 1497
  # and a few more to hopefully close coverage
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +tst_seed+3 +master=1 +instances=20 +children=18
  # cherry pick 'best' runs using default order
  ./optimize -r 1484 -r 1497 -r 1515
  # replay the selected seeds to check we're still at 100%
  ./coverage test_coverage_multiple.py +cvr_seed+0xdead \
    +master=1 +instances=20 +test_xml=optimize_1484.xml
  # rerun using multiple iterations, the second and third 
  # random orders may reduce the set size slightly.
  ./optimize -r 1484 -r 1497 -r 1515 \
    --order cvg \
    --order rand \
    --order rand

The problem of finding the minimum number of tests that together provide 100% is a variant of the Set Cover Problem, which is NP hard. This means there is no cunning way to arrive at the answer without having to do all of the computation involved - which will grow at an exponential rate as the dataset increases. For a large number of tests with significant coverage this is not going to be practical as we'd have to wait too long for the output.

Fortunately we don't need the absolute smallest optimized set, something approaching it will do. The method currently used to create the optimized test set is very simple, although there are several variants. The core of each is to add tests one by one accumulating coverage into a table each time. When the coverage is 100% we stop. We then have a list of tests and how much coverage each contributed. We can omit those which did not add any coverage to reduce the list to a minimum number of tests required to hit 100%. This may be not the minimum, but normally will be less than all the tests we started with (sometimes it will be significantly less). Even if the coverage is not 100% we can still eliminate some tests that do not contribute to the overall coverage.

The variants introduce the tests in differing orders :

  • pos uses the given order (order of execution in the first case if a regression id is given)
  • rand uses a random order
  • cvg orders by absolute coverage
  • incr starts by first adding the test with the highest absolute coverage, but then recomputes each outstanding tests incremental coverage and adds the test with the next highest. It is probably not suitable for large test sets with significant coverage due to computational requirements and it may produce only a marginally better test set than cvg, but take far longer to complete.

The profile command allows multiple passes so something like --order cvg --order rand can sometimes deliver a minor improvement by changing the starting point on the second pass and eliminating a few more tests. The incr option is an implementation of the greedy algorithm solution for the set cover problem described in the literature, it adds the test with the highest incremental coverage (the most uncovered buckets) each time. This requires recomputing the incremental coverage at each iteration which can be expensive but it will get quicker as the number of hit buckets increases and the number of outstanding tests decreases; but it is still computationally intensive. A hybrid approach that starts like cvg but switches to incr at a predefined point may still yield better than cvg results at a much lower compute cost. Use the --threshold argument to specify a coverage percentage at which to switch from cvg to incr when the incr order is specified.

  # use "incr" ordering; but only switch from "cvg" at 95%
  ./optimize -r 1484 -r 1497 -r 1515 --order incr --threshold 95

Creating a Robust Test Set

We can attempt to offset subsequent functional changes, bug fixes, system instabilities losing tests or other factors perturbing the coverage by over-provisioning each bucket beyond the goal. This would make the test set robust to losing a previously contributing test at the expense of making the optimization process more involved and slower than it currently is, but could be desirable in some circumstances. Possible mechanisms include :

  • Attempt to hit double the goal. As any test can only contribute a maximum of hits equal to the goal at least two tests will be required. The obvious pathological case is where each test only contributes one hit - we will effectively double the number of tests required in this case.
  • If we were to store the maximum number of hits from any of the contributing tests with each bucket and then attempt to hit the goal plus this maximum we would be immune to the loss of any one of the tests.

This functionality is now available in the example, use the --robust switch to enable it. It is of the second type and works best with two passes where the first pass is a simple algorithm that can be used to scope out the value of the maximum hits from any test per bucket. Use something like

  # build a coverage optimized set robust to losing a test
  ./optimize -r 36,595 \
    --order rand \
    --order incr \
    --robust \
    --threshold 99

It has been observed that adding several rand orders at the end can crunch the number of tests down by a few with little compute overhead.

The code can be found in the optimize.py wrapper for the classes at the end of the database.py module. All the optimization variants inherit from the optimize base class, but changing the order of test introduction.


Saturday 28 December 2013

Using a Relational Database for Functional Coverage Collection Part Five

Web Based Coverage Data Visualisation

As I've stated previously the web browser is the new GUI, and has been for some time. So forget Tk, Qt or wxWidgets - in order to visualise the collected coverage data we'll need to display it in a web browser. We'll extend the existing web interface to annotate the test listing tables with coverage details and so display coverage data alongside the log of the test that created it. If a test has no data there will be no extra coverage tab. Also added is an option to show whether each listed test has coverage, and if so is it goal or hit data (see the coloured coverage column and how the 'show coverage' checkbox is checked).

Test list table with coverage column shown

As the coverage data itself is potentially large we want to avoid recalculating it as much as possible to reduce load on the database server. Computing the cumulative coverage is potentially costly in CPU time and we do not want to continually recompute this as the user clicks through different coverage points. To reduce the number of times the cumulative coverage is requested we just compute it all and then send it in its entirety to the browser as JSON, and then let the browser render the appropriate parts to HTML when required as the user clicks through the displayed page. It does mean that if the page is lost, by navigating away or losing the browser, the data is lost and will need to be recalculated when the page is reloaded, a more serious application might use something like memcached. This is one reason why it is useful to run the optimizer immediately at the end of a regression run with coverage, as it can store the computed coverage result to the regression root, which requires far less effort to retrieve as it effectively caches the cumulative coverage.

The browser can also perform some post processing such as individual axis coverage (collating coverage by axis) and axis flattening (removing an axis from the coverage and recalculating the hits for remaining axes). Modern browsers are well capable of storing this much data and processing the work load as long as the client machine isn't ancient (in which case you should be using something more contemporary anyway - see here for why). Unfortunately it does mean having to use JavaScript for this code (it is possible to use Coffee Script, Dart or a toolchain targetting asm.js but I've gone for bare metal in this example).

Whenever we display the log of an invocation we'll check to see if there's any coverage associated with it (via an Ajax call and the exact same Ajax call that created the coloured coverage columns in the browser image above). If there is then we'll wrap the log in another set of tabs and display the coverage alongside. We do seem to be generating a lot of tabs, but I think it's a fairly natural way to browse this kind of data and most users are comfortable with the idea due to the tabs' universal use within desktop browsers.

Coverage tab showing selected coverpoint and menu

The coverage tab is then divided into two panes, one on the left with a hierarchical view of the coverage points and one on the right into which the coverpoint coverage tables are rendered. Clicking on a coverpoint in the hierarchical view causes it to be rendered in the coverpoint table pane. A single click on an axis column header will select or deselect that column. Clicking and holding will pop up a menu containing several options (all with tooltips), those of note include :
  • Hide : The selected column(s) will be collapsed and their hits redistributed. This enables axis coverage from the hide others option, this can help looking for systematic holes across multiple columns. For example in a 3 axis coverpoint it may not be obvious that a particular combination of values of columns 1 and 2 are not hit when crossed with the third column, so removing that third column will make it obvious.
  • Unhide all : Reset the table to its original state with all columns showing.
  • Hide Hits | Illegals | Don't Cares : Collapse rows that are hit, illegal or marked as don't care, thus reducing the table rows to the more interesting ones.
  • Matrix  : This option is greyed out unless only two columns are displayed. It converts the table so that one axis is displayed across the top and the other down the right hand side. This is useful for looking for patterns in coverage values.
When an axis is hidden multiple buckets belonging to the enumerations of that axis are aggregated into new buckets that are combinations of the remaining axes. This screenshot shows which original buckets indices have been coalesced into the new bucket.

Buckets merged when axis hidden
Coverpoints with two axes may be viewed as a matrix. Goal is available as a label. Bucket coverage is also available for cumulative coverage.
Two axis coverpoint shown in matrix format

For regressions an extra tab containing the cumulative coverage will be produced, such that the coverage of each child test will be summed and the total cumulative result presented in a pane. This will be different to the coverage in the regression log pane as that is the result of the post regression optimization as it will contain the minimum of the goal and total sum of all the hits, and not the sum of all of the tests as the cumulative result will. Also available in the cumulative coverage tables is individual bucket coverage. A summary can be obtained by simply hovering over the hits cell, this currently defaults to showing just the ten biggest hits.

Showing individual bucket coverage popup from cumulative table

Clicking will convert the popup into a dialog box containing a table of all the hits with additional information such as test name. Clicking on a table row will open a new tab in the regression tab set containing the test log and individual coverage.

Bucket coverage dialog box

Interactive coverage analysis

To summarise - there are several features here to aid in the analysis of the recorded coverage.

  1. Axis hiding. We can reduce the number of axes the coverpoint has and recalculate the effective coverage across the remaining ones, with multiple buckets coalesced into a new one. This can make spotting a pattern of e.g. a missed combination easier.
  2. Matrix view. If there are only two axis the coverage data can be presented as a table with the two axes horizontally and vertically. It is a good way to view any patterns that may be present in the data. You can of course take a three or more axis coverpoint and reduce it to two to produce a matrix.
  3. Individual bucket coverage. Cumulative coverage hits are decomposed into individual test hits, useful for seeing where the coverage has come from. Note that this still works for the matrix view and after one or more axis has been hidden. In this latter case the coverage is given for the summed coverage of the individual buckets that have been coalesced to create the new bucket.

There are a myriad of other ways to look at the coverage data to gain important insights. Having the flexibility to create your own views is a big win over proprietary closed systems.

Note that there is also non interactive analysis available with profiling and optimizing - the subject of the next post.

Wednesday 18 December 2013

Using a Relational Database for Functional Coverage Collection Part Four

Coverage Collection using Python

We will require a method to extract the functional coverage from our DUV. As one of our target simulators will be Verilator this will limit us to the subset of SystemVerilog that it understands and this does not include SystemVerilog coverage constructs. We would be handicapped by using these anyway as not all paid-for event driven simulators have open APIs into these that could be used to extract the data and the SystemVerilog standard does not define such an API. However UCIS has come along recently which does define a standard API that can be used to extract the coverage, but I don't yet know how well supported this is at the current time. But if we wish to use Verilator we'll need another way as it doesn't support any SystemVerilog coverage constructs.

We will use a simple Python based approach that builds upon the SWIG'ed VPI library I described in a previous blog post. As I have said multiple times before executing too much interpreted script will result in significant simulation slowdown, and that is exactly what happens when using this. However it suffices as an example. I believe that a suitable library could be constructed in C++ and integrated into both Verilog and Python through the use of SWIG, so we could declare our coverage points and the instrumentation in Verilog through the DPI. Alternatively we could keep just the instrumentation in Verilog and declare the coverage points with XML, YAML or even Python but this has the disadvantage that the declaration and instrumentation are not in the same place. Then we can use the interpreted language to upload the results to the database or perform whatever post processing is required, utilising the power of the scripting language when the task is not significant in terms of processing requirement or frequency of execution.

However in this example it is all in Python. The code is contained in python/coverage.py, an example of usage can be found in test/test_coverage_vpi.py and is shown below.

  class coverpoint_sig(coverage.coverpoint) :
    'bits toggle'
    def __init__(self, signal, name=None, parent=None) :
      self.size   = signal.size
      self.bit    = self.add_axis('bit', values=range(0, self.size))
      self.sense  = self.add_axis('sense', true=1, false=0)
      self.fmt    = self.add_axis('fmt',
        vpiBinStr=0, vpiOctStr=1, vpiHexStr=2, vpiDecStr=3)
      coverage.coverpoint.__init__(self, name, parent=parent)

    def define(self, bucket) :
      'set goal'
      # no dont cares or illegals
      bucket.default(goal=10)

Each new coverpoint inherits from the coverpoint class. The class documentation (here 'bits toggle') also serves as the default description of the coverpoint. We add the axes, here showing how enumerations may be added (note that there is currently no mechanism to group values within enumerations, sorry). We can also optionally pass a model parameter so that a context for the coverpoint can be saved (not shown in this example) so we could also create a method that remembered the scope and could be called to perform the coverage collection if required. However in this example test we do this in another callback.

The define method allows the goal of each bucket to be defined as well as illegal and dont_care boolean flags. It is executed once for each bucket at initialisation and the bucket value enumerations are passed in as part of the bucket argument object as this variation of the define method shows.

  def define(self, bucket) :
    'An example of more involved goal setting'
    bucket.default(goal=10, dont_care=bucket.axis.fmt=='vpiBinStr')
    # bit will be an integer
    if bucket.axis.fmt == 'vpiDecStr' : 
      bucket.goal += bucket.axis.bit * 10
    # sense will be a string value
    if bucket.axis.sense == 'true' :
      bucket.goal += 1

The following snippet shows the creation of a hierarchical node to contain some coverage and an instantiation of the above coverpoint within that scope. We also create a cursor to use when collecting coverage. The cursor is a stateful object which allows us to set different dimensions at different times and the cursor will remember the dimension values.

    self.container = coverage.hierarchy(scope.fullname, 'Scope')
    self.cvr_pt0   =
      coverpoint_sig(self.scope.sig0, name='sig0', self.container)
    self.cursor0   = self.cvr_pt0.cursor()

The cursor has two methods, incr() which will increment the hit count of the bucket defined by the cursor state and hit() which increments the hit count to one if there have not been any hits to date. This allows us to define two types of coverage behaviour, with incr() a test invocation may hit a bucket any number of times whereas with hit() a test can only contribute one hit to any bucket. The incr() method also optionally takes an argument of the number of hits to increment by, the default being 1. Any number of cursors can be created each with its own internal state. Here we see the coverage being collected

  self.cursor0(fmt=sig0.__class__.__name__)
  for i in self.cvr_pt0.bit.get_values() :
    self.cursor0(bit=i, sense='true' if sig0[i] else 'false').incr()

First the type of the format is set, and then each bit of the vector is crossed with the sense of that bit and the bucket incremented. Of course this doesn't need to happen at the same time, with multiple cursors it's possible to set axis values at different times/cycles.

The value set on each axis is available as an attribute on a cursor, for example

  if self.cursor0.fmt == 'vpiDecStr' :
    # do something

This allows the cursor to be used to store state of previous axis values for conditional processing of subsequent axes, e.g.

  # use value1 if axis0 and axis1 have same value, else use value2
  if cursor.axis0 == cursor.axis1 :
    cursor(axis2=value1)
  else :
    cursor(axis2=value2)

Executing Coverage Collection

This is achieved by using a Python VPI callback from the verilog library. We can execute some Python when a particular event occurs within the simulation, be that a clock edge or other event. We can use a cursor to remember states between these events before a final event causes the bucket to be hit, this can be useful when tracing over multiple events by creating a cursor for each initial event and handing it on so that subsequent events can do more processing on it.

Coverage collection is not limited to a simulation environment either, it will also run in pure python - see test_coverage_capacity.py or test_coverage_multiple.py neither of which operate in a simulation environment. This can be useful for prototyping outside a simulation environment if the coverpoint declarations are split from the simulation specific trigger code.

It would also be possible to collect coverage post mortem from a VCD dump, given a suitable Python VCD API. In a similar vein one could collect coverage post mortem on custom transactions created during simulation, perhaps by a C++ library. A Python script could then walk over these transactions if the transaction libraries API was SWIG'ed and made available to Python. Although these methods might be slower than some alternatives they would not require any license during execution as they are now running out of the simulation environment.

Conclusion

Using a scripting language is an interesting possibility for collecting functional coverage, and can be summarised in the following points

  • Dynamic, no recompilation necessary - faster turn around between edits.
  • Expressive, great support for concise coding plus dynamic typing. Good library availability including regular expressions for text fields which may be more awkward and less forgiving to use in verilog or C++.
  • Post mortem collection options.
  • Slower execution compared to a compiled language.

Next we will look at how to visualize this coverage data in a web browser.

Tuesday 17 December 2013

Using a Relational Database for Functional Coverage Collection Part Three

Pushing Data In

This time we are collecting two different classes of instrumentation.
  1. Test status from generated messages as before, and
  2. Functional coverage from coverage generating instrumentation.
The mechanics of collecting the coverage from the DUV via some Python instrumentation is dealt with in the next post. But once the test has been executed and coverage calculated we need to store this into the database.
To store the coverpoint information we store three different types of data.
  1. The coverpoints themselves in their hierarchy, and with axis/dimension information including enumerations.
  2. The individual bucket goal information. The number of target hits expected bucketwise.
  3. The individual bucket hits from each simulation.
The schema for these were described in the previous post.
We use a class (database.insert.sql) to walk over the coverpoint hierarchy and create the necessary table entries to allow the hierarchy to be recreated later. This data is normally very, very much smaller than the coverage bin data and only needs to be stored once for the master invocation which also stores the individual bucket goals.
The individual bucket data (goal or hit) is collected by iterating over each bucket in each coverage point and then inserting these values into the relevant table. The code for this is contained in the database.insert class, the sub class sql for the coverpoints and the insert class itself for goal or hit data which is differentiated by use of the parameter reference (true for goal and false for hits).

Using MySQL

In this example we are limited by the sqlite API and as Python is being used the sqlite3 module. With this we can only use an INSERT as there is no other method for bulk insert, but even here there is scope for multiple methods delivering wildly different performance. We will try and keep the number of INSERTs to a minimum, but nothing beyond this.
Multiple methods exist when using MySQL. Insert rates will vary wildly depending on insert method and your choice of database engine and its configuration parameters.
  • mysqlimport or LOAD LOCAL FILE INTO - touted as being fastest insert method. You don't need to dump to an actual file - you can use a pipe in linux.
  • SQL INSERT. It is best to use as few commands as possible, grouping together multiple rows in a single statement. Use INSERT DELAYED if concurrent coverage queries are running and your engine choice supports it.
  • handlersocket. A nosql plugin that allows a very direct connection to the storage engine. Included by default in recent Mariadb (since 5.3) and Percona distributions. It may only work with a subset of engines, however.
If you have a large number of client simulations attempting to insert coverage data you will need to tune your installation to prevent everything grinding to a halt as all the data is processed and written to disk. Whilst the capabilities of the underlying hardware are important, the biggest gains can be made using the right insert method as described above and configuring your MySQL server correctly. For example
Unfortunately getting this right can take time and experience.

Pulling Data Out

We have two applications that query coverage data from the database
  • A web server
  • Command line optimizer
Here we will use SQL queries via the Python sqlite3 module again to get the data we require in a suitable format for further processing. 
For the web server we use a thin layer of Python around the SQL queries to generate the data in JSON format suitable for further processing within a web browser in Javascript. For example the bkt class, which queries individual or aggregated bucket coverage, is just a single query that returns the data in a JSON structure. Indeed the web application only generates JSON data dynamically, but does serve other formats statically (e.g. the JavaScript and index HTML page).
For the optimizer everything is done in Python, and is here. It takes arguments consisting of
  • regression ids
  • log ids
  • log ids in an xml format
or any mixture thereof. It first collates a list of individual test log_ids and finds the coverage master and its MD5 sums for each individual test. Should multiple MD5 sums exist it will cowardly fail as it can't merge the coverage. Methods are provided to add tests one by one and record any increase in coverage. Subclasses allow the tests to be added in different order: cvgOrderedOptimize, posOrderedOptimize and randOrderedOptimize. The result is a list of tests that achieve the headline coverage of all the tests, and may be a subset of the input test list, with those tests which do not add any headline coverage being filtered out. More details are contained in a following post.
Both web server and optimizer use database.cvg to recreate the coverpoint hierarchy and annotate coverage data. The query in the hierarchy class method fetches all the coverpoint information and then a groupby function is used to find the coverpoints and the recursive build function pieces the hierarchy back together. single and cumulative classes are available to extract coverage on a single leaf test or a whole regression.

Using MySQL

Configuration is also important to get good performance from queries, e.g. ensuring that buffers are correctly sized. But the first optimization should be to make use of the explain command. It is one thing to get a query returning the correct data, the performance of that query is another. I am not advocating premature optimization but it is vitally important to be able to use the explain command and understand its output if you wish your coverage queries to complete in non geological time. With large datasets it is imperative that the correct indices exist and are correctly used by queries to return data in a reasonable time. It is quite possible to get 10-100x difference in execution time with a poorly constructed query. Slow queries will become all too apparent when using the web interface but you can also look at the slow query log. Either way use explain on these queries to ensure that the right indices are being used so that the query results can be returned as quickly as possible reducing server load and user waiting times (which will become a vicious circle as more client sessions are opened and more queries started whilst engineers wait for the slow ones to finish).

Next Post

The next post looks at creating the data to push in.

Friday 13 December 2013

Using a Relational Database for Functional Coverage Collection Part Two

Schema for Functional Coverage

Selection criteria include
  1. Performance. I have an expectation of rapid query results. I don't want to have to wait. I don't want other engineers having to wait. To a lesser extent I want the upload to be fast too, I don't want the simulation to have to wait [to finish, so the next one can start].
  2. Compact. I will conceivably create a large amount of data. I wish to keep it all for later analysis, at the project level and possibly beyond.
  3. Easy to understand. If I do use an Object Relational Mapper I still want to be able to hand code queries in the event that generated ones aren't fast enough.
  4. Ease of merging. Across regressions and also when part of a coverage point has changed. If a coverpoint has changed it is unlikely we can still merge the coverage of that point with any meaning, but we can merge the unchanged portions of the data set.
As we are building upon the previous blog series we will start with the schema used there for describing simulation runs. As a starting point we will reuse the log_id field to reference the simulation invocation.

We must store three pieces of information.
  1. The hierarchy and description of the coverage points, including the axis/dimensions of each point and the enumerations of each axis/dimension.
  2. The goal or target of each bucket defined. We can also encode whether any particular bucket cannot be hit or is uninteresting, or where it would be illegal to do so.
  3. The actual hits from each invocation. These are tallied to provide the merged coverage.
We will look at each in turn.

Hierarchy and Description of the Coverage Points

The first item could be done in a variety of ways. It does not necessarily require database tables either, the information is relatively small and compact so a file could be dropped somewhere. It does need to be persistent else the coverage data stored is meaningless if it can't be serialized to coverage points. In a similar vein it could stored as a blob in the database, for example in JSON format. Using JSON would make it very easy to serve the data to a browser but perhaps harder to post process that data with another scripting language.
We could encode the coverpoint tree in tables. There is plenty of material on the web detailing how to encode a tree in a RDMS - The Adjacency List Model is just one of them detailed here. Note that we'll not be updating the tree so don't need to use a method that allows the tree to be edited. For this schema we will use three tables
  1. One for the points themselves. Name, description, parent, MD5 hashes for congruency checking.
  2. One for the axes/dimensions of the cross. Parent coverpoint and name. Multiple dimensions per point.
  3. One for the enumerations of the dimensions. Text enumeration and integer value. Multiple enumerations per dimension.
To regenerate coverage point hierarchy and axes/dimensions with enumerations we can use the following query.

  SELECT * FROM point
    LEFT OUTER JOIN axis USING (point_id)
    LEFT OUTER JOIN enum USING (axis_id)
    WHERE log_id = @master;

If we then collate by point, axis and enum we can then recreate the hierarchy.
We could choose a schema where points are not bound to any invocation, but for this example I'm going to use one where the log_id of a master invocation is used to reference the goal data, so this column is added to the point table. This schema won't naturally merge runs of different coverage as they'll not share bucket identifiers, but it could be made to work rather like merging edited coverage but both are beyond the scope of these articles. You can still merge identical coverage points that are congruent but not having the same master id.
The point table will create a hierarchy in the same way as the log table, using a log_id and a parent column. This means every node can have optionally have coverage or not, children or not.
It will also use separate tables for the dimensions and their individual enumerations. We also add MD5 sums for congruency checking. In the event that a coverage merge is attempted on different goal data (because they're different, perhaps because of an edit) we can refuse to do it or invoke a special merging algorithm that can cope (but not implemented here*). We have three MD5 sums
  1. md5_self for congruency checking of name and description
  2. md5_axes for congruency checking of axis and enumeration data
  3. md5_goal for congruency checking of goal data 
If all sums are identical then the coverage points and goals are exactly the same. If the goal data has changed the merge is still straightforward. If the axes have changed but number of buckets hasn't it's also straightforward but may produce strange results if the axis information is completely different. Of course there is always the case when previous coverage collection was just plain wrong and the hit data wrong or meaningless so we wouldn't want that data in that case. The MD5 hashes are there to provide information on the validity of the coverage merge.

Goal & Hits of each Bucket

There are two very different ways of collating the hit counters. The most simple approach would be to have a combined goal and hit table that was updated by each coverage run, essentially keeping a running tally of hits. The upload of coverage data becomes an SQL UPDATE command, but may require a temporary table to hold the imported coverage data from the test run prior to the update which is dropped after. This effectively merges the coverage on the fly and the RDMS acts to queue the update requests and keep the table coherent.
The second method is to append all coverage data from each and every run to a table and collate the total cumulative coverage on demand, or at the end of testing when no more coverage is to be generated.
Obviously the second method will create much, much larger tables (of course, we wouldn't write any buckets with zero hits, we just miss them out) and adds a potentially costly post processing step. However it does preserve valuable information that can be used
  • To generate the coverage of the individual test.
    • For inspection.
    • In combination with pass/fail statistics to determine any correlation between coverage and test failure.
  • To determine which tests hit which buckets.
    • Useful as an indicator of where to focus to improve coverage.
    • To locate examples of particular behaviour. (Which tests do exactly this? I have found this useful on numerous occasions.)
  • To determine the subset of tests that would need to be run to achieve the maximum coverage. 
    • Ultimately some tests will not be contributing to the headline coverage because they are only hitting buckets that other tests have already hit, and so are not exercising any previously untested features. We may wish not to run these tests to reduce the overall test set size and execution time. Or we may choose to run these first to get an early indication that coverage is still as expected.
When using separate goal and hit tables we can merge the coverage of a regression and write it to the hit table and create a a cached copy against the log of the root invocation. This can save the processing time associated with subsequent merges, and there's no reason this could not be overwritten later.

I want to consider the use of the following schema for the goal and hit tables. They both require a log_id key and both require a goal/hit column, but we can index a bucket using 1 or 2 columns.
  1. Use a single column that has a unique bucket index in the whole coverage. It is harder to work back to a point from the index (you have to subtract an offset) but that is not an issue as the normal working mode is the other way around - from a point to a bucket which is much easier to query.
  2. Use two columns. One to index the bucket in a point, one to index the coverage point. Here (point_id, bucket_id) is unique. This makes merging changed coverage easier as the nth bucket in the nth point is the same even if another point has lost or gained some buckets. The downside is that this uses more space to store the data.

Selection for this example

I want to keep the example as simple as possible but also as instructive as possible. So we'll use
  1. The three table model for coverage points, encoding hierarchy with a simple parent id hierarchy model. Note that this doesn't necessarily fit well with SystemVerilog coverage points as there can only be 1 cross per hierarchy container and each cross is the cross of all the dimensions declared in the hierarchy container.
  2. Single column model for goal and hit tables. We have MD5 sums to warn that coverage declarations may have changed and I'm going to use the single column schema in this example as I'm not a fan of merging mismatching coverage. I'd rather spend time ensuring that it is easier and quicker to rerun a whole load of coverage simulations than try to fudge together mismatching data.
I personally am not frightened about table size. 1TB of coverage data is a massive amount, yet still fits on a contemporary consumer grade SATAIII SSD of modest cost (to an enterprise if not to an individual). A consumer grade SSD will suffice as our schema is append only so we are only writing more data. So even if the storage media is three level flash with very low write endurance we are not continually deleting and rewriting to the SSD, just appending.
How much coverage data will be generated depends on the size of your coverage points and the number of tests you run with coverage enabled. More buckets and more tests will mean more data, obviously. This can be partially mitigated by gradually turning off coverage points as they are hit. We can periodically check the coverage status and turn off any hit points. We can move older coverage data to cheaper mass storage such as HDDs or in the limit delete the individual test coverage data if we have stored the cumulative coverage.

Using MySQL

The example will again use sqlite due to the ultra simple database set up. Whilst I haven't done any performance evaluations I strongly suspect that PostgreSQLMySQL or one of the forks will perform substantially better than sqlite under higher loading with multiple clients. I have no doubt that using sqlite for large regressions will most likely yield very poor performance.
If using a non bleeding edge MySQL version you have a choice of choice of database engines, basically between InnoDB and MyISAM (or XtraDB and Aria depending on the MySQL fork).
For this type of application the transactional features that InnoDB/XtraDB support are not required. If some data is lost it's not a big problem, it's probably the cause of the loss that is going to be the problem. If the server loses power or crashes that's bad and is going to cause issues, regardless of the state of the tables when the server is rebooted. In fact we need to disable most of the transactional features in order to increase baseline performance to acceptable levels when using InnoDB/XtraDB. Also take note of the hardware requirements, especially regarding local disk. As mentioned above consumer grade SSDs are viable in this space as the schema is append only and we are not continually overwriting existing data. But do remember that PCIe flash will yield the best results and is getting cheaper all the time, and is becoming available in a 2.5" form factor too. To justify the extra expense simply integrate engineer waiting time multiplied by their salary and it will become blindingly obvious that the 1000's of $s price tag is more than worthwhile.
It is also interesting to note that the schema selected is append only. We don't update any of the coverage data once written, we only ever add new data to the end of the table. This negates any advantage that the row level locking InnoDB/XtraDB provides. This should also work to our advantage if using SSDs for database storage.
I would advocate using the MyISAM/Aria engine for this reason. What matters in this application is insert speed, and then query times. As long as the data is not frequently corrupted (in my experience never - but that's not to say it won't happen eventually) then we don't need the transactional and data integrity features of InnoDB/XtraDB.
Note that you don't have to use the same engine for all the tables, but I haven't experimented with this or generated any performance data.
If, on the other hand, you prefer to use a more recent version of MariaDB and are not frightened to custom install or build from source then MariaDB 10.0.6 has the option of using TokuDB. This claims improved performance over the traditional engines, notably insert times as it uses a different indexing algorithm. I have no experience of this engine as yet so am unable to comment further.
It is also possible to use SSD and disk. Disable binary logging if this is no replication or write it to HDD instead of SSD. Don't forget you'll still need a backup strategy, which will be complicated by having very large tables.
The first schema, which uses on-the-fly coverage aggregation, will be best be served by tables using the MEMORY engine as this is the fastest database engine as long as there is sufficient memory to host the entire table.

Next

We will examine data IO and the database.

* Merging non congruent coverage sets could be implemented by first grouping together individual test runs into congruent groups and then merging coverage groupwise. Overload the __add__ operator on the hierarchy class to only merge congruent coverpoints (or use the newer goal value on coverpoints whose md5_goal is different) and insert non matching coverpoints to leave two copies in the final hierarchy. The final step is to sum() the different groups to yield the merged coverage.

Wednesday 11 December 2013

Using a Relational Database for Functional Coverage Collection Part One

Motivation

Why would one want to use a relational database in silicon verification? My previous blog posts advocated the use of a database in recording the status of tests. This next series of posts advocates using a RDMS to store and collate the results of functional coverage instrumentation. Most of the reasons are the same or similar
  • Asynchronous upload from multiple client simulations. The database will ensure data is stored quickly and coherently with minimal effort required from the client.
  • Asynchronous read during uploads. We can look (through the web interface or otherwise) at partial results on the fly whilst the test set is still running and uploading data. This also allows a periodic check to take place that "turns off" coverage instrumentation of points that have already hit their target metrics reducing execution time and keeping stored data to a minimum (if either are an issue).
  • Performance. Using a RDMS will outperform a home brew solution. I have been continually surprised at the speed of MySQL queries, even with large coverage data sets. You will not be able to write anything faster yourself in an acceptable time frame (the units may be in lifetimes).
  • Merging coverage is not a tree of file merges that a naive file system based approach would be. Don't worry about how the SQL is executed, let the tool optimize any query (caveat - you still need to use explain and understand its output). We can merge coverage dynamically part way through. The web interface can also give bucket coverage in real time too (i.e. which tests hit this combination of dimensions?).
  • Profiling and optimization of test set - what tests hit which buckets? Which types of test should I run more of to increase coverage? And which tests should I not bother running? If we store the data simulation-wise we can mine out this sort of information to better optimize the test set and reduce run time.
I personally am also interested in using highly scalable simulators such as Verilator when running with functional coverage. It is highly scalable because I am not limited by the number I can run in parallel. I can run as many simulations as I have CPUs. This is not true for a paid-for simulator, unless I am very canny or rich and can negotiate a very large number of licenses. But I also want this infrastructure to run on a paid-for simulator too to demonstrate that an event driven, four state simulator does exactly the same thing as Verilator.
So although paid-for simulators may ship with similar utilities I cannot use these with Verilator, nor can I add custom reports or export the coverage data for further analysis.
Also I may also still want to use Verilator if it is faster than an event driven simulator, it is always an advantage for me if I can simulate quicker be it a single test or a whole regression set. Additionally cross simulator infrastructure will also allow easy porting of the code from one simulator to another, reducing your vendor lock in.

Why a RDMS?

The NoSQL movement has produced a plethora of non relational databases that offer the same advantages listed above. They often claim to have superior performance to e.g. MySQL. Whilst this may be true for "big data" and "web scale" applications I think that a RDMS should still perform suitably for applications on the silicon project scale. I hope to have a look at some NoSQL databases in the near future and evaluate their performance versus RDMS in this application space, but in the meantime I'm best versed in RDMS and SQL so will use this for the purposes of this example.
Moving data to and from the database is only one part of this example, and it is also possible to abstract this activity so it is conceivable we could produce code that could optionally use any type of storage back end.
See also SQLAlchemy for Python, which can be used to abstract the database layer making the same code usable with SQLite, Postgresql, MySQL and others. I haven't used it here though and I can't vouch for the performance of the generated queries versus hand coded queries.

Outline of post series

So how do we achieve our aim of storing functional coverage to a relational database? We'll need to design a database schema to hold the coverage data which will be the subject of the next post. But we'll also require a methodology to extract functional coverage from a DUV. Here I'll show an example building on the VPI abstraction from the previous blog posts, although the implementation will knowlingly suffer from performance issues and fail to address the requirements of all coverage queries, it will suffice as an example. Once we can upload meaningful data to the database we can view it with a web based coverage viewing tool, which will present coverage data in a browser and allow some interactive visualisation. Following on from this is a profiling and optimising tool to give feedback on which tests did what coverage-wise.

Example Code

As with the previous blog post series I will release the code in a working example, this time as an addition to the original code. The sample code is available in verilog integration at github.

The example requires boost and python development libraries to be installed and you'll need a simulator for most of the test examples (but not all - see later). Verilator is the primary target, where you'll need 3.851 or later (because it has the required VPI support, although 3.851 is still missing some vpi functionality that will effect at least one of the functional coverage tests). However everything should run fine on Icarus (indeed part of the motivation here is to make everything run the same on all simulation platforms). Set VERILATOR_ROOT and IVERILOG_ROOT in the environment or in test/make.inc.
If you're just interested in running multiple coverage runs and then optimizing the result, you can do this without a simulator. See coverage.readme, no simulator required.

  % git clone https://github.com/rporter/verilog_integration
  % cd verilog_integration
  % cat README
  % make -C test
  % test/regress

Please also note that this is proof of concept code. It's not meant to be used in anger as it has not been tested for completeness, correctness or scalability. It does also have a number of shortcomings as presented, including the performance of the functional coverage collection code. It does however show what can be done, and how.
The next post will describe the database schema.