Saturday, 28 December 2013

Using a Relational Database for Functional Coverage Collection Part Five

Web Based Coverage Data Visualisation

As I've stated previously the web browser is the new GUI, and has been for some time. So forget Tk, Qt or wxWidgets - in order to visualise the collected coverage data we'll need to display it in a web browser. We'll extend the existing web interface to annotate the test listing tables with coverage details and so display coverage data alongside the log of the test that created it. If a test has no data there will be no extra coverage tab. Also added is an option to show whether each listed test has coverage, and if so is it goal or hit data (see the coloured coverage column and how the 'show coverage' checkbox is checked).

Test list table with coverage column shown

As the coverage data itself is potentially large we want to avoid recalculating it as much as possible to reduce load on the database server. Computing the cumulative coverage is potentially costly in CPU time and we do not want to continually recompute this as the user clicks through different coverage points. To reduce the number of times the cumulative coverage is requested we just compute it all and then send it in its entirety to the browser as JSON, and then let the browser render the appropriate parts to HTML when required as the user clicks through the displayed page. It does mean that if the page is lost, by navigating away or losing the browser, the data is lost and will need to be recalculated when the page is reloaded, a more serious application might use something like memcached. This is one reason why it is useful to run the optimizer immediately at the end of a regression run with coverage, as it can store the computed coverage result to the regression root, which requires far less effort to retrieve as it effectively caches the cumulative coverage.

The browser can also perform some post processing such as individual axis coverage (collating coverage by axis) and axis flattening (removing an axis from the coverage and recalculating the hits for remaining axes). Modern browsers are well capable of storing this much data and processing the work load as long as the client machine isn't ancient (in which case you should be using something more contemporary anyway - see here for why). Unfortunately it does mean having to use JavaScript for this code (it is possible to use Coffee Script, Dart or a toolchain targetting asm.js but I've gone for bare metal in this example).

Whenever we display the log of an invocation we'll check to see if there's any coverage associated with it (via an Ajax call and the exact same Ajax call that created the coloured coverage columns in the browser image above). If there is then we'll wrap the log in another set of tabs and display the coverage alongside. We do seem to be generating a lot of tabs, but I think it's a fairly natural way to browse this kind of data and most users are comfortable with the idea due to the tabs' universal use within desktop browsers.

Coverage tab showing selected coverpoint and menu

The coverage tab is then divided into two panes, one on the left with a hierarchical view of the coverage points and one on the right into which the coverpoint coverage tables are rendered. Clicking on a coverpoint in the hierarchical view causes it to be rendered in the coverpoint table pane. A single click on an axis column header will select or deselect that column. Clicking and holding will pop up a menu containing several options (all with tooltips), those of note include :
  • Hide : The selected column(s) will be collapsed and their hits redistributed. This enables axis coverage from the hide others option, this can help looking for systematic holes across multiple columns. For example in a 3 axis coverpoint it may not be obvious that a particular combination of values of columns 1 and 2 are not hit when crossed with the third column, so removing that third column will make it obvious.
  • Unhide all : Reset the table to its original state with all columns showing.
  • Hide Hits | Illegals | Don't Cares : Collapse rows that are hit, illegal or marked as don't care, thus reducing the table rows to the more interesting ones.
  • Matrix  : This option is greyed out unless only two columns are displayed. It converts the table so that one axis is displayed across the top and the other down the right hand side. This is useful for looking for patterns in coverage values.
When an axis is hidden multiple buckets belonging to the enumerations of that axis are aggregated into new buckets that are combinations of the remaining axes. This screenshot shows which original buckets indices have been coalesced into the new bucket.

Buckets merged when axis hidden
Coverpoints with two axes may be viewed as a matrix. Goal is available as a label. Bucket coverage is also available for cumulative coverage.
Two axis coverpoint shown in matrix format

For regressions an extra tab containing the cumulative coverage will be produced, such that the coverage of each child test will be summed and the total cumulative result presented in a pane. This will be different to the coverage in the regression log pane as that is the result of the post regression optimization as it will contain the minimum of the goal and total sum of all the hits, and not the sum of all of the tests as the cumulative result will. Also available in the cumulative coverage tables is individual bucket coverage. A summary can be obtained by simply hovering over the hits cell, this currently defaults to showing just the ten biggest hits.

Showing individual bucket coverage popup from cumulative table

Clicking will convert the popup into a dialog box containing a table of all the hits with additional information such as test name. Clicking on a table row will open a new tab in the regression tab set containing the test log and individual coverage.

Bucket coverage dialog box

Interactive coverage analysis

To summarise - there are several features here to aid in the analysis of the recorded coverage.

  1. Axis hiding. We can reduce the number of axes the coverpoint has and recalculate the effective coverage across the remaining ones, with multiple buckets coalesced into a new one. This can make spotting a pattern of e.g. a missed combination easier.
  2. Matrix view. If there are only two axis the coverage data can be presented as a table with the two axes horizontally and vertically. It is a good way to view any patterns that may be present in the data. You can of course take a three or more axis coverpoint and reduce it to two to produce a matrix.
  3. Individual bucket coverage. Cumulative coverage hits are decomposed into individual test hits, useful for seeing where the coverage has come from. Note that this still works for the matrix view and after one or more axis has been hidden. In this latter case the coverage is given for the summed coverage of the individual buckets that have been coalesced to create the new bucket.

There are a myriad of other ways to look at the coverage data to gain important insights. Having the flexibility to create your own views is a big win over proprietary closed systems.

Note that there is also non interactive analysis available with profiling and optimizing - the subject of the next post.

Wednesday, 18 December 2013

Using a Relational Database for Functional Coverage Collection Part Four

Coverage Collection using Python

We will require a method to extract the functional coverage from our DUV. As one of our target simulators will be Verilator this will limit us to the subset of SystemVerilog that it understands and this does not include SystemVerilog coverage constructs. We would be handicapped by using these anyway as not all paid-for event driven simulators have open APIs into these that could be used to extract the data and the SystemVerilog standard does not define such an API. However UCIS has come along recently which does define a standard API that can be used to extract the coverage, but I don't yet know how well supported this is at the current time. But if we wish to use Verilator we'll need another way as it doesn't support any SystemVerilog coverage constructs.

We will use a simple Python based approach that builds upon the SWIG'ed VPI library I described in a previous blog post. As I have said multiple times before executing too much interpreted script will result in significant simulation slowdown, and that is exactly what happens when using this. However it suffices as an example. I believe that a suitable library could be constructed in C++ and integrated into both Verilog and Python through the use of SWIG, so we could declare our coverage points and the instrumentation in Verilog through the DPI. Alternatively we could keep just the instrumentation in Verilog and declare the coverage points with XML, YAML or even Python but this has the disadvantage that the declaration and instrumentation are not in the same place. Then we can use the interpreted language to upload the results to the database or perform whatever post processing is required, utilising the power of the scripting language when the task is not significant in terms of processing requirement or frequency of execution.

However in this example it is all in Python. The code is contained in python/coverage.py, an example of usage can be found in test/test_coverage_vpi.py and is shown below.

  class coverpoint_sig(coverage.coverpoint) :
    'bits toggle'
    def __init__(self, signal, name=None, parent=None) :
      self.size   = signal.size
      self.bit    = self.add_axis('bit', values=range(0, self.size))
      self.sense  = self.add_axis('sense', true=1, false=0)
      self.fmt    = self.add_axis('fmt',
        vpiBinStr=0, vpiOctStr=1, vpiHexStr=2, vpiDecStr=3)
      coverage.coverpoint.__init__(self, name, parent=parent)

    def define(self, bucket) :
      'set goal'
      # no dont cares or illegals
      bucket.default(goal=10)

Each new coverpoint inherits from the coverpoint class. The class documentation (here 'bits toggle') also serves as the default description of the coverpoint. We add the axes, here showing how enumerations may be added (note that there is currently no mechanism to group values within enumerations, sorry). We can also optionally pass a model parameter so that a context for the coverpoint can be saved (not shown in this example) so we could also create a method that remembered the scope and could be called to perform the coverage collection if required. However in this example test we do this in another callback.

The define method allows the goal of each bucket to be defined as well as illegal and dont_care boolean flags. It is executed once for each bucket at initialisation and the bucket value enumerations are passed in as part of the bucket argument object as this variation of the define method shows.

  def define(self, bucket) :
    'An example of more involved goal setting'
    bucket.default(goal=10, dont_care=bucket.axis.fmt=='vpiBinStr')
    # bit will be an integer
    if bucket.axis.fmt == 'vpiDecStr' : 
      bucket.goal += bucket.axis.bit * 10
    # sense will be a string value
    if bucket.axis.sense == 'true' :
      bucket.goal += 1

The following snippet shows the creation of a hierarchical node to contain some coverage and an instantiation of the above coverpoint within that scope. We also create a cursor to use when collecting coverage. The cursor is a stateful object which allows us to set different dimensions at different times and the cursor will remember the dimension values.

    self.container = coverage.hierarchy(scope.fullname, 'Scope')
    self.cvr_pt0   =
      coverpoint_sig(self.scope.sig0, name='sig0', self.container)
    self.cursor0   = self.cvr_pt0.cursor()

The cursor has two methods, incr() which will increment the hit count of the bucket defined by the cursor state and hit() which increments the hit count to one if there have not been any hits to date. This allows us to define two types of coverage behaviour, with incr() a test invocation may hit a bucket any number of times whereas with hit() a test can only contribute one hit to any bucket. The incr() method also optionally takes an argument of the number of hits to increment by, the default being 1. Any number of cursors can be created each with its own internal state. Here we see the coverage being collected

  self.cursor0(fmt=sig0.__class__.__name__)
  for i in self.cvr_pt0.bit.get_values() :
    self.cursor0(bit=i, sense='true' if sig0[i] else 'false').incr()

First the type of the format is set, and then each bit of the vector is crossed with the sense of that bit and the bucket incremented. Of course this doesn't need to happen at the same time, with multiple cursors it's possible to set axis values at different times/cycles.

The value set on each axis is available as an attribute on a cursor, for example

  if self.cursor0.fmt == 'vpiDecStr' :
    # do something

This allows the cursor to be used to store state of previous axis values for conditional processing of subsequent axes, e.g.

  # use value1 if axis0 and axis1 have same value, else use value2
  if cursor.axis0 == cursor.axis1 :
    cursor(axis2=value1)
  else :
    cursor(axis2=value2)

Executing Coverage Collection

This is achieved by using a Python VPI callback from the verilog library. We can execute some Python when a particular event occurs within the simulation, be that a clock edge or other event. We can use a cursor to remember states between these events before a final event causes the bucket to be hit, this can be useful when tracing over multiple events by creating a cursor for each initial event and handing it on so that subsequent events can do more processing on it.

Coverage collection is not limited to a simulation environment either, it will also run in pure python - see test_coverage_capacity.py or test_coverage_multiple.py neither of which operate in a simulation environment. This can be useful for prototyping outside a simulation environment if the coverpoint declarations are split from the simulation specific trigger code.

It would also be possible to collect coverage post mortem from a VCD dump, given a suitable Python VCD API. In a similar vein one could collect coverage post mortem on custom transactions created during simulation, perhaps by a C++ library. A Python script could then walk over these transactions if the transaction libraries API was SWIG'ed and made available to Python. Although these methods might be slower than some alternatives they would not require any license during execution as they are now running out of the simulation environment.

Conclusion

Using a scripting language is an interesting possibility for collecting functional coverage, and can be summarised in the following points

  • Dynamic, no recompilation necessary - faster turn around between edits.
  • Expressive, great support for concise coding plus dynamic typing. Good library availability including regular expressions for text fields which may be more awkward and less forgiving to use in verilog or C++.
  • Post mortem collection options.
  • Slower execution compared to a compiled language.

Next we will look at how to visualize this coverage data in a web browser.

Tuesday, 17 December 2013

Using a Relational Database for Functional Coverage Collection Part Three

Pushing Data In

This time we are collecting two different classes of instrumentation.
  1. Test status from generated messages as before, and
  2. Functional coverage from coverage generating instrumentation.
The mechanics of collecting the coverage from the DUV via some Python instrumentation is dealt with in the next post. But once the test has been executed and coverage calculated we need to store this into the database.
To store the coverpoint information we store three different types of data.
  1. The coverpoints themselves in their hierarchy, and with axis/dimension information including enumerations.
  2. The individual bucket goal information. The number of target hits expected bucketwise.
  3. The individual bucket hits from each simulation.
The schema for these were described in the previous post.
We use a class (database.insert.sql) to walk over the coverpoint hierarchy and create the necessary table entries to allow the hierarchy to be recreated later. This data is normally very, very much smaller than the coverage bin data and only needs to be stored once for the master invocation which also stores the individual bucket goals.
The individual bucket data (goal or hit) is collected by iterating over each bucket in each coverage point and then inserting these values into the relevant table. The code for this is contained in the database.insert class, the sub class sql for the coverpoints and the insert class itself for goal or hit data which is differentiated by use of the parameter reference (true for goal and false for hits).

Using MySQL

In this example we are limited by the sqlite API and as Python is being used the sqlite3 module. With this we can only use an INSERT as there is no other method for bulk insert, but even here there is scope for multiple methods delivering wildly different performance. We will try and keep the number of INSERTs to a minimum, but nothing beyond this.
Multiple methods exist when using MySQL. Insert rates will vary wildly depending on insert method and your choice of database engine and its configuration parameters.
  • mysqlimport or LOAD LOCAL FILE INTO - touted as being fastest insert method. You don't need to dump to an actual file - you can use a pipe in linux.
  • SQL INSERT. It is best to use as few commands as possible, grouping together multiple rows in a single statement. Use INSERT DELAYED if concurrent coverage queries are running and your engine choice supports it.
  • handlersocket. A nosql plugin that allows a very direct connection to the storage engine. Included by default in recent Mariadb (since 5.3) and Percona distributions. It may only work with a subset of engines, however.
If you have a large number of client simulations attempting to insert coverage data you will need to tune your installation to prevent everything grinding to a halt as all the data is processed and written to disk. Whilst the capabilities of the underlying hardware are important, the biggest gains can be made using the right insert method as described above and configuring your MySQL server correctly. For example
Unfortunately getting this right can take time and experience.

Pulling Data Out

We have two applications that query coverage data from the database
  • A web server
  • Command line optimizer
Here we will use SQL queries via the Python sqlite3 module again to get the data we require in a suitable format for further processing. 
For the web server we use a thin layer of Python around the SQL queries to generate the data in JSON format suitable for further processing within a web browser in Javascript. For example the bkt class, which queries individual or aggregated bucket coverage, is just a single query that returns the data in a JSON structure. Indeed the web application only generates JSON data dynamically, but does serve other formats statically (e.g. the JavaScript and index HTML page).
For the optimizer everything is done in Python, and is here. It takes arguments consisting of
  • regression ids
  • log ids
  • log ids in an xml format
or any mixture thereof. It first collates a list of individual test log_ids and finds the coverage master and its MD5 sums for each individual test. Should multiple MD5 sums exist it will cowardly fail as it can't merge the coverage. Methods are provided to add tests one by one and record any increase in coverage. Subclasses allow the tests to be added in different order: cvgOrderedOptimize, posOrderedOptimize and randOrderedOptimize. The result is a list of tests that achieve the headline coverage of all the tests, and may be a subset of the input test list, with those tests which do not add any headline coverage being filtered out. More details are contained in a following post.
Both web server and optimizer use database.cvg to recreate the coverpoint hierarchy and annotate coverage data. The query in the hierarchy class method fetches all the coverpoint information and then a groupby function is used to find the coverpoints and the recursive build function pieces the hierarchy back together. single and cumulative classes are available to extract coverage on a single leaf test or a whole regression.

Using MySQL

Configuration is also important to get good performance from queries, e.g. ensuring that buffers are correctly sized. But the first optimization should be to make use of the explain command. It is one thing to get a query returning the correct data, the performance of that query is another. I am not advocating premature optimization but it is vitally important to be able to use the explain command and understand its output if you wish your coverage queries to complete in non geological time. With large datasets it is imperative that the correct indices exist and are correctly used by queries to return data in a reasonable time. It is quite possible to get 10-100x difference in execution time with a poorly constructed query. Slow queries will become all too apparent when using the web interface but you can also look at the slow query log. Either way use explain on these queries to ensure that the right indices are being used so that the query results can be returned as quickly as possible reducing server load and user waiting times (which will become a vicious circle as more client sessions are opened and more queries started whilst engineers wait for the slow ones to finish).

Next Post

The next post looks at creating the data to push in.

Friday, 13 December 2013

Using a Relational Database for Functional Coverage Collection Part Two

Schema for Functional Coverage

Selection criteria include
  1. Performance. I have an expectation of rapid query results. I don't want to have to wait. I don't want other engineers having to wait. To a lesser extent I want the upload to be fast too, I don't want the simulation to have to wait [to finish, so the next one can start].
  2. Compact. I will conceivably create a large amount of data. I wish to keep it all for later analysis, at the project level and possibly beyond.
  3. Easy to understand. If I do use an Object Relational Mapper I still want to be able to hand code queries in the event that generated ones aren't fast enough.
  4. Ease of merging. Across regressions and also when part of a coverage point has changed. If a coverpoint has changed it is unlikely we can still merge the coverage of that point with any meaning, but we can merge the unchanged portions of the data set.
As we are building upon the previous blog series we will start with the schema used there for describing simulation runs. As a starting point we will reuse the log_id field to reference the simulation invocation.

We must store three pieces of information.
  1. The hierarchy and description of the coverage points, including the axis/dimensions of each point and the enumerations of each axis/dimension.
  2. The goal or target of each bucket defined. We can also encode whether any particular bucket cannot be hit or is uninteresting, or where it would be illegal to do so.
  3. The actual hits from each invocation. These are tallied to provide the merged coverage.
We will look at each in turn.

Hierarchy and Description of the Coverage Points

The first item could be done in a variety of ways. It does not necessarily require database tables either, the information is relatively small and compact so a file could be dropped somewhere. It does need to be persistent else the coverage data stored is meaningless if it can't be serialized to coverage points. In a similar vein it could stored as a blob in the database, for example in JSON format. Using JSON would make it very easy to serve the data to a browser but perhaps harder to post process that data with another scripting language.
We could encode the coverpoint tree in tables. There is plenty of material on the web detailing how to encode a tree in a RDMS - The Adjacency List Model is just one of them detailed here. Note that we'll not be updating the tree so don't need to use a method that allows the tree to be edited. For this schema we will use three tables
  1. One for the points themselves. Name, description, parent, MD5 hashes for congruency checking.
  2. One for the axes/dimensions of the cross. Parent coverpoint and name. Multiple dimensions per point.
  3. One for the enumerations of the dimensions. Text enumeration and integer value. Multiple enumerations per dimension.
To regenerate coverage point hierarchy and axes/dimensions with enumerations we can use the following query.

  SELECT * FROM point
    LEFT OUTER JOIN axis USING (point_id)
    LEFT OUTER JOIN enum USING (axis_id)
    WHERE log_id = @master;

If we then collate by point, axis and enum we can then recreate the hierarchy.
We could choose a schema where points are not bound to any invocation, but for this example I'm going to use one where the log_id of a master invocation is used to reference the goal data, so this column is added to the point table. This schema won't naturally merge runs of different coverage as they'll not share bucket identifiers, but it could be made to work rather like merging edited coverage but both are beyond the scope of these articles. You can still merge identical coverage points that are congruent but not having the same master id.
The point table will create a hierarchy in the same way as the log table, using a log_id and a parent column. This means every node can have optionally have coverage or not, children or not.
It will also use separate tables for the dimensions and their individual enumerations. We also add MD5 sums for congruency checking. In the event that a coverage merge is attempted on different goal data (because they're different, perhaps because of an edit) we can refuse to do it or invoke a special merging algorithm that can cope (but not implemented here*). We have three MD5 sums
  1. md5_self for congruency checking of name and description
  2. md5_axes for congruency checking of axis and enumeration data
  3. md5_goal for congruency checking of goal data 
If all sums are identical then the coverage points and goals are exactly the same. If the goal data has changed the merge is still straightforward. If the axes have changed but number of buckets hasn't it's also straightforward but may produce strange results if the axis information is completely different. Of course there is always the case when previous coverage collection was just plain wrong and the hit data wrong or meaningless so we wouldn't want that data in that case. The MD5 hashes are there to provide information on the validity of the coverage merge.

Goal & Hits of each Bucket

There are two very different ways of collating the hit counters. The most simple approach would be to have a combined goal and hit table that was updated by each coverage run, essentially keeping a running tally of hits. The upload of coverage data becomes an SQL UPDATE command, but may require a temporary table to hold the imported coverage data from the test run prior to the update which is dropped after. This effectively merges the coverage on the fly and the RDMS acts to queue the update requests and keep the table coherent.
The second method is to append all coverage data from each and every run to a table and collate the total cumulative coverage on demand, or at the end of testing when no more coverage is to be generated.
Obviously the second method will create much, much larger tables (of course, we wouldn't write any buckets with zero hits, we just miss them out) and adds a potentially costly post processing step. However it does preserve valuable information that can be used
  • To generate the coverage of the individual test.
    • For inspection.
    • In combination with pass/fail statistics to determine any correlation between coverage and test failure.
  • To determine which tests hit which buckets.
    • Useful as an indicator of where to focus to improve coverage.
    • To locate examples of particular behaviour. (Which tests do exactly this? I have found this useful on numerous occasions.)
  • To determine the subset of tests that would need to be run to achieve the maximum coverage. 
    • Ultimately some tests will not be contributing to the headline coverage because they are only hitting buckets that other tests have already hit, and so are not exercising any previously untested features. We may wish not to run these tests to reduce the overall test set size and execution time. Or we may choose to run these first to get an early indication that coverage is still as expected.
When using separate goal and hit tables we can merge the coverage of a regression and write it to the hit table and create a a cached copy against the log of the root invocation. This can save the processing time associated with subsequent merges, and there's no reason this could not be overwritten later.

I want to consider the use of the following schema for the goal and hit tables. They both require a log_id key and both require a goal/hit column, but we can index a bucket using 1 or 2 columns.
  1. Use a single column that has a unique bucket index in the whole coverage. It is harder to work back to a point from the index (you have to subtract an offset) but that is not an issue as the normal working mode is the other way around - from a point to a bucket which is much easier to query.
  2. Use two columns. One to index the bucket in a point, one to index the coverage point. Here (point_id, bucket_id) is unique. This makes merging changed coverage easier as the nth bucket in the nth point is the same even if another point has lost or gained some buckets. The downside is that this uses more space to store the data.

Selection for this example

I want to keep the example as simple as possible but also as instructive as possible. So we'll use
  1. The three table model for coverage points, encoding hierarchy with a simple parent id hierarchy model. Note that this doesn't necessarily fit well with SystemVerilog coverage points as there can only be 1 cross per hierarchy container and each cross is the cross of all the dimensions declared in the hierarchy container.
  2. Single column model for goal and hit tables. We have MD5 sums to warn that coverage declarations may have changed and I'm going to use the single column schema in this example as I'm not a fan of merging mismatching coverage. I'd rather spend time ensuring that it is easier and quicker to rerun a whole load of coverage simulations than try to fudge together mismatching data.
I personally am not frightened about table size. 1TB of coverage data is a massive amount, yet still fits on a contemporary consumer grade SATAIII SSD of modest cost (to an enterprise if not to an individual). A consumer grade SSD will suffice as our schema is append only so we are only writing more data. So even if the storage media is three level flash with very low write endurance we are not continually deleting and rewriting to the SSD, just appending.
How much coverage data will be generated depends on the size of your coverage points and the number of tests you run with coverage enabled. More buckets and more tests will mean more data, obviously. This can be partially mitigated by gradually turning off coverage points as they are hit. We can periodically check the coverage status and turn off any hit points. We can move older coverage data to cheaper mass storage such as HDDs or in the limit delete the individual test coverage data if we have stored the cumulative coverage.

Using MySQL

The example will again use sqlite due to the ultra simple database set up. Whilst I haven't done any performance evaluations I strongly suspect that PostgreSQLMySQL or one of the forks will perform substantially better than sqlite under higher loading with multiple clients. I have no doubt that using sqlite for large regressions will most likely yield very poor performance.
If using a non bleeding edge MySQL version you have a choice of choice of database engines, basically between InnoDB and MyISAM (or XtraDB and Aria depending on the MySQL fork).
For this type of application the transactional features that InnoDB/XtraDB support are not required. If some data is lost it's not a big problem, it's probably the cause of the loss that is going to be the problem. If the server loses power or crashes that's bad and is going to cause issues, regardless of the state of the tables when the server is rebooted. In fact we need to disable most of the transactional features in order to increase baseline performance to acceptable levels when using InnoDB/XtraDB. Also take note of the hardware requirements, especially regarding local disk. As mentioned above consumer grade SSDs are viable in this space as the schema is append only and we are not continually overwriting existing data. But do remember that PCIe flash will yield the best results and is getting cheaper all the time, and is becoming available in a 2.5" form factor too. To justify the extra expense simply integrate engineer waiting time multiplied by their salary and it will become blindingly obvious that the 1000's of $s price tag is more than worthwhile.
It is also interesting to note that the schema selected is append only. We don't update any of the coverage data once written, we only ever add new data to the end of the table. This negates any advantage that the row level locking InnoDB/XtraDB provides. This should also work to our advantage if using SSDs for database storage.
I would advocate using the MyISAM/Aria engine for this reason. What matters in this application is insert speed, and then query times. As long as the data is not frequently corrupted (in my experience never - but that's not to say it won't happen eventually) then we don't need the transactional and data integrity features of InnoDB/XtraDB.
Note that you don't have to use the same engine for all the tables, but I haven't experimented with this or generated any performance data.
If, on the other hand, you prefer to use a more recent version of MariaDB and are not frightened to custom install or build from source then MariaDB 10.0.6 has the option of using TokuDB. This claims improved performance over the traditional engines, notably insert times as it uses a different indexing algorithm. I have no experience of this engine as yet so am unable to comment further.
It is also possible to use SSD and disk. Disable binary logging if this is no replication or write it to HDD instead of SSD. Don't forget you'll still need a backup strategy, which will be complicated by having very large tables.
The first schema, which uses on-the-fly coverage aggregation, will be best be served by tables using the MEMORY engine as this is the fastest database engine as long as there is sufficient memory to host the entire table.

Next

We will examine data IO and the database.

* Merging non congruent coverage sets could be implemented by first grouping together individual test runs into congruent groups and then merging coverage groupwise. Overload the __add__ operator on the hierarchy class to only merge congruent coverpoints (or use the newer goal value on coverpoints whose md5_goal is different) and insert non matching coverpoints to leave two copies in the final hierarchy. The final step is to sum() the different groups to yield the merged coverage.

Wednesday, 11 December 2013

Using a Relational Database for Functional Coverage Collection Part One

Motivation

Why would one want to use a relational database in silicon verification? My previous blog posts advocated the use of a database in recording the status of tests. This next series of posts advocates using a RDMS to store and collate the results of functional coverage instrumentation. Most of the reasons are the same or similar
  • Asynchronous upload from multiple client simulations. The database will ensure data is stored quickly and coherently with minimal effort required from the client.
  • Asynchronous read during uploads. We can look (through the web interface or otherwise) at partial results on the fly whilst the test set is still running and uploading data. This also allows a periodic check to take place that "turns off" coverage instrumentation of points that have already hit their target metrics reducing execution time and keeping stored data to a minimum (if either are an issue).
  • Performance. Using a RDMS will outperform a home brew solution. I have been continually surprised at the speed of MySQL queries, even with large coverage data sets. You will not be able to write anything faster yourself in an acceptable time frame (the units may be in lifetimes).
  • Merging coverage is not a tree of file merges that a naive file system based approach would be. Don't worry about how the SQL is executed, let the tool optimize any query (caveat - you still need to use explain and understand its output). We can merge coverage dynamically part way through. The web interface can also give bucket coverage in real time too (i.e. which tests hit this combination of dimensions?).
  • Profiling and optimization of test set - what tests hit which buckets? Which types of test should I run more of to increase coverage? And which tests should I not bother running? If we store the data simulation-wise we can mine out this sort of information to better optimize the test set and reduce run time.
I personally am also interested in using highly scalable simulators such as Verilator when running with functional coverage. It is highly scalable because I am not limited by the number I can run in parallel. I can run as many simulations as I have CPUs. This is not true for a paid-for simulator, unless I am very canny or rich and can negotiate a very large number of licenses. But I also want this infrastructure to run on a paid-for simulator too to demonstrate that an event driven, four state simulator does exactly the same thing as Verilator.
So although paid-for simulators may ship with similar utilities I cannot use these with Verilator, nor can I add custom reports or export the coverage data for further analysis.
Also I may also still want to use Verilator if it is faster than an event driven simulator, it is always an advantage for me if I can simulate quicker be it a single test or a whole regression set. Additionally cross simulator infrastructure will also allow easy porting of the code from one simulator to another, reducing your vendor lock in.

Why a RDMS?

The NoSQL movement has produced a plethora of non relational databases that offer the same advantages listed above. They often claim to have superior performance to e.g. MySQL. Whilst this may be true for "big data" and "web scale" applications I think that a RDMS should still perform suitably for applications on the silicon project scale. I hope to have a look at some NoSQL databases in the near future and evaluate their performance versus RDMS in this application space, but in the meantime I'm best versed in RDMS and SQL so will use this for the purposes of this example.
Moving data to and from the database is only one part of this example, and it is also possible to abstract this activity so it is conceivable we could produce code that could optionally use any type of storage back end.
See also SQLAlchemy for Python, which can be used to abstract the database layer making the same code usable with SQLite, Postgresql, MySQL and others. I haven't used it here though and I can't vouch for the performance of the generated queries versus hand coded queries.

Outline of post series

So how do we achieve our aim of storing functional coverage to a relational database? We'll need to design a database schema to hold the coverage data which will be the subject of the next post. But we'll also require a methodology to extract functional coverage from a DUV. Here I'll show an example building on the VPI abstraction from the previous blog posts, although the implementation will knowlingly suffer from performance issues and fail to address the requirements of all coverage queries, it will suffice as an example. Once we can upload meaningful data to the database we can view it with a web based coverage viewing tool, which will present coverage data in a browser and allow some interactive visualisation. Following on from this is a profiling and optimising tool to give feedback on which tests did what coverage-wise.

Example Code

As with the previous blog post series I will release the code in a working example, this time as an addition to the original code. The sample code is available in verilog integration at github.

The example requires boost and python development libraries to be installed and you'll need a simulator for most of the test examples (but not all - see later). Verilator is the primary target, where you'll need 3.851 or later (because it has the required VPI support, although 3.851 is still missing some vpi functionality that will effect at least one of the functional coverage tests). However everything should run fine on Icarus (indeed part of the motivation here is to make everything run the same on all simulation platforms). Set VERILATOR_ROOT and IVERILOG_ROOT in the environment or in test/make.inc.
If you're just interested in running multiple coverage runs and then optimizing the result, you can do this without a simulator. See coverage.readme, no simulator required.

  % git clone https://github.com/rporter/verilog_integration
  % cd verilog_integration
  % cat README
  % make -C test
  % test/regress

Please also note that this is proof of concept code. It's not meant to be used in anger as it has not been tested for completeness, correctness or scalability. It does also have a number of shortcomings as presented, including the performance of the functional coverage collection code. It does however show what can be done, and how.
The next post will describe the database schema.

Monday, 22 July 2013

Test Bench Control Part Eight

Summary

Putting it all together


The github repository contains a complete example that is available to pull, fork, examine and run. You can also peruse the source code online.
There are several prerequisites. You will need either verilator or icarus verilog installed (preferably both), as well as the boost development libraries (format and function at least).

Incomplete Example


This is an example - an experiment really. No specification or architecture as I advocated, just a first, exploratory, system. Amongst the missing features would be
  • Show older tests. A big omission as it currently stands. All tests are resident in the database but the UI doesn't give access currently. Need to add navigation buttons to see older log entries.
  • Allow filtering of all tests. Need to add UI function to filter by user/block.
  • More mining (history of test failures, what/who causes breaks, source repository version, growth of test suite versus ...)
  • Filtering messages by ident, scope, filename. To separate out messages from a particular source e.g. a monitor. This might be particularly useful to view the output from two monitors with interesting interaction.
  • Arbitrary Attributes on messages (retaining rich semantics of original message context) to allow python style string formatting i.e. %(attribute_name)s. Also allows arbitrary data to be stored that can be easily retrieved when used in conjunction with a message identifier. I have used this elsewhere to successfully store functional coverage information of hierarchy, dimensions and enumerations.
  • Functional Coverage. Hopefully I will post more on this in future.
  • Accounting for tests expected to fail (test_fail.py, I always include a test that just passes and one that just fails in any regression set to test mechanisms and test driven development deems that these be the first two to write!)
  • Build in profiling and performance regression testing infrastructure.
  • Recoding database commit code in C++ for speed.
  • Extensions using e.g. MySQL and MongoDB.
  • Examination of other uses including integrating synthesis and P&R to record results as part of a regression. (Yes I think running synthesis and P&R is a useful addition to function regression testing).
  • Block level dashboard - regression,  coverage, synthesis and layout view of block summarised in one place showing trends and current status.
I hope someone, somewhere has found these posts useful.

Thursday, 27 June 2013

Test Bench Control Part Seven

Data Presentation

The web browser became the de facto GUI platform a number of years ago. It has the performance and features required, augmented by a rich set of 3rd party libraries to aid implementation.
When running a regression we have shown how the results of individual simulations can be asynchronously uploaded to the database as they run, and now we can read out the results in real time in another process - also asynchronously.
A huge amount of work has gone into web technologies, many of which are available as open source. Most of this tooling is of high quality and with good documentation. The web also provides a large resource of tutorials, examples and solutions to previously asked queries.
We can pick up some of these tools and build a light weight web server to display the results of simulations and regressions. We will do this in Python (again - for the same reasons as before), using bottle as a WSGI enabled server. As with other parts of this example this is not necessarily optimized for scale as here it serves static as well as dynamic content and in a single thread (check out e.g. uWSGI for more parallel workloads which can be simply layered on top of bottle), but it suits very well here because it is simple and easy to get started with.

Technologies


In this example we use the following tools and libraries
The layout of the pertinent parts of the example repository
  • www/ - root of web server
Data is mainly served in JSON format. Previous experience has shown me that this is a useful alternative to serving pre-rendered HTML as we can reuse the JSON data to build interactive pages that do not require further data requests from the server. Note how the tree browser and table views are generated in the regression hierarchy viewer of the example using the initially requested JSON of the entire regression. Additionally it is trivial to serialize SQL query results to JSON. Modern browser performance is more than sufficient to render JSON to HTML and store potentially large data structures in the client. If however you wish to serve pre-rendered HTML be sure to use a server side templating engine instead of emitting ad hoc HTML with print or write commands.

Starting the Server


The micro web server can be executed from the command line thus

  % www/report

and serves on localhost:8080 by default (check out the options with -h for details on how to change).
Point your browser at localhost:8080 to view the dashboard landing page.


How it Works


www/report.py uses bottle.py in its capacity as a micro web framework. Requests come in from a client browser and are served. Bottle decorators are used to associate URLs with python functions, e.g. static files in www/static are through routed.

  @bottle.get('/static/:filename#.*#')
  def server_static(filename):
    return bottle.static_file(filename, root=static)

We route the other URLs using the bottle.route function, but not as a decorator as is the mode usually shown in the documentation. Instead we pass in a function that creates an instance of a handler class and executes the GET method with any arguments passed to it. This allows us to define the URLs to be served as a list of paths and associated classes.

  urls = (
    ('/index/:variant', index,),
    ('/msgs/:log_id', msgs,),
    ('/rgr/:log_id', rgr,),
  )

  for path, cls in urls:
    def serve(_cls) : 
      def fn(**args) : 
        return _cls().GET(**args)
      return fn
    bottle.route(path, name='route_'+cls.__name__)(serve(cls))

The classes that handle the requests are simple wrappers around library functions that execute the database queries themselves. Serialisation to JSON is done by the Python json library. We ensure by design that the query result is serialisable to JSON and that's it on the server side, rendering to HTML is done by javascript in the client browser using the JSON data.
The query containing classes are declared in database.py and make heavy use of itertools style groupby. However I've borrowed some code from the Python documentation and updated it to provide custom factory functions for the objects returned. The motivation is to be able to perform an SQL JOIN where there may be multiple matches in the second table and so the columns from the first table are repeated in these rows. The groupby function allows this to be returned as a data structure of e.g. a list of firstly the repeated columns just once, and then secondly a list of the multiple matches. The objects returned are a class that inherits from dict but with an attribute getter based upon the column names returned from the query. So object.log_id will return the log_id column if it exists - this is superior to object[0] or object['log_id'].
For example the following SELECT result

log_idmsg_idmsg
110alpha
111beta
220gamma
221delta
330epsilon
331zeta

When grouped by log_id it could return thus

 [
  {log_id:1}, msgs:[{msg_id:10, msg:alpha}, {msg_id:11, msg:beta}],
  {log_id:2}, msgs:[{msg_id:20, msg:gamma}, {msg_id:21, msg:delta}],
  {log_id:3}, msgs:[{msg_id:30, msg:epsilon}, {msg_id:31, msg:zeta}]
 ]

Such that result[0].msgs[0].msg is alpha or result[-1].msgs[-1].msg_id is 31.
From here mapping to JSON is easy and a table can be readily created dynamically in the client with some simple javascript, plus we retain this data structure in the browser for creating other abstractions of the data. For example in the log tab we have a dynamic verbosity adjusting slider and also a severity browser created by the same JSON structure that created the message listing itself.

All the rendering to HTML is done in the client browser. As already mentioned I find that the data can be reused client side so we don't need to keep querying the database, use a cache (memcached) or session management middleware.
Cross browser compatability is provided by JQuery, although I developed it using Chromium (so try that first if anything seems to be broken).
Javascipt can be an interesting learning exercise for those with time. It is a functional language with functions as first class objects, so there is lots of passing of function references. Scoping is interesting too, I seem to find myself using closures alot.
The code is here - I'm not going to go through it in any detail.

Layout


The initial state of the dashboard is three tabs. These each contain a table with respectively
  1. all the log invocations
  2. all regression log invocations
  3. all singleton log invocations (those without children)
Clicking on row will open a new tab
  • If the log invocation was a singleton a new tab will be opened with the log file
  • if the log invocation was a regression a new tab will be opened containing an embedded set of two tabs,
    • The first containing two panes
      • A tree like hierarchy browser
      • A table of log invocations relating the activated tree branch in the hierarchy browser
    • The second contains the log file of the top level regression log invocation

The log files are colourized to highlight the severity of each message and each message also has a tooltip associated with it that provides further details of the message (identifier if applicable, full date and time, filename and line number).


As part of this demonstration the logs are presented with some controls that hover in the top right of the window. These allow the conditional display of the emit time of the message, and the identifier of the message if any is given. They also allow the message verbosity to be altered by changing the severity threshold of displayed messages. Additionally there is a message index that allows the first few instances of each message severity to be easily located - reducing the time required to find the first error message in the log file. It can even be useful when the log file is trivially short as the message is highlighted when the mouse is run over the severity entry in the hierarchy browser.

Further Functionality


The given code is just a demonstration and there is much more useful functionality that could be easily be added.
  • Test result history : mine the database for all previous test occurrences and their status
  • Regression triage : group failing tests by failure mode (error message, source filename, line)
  • Regression history : graph regression status history. Filter by user, scheduled regression, continuous integration runs.
  • Block orientated dashboard : A collection of graphs with click throughs detailing information pertaining to a particular block. Think regressions, coverage, synthesis, layout.

Command Line


We may also want to get the same data in text format from the command line, especially at the termination of a regression command. We can reuse the libraries from the web presentation to gather the information before serializing it to text.

  % db/summary 668
  (        NOTE) [668, PASS] should be success
  ( INFORMATION) 1 test, 1 pass, 0 fail

We could also generate trees of sub-nodes by recursively selecting the children of logs until no more were added, or generate the whole tree by determining the root node and then cutting the required sub tree out. This is left as an exercise for the reader.
It would also be possible to request the JSON from a running web server. We could allow a client to request data in a particular format with the Accept HTTP header, e.g. Accept : text/XML

  % wget --header "Accept: text/XML" \
      localhost:8080/rgr/530 -O 530.xml

But many libraries are available to use JSON, so this is also left as an exercise for the reader.

Friday, 14 June 2013

Test Bench Control Part Six

Storing Simulation Results in a Database

This has been in use in some Digital Verification flows for quite some time. Why would you want to do it? It turns out that database applications can be very fast, reliable and asynchronous. Much faster and more reliable that writing your own code that is e.g. layered on top of a file system. The reliability comes from advanced locking mechanisms that ensure no data is lost as many clients attempt to concurrently read and write. Compare this to mangled, interleaved and partially buffered output streams from multiple processes and races corrupting files and pipes in a home grown regression framework. We can reduce the framework interaction to read and write through the database's API and let the database application schedule writing and keep the data free of corruption. There is no need for any local locking mechanisms to prevent file corruption. Advantages include
  • Concurrent, fast and asynchronous upload, management of conflicts via locks (data not lost or corrupted)
  • Concurrent and coherent reading
  • Indexing for fast retrieval (much faster than walking over a directory structure and examining files)
  • Archiving of results for later mining (trend analysis)
We can couple this with another process serving out the results via HTTP (see next post), to give us a feature rich GUI in a browser to analyse the results in real time, as each simulation executes. For those stuck in the past we can also produce a text summary and emails. The latter along with perhaps a text message can also be useful in the notification of a serious regression committed to the repository.
However there is one large barrier to using this technology and that is understanding how databases work and learning how to use them.

Which database?


Is probably the hardest question. There are a plethora of them, but for this application they can be subdivided into two categories
  1. Traditional SQL based, MySQL (MariaDBPerconaDrizzle), PostgreSQLsqlite, firebird
  2. Newer, more fashionable NoSQL, mongoDB, cassandra, voldemort, couchDB


SQL


Of the traditional SQL databases listed sqlite is the odd one out as it is an embedded database that does not require a separate daemon process like the others. We will use it in this example as it has a low overhead and is easy to use. Most importantly it requires zero effort to set up. However it's performance in scaling to a large number of clients (10's to 100's) is unknown to me but it may still perform well in this use case.
The others require more effort in set up and hardware. Ideally they should have dedicated local disk resource (performance will be terrible if NAS storage is used). They also require tuning of configuration parameters for best results, for example this type of application tends to be append only (we don't update or delete rows) and we can be fairly relaxed about what happens in a catastrophic failure (we don't care if we loose some data if the machine dies). See below for some pointers in the case of MySQL and forks, perhaps there is scope for yet another blog post on this subject.

NoSQL


The NoSQL databases all have similar requirements, to each other and the conventional SQL databases - a daemon process and fast local disk for best results. They require less configuration, but you will still need to learn how to use them. I have not done any performance comparison against SQL databases so don't know how they compare when inserting and querying. Elsewhere I have mentioned functional coverage - I have an inkling that an SQL type database will be better suited to this, so in the event of otherwise similar performance the SQL type may win out.

Schema Design


Not so much of an issue with NoSQL (otherwise known as schema-free) databases, but you'll still need to index them. But in an SQL database we'll need to carefully design the tables and index them.
Having said that the tables in this simple example are, well, simple. We have a table for each invocation (log) which may or may not be a simulation. There is a field in this table to add a description of what it may be, could be regression, simulation, synthesis or anything else. Also who (uid) is running it and where it is being hosted, and also some parent information to allow us to put a tree of activities (e.g. a regression) back together.
We also add some pertinent indexes on this table to allow us to search it more efficiently.
We have a second table, message, that holds individual messages. Each has a field keyed to a unique log table entry, and this is how we put logs back together again by joining the tables on the log_id field.

Updating the database


In order to develop this example in the shortest possible time a Python callback is placed in the message emit callback list which enqueues each emitted message to be committed to the database. Even though this runs in a separate thread it's probably not the best implementation performance wise. A C++ version would make more sense. However being able to use Python demonstrates the flexibility in this approach to facilitate speedy prototyping (and the rapid turnaround in iterations that using an interpreted language allows).
The Python library is here.
When the messaging instance is created an entry is made in the log table, and the key id of this row is stored for use in all subsequent message table entries.
Each message is placed in a queue and the queue is flushed and committed to the database at each tick of a timer. The commit action is placed in a separate thread so that the process runs asynchronously to the simulation, thus the simulation does not normally stop when messages are sent to the database. However if a message of high enough severity is encountered the threads are synchronized and the message immediately committed, halting simulation. This is an attempt to prevent this message being lost in any subsequent program crash - at least that message will have been captured. The severity is programmable so this behaviour can be changed.
There is also a programmable filter function that can be used so that messages only of set severities or some other function of message parameters are committed.
Each commit to the database is an insert, with the message severity, text, time, filename and line number together with any associated identifier if given (e.g. IDENT-1). The motivation with the identifier is to make it easier to search for specific messages across multiple simulations when mining some specific data or grouping for triage. For example, to find all Python PYTN-0 messages :

  sqlite> select * from log natural join message 
    where message.ident = 'PYTN' and message.subident = '0';

This was one of the original goals; we can easily find messages of a particular type without resorting to regular expression searching a number of files. And not just of a particular type, but from a file, line or of a particular severity.

Results


When simulations are run using this environment all interesting log messages are logged to a database. We can then retrieve these messages post mortem to examine status from test to regression.
The next post will look at how we can do this in a web browser.

Installing and running MySQL


A brief note on getting a database up and running.
  • Which flavour? MySQL, MariaDB or Percona? I'd probably lean toward Maria & Aria engine for this type of application, perhaps Percona if I wanted to use XtraDB (note it's also available in Maria). I'd pick the latest non alpha version - if you're building from source (which I would also recommend) it's easy to have multiple versions installed anyway.
  • It is possible to have multiple instances of multiple versions running on the same machine. Building from source seems to be straightforward and this makes it easy to have multiple versions installed. You'll need to place each instance on a separate port, of course.
  • You do not need to run as mysqladmin user, most compute farms are fire-walled off the internet, so we do not need to worry about hackers. Whilst you may want to consider running your production server under a special user login (and add some basic table permissions to perhaps prevent row deletion/alteration), it is possible to prototype under your own login. If you're changing configuration parameters you'll be taking the daemon up and down frequently.
  • Hardware is important. You will need fast local storage (10k+ SAS, SSD or PCIe flash) and plenty of memory for best results. Don't use a networked drive at all costs. A recent processor will help too, avoid using hand me downs if this is going to be critical to you.
  • You will gain most by tuning your configuration, especially if you choose to use InnoDB/XtraDB. Most recent releases have been set to use this by default, so if you're not using MyISAM or Aria make sure that the daemon configuration file has the following set :
    • innodb_flush_log_at_trx_commit is 2
    • innodb_flush_method is O_DIRECT
    • innodb_buffer_pool_size is sized to the installed memory (70% of available RAM)
    • innodb_log_file_size is small, or send the logs to disks (if using SSDs) or turn off binary logging altogether with binlog_ignore_db, so as not to fill the primary (expensive?) storage.
    • transaction-isolation is READ-UNCOMMITTED
  • If you are using MyISAM or Aria 
    • key_buffer_size to 20% of available RAM.
    • innodb_buffer_pool_size is 0
Finally using something like mysqltuner can give you some feedback on how well your configuration is working.

Functional Coverage


This post has only covered storing simulation messages and results. It is also possible to store coverage data with many of same advantages. Storing data on a per simulation basis and merging it when required yields good results when a performant schema is used. Merging coverage data from many thousands of simulations each with thousands of buckets can be done surprisingly quickly (in the order of seconds) and individual buckets ranked by test even quicker. In fact it is possible to generate which tests hit which buckets data in real time as a mouse is rolled over a table.
Perhaps this is for another blog post series.