Thursday, 27 June 2013

Test Bench Control Part Seven

Data Presentation

The web browser became the de facto GUI platform a number of years ago. It has the performance and features required, augmented by a rich set of 3rd party libraries to aid implementation.
When running a regression we have shown how the results of individual simulations can be asynchronously uploaded to the database as they run, and now we can read out the results in real time in another process - also asynchronously.
A huge amount of work has gone into web technologies, many of which are available as open source. Most of this tooling is of high quality and with good documentation. The web also provides a large resource of tutorials, examples and solutions to previously asked queries.
We can pick up some of these tools and build a light weight web server to display the results of simulations and regressions. We will do this in Python (again - for the same reasons as before), using bottle as a WSGI enabled server. As with other parts of this example this is not necessarily optimized for scale as here it serves static as well as dynamic content and in a single thread (check out e.g. uWSGI for more parallel workloads which can be simply layered on top of bottle), but it suits very well here because it is simple and easy to get started with.

Technologies


In this example we use the following tools and libraries
The layout of the pertinent parts of the example repository
  • www/ - root of web server
Data is mainly served in JSON format. Previous experience has shown me that this is a useful alternative to serving pre-rendered HTML as we can reuse the JSON data to build interactive pages that do not require further data requests from the server. Note how the tree browser and table views are generated in the regression hierarchy viewer of the example using the initially requested JSON of the entire regression. Additionally it is trivial to serialize SQL query results to JSON. Modern browser performance is more than sufficient to render JSON to HTML and store potentially large data structures in the client. If however you wish to serve pre-rendered HTML be sure to use a server side templating engine instead of emitting ad hoc HTML with print or write commands.

Starting the Server


The micro web server can be executed from the command line thus

  % www/report

and serves on localhost:8080 by default (check out the options with -h for details on how to change).
Point your browser at localhost:8080 to view the dashboard landing page.


How it Works


www/report.py uses bottle.py in its capacity as a micro web framework. Requests come in from a client browser and are served. Bottle decorators are used to associate URLs with python functions, e.g. static files in www/static are through routed.

  @bottle.get('/static/:filename#.*#')
  def server_static(filename):
    return bottle.static_file(filename, root=static)

We route the other URLs using the bottle.route function, but not as a decorator as is the mode usually shown in the documentation. Instead we pass in a function that creates an instance of a handler class and executes the GET method with any arguments passed to it. This allows us to define the URLs to be served as a list of paths and associated classes.

  urls = (
    ('/index/:variant', index,),
    ('/msgs/:log_id', msgs,),
    ('/rgr/:log_id', rgr,),
  )

  for path, cls in urls:
    def serve(_cls) : 
      def fn(**args) : 
        return _cls().GET(**args)
      return fn
    bottle.route(path, name='route_'+cls.__name__)(serve(cls))

The classes that handle the requests are simple wrappers around library functions that execute the database queries themselves. Serialisation to JSON is done by the Python json library. We ensure by design that the query result is serialisable to JSON and that's it on the server side, rendering to HTML is done by javascript in the client browser using the JSON data.
The query containing classes are declared in database.py and make heavy use of itertools style groupby. However I've borrowed some code from the Python documentation and updated it to provide custom factory functions for the objects returned. The motivation is to be able to perform an SQL JOIN where there may be multiple matches in the second table and so the columns from the first table are repeated in these rows. The groupby function allows this to be returned as a data structure of e.g. a list of firstly the repeated columns just once, and then secondly a list of the multiple matches. The objects returned are a class that inherits from dict but with an attribute getter based upon the column names returned from the query. So object.log_id will return the log_id column if it exists - this is superior to object[0] or object['log_id'].
For example the following SELECT result

log_idmsg_idmsg
110alpha
111beta
220gamma
221delta
330epsilon
331zeta

When grouped by log_id it could return thus

 [
  {log_id:1}, msgs:[{msg_id:10, msg:alpha}, {msg_id:11, msg:beta}],
  {log_id:2}, msgs:[{msg_id:20, msg:gamma}, {msg_id:21, msg:delta}],
  {log_id:3}, msgs:[{msg_id:30, msg:epsilon}, {msg_id:31, msg:zeta}]
 ]

Such that result[0].msgs[0].msg is alpha or result[-1].msgs[-1].msg_id is 31.
From here mapping to JSON is easy and a table can be readily created dynamically in the client with some simple javascript, plus we retain this data structure in the browser for creating other abstractions of the data. For example in the log tab we have a dynamic verbosity adjusting slider and also a severity browser created by the same JSON structure that created the message listing itself.

All the rendering to HTML is done in the client browser. As already mentioned I find that the data can be reused client side so we don't need to keep querying the database, use a cache (memcached) or session management middleware.
Cross browser compatability is provided by JQuery, although I developed it using Chromium (so try that first if anything seems to be broken).
Javascipt can be an interesting learning exercise for those with time. It is a functional language with functions as first class objects, so there is lots of passing of function references. Scoping is interesting too, I seem to find myself using closures alot.
The code is here - I'm not going to go through it in any detail.

Layout


The initial state of the dashboard is three tabs. These each contain a table with respectively
  1. all the log invocations
  2. all regression log invocations
  3. all singleton log invocations (those without children)
Clicking on row will open a new tab
  • If the log invocation was a singleton a new tab will be opened with the log file
  • if the log invocation was a regression a new tab will be opened containing an embedded set of two tabs,
    • The first containing two panes
      • A tree like hierarchy browser
      • A table of log invocations relating the activated tree branch in the hierarchy browser
    • The second contains the log file of the top level regression log invocation

The log files are colourized to highlight the severity of each message and each message also has a tooltip associated with it that provides further details of the message (identifier if applicable, full date and time, filename and line number).


As part of this demonstration the logs are presented with some controls that hover in the top right of the window. These allow the conditional display of the emit time of the message, and the identifier of the message if any is given. They also allow the message verbosity to be altered by changing the severity threshold of displayed messages. Additionally there is a message index that allows the first few instances of each message severity to be easily located - reducing the time required to find the first error message in the log file. It can even be useful when the log file is trivially short as the message is highlighted when the mouse is run over the severity entry in the hierarchy browser.

Further Functionality


The given code is just a demonstration and there is much more useful functionality that could be easily be added.
  • Test result history : mine the database for all previous test occurrences and their status
  • Regression triage : group failing tests by failure mode (error message, source filename, line)
  • Regression history : graph regression status history. Filter by user, scheduled regression, continuous integration runs.
  • Block orientated dashboard : A collection of graphs with click throughs detailing information pertaining to a particular block. Think regressions, coverage, synthesis, layout.

Command Line


We may also want to get the same data in text format from the command line, especially at the termination of a regression command. We can reuse the libraries from the web presentation to gather the information before serializing it to text.

  % db/summary 668
  (        NOTE) [668, PASS] should be success
  ( INFORMATION) 1 test, 1 pass, 0 fail

We could also generate trees of sub-nodes by recursively selecting the children of logs until no more were added, or generate the whole tree by determining the root node and then cutting the required sub tree out. This is left as an exercise for the reader.
It would also be possible to request the JSON from a running web server. We could allow a client to request data in a particular format with the Accept HTTP header, e.g. Accept : text/XML

  % wget --header "Accept: text/XML" \
      localhost:8080/rgr/530 -O 530.xml

But many libraries are available to use JSON, so this is also left as an exercise for the reader.

Friday, 14 June 2013

Test Bench Control Part Six

Storing Simulation Results in a Database

This has been in use in some Digital Verification flows for quite some time. Why would you want to do it? It turns out that database applications can be very fast, reliable and asynchronous. Much faster and more reliable that writing your own code that is e.g. layered on top of a file system. The reliability comes from advanced locking mechanisms that ensure no data is lost as many clients attempt to concurrently read and write. Compare this to mangled, interleaved and partially buffered output streams from multiple processes and races corrupting files and pipes in a home grown regression framework. We can reduce the framework interaction to read and write through the database's API and let the database application schedule writing and keep the data free of corruption. There is no need for any local locking mechanisms to prevent file corruption. Advantages include
  • Concurrent, fast and asynchronous upload, management of conflicts via locks (data not lost or corrupted)
  • Concurrent and coherent reading
  • Indexing for fast retrieval (much faster than walking over a directory structure and examining files)
  • Archiving of results for later mining (trend analysis)
We can couple this with another process serving out the results via HTTP (see next post), to give us a feature rich GUI in a browser to analyse the results in real time, as each simulation executes. For those stuck in the past we can also produce a text summary and emails. The latter along with perhaps a text message can also be useful in the notification of a serious regression committed to the repository.
However there is one large barrier to using this technology and that is understanding how databases work and learning how to use them.

Which database?


Is probably the hardest question. There are a plethora of them, but for this application they can be subdivided into two categories
  1. Traditional SQL based, MySQL (MariaDBPerconaDrizzle), PostgreSQLsqlite, firebird
  2. Newer, more fashionable NoSQL, mongoDB, cassandra, voldemort, couchDB


SQL


Of the traditional SQL databases listed sqlite is the odd one out as it is an embedded database that does not require a separate daemon process like the others. We will use it in this example as it has a low overhead and is easy to use. Most importantly it requires zero effort to set up. However it's performance in scaling to a large number of clients (10's to 100's) is unknown to me but it may still perform well in this use case.
The others require more effort in set up and hardware. Ideally they should have dedicated local disk resource (performance will be terrible if NAS storage is used). They also require tuning of configuration parameters for best results, for example this type of application tends to be append only (we don't update or delete rows) and we can be fairly relaxed about what happens in a catastrophic failure (we don't care if we loose some data if the machine dies). See below for some pointers in the case of MySQL and forks, perhaps there is scope for yet another blog post on this subject.

NoSQL


The NoSQL databases all have similar requirements, to each other and the conventional SQL databases - a daemon process and fast local disk for best results. They require less configuration, but you will still need to learn how to use them. I have not done any performance comparison against SQL databases so don't know how they compare when inserting and querying. Elsewhere I have mentioned functional coverage - I have an inkling that an SQL type database will be better suited to this, so in the event of otherwise similar performance the SQL type may win out.

Schema Design


Not so much of an issue with NoSQL (otherwise known as schema-free) databases, but you'll still need to index them. But in an SQL database we'll need to carefully design the tables and index them.
Having said that the tables in this simple example are, well, simple. We have a table for each invocation (log) which may or may not be a simulation. There is a field in this table to add a description of what it may be, could be regression, simulation, synthesis or anything else. Also who (uid) is running it and where it is being hosted, and also some parent information to allow us to put a tree of activities (e.g. a regression) back together.
We also add some pertinent indexes on this table to allow us to search it more efficiently.
We have a second table, message, that holds individual messages. Each has a field keyed to a unique log table entry, and this is how we put logs back together again by joining the tables on the log_id field.

Updating the database


In order to develop this example in the shortest possible time a Python callback is placed in the message emit callback list which enqueues each emitted message to be committed to the database. Even though this runs in a separate thread it's probably not the best implementation performance wise. A C++ version would make more sense. However being able to use Python demonstrates the flexibility in this approach to facilitate speedy prototyping (and the rapid turnaround in iterations that using an interpreted language allows).
The Python library is here.
When the messaging instance is created an entry is made in the log table, and the key id of this row is stored for use in all subsequent message table entries.
Each message is placed in a queue and the queue is flushed and committed to the database at each tick of a timer. The commit action is placed in a separate thread so that the process runs asynchronously to the simulation, thus the simulation does not normally stop when messages are sent to the database. However if a message of high enough severity is encountered the threads are synchronized and the message immediately committed, halting simulation. This is an attempt to prevent this message being lost in any subsequent program crash - at least that message will have been captured. The severity is programmable so this behaviour can be changed.
There is also a programmable filter function that can be used so that messages only of set severities or some other function of message parameters are committed.
Each commit to the database is an insert, with the message severity, text, time, filename and line number together with any associated identifier if given (e.g. IDENT-1). The motivation with the identifier is to make it easier to search for specific messages across multiple simulations when mining some specific data or grouping for triage. For example, to find all Python PYTN-0 messages :

  sqlite> select * from log natural join message 
    where message.ident = 'PYTN' and message.subident = '0';

This was one of the original goals; we can easily find messages of a particular type without resorting to regular expression searching a number of files. And not just of a particular type, but from a file, line or of a particular severity.

Results


When simulations are run using this environment all interesting log messages are logged to a database. We can then retrieve these messages post mortem to examine status from test to regression.
The next post will look at how we can do this in a web browser.

Installing and running MySQL


A brief note on getting a database up and running.
  • Which flavour? MySQL, MariaDB or Percona? I'd probably lean toward Maria & Aria engine for this type of application, perhaps Percona if I wanted to use XtraDB (note it's also available in Maria). I'd pick the latest non alpha version - if you're building from source (which I would also recommend) it's easy to have multiple versions installed anyway.
  • It is possible to have multiple instances of multiple versions running on the same machine. Building from source seems to be straightforward and this makes it easy to have multiple versions installed. You'll need to place each instance on a separate port, of course.
  • You do not need to run as mysqladmin user, most compute farms are fire-walled off the internet, so we do not need to worry about hackers. Whilst you may want to consider running your production server under a special user login (and add some basic table permissions to perhaps prevent row deletion/alteration), it is possible to prototype under your own login. If you're changing configuration parameters you'll be taking the daemon up and down frequently.
  • Hardware is important. You will need fast local storage (10k+ SAS, SSD or PCIe flash) and plenty of memory for best results. Don't use a networked drive at all costs. A recent processor will help too, avoid using hand me downs if this is going to be critical to you.
  • You will gain most by tuning your configuration, especially if you choose to use InnoDB/XtraDB. Most recent releases have been set to use this by default, so if you're not using MyISAM or Aria make sure that the daemon configuration file has the following set :
    • innodb_flush_log_at_trx_commit is 2
    • innodb_flush_method is O_DIRECT
    • innodb_buffer_pool_size is sized to the installed memory (70% of available RAM)
    • innodb_log_file_size is small, or send the logs to disks (if using SSDs) or turn off binary logging altogether with binlog_ignore_db, so as not to fill the primary (expensive?) storage.
    • transaction-isolation is READ-UNCOMMITTED
  • If you are using MyISAM or Aria 
    • key_buffer_size to 20% of available RAM.
    • innodb_buffer_pool_size is 0
Finally using something like mysqltuner can give you some feedback on how well your configuration is working.

Functional Coverage


This post has only covered storing simulation messages and results. It is also possible to store coverage data with many of same advantages. Storing data on a per simulation basis and merging it when required yields good results when a performant schema is used. Merging coverage data from many thousands of simulations each with thousands of buckets can be done surprisingly quickly (in the order of seconds) and individual buckets ranked by test even quicker. In fact it is possible to generate which tests hit which buckets data in real time as a mouse is rolled over a table.
Perhaps this is for another blog post series.

Wednesday, 12 June 2013

Test Bench Control Part Five

Unified Reporting

Previous posts have detailed the integration of Python into the simulation to allow greater interaction between a script and the simulation, now we also import the logging API into Python. We reuse the same process as for the VPI, SWIG the API and then thinly wrap in Python.
Our motivation here is to make the test process more robust by using the same reporting mechanisms so that we are unlikely as possible to miss an ERROR and thus report a failing test as passing. Traditionally the standard error and standard output of the invoking simulation script will be concatenated and grep'ed (or some regular expression parsing) for ERROR. This is error prone with multiple streams being interleaved, carriage returns and other noise in the resulting stream.
We also loose a great deal of useful information when a logging item is serialized to text. Sometimes a regular expression is used to extract some of that information (e.g. file name or line number) but this is error prone for the fore mentioned reasons. When a format string like "%s error, file %s, line %d" is used we are losing the structure of the information (the error string itself, file name and line number) that we once had and then need to kludge a mechanism to get it back later. What if this information was always kept with the log message so we could later analyse this data? If we had other output streams in addition to text that had richer semantics we could store this data and mine it to produce e.g. triage tables to aid regression debugging.

Messaging API


Most teams seem to have a generic messaging/logging library for their test benches. I have seem them written in both Verilog and C++. This example is written in C++, has some fixed severity levels with programmable verbosity, accounting and callbacks. Some macros are available that provide a convenient interface to record some attributes (filename, line) automatically when used.
This API is also available in Verilog (as above).
 `EXM_INFORMATION("python input set to %s", sim_ctrl_python_filename_r);
We also SWIG this API and make it available in Python thus
  import message

  # adjust output verbosity by filtering echo level
  message.message.instance.verbosity(0)
  # adjust echo control of particular severity
  message.control[message.NOTE].echo = 0
  # as above using artefact of wrapper
  message.control.DEBUG.echo = 1
  # adjust number of messages allowed prior to exit
  # [here set to unlimited]
  message.control.FATAL.threshold = -1

  # emit a warning with format
  message.warning('a warning %(c)d', c=666)
  # emit a string message
  message.note('a note')
  message.success('should be success')
  message.internal('whoops')
It's a bit more complicated this time as it allows us to add python callbacks (more of that follows), but the result of this is output like
  (     WARNING 14:51:17) a warning 666
  (     SUCCESS 14:51:17) should be success
  (    INTERNAL 14:51:17) whoops
  (       FATAL 14:51:17) Too many INTERNAL
  ( INFORMATION 14:51:17)        FATAL : 1
  ( INFORMATION 14:51:17)     INTERNAL : 1
  ( INFORMATION 14:51:17)        ERROR : 0
  ( INFORMATION 14:51:17)      WARNING : 1
  ( INFORMATION 14:51:17)      SUCCESS : 1
  ( INFORMATION 14:51:17)         NOTE : 1
  ( INFORMATION 14:51:17)  INFORMATION : 7
  ( INFORMATION 14:51:17)        DEBUG : 1
  ( INFORMATION 14:51:17)    INT_DEBUG : 0
But we have control over what is displayed and how in Python, and we can use Python mechanisms to control what happens for all sources of messages in the same manner as if using the C++ API.

Message Callbacks


The messaging library has an architecture based on callbacks. We can register an arbitrary number of callbacks on two events
  1. Message emission - when a message is logged
  2. Termination - when the messaging library is closed as the program exits.
The default display to the standard output is the default installed callback for the emission event, along with an accounting function that keeps track of the number of messages and their severity. The summary table (as above) function is added by default for the termination event. They are managed as hashed lists so can be deleted or replaced with new functions, and new arbitrary functions can also be added.
These callback lists are also available from Python, where we can add callbacks that run in the embedded Python interpreter (see next post regarding test result databases). The example contains no demonstration of using this library from Verilog. It should be possible to provide some access via the DPI with perhaps an exported function, but an elegant solution may be hindered by the lack of a mechanism to pass a task via a reference in System Verilog.

Putting together with VPI


When I first came to write this Verilator did not have a rich enough implementation of vpi_put_value and vpi_get_value, so I added them and the patch has now been folded into the mainline Verilator code. The patch did include a test coded in C++, but I expanded it further in Python and it is available here. It does violate the "do not code test bench logic in Python" rule, but is an interesting example as uses Python's int()bin()oct() and hex() as a reference model - whilst using the verilog and message libraries described above.
The test is also created using a simple framework that registers the two separate prologue and epilogue code fragments. Note that the epilogue will only run if the end of simulation is reached. Any error causing the test to exit prematurely will execute the fatal method instead. This is the body of the test_pass.py test

  # Copyright (c) 2012, 2013 Rich Porter - see LICENSE for further details

  import message
  import test

  ############################################################################

  class thistest(test.test) :
    name='test mdb pass'
    def prologue(self) :
      message.message.verbosity(message.INT_DEBUG)
      message.warning('a warning %(c)d', c=666)
      message.note('a note')
    def epilogue(self) :
      message.success('should be success')
    def fatal(self) :
      'Should not be executed'
      message.fatal('Should not be executed')

  ############################################################################

  testing = thistest()

There are a number of other tests in this directory that excercise the VPI/verilog library more. Please peruse at your leisure.

Conclusion


It is possible to fold a level of test scripting into a Verilog test bench simulation that provides easy access to the simulation hierarchy whilst providing a consistent logging interface. This removes any requirement for regular expression parsing of simulation output to determine pass/fail status.
Next we will look at storing messages in a database for later retrieval and analysis.

Monday, 10 June 2013

Test Bench Control Part Four

Importing the Verilog VPI into Python

This has been done before. And at least several times. And in Ruby too.
So why have I not reused one of these? Well I wanted a self contained example and I wanted to demonstrate the use of SWIG too.
This blog post is also advocating a different usage model. I don't think it's a good idea to use Python code as a test bench, i.e. generating stimulus and checking the functionality of a DUV. Python is just too slow compared to say C++ or even plain Verilog (when compiled). Frequently executing script code will seriously degrade your effective simulation clock frequency in all but the most trivial of examples.
Instead the embedded Python interpreter can be used to pull in script code that would normally execute before and after a simulation. When used in this mode the script now has access to all the Verilog structures and can manipulate them directly instead of creating hex dumps and arguments to pass to the simulator. Likewise at the end of the simulation state can be examined directly instead of dumped and checked by a script post mortem. So this method should simplify these scripts and make things more robust.

How it's done


The SWIG config file is here. It's fairly minimal, just a wrapper around the callback to enable a Python function to be called and some weak references for simulators that do not implement all of the VPI functions.
To generate the Python library we invoke SWIG with the python/vpi.i file to produce a C wrapper and a Python library that imports the wrapper. In python/Makefile
  vpi_wrap.c : vpi.i
         swig -python -I$(VERILATOR_INC) $<
The same makefile compiles the C into a shared object suitable for Python. The VPI is now available to Python embedded within a simulator by importing a library called vpi (you'll need to adjust LD_LIBRARY_PATH and PYTHONPATH - see the example).
The Python embedding code is here, and is imported as a DPI function into the Verilog code thus
  import "DPI-C" context task `EXM_PYTHON (input string filename);
  /*
   * invoke python shell
   */
  initial if ($test$plusargs("python") > 0) 
    begin : sim_ctrl_python_l
      reg [`std_char_sz_c*128-1:0] sim_ctrl_python_filename_r;
      if ($value$plusargs("python+%s", sim_ctrl_python_filename_r) == 0)
        begin
          sim_ctrl_python_filename_r = "stdin";
        end
      `EXM_INFORMATION("python input set to %s", sim_ctrl_python_filename_r);
      `EXM_PYTHON(sim_ctrl_python_filename_r);
    end : sim_ctrl_python_l
We can now pass a script to the python interpreter with a +python+script argument, to access signals within the test bench we can use the python verilog library which is the wrapper around the SWIG wrapper.
  import verilog
  # open scope
  simctrl = verilog.scope('example.simctrl_0_u')
  # set timeout
  simctrl.direct.sim_ctrl_timeout_i = verilog.vpiInt(1000)
So here we have the richness of Python to help configure the test bench, as opposed to just $value$plusargs and loadmemh. Also unlike the DPI we don't have to explicitly pass signal names through functions, we can just reference a scope and have access to all signals that are available through the VPI.
Of course you have to make these signals visible which might require passing options to your simulator to reduce the level of optimization in the containing scope. Partitioning your test bench and reducing the number of scopes to keep open will minimize any performance degradation this may cause.

Versus the DPI


Some test benches may already have similar functionality written in C/C++ and set values through DPI calls. Disadvantages here are that specific signals must be explicitly passed to the DPI call and any delta will require recompilation of the Verilog and the C/C++. Any delta in the script does not require any recompilation - just rerun the simulation.
It is also possible to enter the script interactively (+python) in this example, which allows interactive development and debug. pdb can be used too, use test_pdb.py as an example.
  [...]
  (        NOTE 12:38:00) entering pdb command line
  > /enalvo/work/rporter/20121220-example/verilog_integration/python/test.py(53)pdb()
  -> pass
  (Pdb) import verilog
  (Pdb) scope = verilog.scope('example.simctrl_0_u')
  (     WARNING 12:38:19) vpi_iterate: Unsupported type vpiNet, nothing will be returned
  (     WARNING 12:38:19) vpi_iterate: Unsupported type vpiIntegerVar, nothing will be returned
  (Pdb) scope
  <verilog.scope object at 0x135b750>
  (Pdb) scope._signals
  {'sim_ctrl_cycles_freq_i': <signal sim_ctrl_cycles_freq_i>, 'sim_ctrl_timeout_i': <signal sim_ctrl_timeout_i>, 'sim_ctrl_rst_op': <signal sim_ctrl_rst_op>, 'sim_ctrl_.directclk_op': 
  <signal sim_ctrl_clk_op>, 'sim_ctrl_cycles_i': <signal sim_ctrl_cycles_i>, 'sim_ctrl_rst_i': <signal sim_ctrl_rst_i>, 'sim_ctrl_finish_r': <signal sim_ctrl_finish_r>}
  (Pdb) scope._signals.items()
  [('sim_ctrl_cycles_freq_i', <signal sim_ctrl_cycles_freq_i>), ('sim_ctrl_timeout_i', <signal sim_ctrl_timeout_i>), ('sim_ctrl_rst_op', <signal sim_ctrl_rst_op>), 
  ('sim_ctrl_clk_op', <signal sim_ctrl_clk_op>), ('sim_ctrl_cycles_i', <signal sim_ctrl_cycles_i>), ('sim_ctrl_rst_i', <signal sim_ctrl_rst_i>), ('sim_ctrl_finish_r', <signal   
  sim_ctrl_finish_r>)]
  (Pdb) scope._signals.keys()
  ['sim_ctrl_cycles_freq_i', 'sim_ctrl_timeout_i', 'sim_ctrl_rst_op', 'sim_ctrl_clk_op', 'sim_ctrl_cycles_i', 'sim_ctrl_rst_i', 'sim_ctrl_finish_r']
  (Pdb) scope.direct.sim_ctrl_cycles_i = verilog.vpiInt(9999)
  (     WARNING 12:39:11) Ignoring vpi_put_value to signal marked read-only, use public_flat_rw instead: 
  (Pdb) scope.direct.sim_ctrl_clk_op = verilog.vpiInt(9999)
  (     WARNING 12:39:32) Ignoring vpi_put_value to signal marked read-only, use public_flat_rw instead: 
  (Pdb) scope.direct.sim_ctrl_finish_r = verilog.vpiInt(9999)
  (Pdb) c
  (        NOTE 12:39:49) leaving pdb command line
  [...]
So we can poke about and see what signals are available and what happens when they are manipulated interactively.

Memories

Note:
  1. verilator 3.850 and before has issues with vpi memory word iterators.
  2. verilator and the example verilog library have no support for vpi arrays (multidimensional memories).
The verilog library has an abstraction for memories that allows them to be accessed as a list of signals. For example :
  mem[0] = 99 # assign to memory location
  for idx, i in enumerate(mem) :
    i = idx # assign address index to each location in memory
This allows for programmatic assignment to any memory instead of relying on a stepped process of script to hex dump to loadmemh.

Callbacks


Also of note is the ability to register VPI callbacks. The raw interface is available in the wrapped vpi library, but the verilog library wrapper gives it a more Pythonic idiom (I hope).
For example to execute some Python when the reset signal changes value see test_reset.py. This example first creates a new callback class inheriting from one in the Verilog library
  # build reset callback on top of verilog abstraction
  class rstCallback(verilog.callback) :
    def __init__(self, obj) :
      verilog.callback.__init__(self, name='reset callback', obj=obj,
        reason=verilog.callback.cbValueChange, func=self.execute)
    def execute(self) :
      message.note('Reset == %(rst)d', rst=self.obj)
And then instantiates it onto the reset signal
  class thistest(test.test) :
    name='test reset callback'
    def prologue(self) :
      # register simulation controller scope
      self.simctrl = verilog.scope('example.simctrl_0_u')
      # register reset callback to reset signal in simulation controller scope
      self.rstCallback = rstCallback(self.simctrl.sim_ctrl_rst_op)
The verilog callback class also provides a filter function that allows e.g. simulation state to be examined and conditionally execute the callback. However don't rely too heavily on this sort of function though, as running such a callback too frequently (e.g. every clock cycle) can slow the simulation excessively because of the repeated Python execution. The best solution is to create an expression in Verilog in the test bench that does some prefiltering to reduce the number of callbacks executed. However they can be used on infrequently changing signals, e.g. to catch error conditions.

  class errorCallback(verilog.callback) :
    def __init__(self, obj, scope) :
      verilog.callback.__init__(self, name='error callback', obj=obj,
         scope=scope, reason=verilog.callback.cbValueChange,
         func=self.execute)
    def cb_filter(self) :
      'filter the callback so only executed on error'
      return str(self.scope.what_is_it) != 'error'
    def execute(self) :
      message.error("that's an error")
 
  cb = errorCallback(scope.error, scope)
The callback code body then has the DUV in scope to do any peeking about to help root cause a failure. It is even possible to fall back to a prompt as detailed above.

Functional Coverage


I did attempt to prototype some functional coverage code using this method, but ended disappointed as it's performance was very poor (for the same reason using Python as a test bench returns poor performance). It did make a very expressive coding environment for coverage points, though. It may be that another C++ library could be created to perform the most frequently executed work in a more efficient manner so that the construction and association of coverage points could still be done in script (along with dumping to e.g. a database). To repeat myself again I feel that this is a great architecture for this type of work, a fast compiled library imported into and configured by script.

Next


We will look at importing a common messaging library into Python so that it can be used across Python, Verilog and any C/C++ test bench code.

Wednesday, 5 June 2013

Test Bench Control Part Three

Why Integrate Python?

Or Ruby or Lua ...

I chose Python as it is the most modern scripting language that I am familiar with. And powerful, popular with good library support. It could be Ruby, Lua or Javascript (Javascript is an interesting choice - perhaps that's for another blog post!).
The motivation here is three fold
  1. To integrate as much of the test framework scripting as possible as close to the simulation as possible. Executing script code within the scope of the test bench gives the script direct access to the structure of the test bench and the ability to control values within. Warming memories and registers can now be performed directly without the requirement of creating hex dump files and reams of plusargs.
  2. To integrate common message/logging infrastructure across the simulator and script to retain as much of the message semantics as possible (think file name, line number, scope, time/cycles). There is then no difference in an error generated by the script or the simulation as they both pass through the same message logging library. We do not need to regular expression parse text files to find messages and extract information we had before it was all flattened into text.
  3. To make scripting within the simulation environment useful again. Built in TCL interpreters and simulator specific commands have been holding this back since, well - forever.
On the first point we can import the verilog VPI into our scripting language and add a test prologue and epilogue to be run before and after the simulation (we can use the VPI call backs cbStartOfSimulation and cbEndOfSimulation). In these routines we can poke values into the simulation and read them out afterwards, a use case might be pulling relevant sections from an ELF file then warming caches and data memory. Similarly at the end of a simulation memories can be inspected for expected golden values. Should these values not match then an error can be raised via the second point - common messaging infrastructure. So should anything go awry during simulation, where ever it is detected, be it script, Verilog or C++ all messages will be treated equally and reported accurately without any kludgey regular expression parsing of output streams. There is another important advantage here too, if we have test bench monitors that are reporting messages we can filter these out too. We can filter just the ones we're interested in to see what's going on in one or a few. The advantage here over having multiple output files for each instance is that the messages will be automatically sequentially interleaved. (Filtering by scope is not currently part of the example, but could be easily added.)
Finally I believe there are many advantages to using Python (or Ruby or Lua) over Tcl (or indeed Perl) that makes this infrastructure architecture compelling. Religion and arguments about relative performance aside, the newer languages are built on the foundations of Tcl and Perl and as such they are everything Tcl and Perl are and more.
There are also advantages in gluing the performant pieces of the test bench together with a scripting language, the foremost is that recompilation is not required when an edit is made to those portions of the code. As long as scripts are only called at the beginning and end of the simulation we can trade off recompilation against interpretation to increase the edit-debug iteration frequency. Scripting languages are often more concise when dealing with option parsing or regular expressions, and can be much more lightweight when used to e.g. interface with databases or the file system when compared to C++. If high performance is not required then dynamically typed script code will almost always be more concise as well as quicker to write and debug.

Importing APIs with SWIG

SWIG is the Simplified Wrapper and Interface Generator. Put simply it takes a C or C++ header file and generates a wrapper around it so it can be used in the scripting language of your choice. I find that this wrapped API is still a little rough around the edges, so I then wrap it again with a thin veneer to make it match the scripting language idioms more closely. This process does introduce some compute overhead, but the idea here is that we execute this code very infrequently just to set things up, so the performance is not really of any consequence in the scope of the entire application.
The next two blog posts will look at importing the Verilog VPI and then a simple messaging library into Python using SWIG. After we have done this we can use both together.

Note on ctypespyrex

A quick mention of the Python library ctypes as a replacement for SWIG. If a dynamic library is available this can be a very quick way to get going with it in Python, but it's not relevant here for the VPI. Also pyrex which could be used instead of SWIG in this instance.

Monday, 3 June 2013

Test Bench Control Part Two

Commercial verilog simulators tend to have a built in Tcl interpreter. This has a few extra specific commands built in allowing signals to be probed, forced and breakpointed. Whilst this may look useful at the outset, actually using these commands will limit your simulator choice later on as you are locking yourself into vendor specific functionality. For this reason, many teams choose to avoid using the Tcl interpreter altogether and use +plusargs and loadmem within their verilog code instead.

Likewise with waveform viewer text enumerations, some simulators allow enumerations to be specified for certain values. Whilst this is very useful it also promotes lock in and is not the best place to implement such functionality. I have always advocated adding signal textual enumerations in test bench code as this is portable across simulators and importantly can be reused in monitors when displaying text reports (e.g "register field updated to value 0x5 [A_TEXT_ENUMERATION]") as well as functional coverage. Thus reusing a single point of decoding for multiple outputs, this will also aid maintainability too.

Despite having been originally designed as a scripting language for IC design Tcl is now a bit long in the tooth, despite many recent updates and a rich set of libraries. Python, Ruby or Lua might be better choices today. One of the aspects of this example is integrating Python into a simulator and importing the VPI as an API into the test bench and design hierarchy. The raw VPI API is not very Pythonic at all so I've introduced a small amount of wrapping to make it appear a little more Pythonic. I'm familiar with Python so that's the main reason I chose to use Python in this example. There other libraries available on the web that have integrated the VPI with Ruby and Python too, so this is certainly not a novel approach. Whilst conceptually it is only doing the same thing the commands the simulator uses in it's Tcl shell - the advantage here is that we have full use of the VPI to give richer access than most simulators Tcl would allow and most importantly the commands are now under end user control and can be used consistently across different simulators. Using a script in this environment is now an aid to portability and not a hindrance against it.

The import is done with SWIG, the Simplified Wrapper and Interface Generator. Adding our own scripting shell also means that simulators without a Tcl interpreter now have a scripting interface too, and the same one as all the others. So any code written for this shell can be used across all simulators, thus increasing portability between vendors and removing lock in. I have been a Verilator user and advocate for a number of years and successfully used it many times. It does not come packaged with a shell and so by default any option parsing needs to be done in C++ or Verilog. Whilst it is possible to do this in either I believe that a script is the best way to handle option parsing, checking and executing actions based on their values. Of course, options don't need to be --option=value format on the command line. XML, YAMLlibconfig or a script fragment can be used as well and a reference to the required test(s) passed in. CSV is not so flexibly extensible and therefore does not lend itself to wider reuse. Text formats are best as they can be concurrently edited; do not be tempted to use something in a binary format e.g. a spreadsheet as these tend to require exclusive locks when being edited (merging binary files is not easy in most version control systems), and also file revision history is not visible inline via the version control system, i.e. you cannot use git annotate.

Whilst the performance of the scripting shell is much lower than that of compiled C++, it is important to remember that it should only executed for a small portion of the simulation (the beginning, the end, and possibly some infrequent callbacks in between) and that some of this work would be performed in a script anyway. At this point I would also advocate the use of a profiler (e.g. google perftools) to keep a check that this really is the case. Integrate this instrumentation and add this as a test in your regression test set to check for performance regression. This will serve to alert you if any change in your test bench or design code has a large effect on performance. It is otherwise easy to leak effective simulation Hertz as poorly performing code is added to the code base be they in the script, Verilog or C++.

The example also has a simple logging library that allows messages with varying severities to be recorded through an API, with the logging calls available to the Verilog RTL through DPI wrappers. This API is also SWIG'ed and wrapped to be available in Python too. So now we have a unified approach to logging in the entire test whether it is in within a script, Verilog or C++ the same logging code will be used. So an error or warning within the test prologue/epilogue, RTL or testbench checking code will all be treated and reported the same way.

In python :

import message
try :
  dosomething()
catch :
  message.error('dosomething() raised ' + str(sys.exc_info()))

In verilog :

`include "example.h"
if (error) `ERROR("error flag set");

In C++ :

#include "example.h"
if (error) ERROR("error flag set");

The API also facilitates adding callbacks to message events, and so we can add a function that logs these messages into a database. This is the subject of a later post.
In order to prevent further behaviour causing e.g. a segfault and a crash, we can stop simulation and flush all pending messages when one of ERROR or higher is emitted. This should prevent any eventual crash from losing any error messages. Additionally we add a SUCCESS level message. In order to pass a test must have one and only one SUCCESS message and no messages of severity ERROR or higher.
Additional functionality allows us to run a test beyond the first ERROR (to see how many more there are) and adjust the verbosity of the output by silencing less important messages. We can also add callbacks to adjust the verbosity at events within the simulation.

Example Code

The sample code that accompanies these blog posts is verilog integration at github.

The example requires boost and python development libraries to be installed and you'll need a simulator. Verilator is the primary target, where you'll need 3.845 or later (because it has the required VPI support). Set VERILATOR_ROOT in the environment or in test/make.inc.

  % git clone https://github.com/rporter/verilog_integration
  % cd verilog_integration
  % cat README
  % test/regress -s verilator

Please also note that this is proof of concept code. It's not meant to be used in anger as it has not been tested for completeness, correctness or scalability. It does also have a number of shortcomings as presented (some of which I hope to fix shortly). It does however show what can be done, and how.