Open Source Tools
for Verification
Rich Porter DVClub January 2013
This is a short presentation on using open source tools in chip verification flows, given at DVClub Bristol in January 2013. These tools do not primarily target silicon verification but can do useful work nonetheless.
My experience is all digital, but most of this will also apply for analogue and mixed signal development too.
Aims of this presentation
- Remind you that you already use open source tooling
- Introduce some new tools
- Encourage you to keep current with technology
- Pique interest sufficiently so that you might consider using them in future
- Hope to introduce everyone to at least one new tool!
I need to start by pointing out that you probably already use lots of open source tooling and that using new tools is thus not an issue in itself.
So today I'd like to introduce you to some new tools. I believe it's desperately important to keep current with technology, and I don't mean using it promiscuously in production - if you do your repositories will be littered with fragments of code in fleetingly fashionable languages - I mean keeping up to date and knowing what's available so you can use it when you need it.
By piquing your interest now with some recent but mature tooling you will, hopefully, become aware of it and be able to use it in the right circumstance at some point in the future.
if I introduce everyone to one new tool today, then this presentations job is done.
Why Open Source?
- Free?
- High Quality Software
- Debugged
- Contribute
- Fix
- Customize
It may be free-as-in-freedom but it certainly isn't free-as-in-beer. There are costs associated with installation, education and maintenance even if there is no cost in licensing, but those costs exist with any tool.
There is a lot of abandonware, half finished projects and multiple projects which aim do same thing. But a core of very high quality software exists, produced by very talented engineers and shared with us by them and their employers.
A large community of users means the software has been used in many places and then debugged by another set of talented engineers who contribute fixes and features. You are unlikely to find a new bug in a large mature project like python or git. But if you do you too can fix it, and contribute that fix as well as new features.
You can also just change the code and not be bound to contribute those changes if you just them internally.
I haven't added a "why not open source" slide. I'm not going to discuss this today beyond stating the GPL is not viral. If you use these tools internally you're not distributing any executable so do not need to make any source publicly available. Although some open source tools are not fee free when used commercially so you do need to check any license in that respect.
What you knew already
These are probably used globally by everyone here, although I have come across several ASIC firms that develop on windows.
GNU/Linux, some people may use open solaris, or freeBSD.
Venerable emacs and vi, some may use the nirvana editor. And eclipse exists for those who like IDEs.
Revision control rcs and cvs. rcs still has it's place. Look in users' bin directories and you will see the more savvy engineers keep their scripts under revision control - most likely with rcs.
Mozilla web and mail clients.
Old school scripting like tcl and perl, bash and awk come under GNU/linux.
X windows libraries and window managers.
So what's new?
Version Control
- cvs first released in 1990
- Subversion
- introduces version as unit of change
- still need for a central repository
- makes branching easier
- DVCS
- new wave of tools all from 2005
- no central repository as such, local commits, easy branches, push/pull
- bzr, hg, git
- Tool versioning
Concurrent Versions System (CVS) was first released more than 20 years ago in 1990. And is now past its best before date.
Subversion dates from 2000 and seems to have been gradually replacing CVS. The repository is now the unit of change not the file like CVS. So meaningful file-wise changes get grouped together in a single revision. But you still need to be connected to the central repository to make commits and see older versions and log comments. Subversion does making branching and merging less fraught than CVS.
However process has changed alot since the introduction of distributed version control systems. These three were all initially released within days of each other in early 2005.
With these distributed vcs's you can make many local commits to your local checkout and push them back to the master at a later date. Branching and merging is the working mode, and revisions are identified by md5 sums as opposed to incrementing version numbers. A local fork or checkout contains all the history so a connection to the master is only required to push local commits.
Bazaar is Python and c based, originated internally at canonical the company that also distribute the ubuntu flavour of linux.
Mercurial is Python and c based too.
Git is written in c, famously created by Linus Torvalds, massive uptake fuelled by the likes of github.com.
There is local traction in Bristol DV community with Mercurial and git.
I don't think it matters which one you use - as long as it isn't CVS. They all do the same thing. Mercurial has an subversion plugin to enable the use of subversion as a client, github has Mercurial interface.
On a slightly different versioning note there is also a simple program that allows tool versioning. If you haven't heard of it I suggest you look at modules - it allows you to simply load and unload different tool versions into your shell environment. Everybody needs to use a tool like this. Especially useful when you have multiple EDA tool versions installed.
Bug tracking & wiki
- Cross referencing of call #, version
- trac
- redmine
- Both have wide variety of plugins
- including multiple VCS support
Gnats, Bugzilla and Mantis all work.
As do twiki & moinmoin & other wiki packages, but an integrated call tracker/wiki and version control system can do much more.
It allows markup in the wiki content, as well as call description and commit comments.
So in a call you can reference commit #, and in a commit you can reference call number. This allows click through between call, commit, source code and wiki via the web interface.
These still are a bit immature as it doesn't force a number to be given or check the relationship between commit and call number is the same.
But it is useful to determine why code exists by clicking through on code browser to check in comments to call history and back again.
Also it is easier to set up and maintain one than both a call tracker and wiki.
[I was contacted after this presentation by someone who mentioned fossil, a distributed version control system, bug tracking system and wiki. Worth a look at in addition to trac and redmine.]
Compilation and Profiling
gcc, the gnu compiler collection, supported languages includes (Ada, C, C++, Fortran, Java, Objective-C, or Objective-C++) ... also contains gdb and gprof - the debuger and profiler.
LLVM (formerly Low Level Virtual Machine) has been ready for a number of years and is now gaining traction, freebsd distribution now built with it by default. clang, the compiler build upon the llvm, promises reduced memory footprint and increased compilation speed.
Profiling is key, for everything. simulations and infrastructure (see “Speed Comes from Profilers” Paul Graham on language design). If you don't keep tabs on where time is spent performance will slowly degrade as new features are added. I'm not advocating continual optimisation but looking for large step decreases in performance to avoid later exclamations of 'it never used to be this slow!' and the realisation that it could have been any check in since the start.
valgrind's memcheck option catches segmentation violations and carries on if it can, while memory leaks, hanging pointers and other memory related programming errors can also be caught.
valgrind also has cachegrind and callgrind options for performance profiling. the tool works by replacing the malloc and free system calls as well as load store instructions, but this does slow execution down. But it is invaluable to find memory related problems.
Google’s perftools provides leak detection with thread caching malloc and a separate profiling library. the profiler uses a sampling approach so slowdown is minimized. Nice graphed output is provided in a variety of formats with graphviz.
I always leave some form of profiling on in at least some tests by default to check for regressions in performance as it makes looking for the cause easier, as the regression will have occurred recently.
I have not mentioned full Java here. I know of a company that did write their verification drivers and checkers in Java, but it was awhile ago. Java does have excellent tooling and performance.
Build & Continuous Integration
Make probably is still the best option for source code compilation but don't try and use makefiles for anything else - beware of make regression or make gds2. Remember that sometimes just recompiling a single file might be quicker and without dependency bugs.
ccache caches object files by spotting similar commands, options and files. This can help improve speed a lot. it's also trivial to install with modules that I mentioned earlier and can slash compile and link times if you make clean or build from clean checkout frequently.
scons is a Python based build tool with Python expressiveness and extensibility. It also has some built in ccache like functionality.
Hudson/Jenkins is a continuous integration tool, it can provide very early feedback on a broken repository by compiling and running tests at every commit (when did it stop working? is a common question without continuous integration).
Jenkins can be easy to get started with, and even if you just compile and run a simple hello world test it is light years better than not checking at all. Don't forget a scheduled nightly build too to check for other sources of bit rot outside of the repository (e.g. machine configuration).
Farm Management
The majority of companies will have a compute farm, although some may now call it a private cloud. A batch queing system is often used to allocate jobs to machines.
Sun Grid Engine has been widely used, multiple forks now exist especially since Oracle’s acquisition of sun.
Open lava is fork of platform LSF lite. I have not used this one, so can't really comment.
Both support 'Distributed Resource Management Application API (DRMAA)' worth knowing about. bindings exist for popular languages, compiled and scripted. This forms a convenient abstraction for driving your queue via scripts, enabling easier transition between queueing software should it be required.
There exists lots of server health monitoring software. These are web based status monitoring tools that show the health of farm nodes, usually using a traffic light system to highlight any issues. More likely to be used by sysadmins but helpful to debug occasional strange test failures. if you commonly experience issues due to rogue processes using cpu and memory resources or temporary filesystems filling this type of tool is for you. Zabbix is newish and popular, but many others exist.
Scripting
Lots of choices for scripting, certainly not all listed here. Don't try and use them all at once! Littering a repository with different languages not a good idea. Learning new language and idioms is a great idea, however. It will help with programming in other languages, learning new patterns and using them.
These are all modern scripting languages, dynamically typed apart from ocaml. Some are Just in Time (JIT) compiled and some have JIT support. All are performant, but much depends on application. It's about using the right tool for the right job.
Python is probably the best supported, with libraries for most conceivable applications.
debuggers are available as well as profilers e.g. cProfile for Python.
Python also has a JIT compiled version called pypy that can return large performance increases in certain applications.
Ruby also has comprehensive library support including profilers and debuggers with JIT implementations, too.
If functional languages are your thing try ocaml which does have support for imperative features too. Jane Street Capital, a finance firm, is famous for using ocaml because they believe they develop the code faster and it contains fewer bugs. I find that learning functional languages really improve general programming skills too.
Lua is the most famous scripting language to come out of Brazil. It's used extensively in the games industry because of its small memory footprint, it's also meant to be one of the fastest dynamically typed scripting languages.
I have found SWIG to be an excellent tool. It ports a c/c++ api into scripting language automagically. I find that a little wrapping is then required to match scripting language idioms, but one can glue together application libs with a top level script, to do script like work in a scripting environment with performance work in faster c implementation. Plus you can often turn around top level changes without recompilation.
I use SWIG’ed verilog VPI in Python as command shell for verilator.one can then use exactly same code in four state event driven simulator. Also having written verification code in c++ that API that can be SWIG’ed and driven by Python too. The resulting framework removes the requirement to text parse simulation output as pre/post simulation scripts called from within the simulator executable with direct access to the verilog and test c++ structures.
Finally quick mention of javascript as it is being widely talked about, it is JIT compiled and very fast. Library support is not currently extensive but is improving. I hope to integrate into a verilog simulator but have not yet done so.
Data Visualisation
Visualisation was gnuplot or spreadsheets sometime ago? perhaps even text. Openoffice does have a scripting plugin.
But everyone is used to browsing with a web browser, so I see this as the new gui - it is probably the best current way to share data like regression and coverage results. The whole team can see and interact with it using hyperlinks, table sorting, drag and drop and so on.
I have found chromium from google to be the fastest at rendering large datasets, although others are improving all the time.
But as it is a client you wil need a server.
Apache is far too big and overpowered - micro web servers are ideal for the traffic this sort of application will see, but are scalable and can be beefed up if required. They can be run in user space so anyone can prototype and are very lightweight.
Bottle is a single python file, unicorn is for Ruby. There are lots of examples across the web to help craft a solution.
Template engines can help replace endless seas of print or write commands that output html.
Javascript libraries help with client side scripting, creating interactive pages, reducing page requests and server load - modern browsers are very fast at dynamic rendering so doing some work within the client works well. jquery is just one of those available, but is probably the most widely used and hence has the best third party support in the form of plugins. Could be graphing like flot or micro templating like jqote2.
Also available are mootools, dojo toolkit and google's web tools if you're a java programmer.
Databases
You already use the filesystem to store data, but lack of indices can make finding the data you're looking for slow in larger datasets. And how to avoid race conditions with multiple writers, or synchronizing readers? Stop faffing about with that and use a tool that was made for such a situation. Let it worry about sorting out the races and how best to aggregate data. Although not all strictly ACID compliant these relational databases work well enough and we are not running banking applications where losing track of money hurts.
sqlite is an embedded database with bindings for scripting languages and c++. Widely used - but I don't know much about performance with lots of clients.
Using mysql or one of the forks means you get high performance with fast asynchronous data upload . It has great support for 3rd party tools - but you do have to learn SQL. You can keep all your data over a project's lifetime and mine it for triage, trends and historical events. Don't worry about data size and performance, I have always been surprised at the speed of queries on large datasets.
Also available are postgresql and drizzle. drizzle is a fork of mysql for 'the cloud' - it is designed to be fast and scalable.
I've used mysql for coverage databases; I think this has been independently invented a number of times, e.g. CovVise.
If you are already using them then one of the numerous status web applications may help you, or bring other engineers up to speed or for those afraid of an unfamiliar command line. phpMyAdmin, heidi-sql, mysql workbench. Pick one.
NoSQL big in recent years, schema free document stores with indices to aid searching. mongodb is one I've used. It has a javascript and json/bson interface. The application was storing simulation log files for triage, fail history and other mining purposes.
Sedna development seems to have gone quiet - the web seems to be moving away from xml - but it uses xquery which is an interesting templating engine. You'll have to learn xquery though which is huge curve if you're unfamiliar with xpath/xslt. But works well and faster than libxslt if you have lots of xml data. I’ve successfully used sedna with a large architectural coverage database several gigabytes in size, querying for tests that exercise specific functionality.
Other Tools
Other tools that I don't know much about but I like the idea of.
A quick mention for gnu octave, I've not used but it can be a replacement for matlab. Lack of license restrictions make it very scaleable.
I am really interested in the concept of a scalable private cloud (ie. compute farm) composed of redundant commodity hardware. 3TB SATA disks are cheap, how do we use them instead of expensive NAS? We can add new hardware as required with tape out schedules and plan obsolescence coincident with release of new technology such as higher capacity disks and faster/denser processors.
Hadoop is reimplementation of googles’ map reduce. This could be used for regression, comes with its own distributed file system. A replacement for batch scheduler?
Finally the hudson Android app has been around since 2009, but a higher resolution one would work well on a TV screen. Place in key areas where it will get seen, so everyone can see the current project status.
In conclusion - there is lots of freely available tooling that is just a google and a click away that can help you with your current project. Start trying them out now so you're able to use them when they can help you.
Compilation and Profiling
- gcc is widely used
- llvm/clang is now production ready
- FreeBSD now uses clang by default
- faster compilation times
- 'better' output
- valgrind (val-grinned)
- more than just memcheck
- google perftools
- tcmalloc, profiler
- nice graphing output with graphviz
- cpu profiling
- memory profiling/leak detection
gcc, the gnu compiler collection, supported languages includes (Ada, C, C++, Fortran, Java, Objective-C, or Objective-C++) ... also contains gdb and gprof - the debuger and profiler.
LLVM (formerly Low Level Virtual Machine) has been ready for a number of years and is now gaining traction, freebsd distribution now built with it by default. clang, the compiler build upon the llvm, promises reduced memory footprint and increased compilation speed.
Profiling is key, for everything. simulations and infrastructure (see “Speed Comes from Profilers” Paul Graham on language design). If you don't keep tabs on where time is spent performance will slowly degrade as new features are added. I'm not advocating continual optimisation but looking for large step decreases in performance to avoid later exclamations of 'it never used to be this slow!' and the realisation that it could have been any check in since the start.
valgrind's memcheck option catches segmentation violations and carries on if it can, while memory leaks, hanging pointers and other memory related programming errors can also be caught.
valgrind also has cachegrind and callgrind options for performance profiling. the tool works by replacing the malloc and free system calls as well as load store instructions, but this does slow execution down. But it is invaluable to find memory related problems.
Google’s perftools provides leak detection with thread caching malloc and a separate profiling library. the profiler uses a sampling approach so slowdown is minimized. Nice graphed output is provided in a variety of formats with graphviz.
I always leave some form of profiling on in at least some tests by default to check for regressions in performance as it makes looking for the cause easier, as the regression will have occurred recently.
I have not mentioned full Java here. I know of a company that did write their verification drivers and checkers in Java, but it was awhile ago. Java does have excellent tooling and performance.
Build & Continuous Integration
- make
- Probably still the best for source code compilation
- ccache could help
- scons
- python based build tool
- python expressiveness and extensibility
- tinderbox3, hudson forked to jenkins
- lots of plugins
- good support for VCS
- probably still need a test framework underneath
Make probably is still the best option for source code compilation but don't try and use makefiles for anything else - beware of make regression or make gds2. Remember that sometimes just recompiling a single file might be quicker and without dependency bugs.
ccache caches object files by spotting similar commands, options and files. This can help improve speed a lot. it's also trivial to install with modules that I mentioned earlier and can slash compile and link times if you make clean or build from clean checkout frequently.
scons is a Python based build tool with Python expressiveness and extensibility. It also has some built in ccache like functionality.
Hudson/Jenkins is a continuous integration tool, it can provide very early feedback on a broken repository by compiling and running tests at every commit (when did it stop working? is a common question without continuous integration).
Jenkins can be easy to get started with, and even if you just compile and run a simple hello world test it is light years better than not checking at all. Don't forget a scheduled nightly build too to check for other sources of bit rot outside of the repository (e.g. machine configuration).
Farm Management
- Oracle grid engine (was SGE, was codine)
- open lava fork of platform lsf
- Support for DRMAA
- c++/perl/python/ruby library
- scheduler agnostic?
The majority of companies will have a compute farm, although some may now call it a private cloud. A batch queing system is often used to allocate jobs to machines.
Sun Grid Engine has been widely used, multiple forks now exist especially since Oracle’s acquisition of sun.
Open lava is fork of platform LSF lite. I have not used this one, so can't really comment.
Both support 'Distributed Resource Management Application API (DRMAA)' worth knowing about. bindings exist for popular languages, compiled and scripted. This forms a convenient abstraction for driving your queue via scripts, enabling easier transition between queueing software should it be required.
There exists lots of server health monitoring software. These are web based status monitoring tools that show the health of farm nodes, usually using a traffic light system to highlight any issues. More likely to be used by sysadmins but helpful to debug occasional strange test failures. if you commonly experience issues due to rogue processes using cpu and memory resources or temporary filesystems filling this type of tool is for you. Zabbix is newish and popular, but many others exist.
Scripting
- Python
- Ruby
- OCaml
- Go
- Lua
- Fast with small memory footprint
- swig works with all the above
- 'Go' via cgo
- Javascript (ECMA script)
Lots of choices for scripting, certainly not all listed here. Don't try and use them all at once! Littering a repository with different languages not a good idea. Learning new language and idioms is a great idea, however. It will help with programming in other languages, learning new patterns and using them.
These are all modern scripting languages, dynamically typed apart from ocaml. Some are Just in Time (JIT) compiled and some have JIT support. All are performant, but much depends on application. It's about using the right tool for the right job.
Python is probably the best supported, with libraries for most conceivable applications.
debuggers are available as well as profilers e.g. cProfile for Python.
Python also has a JIT compiled version called pypy that can return large performance increases in certain applications.
Ruby also has comprehensive library support including profilers and debuggers with JIT implementations, too.
If functional languages are your thing try ocaml which does have support for imperative features too. Jane Street Capital, a finance firm, is famous for using ocaml because they believe they develop the code faster and it contains fewer bugs. I find that learning functional languages really improve general programming skills too.
Lua is the most famous scripting language to come out of Brazil. It's used extensively in the games industry because of its small memory footprint, it's also meant to be one of the fastest dynamically typed scripting languages.
I have found SWIG to be an excellent tool. It ports a c/c++ api into scripting language automagically. I find that a little wrapping is then required to match scripting language idioms, but one can glue together application libs with a top level script, to do script like work in a scripting environment with performance work in faster c implementation. Plus you can often turn around top level changes without recompilation.
I use SWIG’ed verilog VPI in Python as command shell for verilator.one can then use exactly same code in four state event driven simulator. Also having written verification code in c++ that API that can be SWIG’ed and driven by Python too. The resulting framework removes the requirement to text parse simulation output as pre/post simulation scripts called from within the simulator executable with direct access to the verilog and test c++ structures.
Finally quick mention of javascript as it is being widely talked about, it is JIT compiled and very fast. Library support is not currently extensive but is improving. I hope to integrate into a verilog simulator but have not yet done so.
Data Visualisation
- was gnuplot or a spreadsheet
- chromium, firefox, opera
- chromium fastest?
- but need a webserver
- apache an old, overpowered monster
- micro web frameworks
- HTML template engines
- Javascript helper libraries
Visualisation was gnuplot or spreadsheets sometime ago? perhaps even text. Openoffice does have a scripting plugin.
But everyone is used to browsing with a web browser, so I see this as the new gui - it is probably the best current way to share data like regression and coverage results. The whole team can see and interact with it using hyperlinks, table sorting, drag and drop and so on.
I have found chromium from google to be the fastest at rendering large datasets, although others are improving all the time.
But as it is a client you wil need a server.
Apache is far too big and overpowered - micro web servers are ideal for the traffic this sort of application will see, but are scalable and can be beefed up if required. They can be run in user space so anyone can prototype and are very lightweight.
Bottle is a single python file, unicorn is for Ruby. There are lots of examples across the web to help craft a solution.
Template engines can help replace endless seas of print or write commands that output html.
Javascript libraries help with client side scripting, creating interactive pages, reducing page requests and server load - modern browsers are very fast at dynamic rendering so doing some work within the client works well. jquery is just one of those available, but is probably the most widely used and hence has the best third party support in the form of plugins. Could be graphing like flot or micro templating like jqote2.
Also available are mootools, dojo toolkit and google's web tools if you're a java programmer.
Databases
- Filesystem?
- Relational (SQL)
- sqlite (start here)
- MySQL, Percona, MariaDB
- PostgreSQL
- Drizzle
- Web management apps
- phpMyAdmin, ...
- NoSQL
- XML
You already use the filesystem to store data, but lack of indices can make finding the data you're looking for slow in larger datasets. And how to avoid race conditions with multiple writers, or synchronizing readers? Stop faffing about with that and use a tool that was made for such a situation. Let it worry about sorting out the races and how best to aggregate data. Although not all strictly ACID compliant these relational databases work well enough and we are not running banking applications where losing track of money hurts.
sqlite is an embedded database with bindings for scripting languages and c++. Widely used - but I don't know much about performance with lots of clients.
Using mysql or one of the forks means you get high performance with fast asynchronous data upload . It has great support for 3rd party tools - but you do have to learn SQL. You can keep all your data over a project's lifetime and mine it for triage, trends and historical events. Don't worry about data size and performance, I have always been surprised at the speed of queries on large datasets.
Also available are postgresql and drizzle. drizzle is a fork of mysql for 'the cloud' - it is designed to be fast and scalable.
I've used mysql for coverage databases; I think this has been independently invented a number of times, e.g. CovVise.
If you are already using them then one of the numerous status web applications may help you, or bring other engineers up to speed or for those afraid of an unfamiliar command line. phpMyAdmin, heidi-sql, mysql workbench. Pick one.
NoSQL big in recent years, schema free document stores with indices to aid searching. mongodb is one I've used. It has a javascript and json/bson interface. The application was storing simulation log files for triage, fail history and other mining purposes.
Sedna development seems to have gone quiet - the web seems to be moving away from xml - but it uses xquery which is an interesting templating engine. You'll have to learn xquery though which is huge curve if you're unfamiliar with xpath/xslt. But works well and faster than libxslt if you have lots of xml data. I’ve successfully used sedna with a large architectural coverage database several gigabytes in size, querying for tests that exercise specific functionality.
Other Tools
- That I don't really know much about
- GNU Octave
- Ceph, Lustre
- To build scalable, redundant "private cloud"
- Using cheap commodity hardware
- Ceph has been in the news recently
- Hadoop
- As above but replacing queue for scheduling
- Android
- An app for your phones
- As more and more TVs become android powered, an app to deliver CI status
- Put a TV in your coffee room, on the office wall
Other tools that I don't know much about but I like the idea of.
A quick mention for gnu octave, I've not used but it can be a replacement for matlab. Lack of license restrictions make it very scaleable.
I am really interested in the concept of a scalable private cloud (ie. compute farm) composed of redundant commodity hardware. 3TB SATA disks are cheap, how do we use them instead of expensive NAS? We can add new hardware as required with tape out schedules and plan obsolescence coincident with release of new technology such as higher capacity disks and faster/denser processors.
Hadoop is reimplementation of googles’ map reduce. This could be used for regression, comes with its own distributed file system. A replacement for batch scheduler?
Finally the hudson Android app has been around since 2009, but a higher resolution one would work well on a TV screen. Place in key areas where it will get seen, so everyone can see the current project status.
In conclusion - there is lots of freely available tooling that is just a google and a click away that can help you with your current project. Start trying them out now so you're able to use them when they can help you.