Digital Verification Musings: First Post!

I've been meaning to post this for literally years.

It is targeted at Digital Chip Design & Verification Engineers but may be more widely applicable.

I may be back to update it, but this is as I found it. In reality it's not this simple of course, and individual teams & projects will be blurred across the levels, but I think it provides an ideal discussion piece.

I've never worked anywhere past level 3, and I don't think any such place exists.

Making Debug More Effective

Table of Contents

Level 0

Problems Experienced
Methodologies

Level 1

Problems Experienced
Methodologies

Level 2

Problems Experienced
Methodologies

Level 3

Problems Experienced
Methodologies

Level 4

Problems Experienced
Methodologies

Level 5

Problems Experienced
Methodologies

Level 0

Problems Experienced

Code almost always fails to compile.
Code is unreadable, no comments, short identifiers.
Code functionality unknown or hard to quantify.
Always 'progressing' without passing hard metrics.
Little intra group communication.
Debug performed in final device (probably FPGA).
Engineering performed on old PC, small monitor, probably no backups taken.
Inevitable occurence of show stopping bugs or dead device.

Methodologies

This level classified by lack of methodology.
Little or no specification.
Some files stored in a directory not under verison control.
Compilation performed by hand.
Probably no testbench for simulation, not self checking if it did exist.
Instant coffee.

Level 1

Problems Experienced

Code frequently fails to compile
Code contains virtually no comments
Code functionality unknown or hard to quantify
Any specification (oral or otherwise) devoid of debug and verification driven features. Write only registers, multiple pervasive asynchronous clocks etc.
Project always 'progressing' without passing hard metrics (probably because there aren't any hard metrics)
Debug performed by eye/waveform, no reference model.
Engineers use low resolution monitor or laptop display.
Frequent occurence of silicon show stopping bugs or dead device.

Methodologies

Source code repository, but head seldom compiles.
Infrequent code check-ins of various added features and fixes. Not driven by plan or calls.
Non or partial self checking testbench.
Debug not repeatable.
No code reviews.
Filter coffee.

Level 2

Problems Experienced

Code frequently fails to compile.
Code functionality tested infrequently, testcase regressions as new bugs introduced, but not found until some after commited to repository.
Multiple testbenches for different functionality, supported by different people/groups, different formats.
Simulation performance slow, old computers, tier 2 simulators.
Engineers spend time updating workareas from repository to find something else broken hindering their commits.
Incomplete guidelines on use of HDL structures (e.g. X states) cause issues from testbench to gate level.
Wiki, if exists, is a morass of out-of-date write only pages.
Compute infrastructure outdated exemplified by overfull filer - causing iterations to be lost by "no further space on device".
Frequent occurence of show stopping silicon bugs, many other bugs worked around or requiring spins.

Methodologies

Specification document.
Source code repository, but head seldom compiles.
Call tracking system (but seldom used).
HDL coding guidelines.
Self checking testbench(es), post mortem value dump comparisons, maybe using golden dumps from eyeballed simulations.
Testsuite(s), but incomplete. Cannot run both as single testsuite. No script may exist to automate running of all tests.
Reference model - if exists - not bit accurate, script required to test whether results 'within acceptable parameters'.
Attempt at peer review of design code prior to tapeout.
Engineers might send out weekly reports in a common format.
Mostly quiet environment conducive to work.
Coffee room with quality filter coffee.

Level 3

Problems Experienced

Occasional code compilation errors.
Often repository head functionality dead.
Nightly regression incomplete hiding issues, and often fails due to broken repository head.
Checkout-edit-update-commit loop slowed by compromised repository head functionality.
Ad hoc test infrastucture incomplete, unwieldy and non performant.
Ill informed use of verification methodologies cause difficult to reproduce scenarios, works in testbench A but not B. Testcases initialised with random seeds not recorded.
Frequent silicon bugs worked around or requiring spins.

Methodologies

Engineers have 24" monitors on their desks.
Specification and implementation document.
Feature or fix driven code check-ins, against clear calls.
Regular check-ins.
Nightly regression run with results presented via web based application.
Queueing system of machines to allow parallel testcase simulations.
Call tracking system (but widely used).
List of permitted languages (HDL, scripting, test etc.) produced.
Dynamically self checking testbench.
Test architecture allows tests to be rerun on silicon platform.
Culture of always working repository, but not always successful.
Weekly reports are not spammed to every engineer, but collated and resent in single format.
Coffee room environment that encourages ad hoc conversations between disciplines.

Level 4

Problems Experienced

Rare code compilation errors and repository head functionality due to pre compilation test escapes.
Waiting for regressions and simulations to complete has now become the limiting factor, particularly as project deadlines near. This is caused by low effective simulation Hz which is in turn caused by non performant testbenches and testcode in addition to queue resource constraints (licensing & compute nodes).
Some legacy blocks containing arcane code are problematic (bug density) because code is not routinely refactored.
Few silicon bugs.

Methodologies

Engineers have 2 24" monitors on their desks.
Specification and implementation documents reviewed by all disciplines (incl. verification) and include features specifically for debug & verification.
Integrated call tracking and source code repository.
Single dynamically self checking testbench.
HDL coding guidelines extend to testbench.
Testcase coding guidelines (e.g. if in C)
Permitted language list enforced.
Large queuing systems, though licensing may not stretch to all nodes at peak periods.
Monitors/higher level abstractions.
Interface assertions.
Full regression is a single command.
Automated regression triage.
Regression infrastructure may rerun a test that fails (e.g. if it has historically passed) to overcome any queue instabilities.
Feature driven test subset to aid initial bring up and bug localisation.
Mature web based application for test analysis, including metrics regarding stability & project maturity - code & functional coverage (CEO/high level management use it)
Culture of always working repository supported by pre commit regression sets
Frequent peer review of all HDL and test code.
Wiki is gardened.
Quiet environment conducive to work, with distractions removed.
Culture of on time meetings.
Weekly meetings focus on continuous review and are not simply round-the-table reiterations of weekly reports.
Separate meeting rooms to reduce at-desk noise levels.
Good coffee, sofas in coffee room.

Level 5

Problems Experienced

This level classified by only hard problems remaining.
Engineers spend all time debugging - but real and deeply difficult issues!
Performant architecture, infrastructure, testbench, test case code and design.

Methodologies

Engineers have 3 24" monitors on their desks, Aeron chairs.
Specification and implementation optimized for debug & verification.
Each check in is referenced to a call.
Very large queue, > 20 queue slots per engineer. Licenses (unlimited) or infrastructure (e.g. verilator) to support this.
Machine replacement programme exists. Queue machines retired (to the bin) after 3 years and replaced with state of the art.
Testbench and testcase code coverage.
Testbench assertions.
Continuous integration.
Spontaneous peer review and code refactoring.
Regressions test simulation performance (effective testbench MHz) to ensure tests and testbench remain performat.
Regressions run all the way to place and route (via synthesis) from a single command. Code changes can be evaluated against all metrics in parallel via a simple invocation.
Automated regression triage produces waveforms for best debug candidate.
Coding guidelines for all permitted languages.
In depth web based project analysis (CEO/high level management & office secretary have pages to view)
Frequent peer review of all code, including all scripts.
Organic, fairtrade coffee from a 'barista' style machine.

Digital Verification Musings

Thursday, 14 March 2013

First Post!

Making Debug More Effective

Level 0

Problems Experienced

Methodologies

Level 1

Problems Experienced

Methodologies

Level 2

Problems Experienced

Methodologies

Level 3

Problems Experienced

Methodologies

Level 4

Problems Experienced

Methodologies

Level 5

Problems Experienced

Methodologies

See Also

No comments:

Post a Comment