This Verification Crisis
This it is not the first time that the industry has been in this position. Those with memories lasting more than a few years will remember the issues that plagued us when gate level simulation was necessary for sign off. Simulation runs that took days, regression runs that took weeks were normal in the industry. The fix then was to increase the level of abstraction, and the fix today is exactly the same, although with a few more complications. By moving from the gate level to the RTL level, simulation performance and capacity were so much better that most of the problems just disappeared. A discussion about the enablers for that wholesale change in abstraction would take a complete article in itself. RTL simulators may be wonderful tools, but they are only wonderful at doing RTL simulation, and RTL simulation is good for verifying implementations. But RTL simulation is not good at performing functional verification, or performance verification, because there is just too much unnecessary information which slows everything down. So this is not a problem with the tools it is a problem with the models. Simulation For years now it has become accepted that when creating a testbench, one of the first steps is to write a Bus Functional Model for each of the interfaces of the design. This enables the abstraction to be raised for the testbench. In addition, nobody tries to include a full functional model of a processor, instead replacing it with an Instruction Set Simulator and a Bus Functional Model. In both of these cases, the speedup obtained in the piece that they replace is a hundred fold or more and allows for much greater efficiency on the part of the writer of the models and for the speed of test execution. But for some reason, the industry has not applied these techniques in a more general sense within the design itself. Why don't we use the highest possible level of abstraction for every element of the design, such that there is only just sufficient detail to confirm the purpose of the test? A recent study conducted by Cadence where a design was modeled at the transaction level rather than at the RTL level showed a 450X speedup. They also went on to point out that they had done a very poor job of integrating their processor model which was accounting for a 2 to 4 X overall slow down - so in other words they expecting it to be 1000 times faster at the end of the day. To put this in perspective it reduces a 1-day simulation down to 9 seconds. Clearly this kind of simulation will not tell you that the implementation is correct, but it will tell you if the design is working in the way that you expect, and would enable much longer and more sophisticated tests to be put together. This in itself would find more of the problems that are escaping onto silicon, but perhaps more importantly, it would enable some of these bugs to be found much earlier in the design process when something can be done about them, rather than applying band aids when they are found at the last minute. While this higher abstraction simulation is possibly today and being used within some companies, there are some significant practical problems that need to be addressed before this technique will become widespread in its usage. There are four primary areas that need to be improved to make this strategy ubiquitous. The first is that you cannot perform all of the functional verification on the transaction level models. At some point in time, you have to ensure that the implementation matches those transaction level models. Secondly designs today are made up using a number of IP blocks, and abstract models for these are required from the block vendors. A third problem is that there is little to no agreement in the industry on the level or levels of abstraction that should be used for these models and lastly there is a lack of standard interfaces to connect them together. Each of these pieces is soluble and I believe it should be a priority in the industry to put together such solutions. Multi-Abstraction Simulation A general multi-abstraction approach would allow for a simulation run to be performed where the abstraction of any block has been defined in a configuration file. The tools would bring in the desired model for each component and also bring in the appropriate abstraction converters to enable all of the models to be able to talk together. Knowing the abstractions available for each model and for the converters available would enable the tool to tell the user which combinations were legal. This kind of information should be easy to incorporate into the developing SPIRIT standard and this would also allow for all simulator vendors to work from the same meta-data about the models. One of the important aspects of this technique is that it allows the implementation of each block to be proven within the context of the system and subjected to system level tests even before the implementation may be complete. In this way, the engineer can be assured that he has understood the specification correctly and that the necessary functionality has been proven to exist within his block. While there will always be a need to do some complete simulations at the RTL level, it should not be necessary to run them after every change has been made. Long RTL runs can be left until the design has become more stable and the verification of each of the blocks within the context of the system have been verified. The bugs found at this stage should not be the usual integration bugs, but timing specific difficulties only. IP Blocks Today, when a piece of IP is delivered, it may come as an RTL file or as an encrypted model. It is the primary responsibility of the IP developer to have ensured that the block fulfils all of the requirements which it is specified to perform. If this is true, then there should be no need for the integrator to run all of the simulations with an implementation model for the block. Instead a much higher performance behavioral model should be provided and this used for functional verification. It is the role of the integrator to ensure that when multiple blocks are connected together, that they perform the desired function. However, for most designers the necessary level of trust with their IP provider does not yet exist, and so for some simulations, it is considered prudent to replace the behavioral model with the implementation model and ensure that the system function does not change. To do this, the IP developer needs to ship the implementation model, the behavioral model and the abstraction converters between the supplied model levels. In many cases the necessary models have already been developed and supplied in the corresponding verification IP. This needs to be repackaged such that they get shipped with the design IP instead. The processor vendors are ahead of the pack here in that they do provide models at multiple levels of abstraction. Abstraction Levels Abstraction is not a finely defined term. An industry taxonomy of models has recently been published (Taxonomies for the development and verification of digital systems: Bailey, Martin & Anderson, Springer 2005) that should help to clarify some of the terms, but the industry must still decide on the levels of abstraction to use. Without this agreement it will become difficult to build the necessary abstraction converters. Companies like Spiratech in the U.K. have been taking the lead in this area. The SystemC Transaction Level Modeling group has just this week announced the availability of their 1.0 standard. This standard provides a common interface for multiple levels of abstraction, ranging from the untimed transactions, timed and down to a cycle accurate level of abstraction, but still above the RTL level. In their recent press many companies are promising support for this standard and I hope they follow through. In addition, the IP suppliers need to start providing models that conform to these standards as quickly as possible. At least now one aspect of this complete puzzle has been put in place and the solution has become a little closer. Interfaces Without consistent interfaces, an extra burden is placed on the user community in that they have to work out how to talk to every model individually, each possibly written in a different language. In addition it makes it more difficult for tool developers to create the necessary automation tools that would allow easy model replacement. Recent advances made with SystemVerilog have provided a possible base to build on with the new DPI interface. While not every simulator yet supports this and it has yet to be propagated to all of the necessary languages, it is fast and flexible and may be the place to start. I can also do my part to help the industry solve this problem as I chair the Accellera Interfaces Technical Committee which would have the responsibility for building such an interface. I would ask that any people or companies that would be willing to help in the development of such an interface should contact me, or just let me know if you think it is important even if you cannot afford to spend the time on its development. None of the steps outlined above is difficult but it does require the co-operation of the whole community to make this happen. EDA tool companies, IP providers, standards groups and users each have a part to play here and they should not hope that someone else will do the work for them. With these changes made, this verification crisis will go away, at least for several years.
June 7, 2005 Brian Bailey is a widely recognized Verification consultant. He can be reached at brian_bailey@acm.org
|