# Feasibility of a Pass Transistor Logic Library for General Purpose ASIC Design

Preston Thomson and Travis Johnson
University of Utah
Departments of Electrical Engineering and Computer Science

#### Abstract

CMOS has dominated the digital design space almost since its discovery. Alternative design styles such as domino logic, complementary pass gate logic, and others have been proposed and implemented, but none achieved the penetration and reach of the standard CMOS library. These other designs often have benefits for certain applications but none have been able to unseat CMOS as the de facto design style. This project examines the feasibility of moving one of these design styles, a custom pass transistor logic library (PTL) with restoring logic, into the same general purpose environment that its CMOS counterpart currently dominates.

This study was conducted using the Cadence EDA Suite using the venerable .5 micron AMI design rules. This process is employed by the University of Utah for most student design projects and fabrication is still possible through such Fabrication companies as MOSIS. These reasons, as well as the authors' prior experience with this process, were the deciding factors for choosing this technology node and design suite.

It is the authors' thought that this study could be reproduced using present and future technology nodes just as the 90 and 65nm processes and smaller given the proper access to these technologies spice files and design rules. This topic will be revisited under the possible future work section of this paper.

This paper shows comparative results of PTL and CMOS versions of an AOI, XOR, Inverting MUX, NAND, NOR and NOR4. More gates would have been tested, but many of the results were easily inferred by looking at the results of similar cells and the trends that began to appear as these six gate structures were examined. The PTL cells almost universally were smaller than their CMOS counterparts, but the performance and power results were not as clear cut.

#### 1. Introduction & Motivation

In the very competitive field of very large and ultra large scale integrated circuit design every decision must be considered very carefully. Even in relatively small ASIC designs, there is typically a premium on area, speed and power consumption. As designs have continued to grow, the need for functional libraries for design synthesis has become evermore pronounced. For several years now CMOS libraries have been very widely used. Libraries of varying sizes and complexities are used to synthesize most circuits today, and static CMOS cells lie at the heart of most of them.

There are many alternatives to static CMOS, but it has remained the leader in synthesis design. This is because it is quite robust, power friendly and very well understood. CMOS cells also typically have very good output levels and noise margins. Many other circuit styles show some advantage over CMOS in terms of power, area or switching speed, but these styles often have issues with degraded voltage or inability to cascade several gates together.

This paper looks at a custom designed group of cells that the authors have developed to compete with their CMOS counterparts. This library was constructed using pass transistor logic (PTL) to design custom cells. PTL can be very efficient in terms of speed, power and area, but it suffers from several issues as well. The main issue is that area efficient PTL often requires using very unorthodox techniques such as using PMOS and NMOS transistors to pass signals that result in degraded signals (Logical 1 for N-type and 0 for P-type). This means that the advantages of PTL are often mitigated. For example, using PTL, the authors have constructed 2 transistor logical AND, but in a 5 volt rail .5 micron process the passed value for a logical 1 was only around 3.8 volts and a 0 of around 1.3 volts. This is still within the switching thresholds but is barely functional when multiple ANDs are cascaded. This gate is however used internally to build larger cells like the AOI. To fight this undesirable effect, each gate is buffered using a typical CMOS inverter. This output buffering allows for cascading and makes the cells more practical for an environment where synthesis tools will be deciding which gate is used in a given design.

This paper will begin by briefly looking at related uses of PTL. Next, a brief look at the PTL cells is presented followed by a description of the test setup. The paper then examines the results of the testing and concludes with observation and possible future work the authors would like to see examined.

#### 2. Related and Prior Work

PTL has been gaining ground in areas such logic synthesis and verification for some time now. Most of the related work like [3] is studying applications of PTL with BDD's and other decision and mapping techniques for synthesis and formal verification. There were a few papers that relate more directly to this work. [1] motivates a very small PTL library using LEAP and some custom software to compete with a large CMOS library. The interesting fact is that the initial assumptions in [1] are somewhat contradicted by the findings of this paper. It posits that PTL function cells are always larger, slower and more power hungry than the CMOS equivalent. This simply was not the case in our study for several cases, especially with regard to area.

The information in [2] gave the authors of this study carte blanche to experiment with tweaking each cell and use very non-traditional methodologies as described in the introduction. It puts functionality and performance (power, speed etc) as the only necessity in custom circuit design. This concept was abstracted by the authors and applied to the individual cells to exploit potential gains.

## 3. Gate Level Design

The reference CMOS gates were obtained from the University of Utah (UofU) Digital CMOS library. This is a library designed and refined by students and faculty at the University and served as the baseline for comparison. It is a relatively small library that contains many commonly used gates as well as a few more exotic gates. A moderate sample of the more standard gates was selected and each will be described below.

The pass transistor library was hand designed gate by gate by the authors of this paper and each was functionally equivalent to the UofU library cell to ensure a fair test. This was done to ensure an apples-to-apples comparison. Some considerations for future work are included in the final section of this paper as to other possible performance testing strategies. This section however will show the PTL equivalent of the functional cells used in this test. The CMOS cells are all very standard and thus are not included on the transistor level.

The first figure is the 2 transistor AND. It is the base of the NAND gate. The OR gate is almost the same, but P and N transistors are switched and it is tied to Vdd instead of GND. These gates along with an inverter make the other 2-input gates. These very basic gates are also used in larger gates like the AOI and NOR4. Due to the major similarities, only the AND is pictured below.



2 Transistor AND Gate

And the follow-up gates, the AOI using both the AND and OR and the NOR4 using a combination of ORs and a standard CMOS NAND gate.



PTL Realization of AOI



PTL/CMOS Hybrid NOR4 Using 2 PTL ORs and a CMOS NAND and CMOS NOR4

Next up is the XOR. The XOR uses a slightly different approach. The NMOS gates pass or block the opposite signal. There is also a pull-up network through the PMOS transistors. This yields a functional XNOR that is then passed to the CMOS inverter. This can easily be changed to a restoring XNOR by swapping the P and N transistors again, and changing the N network to a pull-down. This same basic design can also be extended to make an XOR3. The XOR3 was also briefly tested with results similar to the PTL XOR/CMOS XOR comparison. It is not shown in this report, but would be an excellent category for inclusion in library.



Restoring PTL XOR

Finally, the inverting MUX is a very different design. It is only 4 transistors. It functions by sending a select signal to both an NMOS and PMOS gate at the same time. This works quite well since only one of these can be truly 'on' at a time. The output is then tied to an inverter. Because the select signal is not degraded through passing, these gates can be cascaded without inverters in between the MUX stages and using one inverter at the output.



4 Transistor Inverting MUX

## 4. Testing Setup

An example figure showing the basic test structure is depicted in the figure in this section. Basically, each input to the cells are provided by running square waves with 50ps rise and fall times and varying periods to give a variety of different input sequences through standard 1x inverters. The periods are set to try and give each gate time to level off. The inputs are sent to both the PTL and CMOS gate inputs, and the outputs of each are fed into a FO4-sized inverter. The outputs are monitored at the input to the inverter for the voltage comparison.



Example of Test Setup with XOR

The power setup was a little more difficult (especially for the PTL gates) as we could not measure based just on Vdd as it also powers the inverters. To circumvent this, the interior of the cells were used, and their Vdd connections were instead hooked to other supplies at the same 5 volt level. The current was determined by checking the full current from the supply for CMOS and PTL. For the PTL case it was also necessary to monitor the current flowing through the pass transistors as well. This method for the PTL should be close to the actual value. The small difference that may be present is due to the fact that the difference between the source and drain of the pass transistor may be less than 5 volts in the cases cited in the introduction. There did not appear to be any power specific tool in the cadence suite analog environment.

Transient analyses were then performed monitoring Voltage and current. The absolute value of the current was then integrated using the cadence calculator for each contributing current. In the case of the CMOS gates there is always one current. However, some of the PTL gates have multiple currents that contribute to the total current. The integrals of current were then multiplied by Vdd to determine the total energy for each simulation. From here, the average power of each individual workload was determined through division by the length of the integral. For any normal power analysis, a more sophisticated analysis with better heuristics would be needed, but since this is a comparison with both gates it is the ratio, and not the raw average power for each

gate that is desired.

Performance was also very important for this study. To determine the performance the time it took the gates to transition to 4 different levels were considered: The rise time to Vdd/2 and 3.5 volts from GND and the fall time from Vdd volts to Vdd/2 and 1.5 volts. All these are reasonable thresholds for a 5 volt process.

The total transistor area was also considered by summing up the total width of all the transistors. Ideally, we would compare the cell sizes instead, but time constraints prevented time to complete the layout for the PTL cells. Area was a tertiary consideration, but the sizes needed for the different gates are interesting, and the results from the observations of the ratios of the PTL to CMOS gates motivates further research into furthering this study at some future time. The results of the testing for each gate are examined in detail in the next section.

## 5. Results and Comparisons

Tabulated results from each of the gates discussed are reported here. Each table has the gate under test (PTL and CMOS) and the results reported in the corresponding tables with a short note of comparison. The parameters included were discussed above in the Testing Settings section above.

| Inverting MUX     | CMOS                 | PTL                  |
|-------------------|----------------------|----------------------|
| 3.5V              | 584 pS               | 353 pS               |
| 1.5V              | 421 pS               | 364 pS               |
| VDD/2 (up)/(down) | 389 pS / 276 pS      | 253 pS / 257 pS      |
| Transistor Area   | 41.4 um <sup>2</sup> | 10.8 um <sup>2</sup> |
| Ave. Power        | 1.54 mW              | 1.20 mW              |

In this case the PTL gate clearly trumps the CMOS equivalent in all cases. There are significant improvements in performance and power. The area looks to be much reduced as well. The speed comes from the short path to the output.

| XOR               | CMOS                 | PTL                  |
|-------------------|----------------------|----------------------|
| 3.5V              | 552 pS               | 422 pS               |
| 1.5V              | 572 pS               | 441 pS               |
| VDD/2 (up)/(down) | 361 pS / 411 pS      | 301 pS / 351 pS      |
| Transistor Area   | 46.8 um <sup>2</sup> | 28.2 um <sup>2</sup> |
| Ave. Power        | 1.30 mW              | 1.72 mW              |

Here there is a considerable increase in performance and a smaller transistor area, but the power consumption is almost an even tradeoff higher. This is because there are more paths of current moving to the output.

| AOI               | CMOS                 | PTL                |
|-------------------|----------------------|--------------------|
| 3.5V              | 276 pS               | 570 pS             |
| 1.5V              | 399 pS               | 771 pS             |
| VDD/2 (up)/(down) | 265 pS / 382 pS      | 380 pS / 683 pS    |
| Transistor Area   | 30.6 um <sup>2</sup> | 27 um <sup>2</sup> |
| Ave. Power        | .762 mW              | .416 mW            |

The AOI shows the opposite and more pronounced situation as the XOR above. In the authors' opinion, the tradeoff is just too high unless the situation is one that demands the absolute lowest power environment.

| NOR4              | CMOS                 | PTL                  |
|-------------------|----------------------|----------------------|
| 3.5V              | 604 pS               | 873 pS               |
| 1.5V              | 1350 pS              | 1400 pS              |
| VDD/2 (up)/(down) | 413 pS / 970 pS      | 623 pS / 1040 pS     |
| Transistor Area   | 64.8 um <sup>2</sup> | 34.2 um <sup>2</sup> |
| Ave. Power        | .420 mW              | .509 mW              |

Here again, the CMOS NOR4 (which is not a practical standard gate in its own right) bests the PTL except in the case of area. There is no real reason to keep either of these gates in a sensible library. The PTL results make sense when the gate is examined. The PTL ORing of the inputs result in a degraded voltage that causes the output gate to perform sub optimally.

| NOR               | CMOS               | PTL                  |
|-------------------|--------------------|----------------------|
| 3.5V              | 420 pS             | 820 pS               |
| 1.5V              | 558 pS             | 610 pS               |
| VDD/2 (up)/(down) | 290 pS / 397 pS    | 620 pS / 442 pS      |
| Transistor Area   | 18 um <sup>2</sup> | 12.6 um <sup>2</sup> |
| Ave. Power        | .752 mW            | .786 mW              |

| NAND              | CMOS                 | PTL                  |
|-------------------|----------------------|----------------------|
| 3.5V              | 215 pS               | 763 pS               |
| 1.5V              | 365 pS               | 793 pS               |
| VDD/2 (up)/(down) | 162 pS / 254 pS      | 553 pS / 542 pS      |
| Transistor Area   | 14.4 um <sup>2</sup> | 12.6 um <sup>2</sup> |
| Ave. Power        | .546 mW              | 1.14 mW              |

In the cases of NOR and NAND, it simply makes no real sense to replace the CMOS versions with the PTL equivalent.

## 6. Conclusion and Possible Future Work

This study yielded mixed results for the PTL library. All the gates were smaller in terms of transistor area and this should translated into a reduced (maybe not proportional to transistor area) layout. The area can not make up for the discrepancy in performance in several of the gates, however. The authors' overall conclusion is that a hybrid library would be best for a general purpose library. There are a few clear wins in this test such as the XOR and MUX that could displace their CMOS counterparts and show an increase in designs in which they are well utilized. This goes for their derivative gates like XOR3

and MUX4 from the tests to this point. The standard NAND and NOR handily beat their PTL rivals due to the degraded voltages in the PTL cells, and the area is close enough that they do not justify the switch. The AOI and NOR4 can be implemented with reduced area, but it doesn't seem to make up for the difference in performance.

It would be interesting to try designing larger functional blocks that could be used in a larger scale project. The idea of restoring PTL is not with out merit, as a few of the cells were clearly better than the alternatives. This seems like a good technology for building arithmetic circuits.

In the future, it would be interesting to research how well this technology scales to more current technology nodes like the 90 and 65 nm processes. Another extension would be to generate the layouts and add the design to a standard library then use different synthesis tools to determine if and when the tools choose one type of gate over the other. Overall, this project was very intellectually satisfying and has led the authors to embrace creative way of thinking about library design as well as circuit tweaking.

#### **References:**

[1] K. Ymo, Y. Sasaki, K. Rikino, and K. Seki, "Top-down pass-transistor logic design," IEEE JSSC, vol. 31, no. 6,

June 1996.
[2] W. J. Grundmann, D. Dobberpuhl, R. Allmon, and N. Rethman, "Designing High Performance CMOS Microprocessors Using Full Custom Techniques" Digital Semiconductor, Digital Equipment Corporation

[3] P. Buch, A. Narayan, R. Newton, and A. Semgiovanni-Vincentelli, "Logic Synthesis for Large Pass Transistor Circuits" Department of Electrical Engineering & Computer Sciences, University of California, Berkely