

# Low Power Area Efficient Flip Flop Design

 <sup>(1)</sup> S Prabaharan <sup>(2)</sup>A Manikandan Department of Electronics and Communication Engineering
Sree Sastha Institute of Engineering and Technology, Chennai (India)
<sup>[1]</sup>prabatel@gmail.com <sup>[2]</sup> Manikandan.1182@gmail.com

*Abstract*— Chip density and operating frequency are increasing steadily to perform complex computations at faster rate, leading to increase in the power dissipation of digital circuit design. Low power flip flop design featuring an explicit type pulse triggered structure and a modified true single phase clock latch based on a signal feed-through scheme is presented. The proposed design acts as a solution for the long discharging path problem found in most explicit pulse triggered flip flops also achieves better speed and power performance. Proposed design outperforms the conventional pulse-triggered flip flop design in data to Q delay. The charge keeper circuit for the internal node X can be saved and a pass transistor controlled by the pulse clock is included, so input data can drive node Q directly. Along with pull up transistor, it facilitates signal driving from input source to node Q. Node can be quickly pulled up to shorten data transition delay.

Keywords-pass transistor, long discharging, signal feed through scheme, charge keeper.

## I. INTRODUCTION

Recent interest in flip flops have trends in high performance systems along with higher clock frequency and more transistors on chip. Consequences of these leads to difficulty in controlling both edges of clock and higher cross talk with substrate coupling. More power consumption leads to expensive packages and better cooling systems, it also limits in performance. Clock burns upto 40%, flops upto 20% of total power. Requirements in the flip flop design are small clock-output delay, narrow sampling window. Low power consumption, reduction in area are the major aim of flip flop improving technics. Clock load should get reduced with high driving capability means increased levels of parallelism and integration of logic into the flop. Flip flops should be crosstalk insensitive otherwise dynamic and high impedance nodes are affected. Flip flops have their content change only either at the rising edge or falling edge of the enable signal. This signal is usually the controlling clock signal, after the rising or falling edge of the clock, the flip flop content remains constant even after the input changes. The main difference between the latches and flip flops is that for latches, their outputs are constantly affected by their inputs as long as the enable signal is asserted. Goal with respect to time is ability to withstand max-min process and temperature variations depends on how much of the clockperiod is taken up by set up and hold times. Pulse-triggered flip flop, because of its single-latch structure, is more popular than the conventional transmission gate and masterslave based Flip Flop (FF) in high-speed applications. Besides the speed advantage, its circuit simplicity lowers the power consumption of the clock tree system. A Pulse Triggered Flip Flop (P-FF) consists of a pulse generator for strobe signals and a latch for data storage. If the triggering

pulses are sufficiently narrow, the latch acts like an edgetriggered FF. Since only one latch, as opposed to two in the conventional master-slave configuration, is needed, a P-FF is simpler in circuit complexity. This leads to a higher toggle rate for high-speed operations. P-FFs also allow time borrowing across clock cycle boundaries and feature a zero or even negative setup time. Despite these advantages, pulse generation circuitry requires delicate pulse width control to cope with possible variations in process technology and signal distribution network. In, a statistical design framework is developed to take these factors into account. To obtain balanced performance among power, delay, and area, design space exploration is also a widely used technique. The pulse generation, can be classified as an implicit or an explicit type. In an implicit type P-FF, the pulse generator is part of the latch design and no explicit pulse signals are generated. In an explicit type P-FF, the pulse generator and the latch are separate. Without generating pulse signals explicitly, implicit type P-FFs is in general more power-economical. However, they suffer from a longer discharging path, which leads to inferior timing characteristics. Explicit pulse generation, on the contrary, incurs more power consumption but the logic separation from the latch design gives the FF design a unique speed advantage. Its power consumption and the circuit complexity can be effectively reduced if one pulse generator is shares a group of FFs (e.g., an n-bit register). In this brief, we will thus focus on the explicit type P-FF designs only.

#### II. ANALYSIS OF FLIP-FLOP ARCHITECTURES

A large number of flip-flops and latches have been published in the past few decades. They can be grouped under the static and dynamic design styles. The former



includes the master slave designs, such as the transmission gate based master-slave flip-flop. They dissipate comparatively lower power and have a low clock-to-output (CLK-Q) delay. In a synchronous system, the delay overhead associated with the latching elements is expressed by the data-to-output (D-O) delay rather than CLK-O delay. Here, D-O delay refers to the sum of CLK-O delay and the setup-time of the flip-flop. But the static designs mentioned earlier lack a low D-O delay because of their large positive setup time. Also, most of them are susceptible to flowthrough resulting from CLK overlap. It has the advantages of having a low-power keeper structure and a low latency direct path. As mentioned earlier, the large D-Q delay resulting from the positive setup time is one of the disadvantages of this design. Also, the large data and CLK node capacitances make the design inferior in performance. Despite all these shortcomings, static designs

Still remain as the low power solution when the speed is not a primary concern. The second category of the flip-flop design, the dynamic flip-flops includes the modern high performance flip-flops. There are purely dynamic designs as well as pseudo-dynamic structures. The latter, which has an internal precharge structure and a static output, deserves special attention because of their distinctive performance improvements. They are called the semi-dynamic or hybrid structures, because they consist of a dynamic frontend and a static output. They benefit from the CLK overlap to perform the latching operation. SDFF is the fastest classic hybrid structure, but is not efficient as far as power consumption is concerned because of the large CLK load as well as the large precharge capacitance. HLFF is not the fastest but has a lower power consumption compared to the SDFF. The longer stack of nMOS transistors at the output node makes it slower than SDFF and causes large hold-time requirement. This large positive holdtime requirement makes the integration of HLFF to complex circuits a difficult process. Also it is inefficient in embedding logic.

The major sources of power dissipation in the conventional semi-dynamic designs are the redundant data transitions and large precharge capacitance. Many attempts have been made to reduce the redundant data transitions in the flip-flops. The conditional data mapping flip-flop (CDMFF) is one of the most efficient among them. It uses an output feedback structure to conditionally feed the data to the flip-flop. This reduces overall power dissipation by eliminating unwanted transitions when a redundant event is predicted . Since there are no added transistors in the pulldown nMOS stack, the speed performance is not greatly affected. But the presence of three stacked nMOS transistors at the output node and the presence of conditional structures in the critical path increase the hold time requirement and D-Q delay of the flip-flop. Also, the additional transistors added for the conditional circuitry make the flip-flop bulky

and cause an increase in power dissipation at higher data activities. The large precharge-capacitance in a wide variety of designs results from the fact that both the output pull-up and the pull-down transistor are driven by this precharge node. These transistors being driving large output loads contribute to most of the capacitance at this node. This common drawback of many conventional designs was considered in the design of XCFF. It reduces the power dissipation by splitting the dynamic node into two, each one separately driving the output pull-up and pull-down transistors. Since only one of the two dynamic nodes is switched during one CLK cycle, the total power consumption is considerably reduced without any degradation in speed. Also XCFF has a comparatively lower CLK driving load. One of the major drawbacks of this design is the redundant precharge at node X2 and X1 for data patterns containing more 0 s and 1 s, respectively. In additon to the large hold time requirement resulting from the conditional shutoff mechanism, a low to high transition in the CLK when the data is held low can cause charge sharing at node X1. This can trigger erroneous transition at the output unless the inverter pair INV1-2 is carefully skewed. This effect of charge sharing becomes uncontrollably large when complex functions are embedded into the design.



Figure2. Semi Dynamic FF





#### Figure 4. DDFF

In DDFF node X1 is pseudo-dynamic, with a weak inverter acting as a keeper, whereas, compared to the XCFF, in the new architecture node X2 is purely dynamic. An unconditional shutoff mechanism is provided at the frontend instead of the conditional one in XCFF. The operation of the flip-flop can be divided into two phases: 1) the evaluation phase, when CLK is high, and 2) the precharge phase, when CLK is low. Pulse generation, can be classified as an implicit or an explicit type. In an implicit type P-FF, the pulse generator is part of the latch design and no explicit pulse signals are generated. In an explicit type P-FF, the pulse generator and the latch are separate. Without generating pulse signals explicitly, implicit type P-FFs are in general more power-economical. However, they suffer from a longer discharging path, which leads to inferior timing characteristics. Explicit pulse generation, on the contrary, incurs more power consumption but the logic separation from the latch design gives the FF design a unique speed advantage. Its power consumption and the circuit complexity can be effectively reduced if one pulse generator is shares a group of FFs (e.g., an *n*-bit register). In this brief, we will thus focus on the explicit type P-FF designs only. To provide a comparison, some existing P-FF designs are reviewed first. shows a classic explicit P-FF design, named data-closeto- output (ep-DCO). It contains a NAND-logic-based pulse generator and a semidynamic true single-phase-clock (TSPC) structured latch design. In this P-FF design, inverters I3 and I4 are used to latch data, and inverters I1 and I2 are used to hold the internal node X. The

pulse width is determined by the delay of three inverters. This design suffers from a serious drawback, i.e., the internal node X is discharged on every rising edge of the clock in spite of the presence of a static input "1." This gives rise to large switching power dissipation. To overcome this problem, many remedial measures such as conditional capture, conditional precharge, conditional discharge, and conditional pulse enhancement scheme have been proposed. An extra nMOS transistor MN3 controlled by the output signal Q\_fdbk is employed so that no discharge occurs if the input data remains "1."

In addition, the keeper logic for the internal node X is simplified and consists of an inverter plus a pull-up pMOS transistor only. It differs from the CDFF design in using a static latch structure. Node X is thus exempted from periodical precharges. It exhibits a longer data-to-Q (D-to-O) delay than the SCDFF design. Both designs face a worst case delay caused by a discharging path consisting of three stacked transistors, i.e., MN1-MN3. To overcome this delay for better speed performance, a powerful pull-down circuitry is needed, which causes extra layout area and power consumption. The modified hybrid latch flipflop (MHLFF) also uses a static latch. The keeper logic at node X is removed. A weak pull-up transistor MP1 controlled by the output signal Q maintains the level of node X when Q equals 0. Despite its circuit simplicity, the MHLFF design encounters two drawbacks. First, since node X is not predischarged, a prolonged 0 to 1 delay is expected. The delay deteriorates further, because a level-degraded clock pulse (deviated by one VT) is applied to the discharging transistor MN3. Second, node X becomes floating in certain cases and its value may drift causing extra dc power.



Figure 5. ep-DCOFF





## III. PROPOSED D FLIP FLOP DESIGN

All above mentioned flip flops encounter the same worst case timing occurring at 0 to 1 data transitions. Referring to Fig. 2(a), the proposed design adopts a signal feed-through technique to improve this delay. Similar to the SCDFF design, the proposed design also employs a static latch structure and a conditional discharge scheme to avoid superfluous switching at an internal node. However, there are three major differences that lead to a unique TSPC latch structure and make the proposed design distinct from the previous one. First, a weak pull-up pMOS transistor MP1 with gate connected to the ground is used in the first stage of the TSPC latch. This gives rise to a pseudo-nMOS logic style design, and the charge keeper circuit for the internal node X can be saved. In addition to the circuit simplicity, this approach also reduces the load capacitance of node X[20], [21]. Second, a pass transistor MNx controlled by the pulse clock is included so that input data can drive node Q of the latch directly (the signal feed-through scheme). Along with the pull-up transistor MP2 at the second stage inverter of the TSPC latch, this extra passage facilitates auxiliary signal driving from the input source to node Q. The node level can thus be quickly pulled up to shorten the data

transition delay. Third, the pull-down network of the second stage inverter is completely removed. Instead, the newly employed pass transistor MNx provides a discharging path. The role played by MNx is thus twofold, i.e., providing extra driving to node Q during 0 to 1 data transitions, and discharging node O during "1" to "0" data transitions. Compared with the latch structure used in SCDFF design. the circuit savings of the proposed design include a charge keeper (two inverters), a pull-down network (two nMOS transistors), and a control inverter. The only extra component introduced is an nMOS pass transistor to support signal feedthrough. This scheme actually improves the "0" to "1" delay and thus reduces the disparity between the rise time and the fall time delays. In comparison with other P-FF designs such as ep-DCO, CDFF, and SCDFF, the proposed design shows the most balanced delay behaviors. The principles of FF operations of the proposed design are explained as follows. When a clock pulse arrives, if no data transition occurs, i.e., the input data and node Q are at the same level, on current passes through the pass transistor MNx, which keeps the input stage of the FF from any driving effort. At the same time, the input data and the output feedback Q fdbk assume complementary signal levels and the pull-down path of node X is off. Therefore, no signal switching occurs in any internal nodes. On the other hand, if a "0" to "1" data transition occurs, node X is discharged to turn on transistor MP2, which then pulls node Q high. Referring to Fig. 2(b), this corresponds to the worst case timing of the FF operations as the discharging path conducts only for a pulse duration. However, with the signal feedthrough scheme, a boost can be obtained from the input source via the pass transistor MNx and the delay can be greatly shortened. Although this seems to burden the input source with direct charging/discharging responsibility, which is a common pitfall of all pass transistor logic, the scenario is different in this case because MNx conducts only for a very short period. Referring to Fig. 2(c), when a "1" to "0" data transition occurs, transistor MNx is likewise turned on by the clock pulse and node Q is discharged by the input stage through this route. Unlike the case of "0" to "1" data transition, the input source bears the sole discharging responsibility. Since MNx is turned on for only a short time slot, the loading effect to the input source is not significant. In particular, this discharging does not correspond to the critical path delay and calls for no transistor size tweaking to enhance the speed. In addition, since a keeper logic is placed at node Q, the discharging duty of the input source is lifted once the state of the keeper logic is inverted.



## International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 1, Issue 1, December 2014



Figure 8. proposed D flip flop design

## IV. SIMULATION RESULTS

The performance of the proposed P-FF design is evaluated against existing designs through post-layout simulations. The compared designs include four explicit type P-FF designs A conventional CMOS NAND-logicbased pulse generator design with a three-stage inverter chain is used for all P-FF designs except the MHLFF design, which employs its own pulse generation circuitry. Since pulse width design is crucial to the correctness of data capture as well as the power consumption the transistors of the pulse generator logic are sized for a design spec of 120 ps in pulse width in the TT case. The sizing also ensures that the pulse generators can function properly in all process corners. With regard to the latch structures, each P-FF design is individually optimized subject to the product of power and D-to-Q delay. To mimic the signal rise and fall time delays, input signals are generated through buffers. Since the proposed design requires direct output driving from the input source, for fair comparisons the power consumption of the data input buffer (an inverter) is included. For circuit features, although the proposed design does not use the least number of transistors, it has the smallest layout area. This is mainly attributed to the signal feed-through scheme, which largely reduces the transistor sizes on the discharging path. In terms of power behavior, the proposed design is the most efficient in five out of the six test patterns. The savings vary in different combinations of test pattern and FF design.



Figure 9. Output of CDMFF



Figure 10. Output of semi dynamic FF



Figure 11. Output of XCFF



International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 1, Issue 1, December 2014



#### Figure 12. Output of DDFF

The leakage powers of all FF designs under different combinations of clock and input signals. A possible concern on the proposed design arises from the pseudo-nMOS logic in the first stage. Although an alwayson MP1 prevents node X from a full voltage swing, it does not result in any dc power consumption problem. A full voltage swing can be expected at node Q because of the charge keeper with two inverters employed at node Q. A degraded "0" signal at node X may affect the transition delay of node Q but not the voltage level. The voltage level of node Q remains at an intact value of VDD. Referring to Table II, the leakage power consumption of the proposed design is very close to that of other P-FF designs. The MHLFF design is the one that suffers from a large dc power consumption because of a nonfull-swing internal node. Its dc (leakage) power consumption is much higher than others and is thus excluded from the comparison [18]. Since the proposed signal feed-through scheme requires occasional signal driving from the input node directly to the output node, we also calculate the power drawn by the pass transistor MNx (the extra power consumption caused by the signal feedthrough scheme).



Figure 12. Output of ep-DCOFF



#### Figure 13. Output of MHLFF

The simulation results of PDP curves versus setup time. The PDP values of the proposed design are smaller than other designs in almost all setup time settings. For most P-FF designs, the minimum PDP values occur at negative setup times. This is because of the extra delay introduced by the pulse generator so that input data can be applied after the triggering edge of the clock. The integration of the pulse generation logic with the latch structure gives SCDFF. All but one P-FF designs under comparison exhibit similar timing parameters. The exception is the MHLFF design, which has a slightly positive setup time and a shorter hold time than its counterparts because of a simpler pulse generator. A longer hold time mainly affect the design of the driving logic. If P-FFs are adopted in the entire design, the hold time constraint can be easily satisfied because of a prolonged clock-to-Q delay property in P-FF designs. Introducing an input delay buffer is also a simple measure to alleviate the hold time requirement.



Figure 13. Output of SCDFF



International Journal of Engineering Research in Electronic and Communication Engineering (IJERECE) Vol 1, Issue 1, December 2014



Figure 14. Output of PTLFF

#### TABLE1. POWER COMPARISON OF D FLIP FLOPS

| FLIP-FLOP          | Power<br>consumed in<br>microwatt | Number of<br>Transistors<br>used | Switching delay |
|--------------------|-----------------------------------|----------------------------------|-----------------|
| CDMFF              | 65.364                            | 22                               | 125ps           |
| ep-DCO             | 14.327                            | 18                               | 26-27ps         |
| MHLFF              | 3.839                             | 8                                | 67ps            |
| SCDFF              | 10.059                            | 13                               | 16ps            |
| SEMI DYNAMIC FF    | 15.462                            | 23                               | 28ps            |
| XCFF               | 16.58                             | 21                               | 33ps            |
| EXISTING SYSTEM    | 27.607                            | 16                               | 78ps            |
| PROPOSED<br>SYSTEM | 2.229                             | 12                               | 60ps            |

## V. CONCLUSION

A novel P-FF design by employing a modified TSPC latch structure incorporating a mixed design style consisting of a pass transistor and a pseudo-nMOS logic. The key idea was to provide a signal feed through from input source to the internal node of the latch, which would facilitate extra driving to shorten the transition time and enhance both power and speed performance. The design was intelligently achieved by employing a simple pass transistor. Extensive simulations were conducted, and the results did support the claims of the proposed design in various performance aspects. The various Flip flop design like, EP-DCO, MHLLF, SCDFF, CDMFF, SDFF, XCFF, TSPC based P-FF and proposed new P-FF are discussed. The Pass Transistor logic Flip Flop design reduces the number of transistors stacked along the discharging path. These were been designed in Tanner Tool and microwind Tool those result waveforms are also discussed.

#### REFERENCES

- Jindal K., Renu, Pandey V.K. (2013) "Design of conditional data mapping flip flop for low power application" International Journal of Science and Modern Engineering (IJISME) ISSN: 2319-6386, Vol. 1, No. 5, pp. 72-75.
- [2] Mahmoodi H., Tirumalashetty V., Cooke M., and Roy K. (2009) "Ultra low power clocking scheme using energy recovery and Clock gating" IEEE Transactions on Very Large Scale Integration Systems, Vol. 17, No. 1, pp. 33-44.
- [3] Nikolic B., Oklobdzija V.G., Stojanovic V., Jia W., Chiu J.K., and Leung M.M. (2000) "Improved senseamplifier-based flip-flop: design measurements" IEEE Journal of Solid-State Circuits, Vol. 35, No. 6, pp. 876-884.
- [4] Shyu Y., Lin Y.M., Huang C.P., Lin C., Lin Y., and Chang Y. (2013) "Effective and Efficient approach for power reduction by using multi-bit flip flops" IEEE Transactions On Very Large Scale Integration Systems, Vol. 21, No. 4, pp. 624-635.
- [5] Stojanovic V. and Oklobdzija V.G. (1999) "Comparative analysis of Master-Slave latches and flip flops for high-performance and low-power systems" IEEE Journal of Solid-State Circuits, Vol. 34, No. 4, pp. 536-548.
- [6] Wimer S. and Koren I. (2014) "Design flow for flipflop grouping in data-driven clock gating" IEEE Transactions On Very Large Scale Integration Systems, Vol. 22, No. 4, pp.771-778.