# Hybrid Reconfigurable FPGA Architecture Based on Autonomous Fine-Grain Power-Gating

Sathyendran<sup>1</sup>, V.J.K. Kishore Sonti<sup>2</sup>

Department of Electronics and Communication Engineering, Sathyabama University, Chennai, Tamil Nadu, India Email: <sup>1</sup>sathya6045@gmail.com, <sup>2</sup>jayakrishna adc@yahoo.com

Abstract— Field Programmable Gate Arrays (FPGAs) are special type processor which allows the end user to configure directly. This paper investigates to design a low power reconfigurable Asynchronous FPGA cells. The proposed design combines four-phase dual-rail encoding and LEDR (Level-Encoded Dual-Rail) encoding with sleep controller. Four-phase dual-rail encoding is used for small area and low power of logic blocks, where LEDR encoding is used for high throughput and low power of data transfer and the sleep controller is used to reduce the standby power that is being consumed by the CLB. The circuit is simulated using the Xilinx Tool.

**Keywords:** Field Programmable Gate Array (FPGA), Level Encoded Dual Rail (LEDR) Encoding, Logic Block, Lookup Table, Sleep Controller.

# I. INTRODUCTION

Field-Programmable Gate Arrays (FPGAs) are widely used to implement special-purpose processors. FPGAs are cost-effective for small-lot production because functions and interconnections of logic resources can be directly programmed by end users. In commercial FPGAs, the clock distribution power is a serious problem because it has an enormously larger number of registers than custom VLSIs. To solve the problems caused by the clock, asynchronous FPGAs are proposed. Instead of using the clock, the asynchronous FPGAs use the handshake protocol between their components in order to perform the necessary synchronization and communication. Therefore, asynchronous FPGAs are low power because of no dynamic power in inactive circuits. The major disadvantage of FPGAs is its low performance because of the following reasons.

- The area and delay of a switch block become large since a switch block consists of many programmable switches.
- The time for data transfer between logic blocks becomes large since data from one logic block usually traverse through many switch blocks to reach the other logic block.

This paper presents a low power FPGA that uses an LUT-level power gating technique called autonomous fine-grain power gating. To reduce the dynamic power consumption, we introduce Level Encoded Dual Rail (LEDR) based architecture. Time Multiplexed Level Encoded Dual Rail based architecture is yet another encoding technique proposed, in the view of reducing the area as well as the propagation delay of the entire circuit. The sleep controller used to reduce the standby power that is being consumed by the Lookup Table. The overall circuit occupies lesser area than the currently existing methods; hence these methods are integrated to reduce the power consumption of our FPGA.

# A. Asynchronous FPGA

# II. RELATED WORK

Asynchronous or self-timed circuit is a digital logic circuit that doesn't have a global clock signal which is different from a synchronous circuit. In changes to the signal values in the circuit are triggered by repetitive pulses called a clock signal which are used in the synchronous circuit. Mostly all digital devices use synchronous circuits but the asynchronous circuits have the potential to be faster, and also have better advantages and better modularity in large systems. Asynchronous encoding schemes are classified into. 1) Bundled-data encoding. 2) Delay insensitive encoding (usually dual-rail encoding) [2]



Figure. 1. Dual Rail Encoded for N Bits

# B. Four Phase Dual Rail Encoding

Reference [5] Four-phase dual-rail encoding is the type of dual rail encoding mostly used by asynchronous FPGAs, because of relatively small hardware cost. Figure 2(a) shows an example where data values 0, 0 and 1 are transferred. The sender sends spacer (0, 0) after a data value. The receiver knows the arrival of a data value by detecting the change of either bit: 0 to 1. The drawback of the four phase dual-rail encoding is low throughput because of the insertion of spacers.

# C. Level Encoded Dual Rail Encoding

In LEDR encoding, no spacer is required. It enhances the throughput of the delay insensitive encoding. Figure 4 shows the example where data values "0" "0" and "1" are transferred. The sender sends data values alternately in phase 0 and phase 1 [8]. The receiver knows the arrival of a data value by detecting the change of phase. The drawback of the LEDR encoding is that it requires slightly complex hardware.



Figure 2. (a) 4 Phase Dual rail Encoding and (b) LEDR Encoding

# D. Power Gating Technique

Overcome the problems of coarse grain power gating we introduce the Fine grain power gating technique. In fine grain power gating technique each look up table having own sleep controller and related to sleep transistor, so any of the lookup table active states all other lookup table are goes to sleep state. In this paper reduce the both standby power and dynamic power [2].



Figure 3. Control Strategy of the Proposed Power Gating

The proposed autonomous fine grain power gating is shown Figure 2. This is an efficient control strategy of the autonomous fine-grain power gating. The standby state is used to do the following:

- Wake up the LB before the data arrives
- Power OFF the LB only when the data does not come for quite a while.

The use of the standby state has two major advantages. First, the wake-up time can be hidden since the LB has already been woken up when the data arrivals. Second, the dynamic power can be saved since the number of the unnecessary switching of the sleep transistor is reduced

# III. ARCHITECTURE DESIGN

Fig.4 shows the overall structure of the proposed FPGA. The FPGA consists of a mesh-connected cellular array. In order to perform a high speed addition, a carry chain is used. The data in the carry chain is always encoded in four-phase dual-rail encoding and is not converted to LEDR encoding since the power and delay overheads of the protocol converters are much larger than those of the carry chain. Three wires (two wires for data encoded using dual-rail encoding and one wire for the acknowledge) are required for a single data bit. Since the proposed FPGA is based on Quasi-Delay-Insensitive (QDI) model, correct operation is theoretically guaranteed under any gate delays or wire delays [2], [5].



Figure 4. Proposed Logic Block Diagram

The proposed Logic Block consist of the lookup table, sleep controller, registers and programmable delay elements presented. The description of the logic block described below.

A. Lookup Table Design

The Lookup Table architecture consists of three sub modules. Each sub modules consist of a decoder, a multiplexer and memory register. The decoder designed by two eight input AND gates. The output of the decoder is given to the multiplexer. The decoders exclude invalid input patterns for each phase. The data's which are valid are fed to the multiplexer. As a result, the transistor count is reduced compared to the multiplexer type LOOKUP TABLE which leads to reduction of the multiplexer. If the input combination are invalid (i.e., if the two inputs have the different phases) all pass transistors turn OFF according to the output of the decoder. The proposed lookup table showed in Figure 5.



#### Figure 5. Proposed Lookup Table

The detailed structures of the decoder and multiplexer based lookup table showed below. The previous outputs stored in latch, if input patterns are valid (i.e., if the two inputs have the same phase), according to the

corresponding switching operation. The value of the memory bit is selected as outputs; the outputs are stored in the latches.



Figure 6. Detailed Structure of Lookup Table

### B. Sleep Controller

It contains phase comparator, latch and programmable delay design. It is discussed below Figure 7. The sleep controller design designed reference.



Figure 7. Sleep Controller

The each block in the sleep controller described in below designs.

*a) Phase Comparator Design:* 

The block diagram of a phase comparator for a four-input and one-output LOGIC BLOCK. The phase comparator is used to detect the data arrival. Phases of each data are extracted by XOR gates. If PHASE-A, PHASE-B, PHASE-C, and PHASE-D are different from PHASEOUT, then LOGIC BLOCK is active, and the output is '1' which means the new data has arrived to the logic block. Otherwise, it means that some data has not yet arrived and that the LOGIC BLOCK is not active and the output is '0'.The phase comparator output and pervious logic block wakeup signal given to the programmable delay. *b)* Latch Design:

If the Wake-up signal once goes to '1', the latch retains the signal until all data arrive at the LOGIC BLOCK. When all data arrive at the LOGIC BLOCK and no data arrives at the previous Logic Blocks, the output of the latch is reset to '0'.

#### c) Programmable Delay Design:

The function of the programmable delay is to delay the sleep signal by the predetermined threshold time. The programmable delay consists of a series of OR gates and several memory bits. The memory bits are used to program the delay time.

# IV. RESULTS AND DISCUSSION

The developed models can be analysed for functional correctness using a top-down design methodology and starting from a high level description. The detailed models can be generated by increasing the description details considering the hardware implementation aspects. The RTL (Register Transfer Level) code is expected to provide better model for synthesis. The functionally describing the Architectures, may then be simulated for verification and synthesized into actual hardware. There are various software tools that support design of individual components and then integration into the system to verify the design using simulation. The synthesis involves analysing the Verilog code, synthesizing for the target architecture, optimizing the design constraints such as placement directives or delay specifications, and generating an optimized FPGA net list. Placement and routing tools generate an optimal placement subject to delay constraints and then interconnect the logic using the available routing resources on the particular FPGA. The simulation and synthesis of the model has been carryout using and Xilinx ISE foundation series 14.5 as per the design environment discussed. The designed model has been simulated and synthesized. *A. Simulation Outcome* 

The simulation results in Figure 8 shows that the proposed logic block is in active state when the data arrives. If the data arrives also the comparator output state is changing from "0" to "1" which the active process behind the data arrival. This has been repeated for various combinations of data and the simulated result coincides with the results. The signal of the sleep controller when it's high the next logic block goes to active state. The signal of the sleep controller when its low the next logic block goes to standby state till the next data arrives to the logic block.



#### B. Power Analysis

Figure 8. Simulation Results of LUT

The power required to implement the Logic block after incorporating the developed fine-grain power gating LUT using the target device has been executed and tabulated below in Table I.

Table I. Power Report

| POWER      | CONVENTIONAL | PROPOSED    |
|------------|--------------|-------------|
| SUMMARY    | LOGIC BLOCK  | LOGIC BLOCK |
| Power (mW) | 30           | 26          |

# C. Area Analysis

The total Logic utilization is shown in Table II. The number of LUTs was found to be same, number of Input Output Blocks (IOBs) was reduced in the proposed architecture and delay was increased.

| LOGIC<br>UTILIZATION     | CONVENTIONAL<br>LOGIC BLOCK | PROPOSED<br>LOGIC BLOCK |
|--------------------------|-----------------------------|-------------------------|
| Number of<br>bonded IOBs | 10                          | 41                      |
| Delay(ns)                | 4.910                       | 3.987                   |

Table II. Area and Time Analysis

#### V. CONCLUSION

This paper proposes an asynchronous FPGA that combines four-phase dual-rail encoding and LEDR encoding with sleep controller to achieve small area, high throughput and low power and reduced standby power. The proposed architecture is also useful for large-scale.

#### REFERENCES

- [1] J. Teifel and R. Manohar, "An Asynchronous dataflow FPGA architecture," IEEE Transactions on Computers, vol.53, no.11, pp.1376-1392, 2004.
- [2] Masanori Hariyama, "A Low Power FPGA Based on Fine Grain Power Gating Shota Ishihara, Student Member," IEEE Trans. (VLSI) Systems, Vol. 1 9, No. 8, August 2011.
- [3] M.Hariyama, S. Ishihara, and M. Kameyama, "A low-power field-programmable VLSI based on a fine-grained power gating scheme," in Proc. IEEE Int. Midw. Symp. Circuits Syst. (MWSCAS), Knoxville, Aug. 2008, pp. 430-433.
  [4] M. Hariyama, S. Ishihara, C.c. Wei and M. Kameyama, "A field programmable VLSI based on an asynchronous bitserial architecture,"
- in Proc. IEEE Asian Solid-State Circuits Conf. (A- SSCC), Jeju, Korea, Nov. 2007, pp. 380-383.
- [5] Zhengfan Xia, Shota Ishihara, Masanori Hariyama, and Michitaka Kameyama "An Asynchronous FPGA Based on Dual/Single-Rail Hybrid Architecture", Int'l Conf. Reconfigurable Systems and Algorithms, ERSA'12
- [6] Xilinx Inc., San Jose, CA, "Spartan-3 FPGA family datasheet," 2009. [Online]. Available: http://www.xilinx.com