Low Power Implementation Of Ternary Content Addressable Memory (TCAM)

Prithwiraj Das, Ria Pathak, P. Augusta Sophy Beulet

Abstract: In network routers, Ternary Content Addressable Memory (TCAM) based search engines take an important role. One of the improved versions of Content Addressable Memory (CAM) is TCAM. For high speed and broader searching operation TCAM is used. Unlike normal CAM, TCAM has 3 logic states: 0, 1, ‘X’. In TCAM within one single clock cycle, search operation can be performed. That is why it is called special type of memory. Also, quick search ability is one of the popular features of TCAM. To compare the search and stored data, TCAM array acts parallel in every location. But high power dissipation is the main disadvantage of TCAM. To overcome this power dissipation in this paper we proposed a low power TCAM implementation by using Reversible logic.[2] Reversible logic has less heat dissipating characteristics property with respect to irreversible gate. Also, Reversible logic has ultra-low power characteristics feature. In recent past it has been proved that reversible gates can implement any Boolean function.

Keywords: TCAM, CAM, Reversible Logic, Fredkin Gate, Feynman Gate, Peres Gate, Taffoli Gate

I. INTRODUCTION

In today’s life, we need high-performance memory for network system. CAM is one such type of high performance memory. Functionality of TCAM is different from Random Access Memory (RAM). RAM is used to store information that it needed very often while the computer is running. But the drawback is it cannot store any data permanently. CAM is a hardware search engine. It is much faster than any algorithmic operation for search intensive application. CAM is basically constructed of SRAM with an additional comparison circuitry. This kind of circuitry can complete search operation in one clock cycle. CAM mainly uses two search intensive techniques namely, packet forwarding and packet classification to improve the speed. One of the improved versions of Content Addressable Memory is Ternary Content Addressable Memory (TCAM). Based on contents, Ternary Addressable Memory (TCAM) elects a word among stored ternary data. In one clock cycle TCAM compares the search key with the entire stored words in parallel and gives the address of the matched output with matching word. 0, 1, “X”[3] (don’t care state) are 3 states of a TCAM. TCAM architecture is more complex than the conventional CAM architecture. Binary CAM utilizes information search words comprising altogether 1s and 0s. In any case, in ternary CAM a third coordinating state "X" or "couldn't care less" put away at least one bit in the information word as shown in Fig. 1. This additional bits comes at an extra cost over the paired CAM as the inside memory cell should now encode three potential states rather than the two of binary CAM. By a mask bit, this additional state is implemented to every memory cell.

In various fields we can find the application of TCAM. Network routers are the main application of TCAM. In the network switches each address has 2 sections. One is system prefix, which can change in size contingent upon the subnet setup and the host address which involves the rest of the bits. Each subnet has organized musk. This system musk indicates which bits of the location are system prefix and which bits is the host address. Steering is finished by the counseling of a directing table which is kept up by the switch. This switch contains each realized system prefix, the related system cover, and the data expected to course bundles to that goal. Ternary CAM makes the lookup process very efficient for the routing table. By using “don’t care” the addresses are stored for the host part of the address.

Intrusion detection, pattern searching in genes, image processing, data compression, artificial intelligence, radar applications, signal tracking, pattern matching in virus detection [4] are the other applications of TCAM.

II. REVERSIBLE LOGIC

Everyday new faster, smaller and complex technology is developing. For achieving higher speed the clock frequency is increased. Increase in number of transistors into a chip makes that chip architecture more complex and the power consumption also increases. Mostly, the gates used to perform logical operations are irreversible. It means every time when some information about the input is deleted or lost and that lost information is
dissipated as heat. In digital design one of the important performance parameter is energy loss. Over the last decades this heat loss is reduced dramatically by higher level of integration and also by new fabrication processes. Reversible logic can also reduce power dissipation of a circuit.

By definition logic function is called reversible if the multiple output Boolean function of ‘n’ variables F(x1, x2 …. xn) [5] has

- The number of outputs equivalent to the number of inputs.
- Any output pattern with an exceptional pre-image.

In a reversible gate, one to one mapping is coordinated between vectors of input and output; so the vector of input states can be constantly remade. The output states vector does the reconstruction. A computation is called reversible if from output it is possible to recover the input.

Main properties of reversible logic is

- Feedback is not allowed.
- Fan-out is not allowed.

Swap, inverter and buffer gates are naturally reversible because they have same no of input and output.

Reversible logic came from the 2nd law of thermodynamics. It states that “A state transition from one to another equilibrium state is thermodynamically reversible, if and only if the final state can be restored to the initial state, without remaining any effect on the outside world.”

Reversible logic conserves power and hence reduces the power dissipation in a circuit design. This is the main difference of reversible logic and irreversible logic. In irreversible logic there is energy dissipation because of information loss.

The amount of energy dissipation for every bit operation (which are irreversible by nature) is minimum KTln2 joules, where K= Boltzmann constant and T is temperature at which the performance is performed. In room temperature because of the bit loss heat generated. This heat is very small at room temperature. But for higher computation, when the number of bits lost is more than the heat dissipation became larger. It affects the performance of that circuit. Both forward and backward process can be run by reversible logic. Reversible rationale can contribute to more yield. This yield can go back and forth back to any point in the computation history. Four types of reversible gates are used to implement our proposed work.

a. Fredkin gate
b. Feynman gate
c. Peres gate
d. Taffoli gate

A. Fredkin Gate

3*3 Fredkin gate structure is explained in Fig. 2. Here the input vectors are A, B, C and the output vectors are A, ~AC XOR AB, ~AB XOR AC. The Quantum cost of this gate is 5.

B. Feynman Gate

Fig. 3. 2*2 Feynman Gate

Fig. 3. is a 2*2 Feynman gate with input vectors A, B and the output vectors are A and A XOR B with a Quantum cost of 1. This gate is used as a copying gate. As the fan-out is not allowed in a reversible gate, this Feynman gate can be used as replacement for duplication of the required output.

C. Peres Gate

Fig. 4. 3*3 Peres Gate in both email address.

3*3 Peres gate structure is described in Fig. 4. Input vectors are A, B, C, and the output vectors are P=A, Q=A XOR B, R=AB XOR C [5]. The Quantum cost of this gate is 4.

D. Taffoli Gate

Fig. 5. Taffoli Gate

3*3 Taffoli gate is mentioned in Fig. 5. Input vectors are A, B, C and output vectors are P=A, Q=B, R=AB XOR C . The Quantum cost [6] of a 3*3 Taffoli gate is 5.

III. SRAM CELL USING REVERSIBLE GATE

The Reversible SRAM cell is explained in Fig. 6. In this proposed design we use 2 gates to implement SRAM cell. Only a single bit data can be stored in this SRAM cell. This cell has 2 states, i.e. HOLD and READ/WRITE. Working principle of this reversible SRAM cell depends on the WL. When WL=0, SRAM cell
is in HOLD state and when WL=1 SRAM cell operates in READ/WRITE mode. Output of Reversible SRAM cell is shown in Fig. 7. Whereas Fig. 8. Describes the schematic of Read Write SRAM CELL.

![Schematic diagram of SRAM CELL](image)

**Fig. 6. Schematic diagram of SRAM CELL**

<table>
<thead>
<tr>
<th>Write line</th>
<th>q(n)</th>
<th>Data</th>
<th>Q(n+1)</th>
<th>Bit</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
</tbody>
</table>

**Table 1: Truth Table of SRAM CELL**

In this work, we have implemented the TCAM Cell [8] using reversible SRAM. TCAM Cell can be constructed by using PERES gate as well as TAFFOLI gate. At first write (WL) signal and input data is given to the SRAM as inputs. The output of the SRAM cell is given to 2*2 Feynman gate as an input which is connected to another 2*2 Feynman gate, followed by a 3*3 Fredkin gate. And at the end, this XORed result is given as input to the Peres gate. From this gate the Match line (ML) output is obtained. This match line (ML) will be HIGH (1) when the search data bit (sl) and stored data bit (b) matches for match bit (mb) be 0.

![Simulation results of SRAM CELL](image)

**Fig. 7. Simulation results of SRAM CELL**

**IV. READ WRITE SRAM CELL**

Fig. 8. illustrates the structure of read/ write SRAM cell using reversible gates. Here the Row Decoder (WL) signal controls the entire row. Quantum cost of this cell is 16 [7].

This TCAM cell can be implemented using Taffoli also. It can be constructed by replacing the PERES gate with TAFFOLI gate as in Fig. 11. But the Quantum cost is large in the second case which also affects the worst case delay of the circuit. But the garbage output is same for both designs.

**VI. TCAM CELL WITH TAFFOLI GATE**

The TCAM cell can be implemented with Taffoli gates as well. In this structure we can replace the Peres gate with Taffoli gate. After that we can calculate the delay and quantum cost of both TCAM. For Taffoli gate delay and quantum cost is more than Peres gate. But the garbage value [9] is same for both cases. Table 2 is the comparison between Taffoli gate based
TCAM and Peres gate based TCAM. Here we have compared both TCAM cell using Taffoli gate and Peres gate. We can say from this table that for a single bit TCAM with Peres gate Quantum cost and delay case is better than single bit TCAM using Taffoli gate. The RTL view is shown in Fig. 12. Simulation resultS of TCAM cell is shown in Fig. 13.

<table>
<thead>
<tr>
<th>Model</th>
<th>Quantum Cost</th>
<th>Worst Case Delay</th>
<th>Garbage Output</th>
</tr>
</thead>
<tbody>
<tr>
<td>Single bit TCAM with Taffoli gate</td>
<td>18</td>
<td>18</td>
<td>6</td>
</tr>
<tr>
<td>Single bit TCAM with Peres gate</td>
<td>17</td>
<td>17</td>
<td>6</td>
</tr>
</tbody>
</table>

Table 3: Truth Table of TCAM CELL

<table>
<thead>
<tr>
<th>Stored Data bit (sl)</th>
<th>Search data bit (b)</th>
<th>Match Data bit (mb)</th>
<th>Match Data line (ml)</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>1</td>
<td>1</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

VII. 4*3 TCAM ARRAY ARCHITECTURE

In 4*3 array 4 numbers of each 3 bits can be stored in 4 rows [10]. From this array we can check the number where it is saved in this array. Basically we can be informed where it is saved in this array. In this array every TCAM will check one bit. When we try to search a number, we give every bit in the column for searching. When one bit is matched, match line shows ‘1’ (HIGH), like this if every bit matched with stored bit every match line became one. But this case will happen for one memory location only. But if any one of the bits is not matched it becomes 0 (LOW). For this reason we give every match line to the AND gate. If all the match line is 1 then only the AND gate output becomes 1. All the AND gate outputs are sent to the 4:2 Encoder. Between those four locations which location matches the searched data, the decoder will give that matched location in single clock cycle. Fig. 14. describes the working of 4*4 TCAM array.
In the case of 4*4 TCAM array, we need sixteen TCAM unit cells, four AND gates and one 4:2 encoder. At first, we store four 4 bits data into the memory. Then we search for the memory location of the searched data. Each bit of searched data (s) is given to each of the columns to search in parallel. Then for each row, the match line (ML) outputs of all the columns are sent to one AND gate. Whenever all the match lines are one then only the AND gate output becomes one (HIGH). This indicates the matched location in the array. The encoder will provide the matched location of the TCAM array memory. We can use this TCAM array as virtual memory. Fig. 15 describes the simulation result of 4*4 TCAM array.

For this reason we give every match line of a row to AND gate as inputs. If all the match line is 1 then only the AND gate output became 1. We are using 5:3 encoder to locate the address of the matched data. Fig. 16 describes the architecture of 5*4 TCAM array and Fig. 17 describes the simulation result of 5*4 TCAM array.

VIII. PROPOSED 5*4 TCAM ARRAY ARCHITECTURE

In 5*4 array store five numbers in 5 row, each number having 4 bits. From this array we can check where the search data is stored in this array. In this array, every TCAM can check the matching of one bit. When we try to search a number, we give every bit in the column for searching. When one bit is matched with stored data (b), match line shows one. Like this if every searched data match with stored bit, then every match line became one. But if any of them is not matched then ML becomes 0.
IX. CONCLUSION

90nm based Spartan-3 FPGA is proved to work efficiently with TCAM using reversible logic. In this paper, we implemented different types of TCAM array and checked there working process. We made this TCAM cells with reversible logic i.e. Fredkin Gate, Feynman Gate, Taffoli Gate and Peres gate to reduce the heat dissipation in TCAM. Also we noticed that Peres gate is better than Taffoli gate when it comes to quantum cost and delay case. We have also proposed a new architecture of TCAM array using AND gates instead of parity Encoder. It also reduces the complexity of the circuit.

REFERENCES


AUTHORS PROFILE

Mr. Prithviraj Das has completed his Bachelor of Engineering in the stream of Electronics and communication Engineering from RCC Institute of Information Technology, Kolkata. Now he is pursuing his Masters of Technology in VLSI Design from Vellore Institute of Technology University, Chennai. His area of interest is Digital IC Design, DFT. Currently, he is an Technical graduate intern at Intel Corporation.

Ms. Ria Pathak completed her Bachelor of Engineering in the stream of Electronics and communication Engineering from Dr. Sudhir Chandra Sur Degree Engineering College, Kolkata, West Bengal and pursuing her Masters in the field of VLSI Design from Vellore Institute of Technology University, Chennai. Her area of interests includes VLSI, IoT and Artificial Intelligence.

Being a lecturer for more than 25 years, Dr. P. Augusta Sophy, has developed a passion towards teaching Engineering subjects and guiding projects. The passion has naturally been extended towards developing the students too. Have handled more than 20 different subjects in Electronics and Communication Engineering and guided more than 50 projects in UG and PG levels. She has published 24 research papers in International Conferences and Journals. She did her research work in the field of VLSI Signal Processing in Anna University, in 2015. Received her M.E degree from Anna University and finished her B.E degree in Government College of Engineering, Tirunelveli. Now she is working as Associate Professor in VIT University, Chennai.