Vol. 06, No. 4 (2024) 1765-1770, doi: 10.24874/PES.SI.24.03.013



# **Proceedings on Engineering Sciences**



www.pesjournal.net

# **AREA AND POWER EFFICIENT LEAST MEAN** SQUARE ADAPTIVE FILTER USING **APPROXIMATE ARITHMETIC**

G. R. L. V. N. Srinivasa Raju<sup>1</sup> Doondi Kumar Janapala J. Naga Vishnu Vardhan Chinnaiah M C Praveen Kumar Nalli Jannu Teja Sri

Received 25.10.2023. Received in revised form 26.11.2023. Accepted 04.01.2024 UDC - 004.383.3

### Keywords:

Based Approximate Multiplier (ROBA), Least Mean Square (LMS), Adaptive *Filter (AF), Weight Update Block* Multiply and Accumulate (MAC)



ABSTRACT

Finite Impulse Response (FIR), Rounding The efficiency of a digital signal processing system heavily relies on the performance of multipliers, which are crucial arithmetic functional units. Approximate arithmetic techniques have emerged as a promising approach to significantly reduce circuit complexity, latency, and energy (WUB), Digital Signal Processing (DSP), consumption. This paper presents a rounding-based approximate multiplier, grounded in approximate arithmetic principles, to execute a Least Mean Square (LMS) adaptive filter. Within the LMS adaptive filter, conventional multipliers are replaced with approximate arithmetic-based multipliers. These approximations simplify the multiplication operations, resulting in reduced area and power consumption. The LMS adaptive filter adjusts filter coefficients based on the LMS algorithm. This proposed system is realized using the Verilog hardware description language, and its performance is validated through simulation and synthesis using Xilinx ISE 14.7 simulator and Vivado design suite. Simulation results showed that implementing the LMS adaptive filter algorithm with rounding-based approximate multipliers yields a substantial reduction in area, latency, and power consumption.

© 2024 Published by Faculty of Engineering

# **1. INTRODUCTION**

Digital filters play a pivotal role in modern digital signal processing (DSP) applications. These filters are essential devices employed to shape and manipulate the spectral characteristics of a signal while rejecting unwanted or undesirable components. In DSP, one innovative category of filters is Adaptive Filters (AF), which holds a crucial position due to their ability to automatically adjust their coefficients based on adaptive algorithms, thereby enhancing their performance. Adaptive filters, in contrast to conventional linear filters, are nonlinear in nature, allowing them to adapt dynamically to changing input signals. One widely used algorithm for adapting filter

Corresponding author: G.R.L.V.N. Srinivasa Raju Email: grlvnsr@svecw.edu.in

coefficients is the Least Mean Squares (LMS) algorithm. LMS adaptive filters iteratively adjust their coefficients to minimize the error between the desired and actual filter outputs. This adaptation enables them to approximate and track time-varying signals accurately, making them indispensable in applications like noise cancellation, echo cancellation, and adaptive beamforming. Multipliers constitute a primary source of power consumption in digital signal processing circuits. To meet stringent power budgets in VLSI circuits, strategies are employed to reduce power consumption. One effective approach is to optimize the use of multipliers and adders within digital filters. By minimizing unnecessary data transitions and employing efficient arithmetic operations, power consumption can be significantly lowered.

In this context, approximate arithmetic designs have gained prominence. These designs offer a trade-off between accuracy and power efficiency. They employ techniques such as reduced-precision arithmetic and approximate algorithms to achieve computational savings while still delivering acceptable signal processing performance. This approach is particularly valuable in battery-powered and energy-efficient devices, where minimizing power consumption is paramount. Previous studies shows that approximate computing has appear as a good prototype to upgrade the circuit performance i.e., speed and power dissipation (Han & Orshansky, 2013). The reduction in the hardware complication of LMS AF supposed error free arithmetic (Allred et al., 2005).

The conditions which involved in replacing of MAC unit with LUT are presented by Khan et al. (2017). It is also proved that approximate multipliers are used to reduce the logic compression of the design (Qiqieh et al., 2017). In other work, a well-organized architecture for the execution of a delay LMS AF propose a plan of action for maximize remaining pipelining over time taking connectional blocks of the shape is presented (Meher & Park 2014). A new strategy AF using Offset Binary Coding (OBC) technique and removes two oldest sample permit attainable decomposition of LUT (Khan & Ahamed, 2017). Approximate multiplier offer rounding the quantity to nearby supporter of two and it is relevant for both signed and unsigned multiplications (Zendegani et al., 2017). A FIR filter designed using ROBA multipliers and rounding quantities to the nearest power of two is aimed at optimizing the multiplication process to reduce area and enhance processing speed. This approach involves explaining the multiplication operation in a way that minimizes computational complexity and maximizes efficiency (Begum & Kumar, 2017). Recently, the implementation of approximate multipliers, judges' effect on execution of LMS algorithm and presents approximate multiplier, whose precision adjusted results in low hardware complexity (Esposito et al., 2019).

The research gap of the study pertains to the underexplored area of leveraging approximate arithmetic methods to enhance area and power efficiency in adaptive filters. A more thorough investigation is needed to develop innovative techniques that can improve performance while minimizing computational complexity and resource utilization. With these motivations, this paper presents a LMS adaptive filter design using ROBA multiplier to lower the power requirement, and to reduce area, delay with the improved performance of design.

The paper is structured as detailed below. Section II outlines the LMS adaptive filter algorithm. The design methodology is presented in section III. Section IV presents the results and discussions. Section V discusses the implications of the proposed design.

#### 2. LMS ADAPTIVE FILTER

Machine learning algorithms can be used to analyze clinical LMS AF balances Filter co-efficient to adapt input signal and act as negative feedback to lower the error between FIR Filter output and desired signal (Riaz et al., 2018). When differentiated with additional algorithms used for applying AF the LMS algorithm is seen to present very well in terms of simplicity (Mohanty & Meher, 2013; Krishnamurthy et al., 2017; Sadeghi et al., 2019). FIR filter is linear whose output is linear to the input signal and operates only on current and past input values. FIR filters are used to filter the unwanted signals from the discrete input signals (Haykin, 2003; Prakash & Ahamed 2013; Venkatachalam & Ko, 2017).



Figure 1. LMS Adaptive Filter

The Figure 1 conveys the input signal x(n), output signal y(n), error signal e(n), desired signal d(n), filter output y(n). The above structure shows in which way the output signal of the Filter is determined from input signal. LMS AF varies the filter transfer function in accordance with the adaptive algorithm to enhance the performance and it requires the number of iterations equals to the input signal (Farshchi et al., 2013). The LMS adaptive Algorithm reports how the parameters are modified from one time instant to another. There is an input, a desired response and the error between them to adjust the filter parameters (Haykin, 1996; Bhardwaj et al., 2014). The WUB regulate the FIR filter coefficients using the LMS algorithm. The FIR Filter utilizes the perpetually substituting the co-efficient planned by the WUB to calculate an output signal.

Filter output:

$$y(n) = \sum_{n=0}^{N-1} x[n] w(n)$$
 (1)

Estimation error:

$$e(n) = d(n) - y(n) \tag{2}$$

This measures the difference between the output of the adaptive filter and the output of the unknown system. On the basis of this measure, the adaptive filter will change its coefficients in an attempt to reduce the error. Tap weight adaptation:

$$W(n+1) = w(n) + \mu e(n) x(n)$$
 (3)

#### 3. DESIGN METHODOLOGY

This section introduces the structure of ROBA Multiplier design. It is an efficient multiplier to bring down the power dissipation in circuits (Vasudeva Reddy et al., 2023) The sign detector structure detects the sign of the input variables if they are negative, they are converted into two's complement form and for each absolute value is generated.



Figure 2. Block diagram of ROBA multiplier

The block diagram of ROBA multiplier is shown in Figure 2, and is applicable for both unsigned multiplication and signed multiplication. For unsigned multiplication, sign detector and sign set is disabled that can speed up the multiplication process. The inputs given to sign detector block which discover MSB of input and transferred to sign set signed multiplication or indicated unsigned multiplication. Rounded block is used for round off purposes and it extracts the nearest value in the form of  $2^n$ . The inputs A, B are rounded by  $A_r$ ,  $B_r$ . Then A and B written as

$$AB = (A_r - A)(B_r - B) + A_r B + B_r A - A_r B$$
(4)

The shifter blocks are used to implement the product terms. The operations of  $A_rB_r$ ,  $A_rB$ ,  $B_rA$  applied to functioning of shifting i.e., shown equation (4). Application of  $(A_r - A)(B_r - B)$  is difficult. So, that part is omitted and simplifies the multiplication operation shown in the equation (5).

$$AB = A_r B + B_r A - A_r B_r \tag{5}$$

The adder block adds the product to get the final result if the input variables are negative subtraction block is preferred. Finally, the relevant sign is set according to the sign of the input variables (Maddela et al., 2021; Parvathi & Chinnaaiah2023). In the proposed design ROBA multiplier is used for the execution of LMS AF. The structure continuously changed the filter input and output. Figure. 3 present the structure of LMS AF that uses multiplier.



Figure 3. Structure of LMS AF using ROBA multiplier

In the proposed design that multiplier can be replaced with ROBA Multiplier which gives less area and less delay and less power consumption when compared with other approximate multipliers. The output equal to the desired, then error between the two is zero and weights of LMS AF match to the weights of the FIR filter results a good convergence. The delay input is multiplied with the corresponding coefficient by the multiplier and remaining is added to configure the filter output. Proposed LMS AF designed based on proposed multiplier permit decrease in power consumption and gives less area and delay.

# 4. RESULTS & DISCUSSIONS

The LMS Adaptive Filter, employing a ROBA multiplier, is accurately designed using the Verilog hardware description language. This specialized approach was subjected to thorough simulation and synthesis processes using both the Xilinx ISE 14.7 simulator and the Vivado design suite. The integration of the ROBA multiplier into the LMS Adaptive Filter yielded several notable advantages. Firstly, it demonstrated superior resource efficiency, consuming less FPGA resources. Specifically, the LMS AF using the ROBA multiplier occupied 540 slice Look-Up

Tables (LUTs) and 576 slice registers. In contrast, traditional LMS AF implementations typically utilize more FPGA resources. Moreover, this innovative design exhibited reduced power consumption, making it an attractive choice for power-sensitive applications. Lower power requirements contribute to enhanced energy efficiency and longer battery life in portable devices.

#### 4.1 RTL Schematic of Approximate Multiplier

Figure 4 depicts the RTL schematic of a proposed approximate multiplier used in LMS AF algorithm. This specific configuration of the multiplier provides a way to perform multiplication. To evaluate its functionality and performance testing is conducted. During the simulation process it is examined how it behaves, under different input conditions and compared its accuracy to traditional multipliers. The investigations ensured that the approximate multiplier could be efficiently implemented and used in real world applications while striking a balance, between accuracy and resource utilization.



Figure 4. RTL Schematic of Approximate Multiplier

# 4.2 RTL Schematic of LMS AF using ROBA Multiplier

Figure 5 presents the RTL schematic of the LMS AF incorporating the ROBA multiplier. This specialized design harnesses the power of adaptive filtering while optimizing resource utilization and performance. To evaluate and implement this innovative design, a twostep process involving simulation and synthesis was carried out using the Vivado design suite. In the simulation phase, the behaviour of the LMS AF with the ROBA multiplier was rigorously tested under various input conditions, allowing for an assessment of its accuracy and efficiency compared to traditional implementations. Following successful simulation, the synthesis step within Vivado converted the RTL description into a hardware configuration that can be deployed on FPGA devices. This process ensured that the LMS AF with the ROBA multiplier could be efficiently realized, offering a compelling balance between computational performance and resource utilization. Ultimately, this design holds promise for

applications requiring adaptive filtering with enhanced area efficiency and reduced power consumption.



Figure 5. RTL Schematic of LMS filter using ROBA Multiplier

### 4.3 Simulation Waveform of LMS AF using ROBA multiplier

Figure 6 shows the simulation waveform for the LMS AF utilizing the ROBA multiplier, implemented and analyzed within the Vivado design suite. This waveform offers a visual representation of the filter's performance and behavior, enabling engineers to assess its effectiveness in adapting coefficients and achieving desired signal processing outcomes.



Figure 6. Simulation waveform of LMS Adaptive filters using ROBA multiplier

#### 4.4 Performance Comparison

The comparison presented in Table 1 provides compelling evidence of the superiority of the ROBA multiplier in the execution of the LMS AF. It demonstrates remarkable power efficiency, with a substantial 99.86% reduction in power consumption compared to existing approximate multipliers. Additionally, the ROBA multiplier occupies significantly less FPGA area, utilizing 6% fewer resources. This efficient use of resources is critical for optimizing chip real estate. Moreover, the ROBA multiplier contributes to a 10.04% reduction in signal propagation delay, making it invaluable for applications demanding low-latency signal processing. These results underscore the ROBA multiplier's prowess in enhancing the performance and efficiency of LMS AF implementations

**Table 1.** Performance Comparison of Approximate and ROBA Multipliers

| Parameter                                 | (11)       | (12)       | Approximate<br>Multiplier<br>(10) | ROBA<br>Multiplier<br>(proposed) |
|-------------------------------------------|------------|------------|-----------------------------------|----------------------------------|
| Delay (ns)                                | 595        | 517.5      | 11.228                            | 10.1                             |
| Power (mw)                                | 16.00<br>4 | 125.7<br>6 | 57877                             | 81                               |
| Area<br>(slice LUTS &<br>slice registers) | 23265      | 21105      | 1195                              | 1116                             |

From the investigations it is observed that the ROBA multiplier within the LMS AF design showcased reduced signal propagation delays. This reduction in delay can be crucial in real-time signal processing applications, where minimizing latency is essential. Overall, the integration of the ROBA multiplier into the LMS Adaptive Filter represents a significant advancement in hardware design. It optimizes FPGA resource utilization, lowers power consumption, and reduces signal propagation delays, making it a compelling choice for various applications, such as communications, image processing, and embedded systems, where efficient and low-latency signal processing is paramount. This demonstrates the ongoing drive in the field of digital hardware design to strike a balance between performance and resource efficiency, offering solutions that meet the demands of modern technology

# 5. CONCLUSION

Adaptive filtering stands as a cornerstone in the realm of digital signal processing, enabling the refinement of signal characteristics. This paper presents a novel approach, realized through Verilog language, aimed at enhancing both area efficiency and power conservation in the operation of the LMS AF by leveraging the ROBA multiplier. The design comprehensive underwent validation through simulation utilizing Xilinx ISE 14.7 and Vivado, two prominent FPGA development tools. The results are indeed noteworthy, with the adoption of the ROBA multiplier leading to power and delay improvements. Specifically, this design achieved 99.86% reduction in power consumption, significantly contributing to energy-efficient signal processing. Furthermore, it occupied 6% less FPGA area (LUTs), demonstrating an efficient use of hardware resources. Additionally, a notable 10.04% reduction in signal propagation delay was observed, crucial for applications demanding minimal latency. As for future directions, the proposed architecture holds great promise. Further exploration can focus on refining the ROBA multiplier's design and exploring its applicability in other adaptive filtering techniques. Research can also extend to optimizing the trade-off between power savings and signal processing accuracy, ensuring that the system remains adaptable to varying requirements. Moreover, investigating the scalability of this approach for more complex and demanding applications could pave the way for broader adoption. In summary, the proposed design not only presents significant immediate benefits in power efficiency, area utilization, and signal latency but also offers a compelling foundation for future research and applications in the field of adaptive filtering.

#### **References:**

- Allred, D., Yoo, H., Krishnan, V., Huang, W., & Anderson, D. V. (2005b). LMS adaptive filters using distributed arithmetic for high throughput. *IEEE Transactions on Circuits and Systems I-regular Papers*, 52(7), 1327–1337. doi: 10.1109/tcsi.2005.851731.
- Begum, S., & Kumar, M. V. (2017). Design of FIR filter using rounding based approximate (ROBA) multiplier. *International Journal of Scientific Engineering and Technology Research*, 6(22), 4483–4489.
- Bhardwaj, K., Mane, P., & Henkel, J. (2014). Power- and area-efficient Approximate Wallace Tree Multiplier for errorresilient systems. *Fifteenth International Symposium on Quality Electronic Design*, Santa Clara, CA, USA, 263-269. doi: 10.1109/isqed.2014.6783335.
- Esposito, D., De, D., Di Meo, G., Napoli, E., & Strollo, A. (2019b). Low-Power hardware implementation of Least-Mean-Square adaptive filters using approximate arithmetic. *Circuits, Systems, and Signal Processing*, 38(12), 5606– 5622. doi: 10.1007/s00034-019-01132-y.
- Farshchi, F., Abrishami, M. S., and Fakhraie, S. M. (2013). New approximate multiplier for low power digital signal processing. *Proc. 17th Int. Symp. Comput. Archit. Digit. Syst.*, 25–30.
- Han J., and Orshansky, M. (2013). Approximate computing: An emerging paradigm for energy-efficient design. 18<sup>th</sup> *IEEE European Test Symposium (ETS)*, Avignon, France, 1-6. doi: 10.1109/ETS.2013.6569370.

Haykin, S. (1996). Adaptive filter theory (3rd ed.). Prentice Hall.

Haykin, Simon S. (2003). Least-Mean-Square Adaptive Filters. Wiley: Bernard Widrow, ISBN 0-471-21570-8.

- Khan, M. T., & Ahamed, S. R. (2017). VLSI implementation of throughput efficient distributed arithmetic based LMS adaptive filter. In B. Kaushik, S. Dasgupta, & V. Singh (Eds.), *VLSI design and test* (Vol. 711, pp. 19–26). Springer. https://doi.org/10.1007/978-981-10-7470-7\_3
- Khan, M. T., Ahamed, S. R., & Brewer, F. (2017). Low Complexity and Critical Path Based VLSI Architecture for LMS Adaptive Filter Using Distributed Arithmetic. 2017 30<sup>th</sup> International Conference on VLSI Design and 2017 16<sup>th</sup> International Conference on Embedded Systems (VLSID), Hyderabad, India, 127–132. doi: 10.1109/vlsid.2017.16.
- Krishnamurthy, S., Kannan, R., Yahya, E. A., & Bingi, K. (2017). Design of FIR filter using novel pipelined bypass multiplier. 2017 IEEE 3<sup>rd</sup> International Symposium in Robotics and Manufacturing Automation (ROMA), Kuala Lumpur, Malaysia, 1-6. doi: 10.1109/roma.2017.8231838.
- Maddela, V., Sinha, S. K., & Parvathi, M. (2021b). Extraction of undetectable faults in 6T- SRAM cell. 2021 International Conference on Communication, Control and Information Sciences (ICCISc). doi: 10.1109/iccisc52257.2021.9484987.
- Meher, P. K., & Park, S. (2014). Area-Delay-Power Efficient Fixed-Point LMS Adaptive Filter with Low Adaptation-Delay. *IEEE Transactions on Very Large Scale Integration Systems*, 22(2), 362–371. doi: 10.1109/tvlsi.2013.2239321.
- Mohanty, B. K., & Meher, P. K. (2013). A high-performance energy-efficient architecture for FIR adaptive filter based on new distributed arithmetic formulation of block LMS algorithm. *IEEE Transactions on Signal Processing*, 61(4), 921–932. https://doi.org/10.1109/TSP.2012.2226453
- Parvathi, M., & Chinnaaiah, M. C. (2022). Implementation and performance evaluation of hybrid SRAM architectures using 6T and 7T for Low-Power applications. *In Lecture notes in networks and systems*, 235–245. doi: 10.1007/978-981-19-4990-6\_22.
- Prakash, M. S., & Ahamed, S. R. (2013). Low-Area and High-Throughput architecture for an adaptive filter using distributed arithmetic. *IEEE Transactions on Circuits and Systems Ii-express Briefs*, 60(11), 781–785. doi: 10.1109/tcsii.2013.2281747.
- Qiqieh, I., Shafik, R., Tarawneh, G., Sokolov, D., & Yakovlev, A. (2017). Energy-efficient approximate multiplier design using bit significance-driven logic compression. Design, *Automation & Test in Europe Conference & Exhibition (DATE)*, 7–12. doi: 10.23919/date.2017.7926950.
- Riaz, M., Ahmed, S. A., Javaid, Q., & Kamal, T. (2018). Low power 4×4 bit multiplier design using dadda algorithm and optimized full adder. 2018 15<sup>th</sup> International Bhurban Conference on Applied Sciences and Technology (IBCAST), 392-396. doi: 10.1109/ibcast.2018.8312254.
- Sadeghi, M., Zahedi, M., & Ali, M. (2019). The Cascade Carry Array Multiplier a novel structure of digital unsigned multipliers for Low-Power consumption and Ultra-Fast applications. *Annals of Emerging Technologies in Computing*. doi: 10.33166/aetic.2019.03.003.
- Vasudeva Reddy, T., Madhava Rao, K., Santhosh Kumar, V., & Hindumathi, V. (2023). Energy Efficient Memory Architecture for High Performance and Low Power Applications Under Sub-threshold Regime. *Communication, Software and Networks. Lecture Notes in Networks and Systems*, vol 493. Springer, Singapore. doi: 10.1007/978-981-19-4990-6\_21.
- Venkatachalam, S., & Ko, S. (2017). Design of power and area efficient approximate multipliers. *IEEE Transactions on Very Large Scale Integration Systems*, 25(5), 1782–1786. doi: 10.1109/tvlsi.2016.2643639.
- Zendegani, R., Kamal, M., Bahadori, M., Afzali-Kusha, A., & Pedram, M. (2017). ROBA Multiplier: a Rounding-Based approximate multiplier for High-Speed yet Energy-Efficient digital signal processing. *IEEE Transactions on Very Large Scale Integration Systems*, 25(2), 393–401. doi: 10.1109/tvlsi.2016.2587696.

G. R. L. V. N. S. Raju Department of ECE, Shri Vishnu Engineering College for Women, Bhimavaram, India <u>grlvnsr@svecw.edu.in</u> ORCID 0000-0003-1157-1856

J Naga Vishnu Vardhan Department of ECE, BVRIT Hyderabad College of Engineering for Women, India <u>vishnu.j@bvrithyderabad.edu.in</u> ORCID 0000-0002-6257-7644 M. Venkata Subbarao Department of ECE, Shri Vishnu Engineering College for Women, Bhimavaram, India <u>mandava.decs@gmail.com</u> ORCID 0000-0001-5840-2190

Chinnaiah M C Department of ECE, B V Raju Institute of Technology, Medak 502313, India <u>chinnaaiah.mc@bvrit.ac.in</u> ORCID 0000-0002-1489-7686 **Doondi Kumar Janapala** Department of ECE, Vishnu Institute of Technology, Bhimavaram, India

jdoondikumarece@gmail.com ORCID 0000-0001-5346-0877

Jannu Teja Sri Department of ECE, Shri Vishnu Engineering College for

Women, Bhimavaram, India jannutejasri@gmail.com ORCID 0009-0005-5522-2043