Abstract
With the increasing demand for the deployment of machine learning
models on energy-efficient and low-latency devices, TinyML stands out
as an efficient solution for enabling intelligence on edge-constrained
devices. TinyML workloads often need energy efficient hardware
resources for reliable deployment of Machine Learning models.
Existing hardware often lacks efficient hardware resources and is
unable to perform efficient computations. The Multiply Accumulate
Unit (MAC) plays a key role in defining the energy efficiency of the
edge-constrained TinyML hardware. To bridge the gap, this work
presents a novel architecture: a low power dynamic bit width-adaptive
multiply accumulate unit (8-bit) for TinyML Accelerators. This
architecture introduces a dynamic, multi-precision, bit width adaptive
computational capability, supporting mixed-precision modes such as 2
× 2, 2 × 4, 2 × 8, 4 × 4, 4 × 8 and 8 × 8 with signed × unsigned support,
making it highly scalable for TinyML accelerators. In addition, zero
aware gating and clock gating are implemented by employing a shift
and-add-based multiplier enabling partial product elimination and
hybrid carry lookahead adder (CLA) based accumulator enabling
dynamic segment-wise activation targeting energy efficiency in
TinyML Accelerators. Proposed architecture is simulated and verified
on eSim EDA tool and synthesized on the technology node of 130?nm
using Google SkyWater’s SKY130 PDK and the open-source EDA
toolchain OpenLANE. The proposed Multiply Accumulate Unit
reduces power by 59.36%, 68.78%, 74% and 80% when compared to
PS4MAC, state-of-the-art (SotA) mixed precision MAC, Synopsys
Design Ware MAC (DW) and approximate MAC unit respectively.
Compared to prior works, this work stands out as an efficient
architecture leading to the growth of energy-efficient TinyML
Accelerators.
Authors
Shyam Perika1, Boddu Ajay2, Sumanto Kar3
Rajiv Gandhi University of Knowledge Technologies, India1,2, Indian Institute of Technology Bombay, India3
Keywords
TinyML Accelerators, Ultra-low Power, Dynamic bit width adaptive, MAC Architecture