Abstract
Edge AI accelerators have emerged as a critical component for real
time inference under strict power and latency constraints.
Conventional accelerator architectures have focused on exact
computation, which has limited the achievable energy efficiency when
deployed in resource-constrained edge environments. Approximate
computing has gained attention as a promising paradigm that has
traded controlled accuracy loss for significant gains in power and
performance. However, most existing approximate designs have
remained static and application-specific, which has reduced their
adaptability across diverse AI workloads. The primary challenge has
involved designing an architecture that has supported dynamic
accuracy–energy trade-offs while maintaining acceptable inference
quality. Fixed approximation levels have failed to respond to varying
workload sensitivities, data distributions, and quality-of-service
requirements. As a result, edge AI systems have suffered from either
unnecessary energy consumption or unacceptable accuracy
degradation. This work has proposed a reconfigurable approximate
computing architecture that has enabled runtime adaptation of
approximation levels within an edge AI accelerator. The architecture
has integrated configurable approximate arithmetic units, adaptive
precision control, and a lightweight reconfiguration controller that has
monitored workload characteristics. Approximation modes that have
targeted multipliers, adders, and accumulation paths have been
selectively activated based on layer-wise sensitivity analysis. A design
framework that has supported rapid switching between accuracy modes
has been implemented and evaluated using representative
convolutional and transformer-based inference workloads.
Experimental evaluation demonstrates that the proposed architecture
reduces energy consumption from 3.3 mJ to 3.05 mJ across thresholds
(?_1=0.1 to ?_3=0.3) while maintaining inference accuracy within
1.9% deviation of the exact baseline. Compared with the exact baseline
accelerator, energy savings reach up to 36%, and latency decreases
from 16.2 ms to 15.4 ms. Energy–accuracy efficiency (?) achieves 0.75,
outperforming static and learning-based approximate accelerators.
These results indicate that sensitivity-aware reconfigurable
approximation effectively balances energy efficiency and output
quality, providing a practical solution for diverse edge AI workloads.
Authors
Pitty Nagarjuna1, Shaik Mohammed Rizwan2
Indian Institute of Science, Bengaluru, India1, Jazan University, Soudi Arabia2
Keywords
Approximate Computing, Edge AI Accelerators, Reconfigurable Architecture, Energy Efficiency, Adaptive Precision