Home
Scholarly Works
HW/SW Collaborative Techniques for Accelerating...
Conference

HW/SW Collaborative Techniques for Accelerating TinyML Inference Time at No Cost

Abstract

With the unprecedented boom in TinyML development, optimizing Artificial Intelligence (AI) inference on resource-constrained microcontrollers (M CU s) is of paramount importance. Most of the existing works focus on peak memory or computation reduction. The tasks are partitioned in the patch-based or device-based during the execution. However, it comes with a price of the latency and communication overhead. In this paper, we propose several techniques to accelerate the Convolutional Neural Networks (CNN s) inference process. These techniques are both architecture- and application-aware. From the application perspective, 1) we maximize computation reuse through instruction reordering, 2) fuse several linear layers together to improve computation patterns, and 3) enable memory reuse of intermediate buffers for improving memory behavior. From the architecture perspective, we propose techniques that take into account knowledge about underlying architecture of the MCU including 1) cache-aware and 2) multi-core parallelism-aware techniques. Those solutions only require the general MCUs features thus demonstrating board generalization across various networks and devices. These techniques come at no additional cost. It improve the inference latency without any compromise of the model accuracy or the model size. Our evaluation on a use-case from the health-care domain with real-data set for four CNNs - LeNet, AlexNet, ResNet20, and SqueezeNet - show that we achieve up to 71 % reduction in inference latency.

Authors

Sun B; Hassan M

Volume

00

Pagination

pp. 512-520

Publisher

Institute of Electrical and Electronics Engineers (IEEE)

Publication Date

August 30, 2024

DOI

10.1109/dsd64264.2024.00074

Name of conference

2024 27th Euromicro Conference on Digital System Design (DSD)
View published work (Non-McMaster Users)

Contact the Experts team