Power-Efficient, Accelerated, Exponential-Based...

Power-Efficient, Accelerated, Exponential-Based Activation Functions

Abstract

Machine Learning offers the promise of great economic, social, and scientific benefits, but this comes at the cost of energy consumption, making it harder to meet the goal of preventing run-away climate change. IT infrastructure is predicted to account for over 20 percent of global power consumption by 2030. In this paper, we look at one of two major cost centers for neural network computations, namely, activation functions. We present alternative piecewise functions based on the exponential function. They have similar shapes to common activation functions, including similar asymptotic behavior. Despite being piecewise functions, they have continuous first derivatives, which is important for optimization including training. The proposed functions eliminate costly divide operations and simplify multiplicative constants. To test the feasibility of replacing conventional activation functions with the proposed functions, we compare benchmark learning and inference tasks using both types of activation functions. In all cases, the power-efficient activation functions achieved equivalent accuracy and loss curves. For the sigmoid function, we also verified that the new function can be used as a drop-in replacement, even without retraining. The piecewise structure, in which the expensive operations in both branches are the same, lends these functions to hardware acceleration and software acceleration using vector instructions and selection instead of branching to facilitate loop scheduling. As a first step, we have implemented sigmoid and hyperbolic tangent using POWER vector instructions, and measured 21% and 20% accelerations, respectively, versus similarly optimized functions computing the standard activation functions.