KERAS Activation Functions

The powerful Keras framework (which supplements other machine learning technologies such as Tensorflow) provides a set of inbuilt activation functions which provide a alternative behaviours for different machine learning scenarios.

elu - Exponential Linear Unit

keras.activations.elu(x, alpha=1.0)

Arguments

x: Input tensor
alpha: A scalar, slope of negative section

Returns

The exponential linear activation: x if x > 0 and alpha * (exp(x) -1) if x < 0

\[y = \begin{cases} x, &\text{if x >= 0} \\ alpha * (e^x - 1), &\text{if x < 0} \\ \end{cases}\]

softmax

keras.activations.softmax(x, axis=-1)

Arguments

x: Input tensor. axis: Integer, axis along which the softmax normalization is applied.

Returns

Tensor, output of softmax transformation.

\[\sigma(z)_i = \frac{e^z}{\sum_{j=1}^{K} e^{zj}} \textrm{for i}=1,...,K \textrm{ and z} = (z_1, ... , z_K) \epsilon \R ^ K\]

Comment (Source)

In mathematics, the softmax function, also known as softargmax or normalized exponential function, is a function that takes as input a vector of K real numbers, and normalizes it into a probability distribution consisting of K probabilities proportional to the exponentials of the input numbers. That is, prior to applying softmax, some vector components could be negative, or greater than one; and might not sum to 1; but after applying softmax, each component will be in the interval (0,1), and the components will add up to 1, so that they can be interpreted as probabilities. Furthermore, the larger input components will correspond to larger probabilities. Softmax is often used in neural networks, to map the non-normalized output of a network to a probability distribution over predicted output classes.

If we take an input of [1, 2, 3, 4, 1, 2, 3], the softmax of that is [0.024, 0.064, 0.175, 0.475, 0.024, 0.064, 0.175]. The output has most of its weight where the ‘4’ was in the original input. This is what the function is normally used for: to highlight the largest values and suppress values which are significantly below the maximum value. But note: softmax is not scale invariant, so if the input were [0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3] (which sums to 1.6) the softmax would be [0.125, 0.138, 0.153, 0.169, 0.125, 0.138, 0.153]. This shows that for values between 0 and 1 softmax, in fact, de-emphasizes the maximum value (note that 0.169 is not only less than 0.475, it is also less than the initial proportion of 0.4/1.6=0.25)

>>> import numpy as np
>>> a = [1.0, 2.0, 3.0, 4.0, 1.0, 2.0, 3.0]
>>> np.exp(a) / np.sum(np.exp(a)) 
array([0.02364054, 0.06426166, 0.1746813, 0.474833, 0.02364054,
       0.06426166, 0.1746813])

selu - Scaled Exponential Linear Unit

keras.activations.selu(x)

Arguments

x: A tensor or variable to compute the activation function for

Returns

The scaled exponential unit activation: scale * elu(x, alpha)

Comment Source

SELU is equal to: scale * elu(x, alpha), where alpha and scale are predefined constants. The values of alpha and scale are chosen so that the mean and variance of the inputs are preserved between two consecutive layers as long as the weights are initialized correctly (see lecun_normal initialization) and the number of inputs is “large enough” (see references for more information).

softplus

keras.activations.softplus(x)

Arguments

x: Input tensor

Returns

The softplus activation: $log(exp(e) + 1)$

Softplus

Softsign

keras.activations.softsign(x)

Arguments

x: Input tensor.

Returns

The softsign activation: $\frac{x}{\lvert{x}\rvert + 1}$

Softsign

ReLu

Rectified Linear Unit

This popular activation function is used

typically found used in
- convolutional neural networks
- multi-layer networks
pass positive sum evaluations (unaltered)
- small positive values only contribute a small amount to the next layer
removes negative sum evaluations from contributing to the next layer
- small or large negative values are treated equally
effectively diminishes the impact of ancestors of the activating neuron on the next layer

ReLu

tanh - Hyperbolic tangent

keras.activations.tanh(x)

Arguments

x: Input tensor.

Returns

The hyperbolic activation: $tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$.

tanh

Sigmoid

keras.activations.sigmoid(x)

Arguments

x: Input tensor.

Returns

The sigmoid activation: $\frac{1}{1 + e^{-x}}$.

Sigmoid

Hard Sigmoid

keras.activations.hard_sigmoid(x)

Faster to compute than sigmoid activation.

Arguments

x: Input tensor.

Returns

Hard sigmoid activation:

\[y = \begin{cases} 0 &\text{ if x < -2.5 } \\ 1 &\text{ if x > 2.5 } \\ 0.2 * x + 0.5 &\text{ if -2.5 <= x <= 2.5 } \end{cases}\]