> [!abstract]
> The **softmax** function squishes a vector-valued $\mathbf{o} \in \mathbb{R}^{k}$ to $\mathrm{softmax}(\mathbf{o}) \in [0,1]^{k}$ proportional to the original $\mathbf{o}$. It is given by $\mathrm{softmax}(\mathbf{o}):= \frac{\exp\mathbf{o}}{\| \exp\mathbf{o} \,\| _{1}},$where $\exp$ is applied entry-wise. It is used as activation/link functions in probabilistic models where the raw output is not in $[0,1]$, for example neural networks and logistic regression.
>
> For example, the inverse function of the logit $\mu \mapsto \eta$ is $\eta \mapsto \frac{\exp(\eta)}{1 + \exp(\eta)}$, which can be interpreted to be softmax applied to $\mathbf{o}=(\eta, 0)$.