Skip to the content.

推导

Contact me

本系列博客主页及相关见此处


线性回归

\[\hat{y} = x_1w_1 + x_2w_2 + b\] \[l(\hat{y}, y) = \frac{1}{2} (\hat{y} - y) ^ 2\] \[w_1 \leftarrow = w_1 - \frac{\eta}{N} \frac{\partial l}{\partial w_1}\\ =w_1 - \frac{\eta}{N} x_1(x_1 w_1 + x_2 w_2 + b - y)\]

softmax

\[\hat{y}_i = \frac{\exp (o_i)}{\sum \exp (o_j)}\]

交叉熵

\[H(y, \hat{y}) = -\sum y \log \hat{y}\]

初始化

Xavier初始化

设输入x,输出y:

\[Var(y) = Var(wx) = Var(w)Var(x)\]

此处设w有$n_i$个神经元,那么就是$n_i Var(w) = 1$,同理反向传播的时候需要$n_o Var(w)=1$,两者调和一下:

\[Var(w) = 2 / (n_i + n_o)\]

均匀分布的话:

\(\frac{(b-a)^2}{12} = 2 / (n_i + n_o)\) \(a = -b = -\sqrt{6/(n_i + n_{i+1})}\)

He初始化

针对relu,x有一半是没有激活的,所以可以让$Var(w)$乘以2。

多层感知机反向传播

平方损失

\[J = \frac{1}{N} \sum \frac{1}{2} (y - \hat{y})^2 + \frac{\lambda}{2} W^2\] \[\sigma = -(y - a) \frac{\partial a}{\partial z}\]

交叉熵

\[J = \frac{1}{N} \sum (y\log \hat{y}) + \frac{\lambda}{2} W^2\] \[\sigma = -\frac{1}{\hat{y}} \frac{\partial a}{\partial z}\]

取softmax的话,$\sigma = a - y$

激活函数

\(\sigma = \frac{1}{1 + \exp(-x)}\) \(\partial \sigma = \sigma (1 - \sigma)\)

\(tanh = 2\sigma(2x)\) \(\partial tanh = 1 - tanh^2\)