推导
Contact me
- Blog -> https://cugtyt.github.io/blog/index
- Email -> cugtyt@qq.com
- GitHub -> Cugtyt@GitHub
本系列博客主页及相关见此处
线性回归
\[\hat{y} = x_1w_1 + x_2w_2 + b\] \[l(\hat{y}, y) = \frac{1}{2} (\hat{y} - y) ^ 2\] \[w_1 \leftarrow = w_1 - \frac{\eta}{N} \frac{\partial l}{\partial w_1}\\ =w_1 - \frac{\eta}{N} x_1(x_1 w_1 + x_2 w_2 + b - y)\]softmax
\[\hat{y}_i = \frac{\exp (o_i)}{\sum \exp (o_j)}\]交叉熵
\[H(y, \hat{y}) = -\sum y \log \hat{y}\]初始化
Xavier初始化
设输入x,输出y:
\[Var(y) = Var(wx) = Var(w)Var(x)\]此处设w有$n_i$个神经元,那么就是$n_i Var(w) = 1$,同理反向传播的时候需要$n_o Var(w)=1$,两者调和一下:
\[Var(w) = 2 / (n_i + n_o)\]均匀分布的话:
\(\frac{(b-a)^2}{12} = 2 / (n_i + n_o)\) \(a = -b = -\sqrt{6/(n_i + n_{i+1})}\)
He初始化
针对relu,x有一半是没有激活的,所以可以让$Var(w)$乘以2。
多层感知机反向传播
平方损失
\[J = \frac{1}{N} \sum \frac{1}{2} (y - \hat{y})^2 + \frac{\lambda}{2} W^2\] \[\sigma = -(y - a) \frac{\partial a}{\partial z}\]交叉熵
\[J = \frac{1}{N} \sum (y\log \hat{y}) + \frac{\lambda}{2} W^2\] \[\sigma = -\frac{1}{\hat{y}} \frac{\partial a}{\partial z}\]取softmax的话,$\sigma = a - y$
激活函数
\(\sigma = \frac{1}{1 + \exp(-x)}\) \(\partial \sigma = \sigma (1 - \sigma)\)
\(tanh = 2\sigma(2x)\) \(\partial tanh = 1 - tanh^2\)