Skip to the content.

统计学习方法

Contact me

本系列博客主页及相关见此处


感知机

\[f(x) = sign(w\cdot x + b)\]

点$x_0$到超平面的距离为:

\[\frac{1}{\Vert w \Vert} \vert w \cdot x_0 + b\]

误分类的数据来说(y=1, wx+b < 0):

\[-y(w\cdot x + b) > 0\]

因此损失函数

\[L(w,b) = -\sum_i y_i(w\cdot x_i + b)\]

梯度为:

\(\nabla_w L(w, b) = - \sum_i y_i x_i\) \(\nabla_b L(W, b) = - \sum y_i\)

K近邻

朴素贝叶斯

\(p(y = c_x) = \frac{1}{N} \sum I(y_i = c_k)\) \(p(x_j = a_{j,l} \vert y=c_k) = \frac{\sum I(x_{i,j} = a_{j,l}, y=c_k)} {\sum I (y=c_k)}\)

\[p(y=c_k) \prod p(x_j \vert y = c_k)\] \[\argmax p(y=c_k) \prod p(x_j \vert y = c_k)\]

采用极大似然的可能会出现概率值为0的情况,解决的方法为采用贝叶斯估计,$\lambda$为先验计数:

\[p_\lambda (x_j = a_{j,l} \vert y = c_k) = \frac{\sum I(x_{i,j} = a_{j,l}, y=c_k) + \lambda} {\sum I (y=c_k) + s_j \lambda}\]

采用贝叶斯估计后,$p(y)$的贝叶斯估计为:

\[p_\lambda(y = c_k) = \frac{\sum I(y_i = c_k) + \lambda}{N + K\lambda}\]

决策树

\(H(D) = -\sum \frac{\vert C_k \vert}{\vert D \vert} \log_2 \frac{\vert C_k \vert}{\vert D \vert}\) \(H(D \vert A) = \sum_i \frac{\vert D_i \vert}{\vert D \vert} H(D_i)\\ -\sum_i \frac{\vert D_i \vert}{\vert D \vert} \sum_k \frac{\vert D_{ik} \vert}{\vert D_i \vert} \log_2 \frac{\vert D_{ik} \vert}{\vert D_i \vert}\) \(g(D, A) = H(D) - H(D \vert A)\)

信息增益会选择取值比较多的特征,因此使用信息增益比矫正:

\[g_R(D, A) = \frac{g(D, A)}{H_A(D)}\]

对于树$T$的每个节点$T_t$,计算其经验熵$H(t)$

递归的从树的叶节点向上回退,如果剪枝后损失函数变小,进行剪枝。

CART

回归树使用平方误差,分类树使用基尼指数。

1.选择最优切分维度和切分点

\[(j^*,s^*) = \min \left [ \min \sum (y_i - c_1)^2 + \min \sum (y_i - c_2)^2 \right]\]

2.用选定的$(j,s)$划分区域:

\(R_1(j,s) = \{ \mathbf{x} \vert x_j \le s \}\) \(R_2(j,s) = \{ \mathbf{x} \vert x_j \gt s \}\) \(c_1 = \frac{1}{N_1} \sum_{R_1} y_i\) \(c_2 = \frac{1}{N_2} \sum_{R_2} y_i\)

3.对子区域$R_1$和$R_2$递归划分,直到满足划分条件

\[Gini(p) = 1 - \sum_k p_k^2\]

1.选择最优划分维度和切分点

\[(j^*,s^*) = \min Gini (D, A_j = s)\]

2.划分区域

\(R_1(j,s) = \{\mathbf{x} \vert x_j \le s \}\) \(R_2(j,s) = \{\mathbf{x} \vert x_j \gt s \}\) \(c_1 = \argmax \sum I(c_1 = y_i)\) \(c_2 = \argmax \sum I(c_m = y_i)\)

3.递归划分直到满足条件

逻辑回归

\(p(y=1 \vert \mathbf{x}) = \frac{1}{1 + e^{-z}}\) \(z = \mathbf{w} \cdot \mathbf{x} + b\)

\[\ln \frac{P(y = 1)}{P(y = 0)} = z = \mathbf{w} \cdot \mathbf{x} + b\]

SVM

间隔最大化

\(\max \gamma \\ s.t. \quad y_i(\frac{w \cdot x + b}{\Vert w \Vert_2}) \ge \gamma\) 等价于 \(\max \frac{1}{\Vert w \Vert_2} \\ s.t. \quad y(w \cdot x + b) \ge 1\) 等价于 \(\min \frac{1}{2} \Vert w \Vert_2^2 \\ s.t. \quad y(w \cdot x + b) \ge 1\)

拉格朗日:

\[\min_w \max_\lambda L(w, b, \lambda) = \frac{1}{2} \Vert w \Vert_2^2 + \sum \lambda [1 - y_i(w \cdot x_i + b)]\\ s.t. \quad \lambda_i \ge 0\]

对偶:

\[\max \lambda \min_w L(w, b, \lambda)\]

偏导:

\(\frac{\partial L}{\partial w} = w - \sum_i \lambda_i x_i y_i = 0\) \(\frac{\partial L}{\partial b} = \sum_i \lambda_i y_i = 0\)

得到:

\(\sum_i \lambda_i x_i y_i = w\) \(\sum_i \lambda_i y_i = 0\)

代入得到:

\[\min_w L(w, b, \lambda) = \sum_i \lambda_i - \frac{1}{2} \sum_i \sum_j \lambda_i \lambda_j y_i y_j(x_i \cdot x_j)\]

TODO SMO

目标函数

\[\min \frac{1}{2} \Vert w \Vert_2^2 + C \sum \xi_i\\ s.t. \quad 1 - y(w\cdot x + b) - \xi <= 0\]

提升方法

AdaBoost: