Introduction
This post is written about theorem of SVM.
This post is part 2. Part 1 is written about deriving the objective function. I will write about dual problem. The objective function which deriving in Part 1 is called the main problem. Typically, We optimize not the main problem, but the dual problem. I will write about deriving the dual problem from the main problem.
If you until look up part 1, please look up
Theorem of SVM part 1.
I implement SVM. It post is
Implement linear SVM
Implement kernel SVM
Overview
Main problem
I will review the main problem in Part 1.
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1 - \epsilon_i ,~~~, \forall i \in N~\epsilon \geq 0
This problem is called soft margin. I will handle soft margin of the main problem.
Dual problem is naturally derived from the main problem.
Dual problem
Optimization problem of SVM is a kind of convex quadratic optimization problem. Convex quadratic optimization problem is a kind of optimizatio plroblem. It is easy to solve convex quadratic optimization problem. It is found the global optimal solution, because the objectice function is convex.
I will drive dual problem. Firstly, I change the main problem into
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0,~~~, \forall i \in N~ -\epsilon \leq 0
Lagurange function
Secondly, I will introduce new vector \alpha,\mu which has element of non-negative paremeter to define Laguranju function. Lagrange function is
L(w,b,\epsilon,\alpha,\mu) := \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i
Here, \alpha = (\alpha_1,\alpha_2,...,\alpha_n) and \mu = (\mu_1,\mu_2,..,\mu_n).
The paremeter w,b,\epsilon is called primal variable and paremeter \alpha,\mu is called dual variable.
maximize L about dual variable, minimize L about primal variable
Now, I define P(w,b,\epsilon) as maximizing \alpha,\mu of L.
P(w,b,\epsilon):= \max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
I minimize P about primal variable.
\min_{w,b,\epsilon} P(w,b,\epsilon) = \min_{w,b,\epsilon}\max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
Optimizing this problem is same as optimizing the main problem.
I will explain it.
I change \min_{w,b,\epsilon} P(w,b,\epsilon) into
\min_{w,b,\epsilon} P(w,b,\epsilon) = \min_{w,b,\epsilon}\max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
= \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i + \max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \}
because, first item and second item is not relate \alpha,\mu.
The main problem is
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0,~~~, \forall i \in N~ -\epsilon \leq 0
Now,
If \exists ~i \in N ~~s.t.~~ -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} > 0, or -\epsilon > 0, We can not optimize \min_{w,b,\epsilon} P(w,b,\epsilon), because We can increase P(w,b,\epsilon) to \infty by increase \alpha,\mu.
If \forall ~i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0 and ~ -\epsilon \leq 0 ,
\max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \} = 0
then, \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i + \max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \}
= \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
this is the main problem.
minimize L about primal variable, maximize L about dual variable.
Next I define D(\alpha,\mu) as optimize primal variable.
D(\alpha,\mu) := \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu)
I miximize D(\alpha,\mu) about \alpha,\mu.
\max_{\alpha,\mu} D(\alpha,\mu) = \max_{\alpha,\mu} \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu)
This problem is called the dual problem.
About \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu),
Partial differential is
\frac{\partial L}{\partial w} = w - \sum_{i \in {1,2,..,n}} \alpha_i y_i x_i = 0
\frac{\partial L}{\partial b} = - \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0
\frac{\partial L}{\partial \epsilon} = C - \alpha_i - \mu_i = 0
then, L is
\begin{eqnarray*} L &=& \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i\\ &=& \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2,..,n}} \alpha_i y_i w^T \phi(x) + \alpha_i y_i b - \alpha_i + \alpha_i \epsilon_i - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i\\ &=& \frac{1}{2}||W||^2 - \sum_{i \in {1,2,..,n}} \alpha_i y_i w^T x_i - b \sum_{i \in {1,2,..,n}} \alpha_i y_i + \sum_{i \in {1,2,..,n}} \alpha_i + \sum_{i \in {1,2,..,n}} (C - \alpha_i - \mu_i) \epsilon_i \end{eqnarray*}
I substitute partial differentials.
-\frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) + \sum_{i \in {1,2,..,n}} \alpha_i
Now,
\begin{eqnarray*} C - \alpha_i - \mu_i &=& 0 \\ C - \alpha_i &=& \mu \geq 0\\ C - \alpha_i & \geq & 0 \end{eqnarray*}
I get this dual problem.
\max_{\alpha} -\frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) + \sum_{i \in {1,2,..,n}} \alpha_i
s.t. \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0 ~~~ , 0 \leq \alpha_i \leq C
\min_{\alpha} \frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) - \sum_{i \in {1,2,..,n}} \alpha_i
s.t. \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0 ~~~ , 0 \leq \alpha_i \leq C
Reference
https://www.amazon.co.jp/%E3%82%B5%E3%83%9D%E3%83%BC%E3%83%88%E3%83%99%E3%82%AF%E3%83%88%E3%83%AB%E3%83%9E%E3%82%B7%E3%83%B3-%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%83%97%E3%83%AD%E3%83%95%E3%82%A7%E3%83%83%E3%82%B7%E3%83%A7%E3%83%8A%E3%83%AB%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-%E7%AB%B9%E5%86%85-%E4%B8%80%E9%83%8E/dp/4061529064
This post is written about theorem of SVM.
This post is part 2. Part 1 is written about deriving the objective function. I will write about dual problem. The objective function which deriving in Part 1 is called the main problem. Typically, We optimize not the main problem, but the dual problem. I will write about deriving the dual problem from the main problem.
If you until look up part 1, please look up
Theorem of SVM part 1.
I implement SVM. It post is
Implement linear SVM
Implement kernel SVM
Overview
- Main problem
- Dual problem
- Lagurange function
- maximize L about dual variable, minimize L about primal variable
- minimize L about primal variable, maximize L about dual variable.
Main problem
I will review the main problem in Part 1.
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1 - \epsilon_i ,~~~, \forall i \in N~\epsilon \geq 0
This problem is called soft margin. I will handle soft margin of the main problem.
Dual problem is naturally derived from the main problem.
Dual problem
Optimization problem of SVM is a kind of convex quadratic optimization problem. Convex quadratic optimization problem is a kind of optimizatio plroblem. It is easy to solve convex quadratic optimization problem. It is found the global optimal solution, because the objectice function is convex.
I will drive dual problem. Firstly, I change the main problem into
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0,~~~, \forall i \in N~ -\epsilon \leq 0
Lagurange function
Secondly, I will introduce new vector \alpha,\mu which has element of non-negative paremeter to define Laguranju function. Lagrange function is
L(w,b,\epsilon,\alpha,\mu) := \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i
Here, \alpha = (\alpha_1,\alpha_2,...,\alpha_n) and \mu = (\mu_1,\mu_2,..,\mu_n).
The paremeter w,b,\epsilon is called primal variable and paremeter \alpha,\mu is called dual variable.
maximize L about dual variable, minimize L about primal variable
Now, I define P(w,b,\epsilon) as maximizing \alpha,\mu of L.
P(w,b,\epsilon):= \max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
I minimize P about primal variable.
\min_{w,b,\epsilon} P(w,b,\epsilon) = \min_{w,b,\epsilon}\max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
Optimizing this problem is same as optimizing the main problem.
I will explain it.
I change \min_{w,b,\epsilon} P(w,b,\epsilon) into
\min_{w,b,\epsilon} P(w,b,\epsilon) = \min_{w,b,\epsilon}\max_{\alpha \geq 0, \mu \geq 0} L(w,b,\epsilon,\alpha,\mu)
= \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i + \max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \}
because, first item and second item is not relate \alpha,\mu.
The main problem is
\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
~~s.t~~ \forall i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0,~~~, \forall i \in N~ -\epsilon \leq 0
Now,
If \exists ~i \in N ~~s.t.~~ -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} > 0, or -\epsilon > 0, We can not optimize \min_{w,b,\epsilon} P(w,b,\epsilon), because We can increase P(w,b,\epsilon) to \infty by increase \alpha,\mu.
If \forall ~i \in N, -\{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \}\leq 0 and ~ -\epsilon \leq 0 ,
\max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \} = 0
then, \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i + \max_{\alpha \geq 0,\mu \geq 0} \{-\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i \}
= \min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i
this is the main problem.
minimize L about primal variable, maximize L about dual variable.
Next I define D(\alpha,\mu) as optimize primal variable.
D(\alpha,\mu) := \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu)
I miximize D(\alpha,\mu) about \alpha,\mu.
\max_{\alpha,\mu} D(\alpha,\mu) = \max_{\alpha,\mu} \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu)
This problem is called the dual problem.
About \min_{w,b,\epsilon,\alpha,\mu} L(w,b,\epsilon,\alpha,\mu),
Partial differential is
\frac{\partial L}{\partial w} = w - \sum_{i \in {1,2,..,n}} \alpha_i y_i x_i = 0
\frac{\partial L}{\partial b} = - \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0
\frac{\partial L}{\partial \epsilon} = C - \alpha_i - \mu_i = 0
then, L is
\begin{eqnarray*} L &=& \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2..n}} \alpha_i \{y_i (w^T \phi(x_i) + b) -1 + \epsilon_i \} - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i\\ &=& \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i -\sum_{i \in {1,2,..,n}} \alpha_i y_i w^T \phi(x) + \alpha_i y_i b - \alpha_i + \alpha_i \epsilon_i - \sum_{i \in {1,2,..,n}} \mu_i \epsilon_i\\ &=& \frac{1}{2}||W||^2 - \sum_{i \in {1,2,..,n}} \alpha_i y_i w^T x_i - b \sum_{i \in {1,2,..,n}} \alpha_i y_i + \sum_{i \in {1,2,..,n}} \alpha_i + \sum_{i \in {1,2,..,n}} (C - \alpha_i - \mu_i) \epsilon_i \end{eqnarray*}
I substitute partial differentials.
-\frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) + \sum_{i \in {1,2,..,n}} \alpha_i
Now,
\begin{eqnarray*} C - \alpha_i - \mu_i &=& 0 \\ C - \alpha_i &=& \mu \geq 0\\ C - \alpha_i & \geq & 0 \end{eqnarray*}
I get this dual problem.
\max_{\alpha} -\frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) + \sum_{i \in {1,2,..,n}} \alpha_i
s.t. \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0 ~~~ , 0 \leq \alpha_i \leq C
\min_{\alpha} \frac{1}{2} \sum_{i,j \in {1,2,..,n}} \alpha_i \alpha_j y_i y_j \phi(x_i)^T \phi(x_j) - \sum_{i \in {1,2,..,n}} \alpha_i
s.t. \sum_{i \in {1,2,..,n}} \alpha_i y_i = 0 ~~~ , 0 \leq \alpha_i \leq C
Reference
https://www.amazon.co.jp/%E3%82%B5%E3%83%9D%E3%83%BC%E3%83%88%E3%83%99%E3%82%AF%E3%83%88%E3%83%AB%E3%83%9E%E3%82%B7%E3%83%B3-%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%83%97%E3%83%AD%E3%83%95%E3%82%A7%E3%83%83%E3%82%B7%E3%83%A7%E3%83%8A%E3%83%AB%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-%E7%AB%B9%E5%86%85-%E4%B8%80%E9%83%8E/dp/4061529064
コメント
コメントを投稿