Theorem of SVM part 1

Introduction

I will explain theorem of SVM.
Please look at my implementation of SVM.
Implement linear SVM
Implement kernel SVM
Today, I will explain about SVM until deriving the objective function.

Overview

Generalized linear model
Explain SVM
hard margin
soft margin

Generalized linear model

SVM is used generalized linear model. Generalized linear model is following function

$f(x) = w^T\phi(x) + b$
b is called bias.

$0 = w^T\phi(x) + b$ is hyper plane. This hyper plane separate two class of

$\phi(x)$ .
hyper plance is n-dimensional plane. if n = 1, hyper plane is line. if n = 2, hyper plane is normal plane.

$\phi(x)$ have effect of converting x to data which can be separated by a line.
image of

$\phi(x)$ is

the left image has nonlinear data.
right image has linear data.

$\phi(x)$ convert from left image to right image.
I will handle

$w^T \phi(x) + b$ as line in feature space.

Next, I will explain the object of SVM

Explain SVM

the label is 1 or -1 in SVM. Let label is y

$\in \{1,-1\}$ . Let dataset is X.
We want to make decisions function which

$\forall x \in X$

$f(x_i) > 0 \implies y_i = 1$

$f(x_i) < 0 \implies y_i = -1$

Let f(x) is

$w^T \phi(x) + b$ , I will optimize w and b of parametor.
However, optimization needs a standard of a good boundary. Its standard is magin. Next, I will explain hard margin.

Hard margin

SVM decide a boundary line by a value called margin.
What is margin? I will explain.

pick up data which exist nearest from

$w^T \phi(x) +b = 0$ . Margin is the distance between the data and

$w^T \phi(x) +b = 0$ .
Look at following image of margin in 2-dimensional.

this distance of green line is margin. SVM decide

$w^T \phi(x) + b= 0$ to depend on only data which exist nearest from

$w^T \phi(x) + b = 0$ . This data called sopport vector.

We decide w and b of the parameter by a maximum margin.

Let dataset is X,

$\forall x_i \in X$ , distance between x and

$w^T \phi(x) + b = 0$ is

$\frac{|w^T \phi(x_i) + b|}{||W||}$

Now, Assume linear hyperplane is enabled to complicately classify.

this image is data which complicately separated hyperplane.

this image is else data.

Thus,

$f(x_i) > 0 \implies y_i = 1$

$f(x_i) < 0 \implies y_i = -1$
is complitely practical.

Therefore,

$\forall i \in N,~~~~~~~y_i(w^T \phi(x_i) + b) > 0$

Therefore

$\frac{|w^T \phi(x_i) + b|}{||W||} = \frac{y(w^T \phi(x_i) + b)}{||W||}$
Next, Let

$i_0$ as follow.

$\forall i_0 \in \arg_{n \in N} \min_{x \in X} \frac{|w^T \phi(x_n) + b|}{||W||}$ ,
Let M is

$M = y_{i_0}(w^T \phi(x_{i_0}) + b)$
Because

$\forall i \in N,~y_i(w^T \phi(x_i) + b) > 0$ ,

$M > 0$ is practical.

M is value of distance between

$w^T \phi(x) + b = 0$ and data which exist nearest

$w^T \phi(x) + b = 0$

The objective function in SVM is expressed as follow.

$\max_{w,b,M} \frac{M}{||W||}$

$~~s.t~~ \forall i \in N ~, y_i(w^T \phi(x_i) + b) \geq M$

Here, when

$w^{\star} = \frac{w}{M}, b^{\star} = \frac{b}{M}$ , the objective function is expressed as follow.

$\max_{w^{\star},b^{\star}} \frac{1}{||W^{\star}||}$

$~~s.t~~ \forall i \in N, y_i (w^{\star} \phi(x_i) + b^{\star}) \geq 1$

I convert this from. because

$||W^{\star}|| > 0$ ,

$\max_{w^{\star},b^{\star}} \frac{1}{||W^{\star}||}$

$\iff \min_{w^{\star},b^{\star}} ||W^{\star}||$

$\iff \min_{w^{\star},b^{\star}} ||W^{\star}||^2$

therefore, the objectice function in SVM is

$\min_{w,b} ||W||^2$

$~~s.t~~ \forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1$

I define

$W^{\star} = W, b^{\star} = b$ again.

We assume data is completely separated by a hyperplane. This method is called hard margin.

However this assumption is strict in the real world, so the soft margin is invented.
Next, I will explain soft margin.

Soft margin

I introduce

$\epsilon_i \geq 0$ the objective function.

I loosen

$\forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1$ . This condition is rewrited as follow.

$\forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1 - \epsilon_i$

if

$x_i$ is beyond

$w^T \phi(x) + b = 0$ ,

$\epsilon_i > 0$ is practical.

$x_5$ and

$x_8$ and

$x_9$ is beyond

$w^T \phi(x) + b = 0$ .
This distance of black line is

$\epsilon_i$

I rewrite the objective function.

$\min_{w,b} \frac{1}{2}||W||^2 + C\sum_{i \in N} \epsilon_i$

$~~s.t~~ \forall i \in N, y_i (w^T \phi(x_i) + b) \geq 1 - \epsilon_i ,~~~~\epsilon \geq 0 , \forall i \in N$

C is called regulation parameter.
This parameter is a hyperparameter, so We decide before computing SVM algorithm.
C has the role which adjusts degree of suppression of misclassification.
The smaller C is, The smaller effect of

$\sum_{i \in N}\epsilon_i$ is. Thus, it is easy to accept misclassification. On the other hand, the bigger C is, The bigger effect of

$\sum_{i \in N}\epsilon_i$ is.
When

$C = \infty$ , It become hard margin.

Reference

https://www.amazon.co.jp/%E3%82%B5%E3%83%9D%E3%83%BC%E3%83%88%E3%83%99%E3%82%AF%E3%83%88%E3%83%AB%E3%83%9E%E3%82%B7%E3%83%B3-%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%83%97%E3%83%AD%E3%83%95%E3%82%A7%E3%83%83%E3%82%B7%E3%83%A7%E3%83%8A%E3%83%AB%E3%82%B7%E3%83%AA%E3%83%BC%E3%82%BA-%E7%AB%B9%E5%86%85-%E4%B8%80%E9%83%8E/dp/4061529064

journey of Froakie (ケロマツの旅路)

このブログを検索

Theorem of SVM part 1

ラベル

コメント

コメントを投稿

このブログの人気の投稿

カーネルK-means 理論編

ダイクストラ法

Visualization of Variational method