Introduction
Today, I will write about the theorem of K-means algorithm. K-means algorithm is a method to do clustering about K of class.
The post of implement K-means is
Implement kernel K-means
This post is written about kernel K-means. The kernel K-means is the method which covers the weak point of K-means. I will explain kernel K-means another post.
Overview
1 of K coding scheme
K-means algorithm is clustering K of class. Now, K-means algorithm expresses that x_n be belong to k like as follows.
Let vector r_n:1 \times K is
r_n := (0,0,..,1,..,0)
this vector have 1 in element of k'th and have 0 in else.
This expression is called 1 of K coding scheme.
Prototype vector
K-means algorithm chooses vector which called a prototype. this vector has represented by the cluster. K-means algorithm is regard mean vector as representative of the cluster. It is naturally derived from the objective function.
Distortion measure
Let prototype vector is \mu_i ~\forall k \in K.
Then, k-means have the objective function like as follows.
J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2
here, r_{nk} is k'th element of r_n.
I explain about this objective function. because r_{n} have only 1 in k'th element,
J = \sum_{n=1}^{N} ||x_n - \mu_{x_n}||
here, \mu_{k_n} is prototype vector of class which belonged x_n.
Thus,
J = ||x_1 - \mu_{x_1}|| + ||x_2 -\mu_{x_2}|| + ... + ||x_N - \mu_{x_N}||
Firstly,I minimise J about r_n.
Because ||x_n-\mu_k||^2 is distance between x_n and prototype vector of class k,
r_n is decided as follows.
k = \arg \min_{j} || x_n - \mu_{j} || \implies r_{nk} = 1
else \implies r_{n_k} = 0
Secondly, I minimise J about \mu_k when I fix r_{n_k}.
Partial is
2\sum_{n=1} ^{N} r_{n_k} (x_n-\mu_k) = 0
Thus,
2\sum_{n=1} ^{N} \{r_{n_k} x_n\} - \mu_k \sum_{n=1}^{N}r_{n_k} = 0
\mu_k = \frac{\sum_{n} r_{n_k} x_n}{\sum_n r_{n_k}}
We regard this value of \mu_k as mean vector of class k.
As result, we found prototype vector is mean vector.
If you know EM algorithm, minimizing J about r_n is E step and minimizing J about \mu when I fix r_n is M step.
Reference
https://www.amazon.co.jp/パターン認識と機械学習-上-C-M-ビショップ/dp/4621061224
The post of implement K-means is
Implement kernel K-means
This post is written about kernel K-means. The kernel K-means is the method which covers the weak point of K-means. I will explain kernel K-means another post.
Overview
- 1 of K coding scheme
- prototype vector
- Distortion measure
- Computing
1 of K coding scheme
K-means algorithm is clustering K of class. Now, K-means algorithm expresses that x_n be belong to k like as follows.
Let vector r_n:1 \times K is
r_n := (0,0,..,1,..,0)
this vector have 1 in element of k'th and have 0 in else.
This expression is called 1 of K coding scheme.
Prototype vector
K-means algorithm chooses vector which called a prototype. this vector has represented by the cluster. K-means algorithm is regard mean vector as representative of the cluster. It is naturally derived from the objective function.
Distortion measure
Let prototype vector is \mu_i ~\forall k \in K.
Then, k-means have the objective function like as follows.
J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2
here, r_{nk} is k'th element of r_n.
I explain about this objective function. because r_{n} have only 1 in k'th element,
J = \sum_{n=1}^{N} ||x_n - \mu_{x_n}||
here, \mu_{k_n} is prototype vector of class which belonged x_n.
Thus,
J = ||x_1 - \mu_{x_1}|| + ||x_2 -\mu_{x_2}|| + ... + ||x_N - \mu_{x_N}||
Firstly,I minimise J about r_n.
Because ||x_n-\mu_k||^2 is distance between x_n and prototype vector of class k,
r_n is decided as follows.
k = \arg \min_{j} || x_n - \mu_{j} || \implies r_{nk} = 1
else \implies r_{n_k} = 0
Secondly, I minimise J about \mu_k when I fix r_{n_k}.
Partial is
2\sum_{n=1} ^{N} r_{n_k} (x_n-\mu_k) = 0
Thus,
2\sum_{n=1} ^{N} \{r_{n_k} x_n\} - \mu_k \sum_{n=1}^{N}r_{n_k} = 0
\mu_k = \frac{\sum_{n} r_{n_k} x_n}{\sum_n r_{n_k}}
We regard this value of \mu_k as mean vector of class k.
As result, we found prototype vector is mean vector.
If you know EM algorithm, minimizing J about r_n is E step and minimizing J about \mu when I fix r_n is M step.
Reference
https://www.amazon.co.jp/パターン認識と機械学習-上-C-M-ビショップ/dp/4621061224
コメント
コメントを投稿