Introduction
Today, I will write about the theorem of K-means algorithm. K-means algorithm is a method to do clustering about K of class.
The post of implement K-means is
Implement kernel K-means
This post is written about kernel K-means. The kernel K-means is the method which covers the weak point of K-means. I will explain kernel K-means another post.
Overview
1 of K coding scheme
K-means algorithm is clustering K of class. Now, K-means algorithm expresses that $x_n$ be belong to k like as follows.
Let vector $r_n:1 \times K$ is
$$r_n := (0,0,..,1,..,0)$$
this vector have 1 in element of k'th and have 0 in else.
This expression is called 1 of K coding scheme.
Prototype vector
K-means algorithm chooses vector which called a prototype. this vector has represented by the cluster. K-means algorithm is regard mean vector as representative of the cluster. It is naturally derived from the objective function.
Distortion measure
Let prototype vector is $\mu_i ~\forall k \in K$.
Then, k-means have the objective function like as follows.
$$J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2$$
here, $r_{nk}$ is k'th element of $r_n$.
I explain about this objective function. because $r_{n}$ have only 1 in k'th element,
$$J = \sum_{n=1}^{N} ||x_n - \mu_{x_n}||$$
here, $\mu_{k_n}$ is prototype vector of class which belonged $x_n$.
Thus,
$$J = ||x_1 - \mu_{x_1}|| + ||x_2 -\mu_{x_2}|| + ... + ||x_N - \mu_{x_N}||$$
Firstly,I minimise $J$ about $r_n$.
Because $||x_n-\mu_k||^2$ is distance between $x_n$ and prototype vector of class k,
$r_n$ is decided as follows.
$$k = \arg \min_{j} || x_n - \mu_{j} || \implies r_{nk} = 1$$
$$else \implies r_{n_k} = 0$$
Secondly, I minimise $J$ about $\mu_k$ when I fix $r_{n_k}$.
Partial is
$$2\sum_{n=1} ^{N} r_{n_k} (x_n-\mu_k) = 0$$
Thus,
$$2\sum_{n=1} ^{N} \{r_{n_k} x_n\} - \mu_k \sum_{n=1}^{N}r_{n_k} = 0 $$
$$\mu_k = \frac{\sum_{n} r_{n_k} x_n}{\sum_n r_{n_k}}$$
We regard this value of $\mu_k$ as mean vector of class k.
As result, we found prototype vector is mean vector.
If you know EM algorithm, minimizing $J$ about $r_n$ is E step and minimizing $J$ about $\mu$ when I fix $r_n$ is M step.
Reference
https://www.amazon.co.jp/パターン認識と機械学習-上-C-M-ビショップ/dp/4621061224
The post of implement K-means is
Implement kernel K-means
This post is written about kernel K-means. The kernel K-means is the method which covers the weak point of K-means. I will explain kernel K-means another post.
Overview
- 1 of K coding scheme
- prototype vector
- Distortion measure
- Computing
1 of K coding scheme
K-means algorithm is clustering K of class. Now, K-means algorithm expresses that $x_n$ be belong to k like as follows.
Let vector $r_n:1 \times K$ is
$$r_n := (0,0,..,1,..,0)$$
this vector have 1 in element of k'th and have 0 in else.
This expression is called 1 of K coding scheme.
Prototype vector
K-means algorithm chooses vector which called a prototype. this vector has represented by the cluster. K-means algorithm is regard mean vector as representative of the cluster. It is naturally derived from the objective function.
Distortion measure
Let prototype vector is $\mu_i ~\forall k \in K$.
Then, k-means have the objective function like as follows.
$$J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2$$
here, $r_{nk}$ is k'th element of $r_n$.
I explain about this objective function. because $r_{n}$ have only 1 in k'th element,
$$J = \sum_{n=1}^{N} ||x_n - \mu_{x_n}||$$
here, $\mu_{k_n}$ is prototype vector of class which belonged $x_n$.
Thus,
$$J = ||x_1 - \mu_{x_1}|| + ||x_2 -\mu_{x_2}|| + ... + ||x_N - \mu_{x_N}||$$
Firstly,I minimise $J$ about $r_n$.
Because $||x_n-\mu_k||^2$ is distance between $x_n$ and prototype vector of class k,
$r_n$ is decided as follows.
$$k = \arg \min_{j} || x_n - \mu_{j} || \implies r_{nk} = 1$$
$$else \implies r_{n_k} = 0$$
Secondly, I minimise $J$ about $\mu_k$ when I fix $r_{n_k}$.
Partial is
$$2\sum_{n=1} ^{N} r_{n_k} (x_n-\mu_k) = 0$$
Thus,
$$2\sum_{n=1} ^{N} \{r_{n_k} x_n\} - \mu_k \sum_{n=1}^{N}r_{n_k} = 0 $$
$$\mu_k = \frac{\sum_{n} r_{n_k} x_n}{\sum_n r_{n_k}}$$
We regard this value of $\mu_k$ as mean vector of class k.
As result, we found prototype vector is mean vector.
If you know EM algorithm, minimizing $J$ about $r_n$ is E step and minimizing $J$ about $\mu$ when I fix $r_n$ is M step.
Reference
https://www.amazon.co.jp/パターン認識と機械学習-上-C-M-ビショップ/dp/4621061224
コメント
コメントを投稿