スキップしてメイン コンテンツに移動

Theorem of K-means

Introduction

Today, I will write about the theorem of K-means algorithm. K-means algorithm is a method to do clustering about K of class.

The post of implement K-means is
Implement kernel K-means
This post is written about kernel K-means. The kernel K-means is the method which covers the weak point of K-means. I will explain kernel K-means another post.

Overview

  • 1 of K coding scheme
  • prototype vector
  • Distortion measure
  • Computing

1 of K coding scheme

K-means algorithm is clustering K of class. Now, K-means algorithm expresses that $x_n$ be belong to k like as follows.
Let vector $r_n:1 \times K$ is
$$r_n := (0,0,..,1,..,0)$$
this vector have 1 in element of k'th and have 0 in else.

This expression is called 1 of K coding scheme.

Prototype vector

K-means algorithm chooses vector which called a prototype. this vector has represented by the cluster. K-means algorithm is regard mean vector as representative of the cluster. It is naturally derived from the objective function.

Distortion measure

Let prototype vector is $\mu_i ~\forall k \in K$.
Then, k-means have the objective function like as follows.

$$J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2$$

here, $r_{nk}$ is k'th element of $r_n$.

I explain about this objective function. because $r_{n}$ have only 1 in k'th element,

$$J = \sum_{n=1}^{N} ||x_n - \mu_{x_n}||$$

here, $\mu_{k_n}$ is prototype vector of class which belonged $x_n$.

Thus,
$$J = ||x_1 - \mu_{x_1}|| + ||x_2 -\mu_{x_2}|| + ... + ||x_N - \mu_{x_N}||$$


Firstly,I minimise $J$ about $r_n$.

Because $||x_n-\mu_k||^2$ is distance between $x_n$ and prototype vector of class k,
$r_n$ is decided as follows.

$$k = \arg \min_{j} || x_n - \mu_{j} || \implies r_{nk} = 1$$
$$else \implies r_{n_k} = 0$$



Secondly, I minimise $J$ about $\mu_k$ when I fix $r_{n_k}$.
Partial is
$$2\sum_{n=1} ^{N} r_{n_k} (x_n-\mu_k) = 0$$
Thus,
$$2\sum_{n=1} ^{N} \{r_{n_k} x_n\} - \mu_k \sum_{n=1}^{N}r_{n_k} = 0 $$
$$\mu_k = \frac{\sum_{n} r_{n_k} x_n}{\sum_n r_{n_k}}$$

We regard this value of $\mu_k$ as mean vector of class k.

As result, we found prototype vector is mean vector.

If you know EM algorithm, minimizing $J$ about $r_n$ is E step and minimizing $J$ about $\mu$ when I fix $r_n$ is M step.

Reference
https://www.amazon.co.jp/パターン認識と機械学習-上-C-M-ビショップ/dp/4621061224

コメント

このブログの人気の投稿

ヘッセ行列

Introduction English ver 今日は、ヘッセ行列を用いたテイラー展開について書こうと思います。 これは最適化を勉強するにあたって、とても大事になってくるので自分でまとめて残しておくことにしました。とくに、機械学習では最適化を必ず行うため、このブログのタイトルにもマッチした内容だと思います。 . 概要 ヘッセ行列の定義 ベクトルを用いたテイラー展開 関数の最適性 ヘッセ行列の定義 仮定 f は次のような条件を満たす関数です。. f はn次元ベクトルから実数値を出力します。 このベクトルは次のように表せます。 \[x = [x_1,x_2,,,,x_n]\] \(\forall x_i , i \in {1,2,,,n}\), f は二回偏微分可能です。 定義 ヘッセ行列は \(\frac{\partial^2}{\partial x_i \partial x_j}を (i,j)要素に持ちます。\) よってヘッセ行列は次のように表せます。 \[ H(f) = \left( \begin{array}{cccc} \frac{\partial^ 2}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & &\ldots \frac{\partial^2 f}{\partial x_1 \partial x_n} \\ \frac{\partial^ 2 f}{\partial x_1 \partial x_2} & \frac{\partial^ 2 f}{\partial x_2^ 2} & \ldots & \frac{\partial^2 f}{\partial x_2 \partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^ 2 f}{\partial x_n \partial x_2} & \frac{\partial^ 2 f}{\partial x_n \partial x_2} & \ldo...

Plane in two dimention

Introduction 日本語 ver Today, I prove this theorem. Plane in two dimention is expressed following. \[\{x|<x,v> = 0\}\] however, v is orthogonal vector for plane and not zero vector. Proof \[\forall k \in \{x|<x,v> = 0\},\] k is fulfill this form. \[<k,v> = 0\] Now, because k and v in two dimentinal space, each vector express following. \[k = (k_1,k_2)\] \[v = (v_1,v_2)\] Thus, \(<k,v>=k_1v_1 + k_2v_2=0\) Change this equation. \[k_2 = -\frac{v_1}{v_2} k_1\] This equation is plane that slope is \(-\frac{v_1}{v_2}\). Q.E.D

Discrete Fourier transform

Introduction 日本語 ver I will write about Discrete Fourier transform. Discrete Fourier transform is Abbreviated DFT. I am making pdf about Audio Signal Processing. I publish pdf at  github . However, I write tex in Japanese. I take a lecture about the signal processing. There is lecture at  thie page . I update this pdf.