Implement kernel k-means

Introduction

Today, I implement kernel k-means. The k-means algorithm is clustering algorithm. A reason that I implement kernel k-means algorithm is that I and my friend conceived introducing kernel to k-means. I investigated paper of kernel k-means. I found [This page](http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf) Thus I could implement kernel k-means algorithm. I introduce the implementation of normal k-means and kernel k-means.

I handle the only implementation of kernel k-means. I will write the theory of kernel k-means. If I finished writing it, I publish on this post.

# I finished. Theorem of K-means

Overview

dataset
a few explaining k-means
k-means
kernel k-means

Dataset

I used two datasets. first data is designated for normal k-means. second data is designated for kernel k-means.

First data has consisted of three group and two-dimensional data, 300 samples.
The distribution is as follow.

Second data has consisted of two groups and two dimensional, 300 samples.
The distribution is as follow.

I publish a code of dataset.
THIS PAGE!!
A few explaining k-means
k-means algorithm computes mean vector in K class. second, k-means algorithm computes the distance between each data point and each mean vector. third, k-means algorithm choice as a new label of data point. How to choice is a mean vector in class K which minimize the distance between a mean vector and data point.
k-means

Firstly, I implement normal k-means algorithm. I use first data to test my code. A Result of the test is complicated.

The centroid is mean vactor.

However, the k-means algorithm has weak points. You can understand by looking as follow.

This image is results that I use my k-means algorithm for second data.
Normal k-means depend on Euclid distance between the mean vector and data point in data space. Therefore I failed to cluster.

Kernel k-means
I failed to cluster in normal k-means.
However, I success clustering by using kernel trick.
Its result is as follow.

This clustering is complicated.
the kernel is the best way of non-linear clustering.

CODE
My code of kernel k-means algorithm is published in this page.

A git_Kmeans_def.py file is written function used in normal k-means.
A git_Kmeans_main.py file is the main file. This file is written if __name == '__main__':.

A git_kernel_Kemans_def.py file is written function used in kernel k-means.
A git_kernel_Kemans_main.py file is a main file. This file is written if __name__ == '__main__':

Reference

http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf
https://sites.google.com/site/dataclusteringalgorithms/kernel-k-means-clustering-algorithm

このブログの人気の投稿

カーネルK-means 理論編

Introduction English ver 今日は、カーネルK-meansの理論について書きます。カーネルK-meansは通常のK-meansの欠点を補うことができます。通常のK-meansの欠点とカーネルK-meansの強みも説明します。もし、まだ御覧になられていなければ、通常の K-means 理論編の記事を見ていただけるとよいのではないかと思います。カーネルK-meansの実装編も併せてご覧ください。概要 K-meansの弱点カーネルトリックカーネルK-means アルゴリズム K-meansの弱点例えば、次のようなデータを用意します。このデータはK-meansによってうまく分類することはできません。なぜなら通常のK-meansでは、データとプロトタイプのユークリッド距離に依存しているからです。そのため、このような円状に分布しているデータはうまく分類することができません。プロトタイプとはそれぞれのクラスにあり、そのクラスを代表するようなもののことです。K-meansでは各クラスの平均ベクトルとなります。それゆえ、以下のような分類になってしまいます。このようなデータではK-meansはうまくいきません。 K-meansで分類できるデータセットは次のように各クラスで固まっている必要があります。カーネルK-meansはK-meansの弱点を補います。カーネルトリック初めに、カーネルトリックを説明します。線形分離できないようなデータ$X$を例えば次のように線形分離できるように$\phi(x)$に送る写像$\phi$を考えます。カーネルは次のように定義されます。 $$K(x,y) = \phi(x)^T \phi(y)$$ $\phi$を具体的に計算することは難しいですが、$K(x,y)$を計算することなら簡単です。この手法をカーネルトリックと呼ばれます。カーネルK means K-meansの目的関数を復習しておきます。 $$J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2$$ ここで、プロトタイプは$\mu_i ~\forall k \in K$としま...

Kullback-Leibler divergence

Introduction sorry, this page is Japanese only. 今日がダイバージェンスについて書いていきます。ちなみにエントロピーの知識を使うのでエントロピーの記事も見てあげてください。エントロピーの記事はこちら Kullback-Leibler Divergence 二つの確率分布の平均エントロピーの差を表す値をKLダイバージェンスといいます。式では次のように定義されます。 $$KL(P||Q) = \int_{-\infty}^{\infty} P(X) log \frac{P(X)}{Q(X)}$$ 離散の場合は $$KL(P||Q) = \sum_{i} P(X_i) log \frac{P(X_i)}{Q(X)}$$ なぜ二つの分布間の距離をこのように定義できるのでしょうか。式の解釈真の分布P(X)が存在するとします。しかし、有限のデータから真の分布P(X)を求めるのは難しいです。そこで、有限のデータから推定して得られた確率分布をQ(X）とします。では真の分布P(X）と推定した分布Q(X)はどれだけ違っているのでしょうか。ここで登場するのがエントロピーです。エントロピーはその分布の不確実性を示す値でした。エントロピーが高いほど不確かなことが起こるとゆうことです。 P(X)のエントロピーとは$-\int_{-\infty}^{\infty} logP(X)$でした。では推定した確率分布Q(X）は確率分布P(X)に対してどれだけ不確実性を持っているのでしょうか。エントロピーとは情報量の期待値でした。確率分布Q(X）が持つ情報量は$-logQ(X)$です。この情報量を確率P(X)で期待値をとります。式は以下のようになります。 $$-\int_{-\infty}^{\infty} P(X) logQ(X)$$ この値と真の分布のエントロピーとの差を二つの分布間の差として定義します。式では以下のようになります。 $$-\int_{-\infty}^{\infty} P(X) logQ(X) - (--\int_{-\infty}^{\infty} P(X) logP(X)))$$ これを式変形すると $$-\int_{-\infty}^...

ダイクストラ法

Introduction English ver 今日は、ダイクストラ法について書きます。ダイクストラ法とは最短距離を求めるアルゴリズムです。地図はグラフで表されます。もし、まだ this page を見ていない方は先にこちらをご覧ください。今回はこの記事を前提としています。このページでは、グラフの定義と、ヒープ構造について書いています。ダイクストラ法ではヒープ構造を使って、かなりの計算量を落とします。このスライドはダイクストラ法を説明したスライドです。 Overview アルゴリズム実装アルゴリズムこのアルゴリズムはスタート始点のノードを決める。そして、それをAと名付ける。各ノードに$d=\infty$を割り当てる。ただし、スタート地点はd=0 Aの隣接ノードのリストをadj_listと名付ける。 For adj in adj_list: If d of adj > d of A + weight to adj -> d = A + weight to adj. グラフnetworkからAを取り除くグラフnetworkの中で最初のdを持っているノードをAとし、4に戻る。となっています。このアルゴリズムを図を用いて説明します。このグラフを使って説明します。初めに、スタート地点を決めます。そして、各ノードに$d=\infty$を割り当てます。 Aから始まります。Aの隣接ノードであるBのdを更新します。もし、現在のBよりもAのdとA->Bへの重みを足したもののほうが小さいならdをその値に更新します。同じようにCnのdを更新します。次にAを取り除きます。次はBから始まります。Aと同じことをやります。このダイクストラ法では今のような操作をグラフの全てのノードに×がつくまで続きます。実装このアルゴリズムでは$O(log(|V|^2))$という計算量を持っています。最小のdを持つノードを探すのに時間がかかります。しかし、ヒープ構造を使えばO((E+V)log(V))に減らせます。ヒープ構造で現時点での...

journey of Froakie (ケロマツの旅路)

このブログを検索