Introduction
Today, I implement kernel k-means. The k-means algorithm is clustering algorithm. A reason that I implement kernel k-means algorithm is that I and my friend conceived introducing kernel to k-means. I investigated paper of kernel k-means. I found [This page](http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf) Thus I could implement kernel k-means algorithm. I introduce the implementation of normal k-means and kernel k-means.
I handle the only implementation of kernel k-means. I will write the theory of kernel k-means. If I finished writing it, I publish on this post.
# I finished. Theorem of K-means
Overview
Dataset
I used two datasets. first data is designated for normal k-means. second data is designated for kernel k-means.
First data has consisted of three group and two-dimensional data, 300 samples.
The distribution is as follow.
Second data has consisted of two groups and two dimensional, 300 samples.
The distribution is as follow.
I publish a code of dataset.
THIS PAGE!!
A few explaining k-means
k-means algorithm computes mean vector in K class. second, k-means algorithm computes the distance between each data point and each mean vector. third, k-means algorithm choice as a new label of data point. How to choice is a mean vector in class K which minimize the distance between a mean vector and data point.
k-means
Firstly, I implement normal k-means algorithm. I use first data to test my code. A Result of the test is complicated.
The centroid is mean vactor.
However, the k-means algorithm has weak points. You can understand by looking as follow.
This image is results that I use my k-means algorithm for second data.
Normal k-means depend on Euclid distance between the mean vector and data point in data space. Therefore I failed to cluster.
Kernel k-means
I failed to cluster in normal k-means.
However, I success clustering by using kernel trick.
Its result is as follow.
This clustering is complicated.
the kernel is the best way of non-linear clustering.
CODE
My code of kernel k-means algorithm is published in this page.
A git_Kmeans_def.py file is written function used in normal k-means.
A git_Kmeans_main.py file is the main file. This file is written if __name == '__main__':.
A git_kernel_Kemans_def.py file is written function used in kernel k-means.
A git_kernel_Kemans_main.py file is a main file. This file is written if __name__ == '__main__':
Reference
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf
https://sites.google.com/site/dataclusteringalgorithms/kernel-k-means-clustering-algorithm
Today, I implement kernel k-means. The k-means algorithm is clustering algorithm. A reason that I implement kernel k-means algorithm is that I and my friend conceived introducing kernel to k-means. I investigated paper of kernel k-means. I found [This page](http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf) Thus I could implement kernel k-means algorithm. I introduce the implementation of normal k-means and kernel k-means.
I handle the only implementation of kernel k-means. I will write the theory of kernel k-means. If I finished writing it, I publish on this post.
# I finished. Theorem of K-means
Overview
- dataset
- a few explaining k-means
- k-means
- kernel k-means
Dataset
I used two datasets. first data is designated for normal k-means. second data is designated for kernel k-means.
First data has consisted of three group and two-dimensional data, 300 samples.
The distribution is as follow.
Second data has consisted of two groups and two dimensional, 300 samples.
The distribution is as follow.
I publish a code of dataset.
THIS PAGE!!
A few explaining k-means
k-means algorithm computes mean vector in K class. second, k-means algorithm computes the distance between each data point and each mean vector. third, k-means algorithm choice as a new label of data point. How to choice is a mean vector in class K which minimize the distance between a mean vector and data point.
k-means
Firstly, I implement normal k-means algorithm. I use first data to test my code. A Result of the test is complicated.
The centroid is mean vactor.
However, the k-means algorithm has weak points. You can understand by looking as follow.
This image is results that I use my k-means algorithm for second data.
Normal k-means depend on Euclid distance between the mean vector and data point in data space. Therefore I failed to cluster.
Kernel k-means
I failed to cluster in normal k-means.
However, I success clustering by using kernel trick.
Its result is as follow.
This clustering is complicated.
the kernel is the best way of non-linear clustering.
CODE
My code of kernel k-means algorithm is published in this page.
A git_Kmeans_def.py file is written function used in normal k-means.
A git_Kmeans_main.py file is the main file. This file is written if __name == '__main__':.
A git_kernel_Kemans_def.py file is written function used in kernel k-means.
A git_kernel_Kemans_main.py file is a main file. This file is written if __name__ == '__main__':
Reference
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_spectral_kernelkmeans.pdf
https://sites.google.com/site/dataclusteringalgorithms/kernel-k-means-clustering-algorithm
コメント
コメントを投稿