Implementation of Logistic Regression

Introduction

Today, I implement Logistic Regression.
My OS of computer is the windows10.
Implementation is used by Python3.
I use the IRLS to estimate optimization value.
I introduce the theory of Logistic Regression in another post.
If you interested, look at this post.

Overview

I will introduce used data set
I will introduce my code in Python
I will show you result on Command line.

Dataset

I use this dataset to implement Logistic Regression.
This dataset is Residential area data.
I diplay this data in Pandas DataFrame Python3.
enter image description here

This is data set from top to five elements.
if people live the house, occupancy is 1.
if people do not live the house, occupancy is 0.
This data consist of 8000 samples to use as training data, and 2000 samples to use as test data.
However I use 100 samples as training data and 100 samples as test data, because my computer is not designated programing.
Sorry, .

CODE

This code is very long.
Thus, I publish my code of Logistice Regression in my Github.
My github page
My Logistic Regression code(def file)
My Logistic Regression code(main file)
Separating my code have reason. It is that I want to separate define file and main file. main file have

if __name__ == '__mian__'

my def file have algorithm of Logistic Regression. My code is defined class in def file. I will write about class in Python.

Execution!

w is estimating…
enter image description here

I will save figure of value of Closs-entropy error function
enter image description here

This is sactter plot of closs entropy error function.
enter image description here

I find out decreasing of value of closs entropy error funtion.
I finished estimating optimization.
I will test my model of Logistic Regression.
enter image description here

I compare my predict by Logistic Regression and correct class.
Percentage of correct answer is 98per.
I think this is high score.
By the way, Logistic Regression find out probability that each data point exists

$C_1$
Please check out P columns.
As long as I identify, Almost the P is not near 0.5.

このブログの人気の投稿

カーネルK-means 理論編

Introduction English ver 今日は、カーネルK-meansの理論について書きます。カーネルK-meansは通常のK-meansの欠点を補うことができます。通常のK-meansの欠点とカーネルK-meansの強みも説明します。もし、まだ御覧になられていなければ、通常の K-means 理論編の記事を見ていただけるとよいのではないかと思います。カーネルK-meansの実装編も併せてご覧ください。概要 K-meansの弱点カーネルトリックカーネルK-means アルゴリズム K-meansの弱点例えば、次のようなデータを用意します。このデータはK-meansによってうまく分類することはできません。なぜなら通常のK-meansでは、データとプロトタイプのユークリッド距離に依存しているからです。そのため、このような円状に分布しているデータはうまく分類することができません。プロトタイプとはそれぞれのクラスにあり、そのクラスを代表するようなもののことです。K-meansでは各クラスの平均ベクトルとなります。それゆえ、以下のような分類になってしまいます。このようなデータではK-meansはうまくいきません。 K-meansで分類できるデータセットは次のように各クラスで固まっている必要があります。カーネルK-meansはK-meansの弱点を補います。カーネルトリック初めに、カーネルトリックを説明します。線形分離できないようなデータ

$X$ を例えば次のように線形分離できるように

$\phi(x)$ に送る写像

$\phi$ を考えます。カーネルは次のように定義されます。

$K(x,y) = \phi(x)^T \phi(y)$

$\phi$ を具体的に計算することは難しいですが、

$K(x,y)$ を計算することなら簡単です。この手法をカーネルトリックと呼ばれます。カーネルK means K-meansの目的関数を復習しておきます。

$J = \sum_{n=1}^{N} \sum_{k=1}^{K} r_{nk} ||x_n-\mu_k||^2$ ここで、プロトタイプは

$\mu_i ~\forall k \in K$ としま...

Visualization of Variational method

Introduction Today, I will implement visualization of Variational method. Variational method is used when we want to minimize functional. functional is function of function. Please look at [1] , [2] , [3] , [5] and [6] . Overview formula Implementation Visualization Formula I used following formula.

$F(x) = \sqrt{1+(\frac{du}{dx}(x))^2}$

$l(u) = \int_{0}^{1} \sqrt{1+(\frac{du}{dx}(x))^2} dx$ l(u) is the length of the u(x). I want to minimize l(u) subject to

$u(0)=a$ and

$u(1)=b$ . u minimizing I(u) is

$u(x) = (b-a)x+a$ This u is line from (0,a) to (1,b). Because l(u) is the length of the u(x), We found out that u minimizing l(u) is line. Please look [5] to calculate of variational method. Implementation I implement visualization of variational method to check difference of optimize curve and other curve. Let

$u_A$ is

$u_A = (b-a)x+a + A sin(8t)$

$A sin(8t)$ increase the di...

Kullback-Leibler divergence

Introduction sorry, this page is Japanese only. 今日がダイバージェンスについて書いていきます。ちなみにエントロピーの知識を使うのでエントロピーの記事も見てあげてください。エントロピーの記事はこちら Kullback-Leibler Divergence 二つの確率分布の平均エントロピーの差を表す値をKLダイバージェンスといいます。式では次のように定義されます。

$KL(P||Q) = \int_{-\infty}^{\infty} P(X) log \frac{P(X)}{Q(X)}$ 離散の場合は

$KL(P||Q) = \sum_{i} P(X_i) log \frac{P(X_i)}{Q(X)}$ なぜ二つの分布間の距離をこのように定義できるのでしょうか。式の解釈真の分布P(X)が存在するとします。しかし、有限のデータから真の分布P(X)を求めるのは難しいです。そこで、有限のデータから推定して得られた確率分布をQ(X）とします。では真の分布P(X）と推定した分布Q(X)はどれだけ違っているのでしょうか。ここで登場するのがエントロピーです。エントロピーはその分布の不確実性を示す値でした。エントロピーが高いほど不確かなことが起こるとゆうことです。 P(X)のエントロピーとは

$-\int_{-\infty}^{\infty} logP(X)$ でした。では推定した確率分布Q(X）は確率分布P(X)に対してどれだけ不確実性を持っているのでしょうか。エントロピーとは情報量の期待値でした。確率分布Q(X）が持つ情報量は

$-logQ(X)$ です。この情報量を確率P(X)で期待値をとります。式は以下のようになります。

$-\int_{-\infty}^{\infty} P(X) logQ(X)$ この値と真の分布のエントロピーとの差を二つの分布間の差として定義します。式では以下のようになります。

$-\int_{-\infty}^{\infty} P(X) logQ(X) - (--\int_{-\infty}^{\infty} P(X) logP(X)))$ これを式変形すると $$-\int_{-\infty}^...

journey of Froakie (ケロマツの旅路)

このブログを検索

Implementation of Logistic Regression

Introduction

Overview

Dataset

CODE

Execution!

ラベル

コメント

コメントを投稿

このブログの人気の投稿

カーネルK-means 理論編

Visualization of Variational method

Kullback-Leibler divergence