Introduction
Today, I will write about the Maximum likelihood estimation. This is basically the Statistics estimation. I want to explain an example of Maximum likelihood estimation. Firstly, I will explain likelihood. Secondly, I will likelihood function. Thirdly, I will explain the Maximum likelihood estimation.
Overview
likelihood
Let we get the observation data by a precondition.
When we estimate precondition by an observation data, the likelihood is a plausible value which indicated that its estimation is correct.
Maybe, you can not understand this meaning. Also, I could not understand.
I give you an example of likelihood.
I throw a coin. this coin land heads up by probability P, and lands head on the reverse by probability 1-P.
For example, when I throw 100 times a coin, all trial is head. Then, we estimate that probability P is 1.0.
If let P=0.5, Probability that the coin lands 100 times heads is $0.5^{100} = 7.88860e-31$. this is likelihood when let P=0.5.
If let P=0.99, Provability that the coin lands 100 times heads is $0.99^{100} = 0.3666....$. this is likelihood when let P=0.99.
when a phenomenon is fixed that the coin lands 100 times heads, P(100 times heads|P) is called likelihood function of variable P.
At result, the likelihood is that P(A|B = b) when A is fixed and it hold B=b.
We regard maximizing likelihood as reasonable value b.
For example, I use an earlier example.
when letting P=0.5, tje likelihood is 7.88860e-31. when letting P=0.99, likelihood is 0.3666.
Thus, we think that it is natural for our to regard P=0.99.
Thus, P=0.99 is optimum than P=0.5.
Maximum likelihood estimation
The Maximum likelihood estimation is a method that we estimate a parameter of the probability distribution from getting observation data.
The maximum likelihood estimation maximizes all likelihood.
Let probability distirbution function is $f$ and $X_1,X_2,...,X_n is specimen such that $$X_1,X_2, ..., X_n \sim f$.
Then, Probability that we get $X_1,X_2,..,X_n$ from $f$ is
$$\Pi_{i=1}^{N} P(X_i)$$
, because we have to think joint probability.
Thus, I define
$$L(\theta) = f(x_1,x_2,...,x_n|\theta)$$ called likelihood function.
Then,
$$\theta^{\star} \in \arg_{\theta} \max L(\theta)$$
$\theta$ is called maximum likelihood estimator,
and,
$$\frac{\partial}{\partial \theta} \log L(\theta)$$
is called likelihood equation.
I explain the reason that I use $\log$ next example of maximum likelihood estimation.
Example
I think about $x_1,x_2,...,x_n \in {0,1}$. $\forall i \in {1,2,..,n}$, If $x_i = 1$, the coin lands head i'th time. if $x_i$, the coin lands tail i'th time.
Then, likehood function is
$$L(\theta) = P(x_1,x_2,...,x_n|\theta) = \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i}$$
, because $\forall i \in {1,2,..,n}, \sim p(k;\theta) = \theta^k (1-\theta)^{1-k} ~~~~\textrm{for} k \in {0,1}$
here, $\theta$ is probability that the coin lands head.
I maximize $L(\theta)$ about $\theta$, but it is difficult to differentiating, because $L(\theta)$ is expressed multiplication.
I solve this problem.
Its method is $\log$ function.
$\log$ function is monotonically increasing function, thus it is consistented optimal solution of $L(\theta)$ and $\log L(\theta)$.
Thus, I think maximizing $\log L(\theta)$.
\begin{eqnarray*}
\log L(\theta) &=& \log \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i} \\
&=& \sum_{i=1}^N \log \theta^{x_i} + \log (1-\theta)^{1-x_i} \\
&=& \sum_{i=1}^N x_i \log \theta + (1-x_i)\log(1-\theta)
\end{eqnarray*}
Partial of this is
\begin{eqnarray*}
\frac{\partial}{\partial \theta} \log L(\theta) &=& 0 \\
\frac{\partial}{\partial \theta} \sum_{i=1}^N x_i \log \theta + (1-x_i) \log (1-\theta) &=& 0 \\
\sum_{i=1}^N \frac{x_i}{\theta} - \frac{1-x_i}{1-\theta} &=& 0 \\
\frac{1}{\theta} \sum_{i=1}^N x_i - \frac{1}{1-\theta_i} \sum_{i=1}^N (1-x_i) &=& 0 \\
(1-\theta) \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1-x_i &=& 0 \\
\sum_{i=1}^N x_i - \theta \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1 + \theta \sum_{i=1}^N x_i &=& 0 \\
\sum_{i=1}^N x_i - n \theta &=& 0 \\
\theta &=& \frac{\sum_{i=1}^N }{n} \\
\end{eqnarray*}
This optimum is mean of $x_1,x_2,..,x_n$.
If you get the phenomenon that head is 100 times and tail is 0 times.
Then $\theta = 1$
If you get the phenomenon that head is 50 times and tail is 50 times.
Then $\theta = 0.5$
Problem of Maximum likelihood estimation
For example,
If you get the phenomenon that head is 100 times and tail is 0 times, then $\theta = 1$,
but if you get the phenomenon that is 3 times and tail 0 times, then $\theta = 1$.
However, it is dangerous to judge $\theta = 1$ by 3 times implement.
It is problem, Thus maximum likelihood estimation is dangerous when a number of the trial is few.
Reference
https://ja.wikipedia.org/wiki/%E5%B0%A4%E5%BA%A6%E9%96%A2%E6%95%B0
Today, I will write about the Maximum likelihood estimation. This is basically the Statistics estimation. I want to explain an example of Maximum likelihood estimation. Firstly, I will explain likelihood. Secondly, I will likelihood function. Thirdly, I will explain the Maximum likelihood estimation.
Overview
- likelihood
- Maximum likelihood estimation
- the problem of Maximum likelihood estimation
likelihood
Let we get the observation data by a precondition.
When we estimate precondition by an observation data, the likelihood is a plausible value which indicated that its estimation is correct.
Maybe, you can not understand this meaning. Also, I could not understand.
I give you an example of likelihood.
I throw a coin. this coin land heads up by probability P, and lands head on the reverse by probability 1-P.
For example, when I throw 100 times a coin, all trial is head. Then, we estimate that probability P is 1.0.
If let P=0.5, Probability that the coin lands 100 times heads is $0.5^{100} = 7.88860e-31$. this is likelihood when let P=0.5.
If let P=0.99, Provability that the coin lands 100 times heads is $0.99^{100} = 0.3666....$. this is likelihood when let P=0.99.
when a phenomenon is fixed that the coin lands 100 times heads, P(100 times heads|P) is called likelihood function of variable P.
At result, the likelihood is that P(A|B = b) when A is fixed and it hold B=b.
We regard maximizing likelihood as reasonable value b.
For example, I use an earlier example.
when letting P=0.5, tje likelihood is 7.88860e-31. when letting P=0.99, likelihood is 0.3666.
Thus, we think that it is natural for our to regard P=0.99.
Thus, P=0.99 is optimum than P=0.5.
Maximum likelihood estimation
The Maximum likelihood estimation is a method that we estimate a parameter of the probability distribution from getting observation data.
The maximum likelihood estimation maximizes all likelihood.
Let probability distirbution function is $f$ and $X_1,X_2,...,X_n is specimen such that $$X_1,X_2, ..., X_n \sim f$.
Then, Probability that we get $X_1,X_2,..,X_n$ from $f$ is
$$\Pi_{i=1}^{N} P(X_i)$$
, because we have to think joint probability.
Thus, I define
$$L(\theta) = f(x_1,x_2,...,x_n|\theta)$$ called likelihood function.
Then,
$$\theta^{\star} \in \arg_{\theta} \max L(\theta)$$
$\theta$ is called maximum likelihood estimator,
and,
$$\frac{\partial}{\partial \theta} \log L(\theta)$$
is called likelihood equation.
I explain the reason that I use $\log$ next example of maximum likelihood estimation.
Example
I think about $x_1,x_2,...,x_n \in {0,1}$. $\forall i \in {1,2,..,n}$, If $x_i = 1$, the coin lands head i'th time. if $x_i$, the coin lands tail i'th time.
Then, likehood function is
$$L(\theta) = P(x_1,x_2,...,x_n|\theta) = \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i}$$
, because $\forall i \in {1,2,..,n}, \sim p(k;\theta) = \theta^k (1-\theta)^{1-k} ~~~~\textrm{for} k \in {0,1}$
here, $\theta$ is probability that the coin lands head.
I maximize $L(\theta)$ about $\theta$, but it is difficult to differentiating, because $L(\theta)$ is expressed multiplication.
I solve this problem.
Its method is $\log$ function.
$\log$ function is monotonically increasing function, thus it is consistented optimal solution of $L(\theta)$ and $\log L(\theta)$.
Thus, I think maximizing $\log L(\theta)$.
\begin{eqnarray*}
\log L(\theta) &=& \log \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i} \\
&=& \sum_{i=1}^N \log \theta^{x_i} + \log (1-\theta)^{1-x_i} \\
&=& \sum_{i=1}^N x_i \log \theta + (1-x_i)\log(1-\theta)
\end{eqnarray*}
Partial of this is
\begin{eqnarray*}
\frac{\partial}{\partial \theta} \log L(\theta) &=& 0 \\
\frac{\partial}{\partial \theta} \sum_{i=1}^N x_i \log \theta + (1-x_i) \log (1-\theta) &=& 0 \\
\sum_{i=1}^N \frac{x_i}{\theta} - \frac{1-x_i}{1-\theta} &=& 0 \\
\frac{1}{\theta} \sum_{i=1}^N x_i - \frac{1}{1-\theta_i} \sum_{i=1}^N (1-x_i) &=& 0 \\
(1-\theta) \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1-x_i &=& 0 \\
\sum_{i=1}^N x_i - \theta \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1 + \theta \sum_{i=1}^N x_i &=& 0 \\
\sum_{i=1}^N x_i - n \theta &=& 0 \\
\theta &=& \frac{\sum_{i=1}^N }{n} \\
\end{eqnarray*}
This optimum is mean of $x_1,x_2,..,x_n$.
If you get the phenomenon that head is 100 times and tail is 0 times.
Then $\theta = 1$
If you get the phenomenon that head is 50 times and tail is 50 times.
Then $\theta = 0.5$
Problem of Maximum likelihood estimation
For example,
If you get the phenomenon that head is 100 times and tail is 0 times, then $\theta = 1$,
but if you get the phenomenon that is 3 times and tail 0 times, then $\theta = 1$.
However, it is dangerous to judge $\theta = 1$ by 3 times implement.
It is problem, Thus maximum likelihood estimation is dangerous when a number of the trial is few.
Reference
https://ja.wikipedia.org/wiki/%E5%B0%A4%E5%BA%A6%E9%96%A2%E6%95%B0
コメント
コメントを投稿