Introduction
Today, I will write about the Maximum likelihood estimation. This is basically the Statistics estimation. I want to explain an example of Maximum likelihood estimation. Firstly, I will explain likelihood. Secondly, I will likelihood function. Thirdly, I will explain the Maximum likelihood estimation.
Overview
likelihood
Let we get the observation data by a precondition.
When we estimate precondition by an observation data, the likelihood is a plausible value which indicated that its estimation is correct.
Maybe, you can not understand this meaning. Also, I could not understand.
I give you an example of likelihood.
I throw a coin. this coin land heads up by probability P, and lands head on the reverse by probability 1-P.
For example, when I throw 100 times a coin, all trial is head. Then, we estimate that probability P is 1.0.
If let P=0.5, Probability that the coin lands 100 times heads is 0.5^{100} = 7.88860e-31. this is likelihood when let P=0.5.
If let P=0.99, Provability that the coin lands 100 times heads is 0.99^{100} = 0.3666..... this is likelihood when let P=0.99.
when a phenomenon is fixed that the coin lands 100 times heads, P(100 times heads|P) is called likelihood function of variable P.
At result, the likelihood is that P(A|B = b) when A is fixed and it hold B=b.
We regard maximizing likelihood as reasonable value b.
For example, I use an earlier example.
when letting P=0.5, tje likelihood is 7.88860e-31. when letting P=0.99, likelihood is 0.3666.
Thus, we think that it is natural for our to regard P=0.99.
Thus, P=0.99 is optimum than P=0.5.
Maximum likelihood estimation
The Maximum likelihood estimation is a method that we estimate a parameter of the probability distribution from getting observation data.
The maximum likelihood estimation maximizes all likelihood.
Let probability distirbution function is f and X_1,X_2,...,X_n is specimen such that X_1,X_2, ..., X_n \sim f.
Then, Probability that we get X_1,X_2,..,X_n from f is
\Pi_{i=1}^{N} P(X_i)
, because we have to think joint probability.
Thus, I define
L(\theta) = f(x_1,x_2,...,x_n|\theta) called likelihood function.
Then,
\theta^{\star} \in \arg_{\theta} \max L(\theta)
\theta is called maximum likelihood estimator,
and,
\frac{\partial}{\partial \theta} \log L(\theta)
is called likelihood equation.
I explain the reason that I use \log next example of maximum likelihood estimation.
Example
I think about x_1,x_2,...,x_n \in {0,1}. \forall i \in {1,2,..,n}, If x_i = 1, the coin lands head i'th time. if x_i, the coin lands tail i'th time.
Then, likehood function is
L(\theta) = P(x_1,x_2,...,x_n|\theta) = \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i}
, because \forall i \in {1,2,..,n}, \sim p(k;\theta) = \theta^k (1-\theta)^{1-k} ~~~~\textrm{for} k \in {0,1}
here, \theta is probability that the coin lands head.
I maximize L(\theta) about \theta, but it is difficult to differentiating, because L(\theta) is expressed multiplication.
I solve this problem.
Its method is \log function.
\log function is monotonically increasing function, thus it is consistented optimal solution of L(\theta) and \log L(\theta).
Thus, I think maximizing \log L(\theta).
\begin{eqnarray*} \log L(\theta) &=& \log \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i} \\ &=& \sum_{i=1}^N \log \theta^{x_i} + \log (1-\theta)^{1-x_i} \\ &=& \sum_{i=1}^N x_i \log \theta + (1-x_i)\log(1-\theta) \end{eqnarray*}
Partial of this is
\begin{eqnarray*} \frac{\partial}{\partial \theta} \log L(\theta) &=& 0 \\ \frac{\partial}{\partial \theta} \sum_{i=1}^N x_i \log \theta + (1-x_i) \log (1-\theta) &=& 0 \\ \sum_{i=1}^N \frac{x_i}{\theta} - \frac{1-x_i}{1-\theta} &=& 0 \\ \frac{1}{\theta} \sum_{i=1}^N x_i - \frac{1}{1-\theta_i} \sum_{i=1}^N (1-x_i) &=& 0 \\ (1-\theta) \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1-x_i &=& 0 \\ \sum_{i=1}^N x_i - \theta \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1 + \theta \sum_{i=1}^N x_i &=& 0 \\ \sum_{i=1}^N x_i - n \theta &=& 0 \\ \theta &=& \frac{\sum_{i=1}^N }{n} \\ \end{eqnarray*}
This optimum is mean of x_1,x_2,..,x_n.
If you get the phenomenon that head is 100 times and tail is 0 times.
Then \theta = 1
If you get the phenomenon that head is 50 times and tail is 50 times.
Then \theta = 0.5
Problem of Maximum likelihood estimation
For example,
If you get the phenomenon that head is 100 times and tail is 0 times, then \theta = 1,
but if you get the phenomenon that is 3 times and tail 0 times, then \theta = 1.
However, it is dangerous to judge \theta = 1 by 3 times implement.
It is problem, Thus maximum likelihood estimation is dangerous when a number of the trial is few.
Reference
https://ja.wikipedia.org/wiki/%E5%B0%A4%E5%BA%A6%E9%96%A2%E6%95%B0
Today, I will write about the Maximum likelihood estimation. This is basically the Statistics estimation. I want to explain an example of Maximum likelihood estimation. Firstly, I will explain likelihood. Secondly, I will likelihood function. Thirdly, I will explain the Maximum likelihood estimation.
Overview
- likelihood
- Maximum likelihood estimation
- the problem of Maximum likelihood estimation
likelihood
Let we get the observation data by a precondition.
When we estimate precondition by an observation data, the likelihood is a plausible value which indicated that its estimation is correct.
Maybe, you can not understand this meaning. Also, I could not understand.
I give you an example of likelihood.
I throw a coin. this coin land heads up by probability P, and lands head on the reverse by probability 1-P.
For example, when I throw 100 times a coin, all trial is head. Then, we estimate that probability P is 1.0.
If let P=0.5, Probability that the coin lands 100 times heads is 0.5^{100} = 7.88860e-31. this is likelihood when let P=0.5.
If let P=0.99, Provability that the coin lands 100 times heads is 0.99^{100} = 0.3666..... this is likelihood when let P=0.99.
when a phenomenon is fixed that the coin lands 100 times heads, P(100 times heads|P) is called likelihood function of variable P.
At result, the likelihood is that P(A|B = b) when A is fixed and it hold B=b.
We regard maximizing likelihood as reasonable value b.
For example, I use an earlier example.
when letting P=0.5, tje likelihood is 7.88860e-31. when letting P=0.99, likelihood is 0.3666.
Thus, we think that it is natural for our to regard P=0.99.
Thus, P=0.99 is optimum than P=0.5.
Maximum likelihood estimation
The Maximum likelihood estimation is a method that we estimate a parameter of the probability distribution from getting observation data.
The maximum likelihood estimation maximizes all likelihood.
Let probability distirbution function is f and X_1,X_2,...,X_n is specimen such that X_1,X_2, ..., X_n \sim f.
Then, Probability that we get X_1,X_2,..,X_n from f is
\Pi_{i=1}^{N} P(X_i)
, because we have to think joint probability.
Thus, I define
L(\theta) = f(x_1,x_2,...,x_n|\theta) called likelihood function.
Then,
\theta^{\star} \in \arg_{\theta} \max L(\theta)
\theta is called maximum likelihood estimator,
and,
\frac{\partial}{\partial \theta} \log L(\theta)
is called likelihood equation.
I explain the reason that I use \log next example of maximum likelihood estimation.
Example
I think about x_1,x_2,...,x_n \in {0,1}. \forall i \in {1,2,..,n}, If x_i = 1, the coin lands head i'th time. if x_i, the coin lands tail i'th time.
Then, likehood function is
L(\theta) = P(x_1,x_2,...,x_n|\theta) = \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i}
, because \forall i \in {1,2,..,n}, \sim p(k;\theta) = \theta^k (1-\theta)^{1-k} ~~~~\textrm{for} k \in {0,1}
here, \theta is probability that the coin lands head.
I maximize L(\theta) about \theta, but it is difficult to differentiating, because L(\theta) is expressed multiplication.
I solve this problem.
Its method is \log function.
\log function is monotonically increasing function, thus it is consistented optimal solution of L(\theta) and \log L(\theta).
Thus, I think maximizing \log L(\theta).
\begin{eqnarray*} \log L(\theta) &=& \log \Pi_{i=1}^{n} \theta^{x_i} (1-\theta)^{1-x_i} \\ &=& \sum_{i=1}^N \log \theta^{x_i} + \log (1-\theta)^{1-x_i} \\ &=& \sum_{i=1}^N x_i \log \theta + (1-x_i)\log(1-\theta) \end{eqnarray*}
Partial of this is
\begin{eqnarray*} \frac{\partial}{\partial \theta} \log L(\theta) &=& 0 \\ \frac{\partial}{\partial \theta} \sum_{i=1}^N x_i \log \theta + (1-x_i) \log (1-\theta) &=& 0 \\ \sum_{i=1}^N \frac{x_i}{\theta} - \frac{1-x_i}{1-\theta} &=& 0 \\ \frac{1}{\theta} \sum_{i=1}^N x_i - \frac{1}{1-\theta_i} \sum_{i=1}^N (1-x_i) &=& 0 \\ (1-\theta) \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1-x_i &=& 0 \\ \sum_{i=1}^N x_i - \theta \sum_{i=1}^N x_i - \theta \sum_{i=1}^N 1 + \theta \sum_{i=1}^N x_i &=& 0 \\ \sum_{i=1}^N x_i - n \theta &=& 0 \\ \theta &=& \frac{\sum_{i=1}^N }{n} \\ \end{eqnarray*}
This optimum is mean of x_1,x_2,..,x_n.
If you get the phenomenon that head is 100 times and tail is 0 times.
Then \theta = 1
If you get the phenomenon that head is 50 times and tail is 50 times.
Then \theta = 0.5
Problem of Maximum likelihood estimation
For example,
If you get the phenomenon that head is 100 times and tail is 0 times, then \theta = 1,
but if you get the phenomenon that is 3 times and tail 0 times, then \theta = 1.
However, it is dangerous to judge \theta = 1 by 3 times implement.
It is problem, Thus maximum likelihood estimation is dangerous when a number of the trial is few.
Reference
https://ja.wikipedia.org/wiki/%E5%B0%A4%E5%BA%A6%E9%96%A2%E6%95%B0
コメント
コメントを投稿