Introduction
Today, I will explain MAP estimation(maximum a posteriori estimation).
MAP estimation is used Bayes' thorem. If sample data is few, we can not belive value by Maximum likelihood estimation. Then, MAP estimation is enable to include our sense.
Overveiw
Bayes' theorem
Bayes' theorem is
$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
$P(A|B)$ is Probability when B occur.
Please go on http://takutori.blogspot.com/2018/04/bayes-theorem.html to know detail of Bayes' theorem.
Map estimation
Map estimation is used Bayes' theorem. Map estimation estimate parameter of population by maximuzing posterior probability.
Now, suppoce we get data $x_1,x_2,...,x_n$ from population which have parameter $\theta$. Then, we want to $P(\theta|x_1,x_2,...,x_n)$.
Here, we use Bayes' theorem.
$$P(\theta|x_1,x_2,...,x_n) = \frac{P(x_1,x_2,...,x_n | \theta ) P(\theta)}{P(x_1,x_2,...,x_n)}$$
here, $P(\theta)$ is Prior distribution of $\theta$.
Because $x_1,x_2,...,x_n$ is indpendence each other,
$$P(x_1,x_2,...,x_n | \theta ) = \Pi_{i=1}^n P(x_i|\theta)$$.
Therefore, MAP estimation is
$$\theta^{\star} = \arg \max_{\theta} \frac{\Pi_{i=1}^n P(x_i|\theta) P(\theta)}{P(x_1,x_2,...,x_n)}$$
$P(x_1,x_2,...,x_n)$ do not dependent on $\theta$, Therefore MAP estimation is express as follows.
$$\theta^{\star} = \arg \max_{\theta}\Pi_{i=1}^n P(x_i|\theta) P(\theta)$$
Conjugate distribution
Conjugate distribution is a convenient distribution. In general,
The posterior distribution is consist of complex form. However, It is possible to simplify it by using conjugate distribution. When conjugate distribution is chosen for prior distribution, posterior distribution' from consistent prior distribution' from. Actually, I will calculate it next section. The famous conjugate distribution is
$$P(\theta|D) = P(D|\theta)P(\theta)$$
Today, I will explain MAP estimation(maximum a posteriori estimation).
MAP estimation is used Bayes' thorem. If sample data is few, we can not belive value by Maximum likelihood estimation. Then, MAP estimation is enable to include our sense.
Overveiw
- Bayes' theorem
- MAP estimation
- Conjugate distribution
Bayes' theorem
Bayes' theorem is
$$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
$P(A|B)$ is Probability when B occur.
Please go on http://takutori.blogspot.com/2018/04/bayes-theorem.html to know detail of Bayes' theorem.
Map estimation
Map estimation is used Bayes' theorem. Map estimation estimate parameter of population by maximuzing posterior probability.
Now, suppoce we get data $x_1,x_2,...,x_n$ from population which have parameter $\theta$. Then, we want to $P(\theta|x_1,x_2,...,x_n)$.
Here, we use Bayes' theorem.
$$P(\theta|x_1,x_2,...,x_n) = \frac{P(x_1,x_2,...,x_n | \theta ) P(\theta)}{P(x_1,x_2,...,x_n)}$$
here, $P(\theta)$ is Prior distribution of $\theta$.
Because $x_1,x_2,...,x_n$ is indpendence each other,
$$P(x_1,x_2,...,x_n | \theta ) = \Pi_{i=1}^n P(x_i|\theta)$$.
Therefore, MAP estimation is
$$\theta^{\star} = \arg \max_{\theta} \frac{\Pi_{i=1}^n P(x_i|\theta) P(\theta)}{P(x_1,x_2,...,x_n)}$$
$P(x_1,x_2,...,x_n)$ do not dependent on $\theta$, Therefore MAP estimation is express as follows.
$$\theta^{\star} = \arg \max_{\theta}\Pi_{i=1}^n P(x_i|\theta) P(\theta)$$
Conjugate distribution
Conjugate distribution is a convenient distribution. In general,
The posterior distribution is consist of complex form. However, It is possible to simplify it by using conjugate distribution. When conjugate distribution is chosen for prior distribution, posterior distribution' from consistent prior distribution' from. Actually, I will calculate it next section. The famous conjugate distribution is
A | B | C | |
---|---|---|---|
1
|
Conjugate distribution
| likelihood |
posterior distribution
|
2
| beta | Bernoulli | beta |
3
| beta | Binomial | beta |
4
| Gaussian | Gaussian(sigma is known) | Gaussian |
5
| inverse gamma |
Gaussian(sigma is unknown)
| inverse gamma |
6
| gamma | Poisson | gamma |
.
Example
$$ Beta(\theta|a,b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1} $$
This is beta distribution. When we MAP estimate beta distribution, prior distribution is gamma distribution.
$$ \Gamma(x) = \int_0^\infty u^{x-1}e^{-u}du $$
posterior distribution is
$$P(\theta|D) = P(D|\theta)P(\theta)$$
$$=\Pi_{i=1}^{n}\theta^{x_i}(1-\theta)^{1-x_i}\frac{\Gamma(a+b}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1}$$
Because $x_i$is $1~or~0$,
$$ p(x=1,\theta)p(x=1,\theta)p(x=,\theta) =\theta\theta(1-\theta) $$.
Thus,
Thus,
$$ \Pi_{i=1}^{n}\theta^{x_i}(1-\theta)^{x_i} = \theta^{\sum_{i=1}^{n}x_i}(1-\theta)^{\sum_{i=1}^{n}(1-x_i)} $$
$$P(\theta|D) = \theta^{\sum_{i=1}^{n}x_i}(1-\theta)^{\sum_{i=1}^{n}(1-x_i)}\frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1} $$
$$= \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{(\sum_{i=1}^{n}x_i)+a-1}(1-\theta)^{(\sum_{i=1}^{n}(1-x_i))+b-1}$$
Thus,
$$P(\theta|D) \propto \theta^{(\sum_{i=1}^{n}x_i)+a-1}(1-\theta)^{(\sum_{i=1}^{n}(1-x_i))+b-1}$$
This optimazing problem is solve by $\log$.
$$\log P(\theta|D) \propto \{(\sum_{i=1}^{n}x_i)+a-1\}\log \theta + \{(\sum_{i=1}^{n}(1-x_i))+b-1\}\log (1-\theta) \nonumber$$
Because
$$ \sum_{i=1}^{n}x_i + \sum_{i=1}^{n}(1-x_i) = n $$, optimize value is
$$ \theta_{MAP} = \frac{(\sum_{i=!}^{n}x_i)+a-1}{n+a+b-2} $$
Reference
コメント
コメントを投稿