Machine Learning: E-M algorithm

Maximum likelihood

In more complicated cases… Although we can solve one given the other, we can’t solve it simultaneously

Similar to K-means.

:question: how is it related to “the third” situation? lec 20 p7

Supposing we imputed the missing $x^m$, we can calculate the parameters easily.

For one observation $x_i$,

The $x^m$ here is treated as some unknown while unwanted parameters $\theta_2$.

EM is a method than can both learn parameters and impute missing values at the same time.

We want a iterative method:

E: supposing the $q(\theta_2)$ conditional on $\theta_1^{(t)}$, we have $L(\theta_1)\propto \ln p(x \theta_1)$.
- Setting $q(\theta_2)=p(\theta_2 x,\theta_1)$, so the above KL term values 0.
M: argmax over $\theta_1$ of $L(\theta_1)$, we get $\theta_1^{(t+1)}$.

:question: Did we just use the empirical mean to impute the missing data $x_i^m$?