2013年3月9日星期六

the center of gravity of affinity parameters locates in the origin

在某人的Blog里看到的一个引理,很有意思。才知道L2还有这样一层物理意义。

Modeling

$I$ users, $F$ features and $C$ categories.
Training data: $x_{if}$ is the feature value of user $i$ for feature $f$. $x_{i}$ is the feature vector of user $i$.
Labeling: $y_{ic}$ is the category value of user $i$ for category $c$, for binary categorization, it is in $\{0, 1\}$. In weighed/generalized model, it can be any real number.
Take $\beta_{fc}$ as the affinity parameter (to train) between feature $f$ and category $c$. $\beta_{c}$ is the feature vector of category $c$.

Scoring

$p_{ic}$ is the probability that user $i$ belongs to category $c$.
$p_{ic} \propto e^{x_{i}\beta_{c}}$
Since $\sum_c{p_{ic}}=1$, we have $p_{ic}=\frac{e^{x_{i}\beta_{c}}}{\sum_c{e^{x_{i}\beta_{c}}}}$
We also take $p_{ic}$ as the score of user $i$ for category $c$.

Maximize Likelihood

The likelihood of $\beta$ is
$L(\beta) = \prod_{i,c}{p_{ic}^{y_{ic}}}$
$- ln(L(\beta)) = - \sum_{i,c}{y_{ic}ln(p_{ic})}$
With L2 regularization, we will minimize:
$$\Phi(\beta)= \frac{1}{2}\|\beta\|^2 - \sum_{i,c}{y_{ic}ln(p_{ic})}$$ Calculate the derivative:
Fixing any $f_0 < F, c_0 < C$ $$ \frac{\partial\Phi(\beta)}{\partial \beta_{f_0 c_0}} =\beta_{f_0 c_0} - \sum_{i,c}\frac{y_{ic}}{p_{ic}}\frac{\partial p_{ic}}{\partial \beta_{f_0 c_0}} $$ Where $$ \begin{align*} \frac{\partial p_{ic}}{\partial \beta_{f_0 c_0}} &=\frac{1_{c=c_0} e^{x_{i}\beta_{c}} x_{if_0} \sum_c{e^{x_{i}\beta_{c}}} - e^{x_{i}\beta_{c}} e^{x_{i}\beta_{c_0}} x_{if_0}}{(\sum_c{e^{x_{i}\beta_{c}}})^2} \\ &=1_{c=c_0} p_{ic} x_{if_0} - p_{ic} \frac{e^{x_{i}\beta_{c_0}}}{\sum_c{e^{x_{i}\beta_{c}}}} x_{if_0} \\ &=p_{ic} x_{if_0} (1_{c=c_0} - p_{ic_0}) \end{align*} $$ So $$ \begin{align*} \frac{\partial\Phi(\beta)}{\partial \beta_{f_0 c_0}} &=\beta_{f_0 c_0} - \sum_{i,c}{y_{ic} x_{if_0} (1_{c=c_0} - p_{ic_0})} \\ &=\beta_{f_0 c_0} - \sum_{i}{x_{if_0} (y_{ic_0} - p_{ic_0} \sum_c{y_{ic}})} \end{align*} $$

Lemma 1

Take $\sum_c{y_{ic}}=1$ and ignor regulizer $\beta_{f_0 c_0}$, $$\frac{\partial\Phi(\beta)}{\partial \beta_{f_0 c_0}}=- \sum_{i}{x_{if_0} (y_{ic_0} - p_{ic_0})}$$ If we under score category $c_0$ for all users, that is to say $$p_{ic_0} < y_{ic_0}$$ Then $$\frac{\partial\Phi(\beta)}{\partial \beta_{f_0 c_0}} < 0$$ So we should enlarge $\beta_{f_0 c_0}$

Lemma 2

If $\beta$ is the local minimization point $$\frac{\partial\Phi(\beta)}{\partial \beta_{f_0 c_0}} = 0 , \forall f_0, c_0$$ That is to say $$\beta_{f_0 c_0} = \sum_{i}{x_{if_0} (y_{ic_0} - p_{ic_0} \sum_c{y_{ic}})} , \forall f_0, c_0$$ Summarize over all $c_0$, we have $$\sum_{c_0}{\beta_{f_0 c_0}} = \sum_{i}{x_{if_0}(\sum_{c_0}{y_{ic_0}} - \sum_{c_0}{p_{ic_0}} \sum_c{y_{ic}})} = 0, \forall f_0$$ It means: as long as we have L2 regularization, no matter if $\sum_c{y_{ic}}=1$, the center of gravity of affinity parameters locates in the origin.