Machine Learning

Stanford Univ, Coursera

Multivariate Linear Regression

Linear Regression with Multiple Variable

(例) 「家の広さ $x_1$ 」「家の寝室の数 $x_2$ 」「家の階数 $x_3$ 」「家の築年広さ $x_4$ 」という4つの feature (特徴) を用いて「家の値段 $y$」を推測したい。 Hypothesis: $h_{\theta}(x) = {\theta}_0 + {\theta}_1 x_1 + {\theta}_2 x_2 + {\theta}_2 x_2 + {\theta}_3 x_3 + {\theta}_4 x_4 $

$n = $ the number of features (feature の数)
$m = $ the number of training examples (トレーニングセットの数)
$x^{(i)}_j =$ value of feature $j$ in the $i$th training example
$x^{(i)} =$ the features of the $i$th training example
$\displaystyle \boldsymbol{x} = \begin{pmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{pmatrix} \quad\quad where \quad x_0 = 0 $
$\displaystyle \boldsymbol{{\theta}} = \begin{pmatrix} {\theta}_0 \\ {\theta}_1 \\ \vdots \\ {\theta}_n \end{pmatrix} $
Hypothesis:
$ \begin{eqnarray} h_{\theta}(x) &=& {\theta}_0 + {\theta}_1 x_1 + {\theta}_2 x_2 + {\theta}_2 x_2 + {\theta}_3 x_3 + \cdots + {\theta}_n x_n \\ &=& \begin{pmatrix} {\theta}_0 & {\theta}_1 & \cdots & {\theta}_n \end{pmatrix} \begin{pmatrix} x_0 \\ x_1 \\ \vdots \\ x_n \end{pmatrix} \\ &=& \boldsymbol{{\theta}}^T \boldsymbol{x} \end{eqnarray} $

Gradient Descent

Repeat
$\displaystyle \quad {\theta}_j := {\theta}_j - \alpha \frac{1}{m} \sum_{i=1}^{m} ( h_{\theta} (x^{(i)}) - y^{(i)} ) x^{(i)}_j $
ただし $j=0,1,{\cdots},n$について ${\theta}_j$ を同時に更新する必要がある。

Feature Scaling (属性のスケール合わせ)

feature $x_j$ によって数値が極端に変わると、収束に時間がかかってしまうのでスケールを合わせる必要がある。

$\displaystyle \begin{eqnarray} x^{(i)}_j &=& \frac{x^{(i)}_j - {\mu}_j}{s_j} \quad\quad\quad \mbox{平均を引いた上で、スケールを[-0.5,0.5]に合わせる}\\ {\mu}_j &=& \frac{1}{m} \sum_{i=1}^{m} x^{(i)}_j \quad\quad\quad x \mbox{の平均}\\ s_j &=& {\max}_{i=1,\cdots,m}(x^{(i)}_j) - {\min}_{i=1,\cdots,m}(x^{(i)}_j) \end{eqnarray} $

注意:
"Week 2: Gradient Descent in Practice I - Feature Scaling" のまとめでは、 $\displaystyle \begin{eqnarray} x_i &=& \frac{x_i - {\mu}_i}{s_i} \\ \end{eqnarray} $ と書いてあるが、$i$と$j$の指すものが他の説明と異なるのでわかりにくいと思う。文章では正しくと説明されているようだが。

Learning Rate (学習率) の値の選び方

注意
"Gradient Descent in Practice II - Learning Rate"のビデオで $\alpha$が大き過ぎる場合の手書きの赤い線で $J(\theta)$が左右に交互に振れながら増大していく図を書いたが、これは間違いでは。左右に振れながら$J(\theta)$が増大していくグラフの横軸は $\boldsymbol{x}$の場合でなければならない。

$\alpha$が小さいと、conversion(収束)までに時間がかかる。 $\alpha$が大き過ぎると、$J(\theta)$ 減少しないことがあるかもしれないし、収束しないかもしれない。

Features and Polynomial Regression

複数の features を、ひとつの feature にまとめる場合もある。 Cost Function は線形である必要はなく、Polynomial (多項式) を使う場合もある。ただし、この場合はその項($x^2$とか$\sqrt{x}$とか)が取り得る値で割った値を使うこと。

2-1 test

{問題] $x_{1}^{(4)}$ は？

[答]

     1番目のデータののサイズ(size)
     1番目のデータのの築年数(age)
 〆  4番目のデータののサイズ(size)
     4番目のデータのの築年数(age)

2-2 test

[問題] $n$ feature があるとき、cost functionを次のように定義した。 $\displaystyle J( \theta ) = \frac{1}{2m} \sum_{i=1}^{m} (h_{\theta} (x^{(i)}) - y^{(i)}) ^2$。 linear regression (線形回帰)ではどの式と等価か？

[答]
〆 $\displaystyle J( \theta ) = \frac{1}{2m} \sum_{i=1}^{m} (\theta^T x^{(i)} - y^{(i)}) ^2$。
〆 $\displaystyle J( \theta ) = \frac{1}{2m} \sum_{i=1}^{m} ((\sum_{j=0}^{n} \theta_j x_j^{(i)}) - y^{(i)}) ^2$。
$\displaystyle J( \theta ) = \frac{1}{2m} \sum_{i=1}^{m} ((\sum_{j=1}^{n} \theta_j x_j^{(i)}) - y^{(i)}) ^2$。
$\displaystyle J( \theta ) = \frac{1}{2m} \sum_{i=1}^{m} ((\sum_{j=0}^{n} \theta_j x_j^{(i)}) - (\sum_{j=0}^{n} y^{(i)})) ^2$。

2-3 test

[問題] 家の値段を予測する。築年数をcaptureするように $x_i$ を選びたい。築年数は30年から50年で、平均は38年である。feature scaling と mean normalizationi を適用するときに、どの式を選ぶべきか？

[答]
   $\displaystyle x_i = \mbox{age of house}$
   $\displaystyle x_i = \frac{\mbox{age of house}}{50}$
   $\displaystyle x_i = \frac{\mbox{age of house}-38}{50}$
〆 $\displaystyle x_i = \frac{\mbox{age of house}-38}{20}$

2-4 test

[問題] learning rate をそれぞれ $\alpha = 0.01$, $\alpha = 0.1$, $\alpha = 1$ として3回 gradient descentを行った。3つのグラフの$alpha$はそれぞれどれか？

グラフ: A ... 急激に小さくなって収束する。
グラフ: B ... ゆっくりに小さくなって収束する。
グラフ: C ... 発散する。

[答] A: $\alpha=0.1$, B: $\alpha = 0.01$, C: $\alpha=1$

2-5 test

[問題] 家の価格を広さ (size) の関数として求めたい。モデルは $h_{\theta} (x) = \theta_0 + \theta_1 (size) + \theta_2 \sqrt{(size)}$.

size の範囲は[1, 1000]として、fitting model を次の式とする。 $h_{\theta} (x) = \theta_0 + \theta_1 x_1 + \theta_2 x_2$.

[答]
   $\displaystyle x_1 = \mbox{size}, x_2 = 32 \sqrt{(\mbox{size})}$
   $\displaystyle x_1 = 32(\mbox{size}), x_2 = \sqrt{(\mbox{size})}$
〆 $\displaystyle x_1 = \frac{\mbox{size}}{1000}, x_2 = \frac{\sqrt{(\mbox{size})}}{32}$
   $\displaystyle x_1 = \frac{\mbox{size}}{32}, x_2 = \sqrt{(\mbox{size})}$

Yoshihisa Nitta

http://nw.tsuda.ac.jp/