<h2>3章 TensorFlow を使ったニューラルネットワークの実装</h2>

<h3>3.3 TensorFlowのインストール</h3>
<p>
    本では Ubuntu Linux と MacOS のインストール方法が述べられているが、
    自分は Windows を使うので conda 上の
    「<a href="/lec/python/anaconda/deep-graph/">Deep Learning のための python + Visualization 環境</a>」
    のやり方でインストールした deep-graph または gpu-graph 環境を用いた。    
</p>
<p>
    公式の情報は<a href="https://www.tensorflow.org/install/">こちら</a>。

In [1]:
import tensorflow as tf
deep_learning = tf.constant("Deep Learning")
session = tf.Session()
r1 = session.run(deep_learning)
print(r1)

a = tf.constant(2)
b = tf.constant(3)
multiply = tf.multiply(a, b)
r2 = session.run(multiply)
print(r2)

b'Deep Learning'
6


<h3>3.4 TensorFlow の Variable の生成と操作</h3>
<p>
    TensorFlow の Variable とは、テンソルをメモリ上に格納したバッファのようなもの。
    通常のテンソルは、グラフが実行されるときにインスタンス化され、実行が終わると直ちに破棄される。
    Variable はテンソルであるが、複数回のグラフの実行にまたがってメモリ上に存在することができる。
</p>

<ul>
    <li>グラフが最初に利用される時点で、Variable は明示的に初期化されていなければならない。</li>
    <li>Gradient Descent によって Variable を何度も更新することで、モデルにおける最適のパラメータを発見できる。</li>
    <li>Variable に保持された値を保存し、後で必要になったときに読み込むことができる。</li>
</ul>

<p>
    tf.Variable を呼び出すと、次の3つの操作が計算グラフに追加される。
<ul>
    <li> Variable の初期化に使うテンソルを生成する操作</li>
    <li> Variable の使用に先立って、Variable に初期値のテンソルを割り当てる tf.assign 操作</li>
    <li> Variable の現在の値を保持する操作</li>
</ul>
</p>

<p>
    Variable を利用する際には tf.assign メソッドが実行されている必要がある。
<ul>
    <li>tf.global_variables_initializer() 計算グラフ中のすべてのtf.assignが呼び出される。</li>
    <li>tf.initialize_variables([var1, var2, ...]) 指定したVariableのtf.assignだけが呼び出される。</li>
</ul>
</p>


https://www.tensorflow.org/api_docs/python/tf/Variable <br />
https://www.tensorflow.org/api_docs/python/tf/random_normal <br />
https://www.tensorflow.org/api_docs/python/tf/assign <br />
https://www.tensorflow.org/api_docs/python/tf/initialize_variables <br />

In [2]:
# shape=[300, 200]
# the standard deviation of the normal distribution = 0.5
# trainable (default)

weights = tf.Variable(
    tf.random_normal([300, 200], stddev=0.5),
    name="weights"
)

# not trainable
weights = tf.Variable(
    tf.random_normal([300, 200], stddev=0.5),
    name="weights",
    trainable=False
)

In [3]:
# How to initialize Variable

shape = [300, 200]

tf.zeros(shape, dtype=tf.float32, name=None)
tf.ones(shape, dtype=tf.float32, name=None)
tf.random_normal(
    shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None
    )
tf.truncated_normal(
    shape, mean=0.0, stddev=1.0, dtype=tf.float32, seed=None, name=None
    )
tf.random_uniform(
    shape, minval=0, maxval=None, dtype=tf.float32, seed=None, name=None
    )

<tf.Tensor 'random_uniform:0' shape=(300, 200) dtype=float32>

<h3>3.5 TensorFlow での操作</h3>

| カテゴリー | 例 |
|------------|----|
|要素ごとの算術演算|Add, Sub, Mul, Div, Exp, Log, Greater, Less, Equal, ... |
| 配列の操作 | Concat, Slice, Split, Constant, Rank, Shape, Shuffle, ... |
| 行列の操作 | MatMul, MatrixInverse, MatrixDeterminant, ... |
| 内部状態を持った操作 | Variable, Assign, AssignAdd, ... |
| NNのlayer | SoftMax, Sigmoid, ReLU, Convolution2D, MaxPool, ... |
| チェックポイント | Save, Restore |
| キューと同期 | Enqueue, Dequeue, MutexAcquire, MutexRelease, ... |
| 制御フロー | Merge, Switch, Enter, Leave, NextIteration |


<h3>3.6 プレースホルダのテンソル</h3>
<p>
    Variable は一度しか初期化されない。それに対して、グラフの実行のたびに値が設定されるのが PlaceHolder である。Session.run(), Tensor.eval(), Operation.run() のオプションの feed_dict で値を与える。
</p>

https://www.tensorflow.org/api_docs/python/tf/placeholder <br />

In [4]:
x = tf.placeholder(tf.float32, name="x", shape=[None, 784])
W = tf.Variable(tf.random_uniform([784, 10], -1, 1), name="W")
multiply = tf.matmul(x, W)

<h3>3.7 TensorFlow でのセッション</h3>

<p>
    session は初期状態の計算グラフを作成する役割を果たす。
</p>

https://www.tensorflow.org/api_docs/python/tf/Session <br />


In [5]:
# session.py

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("data", one_hot=True)
minibatch_x, minibatch_y = mnist.train.next_batch(32)

x = tf.placeholder(tf.float32, name="x", shape=[None, 784])
W = tf.Variable(tf.random_uniform([784, 10], -1, 1), name="W")
b = tf.Variable(tf.zeros([10]), name="biases")

output = tf.matmul(x, W) + b

init_op = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init_op)
feed_dict = {x : minibatch_x}
sess.run(output, feed_dict=feed_dict)

W0822 09:59:49.908180 11828 deprecation.py:323] From <ipython-input-5-be8b68b8381e>:6: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
W0822 09:59:49.908180 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
W0822 09:59:49.911148 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for up

Extracting data\train-images-idx3-ubyte.gz


W0822 09:59:50.541388 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
W0822 09:59:50.570414 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
W0822 09:59:50.735927 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\contrib\learn\python\learn\datasets\mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be rem

Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz


W0822 09:59:51.122924 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\python\util\tf_should_use.py:193: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.


array([[ 3.07297707e-01,  2.12898684e+00,  2.15093970e+00,
         2.72409248e+00,  2.02692962e+00, -2.97545695e+00,
         9.18175507e+00,  7.79365444e+00, -2.37528229e+00,
         7.19871092e+00],
       [ 1.66653228e+00, -7.28324056e-02,  1.09871101e+00,
         2.06557512e+00, -7.35610962e+00,  1.43941879e-01,
         1.21507454e+01,  4.68716621e+00,  7.46682501e+00,
         4.32856178e+00],
       [ 2.79686856e+00,  5.54188156e+00, -4.20636833e-01,
         8.48845720e-01, -3.96041155e+00,  8.47752810e-01,
         1.32612076e+01,  1.42815566e+00,  1.39339352e+00,
         7.58010864e+00],
       [ 4.96979475e+00,  1.15509367e+00,  1.31926739e+00,
         1.20149307e+01,  9.01111317e+00,  8.03406715e-01,
         7.18732595e+00,  5.85313654e+00, -4.85519648e+00,
         5.51384354e+00],
       [ 3.23182917e+00,  1.52423306e+01,  1.02857990e+01,
         6.07715702e+00,  8.90856934e+00,  3.75592232e-01,
         2.77842569e+00,  1.22090759e+01,  3.12774634e+00,
         1.

<h3>3.8 Variable のスコープと共有</h3>


In [6]:
# scope1.py

import tensorflow as tf

def my_network(input):
    W_1 = tf.Variable(tf.random_uniform([784, 100], -1, 1), name="W_1")
    b_1 = tf.Variable(tf.zeros([100]), name="biases_1")
    output_1 = tf.matmul(input, W_1) + b_1

    W_2 = tf.Variable(tf.random_uniform([100, 50], -1, 1), name="W_2")
    b_2 = tf.Variable(tf.zeros([50]), name="biases_2")
    output_2 = tf.matmul(output_1, W_2) + b_2

    W_3 = tf.Variable(tf.random_uniform([50, 10], -1, 1),name="W_3")
    b_3 = tf.Variable(tf.zeros([10]), name="biases_3")
    output_3 = tf.matmul(output_2, W_3) + b_3

    # printing names
    print("Printing names of weight parameters")
    print(W_1.name, W_2.name, W_3.name)
    print("Printing names of bias parameters")
    print(b_1.name, b_2.name, b_3.name)

    return output_3


i_1 = tf.placeholder(tf.float32, [1000, 784], name="i_1")
my_network(i_1)

i_2 = tf.placeholder(tf.float32, [1000, 784], name="i_2")
my_network(i_2)

Printing names of weight parameters
W_1_1:0 W_2:0 W_3:0
Printing names of bias parameters
biases_1:0 biases_2:0 biases_3:0
Printing names of weight parameters
W_1_2:0 W_2_1:0 W_3_1:0
Printing names of bias parameters
biases_1_1:0 biases_2_1:0 biases_3_1:0


<tf.Tensor 'add_6:0' shape=(1000, 10) dtype=float32>

<p>
scope1.py では、 my_network() を呼び出すたびに、異なる変数が用意されてしまうことがわかる。
たとえば "W_1" という変数は、最初の呼び出しでは W_1_1:0 , 2番目の呼び出しでは
W_1_2:0 という別の変数になっていることがわかる。
</p>

<p>
    以下に示す scope2.py では、関数を呼び出す度に同じ変数が使われる。
</p>

https://www.tensorflow.org/api_docs/python/tf/get_variable <br />
https://www.tensorflow.org/api_docs/python/tf/variable_scope <br />


In [7]:
# scope2.py

import tensorflow as tf

def layer(input, weight_shape, bias_shape):
    weight_init = tf.random_uniform_initializer(minval=-1, maxval=1)
    bias_init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", weight_shape, initializer=weight_init)
    b = tf.get_variable("b", bias_shape, initializer=bias_init)
    return tf.matmul(input, W) + b


def my_network(input):
    with tf.variable_scope("layer_1"):
        output_1 = layer(input, [784, 100], [100])

    with tf.variable_scope("layer_2"):
        output_2 = layer(output_1, [100, 50], [50])

    with tf.variable_scope("layer_3"):
        output_3 = layer(output_2, [50, 10], [10])

    return output_3

In [8]:
i_1 = tf.placeholder(tf.float32, [1000, 784], name="i_1")
my_network(i_1)

i_2 = tf.placeholder(tf.float32, [1000, 784], name="i_2")
# my_network(i_2)
# ValueError: Over-sharing: Variable layer_1/W already exists...

<p>
    tf.variable_scope() を宣言することで、Variableの名前は "layer_1/W" のように、名前空間がついたものになる。
</p>

<p>
    tf.get_variable() は Variable が既にインスタンス化されているかどうかをチェックする。
    デフォルトでは共有は無効化されているので、2度インスタンス化しようとするとエラーとなる。
    共有を有効化するには、scope.reuse_variables() を呼び出す。
<p>
    

In [9]:
with tf.variable_scope("shared_variables") as scope:
    i_1 = tf.placeholder(tf.float32, [1000, 784], name="i_1")
    my_network(i_1)
    scope.reuse_variables()
    i_2 = tf.placeholder(tf.float32, [1000, 784], name="i_2")
    my_network(i_2)

<h3>3.9 CPU と GPU 上でのモデルの管理</h3>

<p>
    with tf.device() を用いて特定のデバイスを選択できる。
    指定されたデバイスが利用できない場合はエラーとなる。
    使えるデバイスを使って計算を進める場合は Session のオプションとして allow_soft_placement フラグを指定する。
</p>

| 値 | 説明 |
| ----- | ------- |
| cpu:0 | CPU を表す |
| gpu:0 | 1つ目の GPU を表す |
| gpu:1 | 2つ目の GPU を表す |


In [15]:
import tensorflow as tf


with tf.device("/gpu:2"):
    a = tf.constant([1.0, 2.0, 3.0, 4.0], shape=[2, 2], name="a")
    b = tf.constant([1.0, 2.0], shape=[2, 1], name="b")
    c = tf.matmul(a, b)

sess = tf.Session(
    config=tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=True
    )
)

ans = sess.run(c)

print(ans)

[[ 5.]
 [11.]]


<p>
プログラムでは、
$$
\begin{pmatrix}
  1 & 2 \\
  3 & 4 \\
  \end{pmatrix}
  \begin{pmatrix}
  1 \\ 2 \\
  \end{pmatrix}
  =
  \begin{pmatrix}
  1 \times 1 + 2 \times 2 \\
  3 \times 1 + 4 \times 2 \\
  \end{pmatrix}  =
  \begin{pmatrix}
  1 \times 1 + 2 \times 2 \\
  3 \times 1 + 4 \times 2 \\
  \end{pmatrix}
  =
  \begin{pmatrix}
  5 \\
  11
  \end{pmatrix}
$$
を計算している。
</p>

In [14]:
# device.py

c = []

for d in ["/gpu:0", "/gpu:1"]:
    with tf.device(d):
        a = tf.constant([1.0, 2.0, 3.0, 4.0], shape=[2, 2], name="a")
        b = tf.constant([1.0, 2.0], shape=[2, 1], name="b")
        c.append(tf.matmul(a, b))

with tf.device("/cpu:0"):
    sum = tf.add_n(c)

sess = tf.Session(
    config=tf.ConfigProto(
        allow_soft_placement=True,
        log_device_placement=True
    )
)
ans = sess.run(sum)

print(ans)

[[10.]
 [22.]]



<p>
上の例では、同じ計算をしてできた行列
$ \begin{pmatrix} 5 \\ 11 \\ \end{pmatrix} $ を
リストに append　して
$ c = 
\begin{pmatrix}
  \begin{pmatrix} 5 \\ 11 \\ \end{pmatrix}
&
  \begin{pmatrix} 5 \\ 11 \\ \end{pmatrix}
\end{pmatrix}
$ となり、tf.add_n(c) を呼び出すので各要素が加算されて
$
  \begin{pmatrix}
  10 \\
  22
  \end{pmatrix}
$
となる。
</p>

<h3>3.10 ロジスティック回帰のモデルを記述する</h3>

<p>
    ロジスティック回帰 (logistic regression) は、入力をクラス分け (classify) する仕組みで、各クラスに属する確率が求まる。
</p>

$$
P(y=i|x) = softmax_i (Wx+b) = \frac{e^{W_i x + b_i}}{\sum_{j} e^{W_j x + b_j}}
$$

<p>
    まず、隠れ層 (hidden layer) を持たないニューラルネットワークを考える。
    (だんだん改良していく)
</p>

https://www.tensorflow.org/api_docs/python/tf/math/reduce_sum <br />
https://www.tensorflow.org/api_docs/python/tf/nn/softmax <br />
https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean <br />
https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer <br />
https://www.tensorflow.org/api_docs/python/tf/summary <br />
https://www.tensorflow.org/api_docs/python/tf/summary/histogram <br />
https://www.tensorflow.org/api_docs/python/tf/train <br />


In [16]:
# logistic_regression.py

import tensorflow as tf
import time, shutil, os
from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets("data", one_hot=True)

# Parameters
learning_rate = 0.01
training_epochs = 100
batch_size = 100
display_step = 1


def inference(x):
    init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", [784, 10], initializer=init)
    b = tf.get_variable("b", [10], initializer=init)
    output = tf.nn.softmax(tf.matmul(x, W) + b)
    tf.summary.histogram("weights", W)
    tf.summary.histogram("biases", b)
    tf.summary.histogram("output", output)
    return output


def loss(output, y):
    dot_product = y * tf.log(output)
    # Reduction along axis 0 collapses each column into a single
    # value, whereas reduction along axis 1 collapses each row 
    # into a single value. In general, reduction along axis i 
    # collapses the ith dimension of a tensor to size 1.
    xentropy = -tf.reduce_sum(dot_product, axis=1)
    loss = tf.reduce_mean(xentropy)
    return loss


def training(cost, global_step):
    tf.summary.scalar("cost", cost)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(cost, global_step=global_step)
    return train_op


def evaluate(output, y):
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("validation error", (1.0 - accuracy))
    return accuracy

Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz


In [17]:
if __name__ == "__main__":
    # Remove old summaries and checkpoints
    if os.path.exists("logistic_logs"):
        shutil.rmtree("logistic_logs")

    with tf.Graph().as_default():
        x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
        y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes
        output = inference(x)
        cost = loss(output, y)
        global_step = tf.Variable(0, name="global_step", trainable=False)
        train_op = training(cost, global_step)
        eval_op = evaluate(output, y)
        summary_op = tf.summary.merge_all()
        saver = tf.train.Saver()
        sess = tf.Session()
        summary_writer = tf.summary.FileWriter(
            "logistic_logs",
            graph_def=sess.graph_def
        )
        init_op = tf.global_variables_initializer()
        sess.run(init_op)

        # Training cycle
        for epoch in range(training_epochs):
            avg_cost = 0.
            total_batch = int(mnist.train.num_examples/batch_size)
            # Loop over all batches
            for i in range(total_batch):
                minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
                # Fit training using batch data
                sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
                # Compute average loss
                avg_cost += sess.run(
                    cost, feed_dict={x: minibatch_x, y: minibatch_y}
                ) / total_batch
            # Display logs per epoch step
            if epoch % display_step == 0:
                print("Epoch: {:04d} cost: {:.9f}".format(epoch+1, avg_cost))
                accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
                print("Validation Error: {}".format(1 - accuracy))
                summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
                summary_writer.add_summary(summary_str, sess.run(global_step))
                saver.save(sess, os.path.join("logistic_logs", "model-checkpoint"), global_step=global_step)

        print("Optimization Finished!")
        accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})
        print("Test Accuracy: {}".format(accuracy))

W0822 10:42:46.021615 11828 writer.py:199] Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.


Epoch: 0001 cost: 1.176797048
Validation Error: 0.15160000324249268
Epoch: 0002 cost: 0.662675477
Validation Error: 0.1284000277519226
Epoch: 0003 cost: 0.550757095
Validation Error: 0.12059998512268066
Epoch: 0004 cost: 0.496798432
Validation Error: 0.11379998922348022
Epoch: 0005 cost: 0.463822677
Validation Error: 0.11000001430511475


W0822 10:42:58.759492 11828 deprecation.py:323] From D:\sys\Anaconda3\envs\gpu-graph\lib\site-packages\tensorflow\python\training\saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


Epoch: 0006 cost: 0.440845016
Validation Error: 0.10579997301101685
Epoch: 0007 cost: 0.423962175
Validation Error: 0.10479998588562012
Epoch: 0008 cost: 0.410644458
Validation Error: 0.10259997844696045
Epoch: 0009 cost: 0.399916407
Validation Error: 0.10019999742507935
Epoch: 0010 cost: 0.390971053
Validation Error: 0.09939998388290405
Epoch: 0011 cost: 0.383357659
Validation Error: 0.0974000096321106
Epoch: 0012 cost: 0.376724289
Validation Error: 0.09600001573562622
Epoch: 0013 cost: 0.371043418
Validation Error: 0.09380000829696655
Epoch: 0014 cost: 0.365912697
Validation Error: 0.09280002117156982
Epoch: 0015 cost: 0.361401314
Validation Error: 0.09200000762939453
Epoch: 0016 cost: 0.357294162
Validation Error: 0.0899999737739563
Epoch: 0017 cost: 0.353541035
Validation Error: 0.0899999737739563
Epoch: 0018 cost: 0.350156383
Validation Error: 0.08980000019073486
Epoch: 0019 cost: 0.347043993
Validation Error: 0.08920001983642578
Epoch: 0020 cost: 0.344165125
Validation Error: 0.0

<h3>3.12 TensorBoard を使って計算グラフと学習を可視化する</h3>

https://www.tensorflow.org/guide/graph_viz <br />

<ul>
    <li>condaを起動してその中で tensorboard を起動する。</li>
<pre>
  (base) c:\Users\nitta> g:
  (base) g:\> cd マイドライブ\deeplearning\book3\ch03
  (base) g:\マイドライブ\deeplearning\book3\ch03> conda activate gpu-graph
  (gpu-graph) g:\マイドライブ\deeplearning\book3\ch03> tensorboard --logdir logistic_logs logistic_logs
</pre>
    <li>ブラウザを起動して http://localhost:6006/ にアクセスする。</li>
</ul>

<br /><img src="graph_logistic_logs.png" width="480" /><br /><br />
<br /><img src="ch03_tensorflow_graph_mnist.png" width="480" /><br /><br />

<h3>3.13 多階層の MNIST モデル</h3>

<p>
    256 子の ReLU ニューロンからなる隠れ層を2つ持つモデルを考える。
</p>

<p>
    ReLU ニューロンについては、
    <a href="https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf">
        [He 2015]</a>
で述べられているように、ネットワーク内での重みの分散を $\frac{2}{n_{in}}$ にする。
</p>

<p>
    softmax の計算を inference の中ではなく、損失の算出時に行うことに変更する。
</p>

<p>
    以上の変更により、性能がかなり改善されることがわかる。
</p>

In [18]:
# multilayer_perceptron.py

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import time, shutil, os

mnist = input_data.read_data_sets("data", one_hot=True)

# Architecture
n_hidden_1 = 256
n_hidden_2 = 256

# Parameters
learning_rate = 0.01
training_epochs = 300
batch_size = 100
display_step = 1


def layer(input, weight_shape, bias_shape):
    weight_init = tf.random_normal_initializer(stddev=(2.0/weight_shape[0])**0.5)
    bias_init = tf.constant_initializer(value=0)
    W = tf.get_variable("W", weight_shape, initializer=weight_init)
    b = tf.get_variable("b", bias_shape, initializer=bias_init)
    return tf.nn.relu(tf.matmul(input, W) + b)


def inference(x):
    with tf.variable_scope("hidden_1"):
        hidden_1 = layer(x, [784, n_hidden_1], [n_hidden_1])     
    with tf.variable_scope("hidden_2"):
        hidden_2 = layer(hidden_1, [n_hidden_1, n_hidden_2], [n_hidden_2])
    with tf.variable_scope("output"):
        output = layer(hidden_2, [n_hidden_2, 10], [10])
    return output


def loss(output, y):
    xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y)    
    loss = tf.reduce_mean(xentropy)
    return loss


def training(cost, global_step):
    tf.summary.scalar("cost", cost)
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    train_op = optimizer.minimize(cost, global_step=global_step)
    return train_op


def evaluate(output, y):
    correct_prediction = tf.equal(tf.argmax(output, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    tf.summary.scalar("validation", accuracy)
    return accuracy

Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz


In [19]:
if __name__ == "__main__":
    # Remove old summaries and checkpoints
    if os.path.exists("mlp_logs"):
        shutil.rmtree("mlp_logs")

    with tf.Graph().as_default():
        with tf.variable_scope("mlp_model"):
            x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
            y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes

            output = inference(x)
            cost = loss(output, y)
            global_step = tf.Variable(0, name="global_step", trainable=False)
            train_op = training(cost, global_step)
            eval_op = evaluate(output, y)
            summary_op = tf.summary.merge_all()
            saver = tf.train.Saver()
            sess = tf.Session()
            summary_writer = tf.summary.FileWriter(
                "mlp_logs",
                graph_def=sess.graph_def
            )
            init_op = tf.global_variables_initializer()
            sess.run(init_op)

            # Training cycle
            for epoch in range(training_epochs):
                avg_cost = 0.
                total_batch = int(mnist.train.num_examples/batch_size)
                # Loop over all batches
                for i in range(total_batch):
                    minibatch_x, minibatch_y = mnist.train.next_batch(batch_size)
                    # Fit training using batch data
                    sess.run(train_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    # Compute average loss
                    avg_cost += sess.run(cost, feed_dict={x: minibatch_x, y: minibatch_y})/total_batch
                # Display logs per epoch step
                if epoch % display_step == 0:
                    print("Epoch: {:04d} cost: {:.9f}".format(epoch+1, avg_cost))
                    accuracy = sess.run(eval_op, feed_dict={x: mnist.validation.images, y: mnist.validation.labels})
                    print("Validation Error: {}".format(1 - accuracy))
                    summary_str = sess.run(summary_op, feed_dict={x: minibatch_x, y: minibatch_y})
                    summary_writer.add_summary(summary_str, sess.run(global_step))
                    saver.save(sess, os.path.join("mlp_logs", "model-checkpoint"), global_step=global_step)

            print("Optimization Finished!")
            accuracy = sess.run(eval_op, feed_dict={x: mnist.test.images, y: mnist.test.labels})
            print("Test Accuracy: {}".format(accuracy))


W0822 17:21:57.726389 11828 deprecation.py:323] From <ipython-input-18-77c176d4a5e3>:39: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version.
Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow
into the labels input on backprop by default.

See `tf.nn.softmax_cross_entropy_with_logits_v2`.

W0822 17:21:58.263858 11828 writer.py:199] Passing a `GraphDef` to the SummaryWriter is deprecated. Pass a `Graph` object instead, such as `sess.graph`.


Epoch: 0001 cost: 1.155528759
Validation Error: 0.12980002164840698
Epoch: 0002 cost: 0.420003600
Validation Error: 0.09259998798370361
Epoch: 0003 cost: 0.324155134
Validation Error: 0.08319997787475586
Epoch: 0004 cost: 0.284075277
Validation Error: 0.07239997386932373
Epoch: 0005 cost: 0.257920302
Validation Error: 0.0690000057220459
Epoch: 0006 cost: 0.238601945
Validation Error: 0.06319999694824219
Epoch: 0007 cost: 0.222525908
Validation Error: 0.059800028800964355
Epoch: 0008 cost: 0.208743082
Validation Error: 0.05699998140335083
Epoch: 0009 cost: 0.196968288
Validation Error: 0.05320000648498535
Epoch: 0010 cost: 0.186194047
Validation Error: 0.05419999361038208
Epoch: 0011 cost: 0.176901473
Validation Error: 0.0493999719619751
Epoch: 0012 cost: 0.168219248
Validation Error: 0.04780000448226929
Epoch: 0013 cost: 0.160226422
Validation Error: 0.045400023460388184
Epoch: 0014 cost: 0.153010859
Validation Error: 0.04339998960494995
Epoch: 0015 cost: 0.146306994
Validation Error: 

Epoch: 0121 cost: 0.013370951
Validation Error: 0.01940000057220459
Epoch: 0122 cost: 0.013135655
Validation Error: 0.018999993801116943
Epoch: 0123 cost: 0.012975959
Validation Error: 0.018000006675720215
Epoch: 0124 cost: 0.012741968
Validation Error: 0.01759999990463257
Epoch: 0125 cost: 0.012566599
Validation Error: 0.018800020217895508
Epoch: 0126 cost: 0.012395765
Validation Error: 0.01819998025894165
Epoch: 0127 cost: 0.012215430
Validation Error: 0.018599987030029297
Epoch: 0128 cost: 0.012006933
Validation Error: 0.01759999990463257
Epoch: 0129 cost: 0.011852338
Validation Error: 0.01819998025894165
Epoch: 0130 cost: 0.011645928
Validation Error: 0.017799973487854004
Epoch: 0131 cost: 0.011500965
Validation Error: 0.01819998025894165
Epoch: 0132 cost: 0.011368062
Validation Error: 0.01840001344680786
Epoch: 0133 cost: 0.011180046
Validation Error: 0.018599987030029297
Epoch: 0134 cost: 0.011021952
Validation Error: 0.01840001344680786
Epoch: 0135 cost: 0.010884800
Validation E

Epoch: 0241 cost: 0.003757570
Validation Error: 0.01819998025894165
Epoch: 0242 cost: 0.003730570
Validation Error: 0.01840001344680786
Epoch: 0243 cost: 0.003709705
Validation Error: 0.018000006675720215
Epoch: 0244 cost: 0.003682990
Validation Error: 0.018000006675720215
Epoch: 0245 cost: 0.003662317
Validation Error: 0.01819998025894165
Epoch: 0246 cost: 0.003620372
Validation Error: 0.018599987030029297
Epoch: 0247 cost: 0.003605607
Validation Error: 0.01840001344680786
Epoch: 0248 cost: 0.003583322
Validation Error: 0.01819998025894165
Epoch: 0249 cost: 0.003557839
Validation Error: 0.01840001344680786
Epoch: 0250 cost: 0.003533291
Validation Error: 0.01840001344680786
Epoch: 0251 cost: 0.003502273
Validation Error: 0.01819998025894165
Epoch: 0252 cost: 0.003489226
Validation Error: 0.018000006675720215
Epoch: 0253 cost: 0.003467938
Validation Error: 0.017799973487854004
Epoch: 0254 cost: 0.003444141
Validation Error: 0.01840001344680786
Epoch: 0255 cost: 0.003419023
Validation Er