The goal of this assignment is to explore regularization techniques. The original notebook can be found here

## Import libraries¶

```
# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
from __future__ import print_function
from tqdm import tnrange
import numpy as np
import tensorflow as tf
from six.moves import cPickle as pickle
```

## Load NotMNIST dataset¶

First reload the data we generated in `1_notmnist.ipynb`

.

```
pickle_file = 'datasets/notMNIST.pickle'
with open(pickle_file, 'rb') as f:
save = pickle.load(f)
X_train = save['train_dataset']
Y_train = save['train_labels']
X_valid = save['valid_dataset']
Y_valid = save['valid_labels']
X_test = save['test_dataset']
Y_test = save['test_labels']
del save # hint to help gc free up memory
print('Training set', X_train.shape, Y_train.shape)
print('Validation set', X_valid.shape, Y_valid.shape)
print('Test set', X_test.shape, Y_test.shape)
```

## Reformat dataset¶

Reformat into a shape that's more adapted to the models we're going to train:

- data as a flat matrix,
- labels as float 1-hot encodings.

As I did in previous notebook, this reformat operation will be different from the operation suggested by the original notebook.

```
image_size = 28
num_labels = 10
def reformat(dataset, labels):
dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32).T
# Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
labels = (np.arange(num_labels) == labels[:, None]).astype(np.float32).T
return dataset, labels
X_train, Y_train = reformat(X_train, Y_train)
X_valid, Y_valid = reformat(X_valid, Y_valid)
X_test, Y_test = reformat(X_test, Y_test)
print('Training set', X_train.shape, Y_train.shape)
print('Validation set', X_valid.shape, Y_valid.shape)
print('Test set', X_test.shape, X_test.shape)
```

### Using Accuracy as Default Metric¶

Because as we explored before, there exist no unbalanced problem in the dataset,

so accuracy alone will be sufficient for evaluating performance of our model on the classification task.

```
def accuracy(predictions, labels):
return (
np.sum(np.argmax(predictions, axis=0) == np.argmax(
labels, axis=0)) / labels.shape[1] * 100)
```

## 3-layer NN as base model¶

In order to test the effect with/without regularization, we will use a little more complex neural network with 2 hidden layers as our base model. And we will be using ReLU as our activation function.

### Hyper parameters¶

```
# hyper parameters
learning_rate = 1e-2
lamba = 1e-3
keep_prob = 0.5
batch_size = 128
num_steps = 501
n0 = image_size * image_size # input size
n1 = 1024 # first hidden layer
n2 = 512 # second hidden layer
n3 = 256 # third hidden layer
n4 = num_labels # output size
```

### Build model¶

```
# build a model which let us able to choose different optimzation mechnism
def model(lamba=0, learning_rate=learning_rate,
keep_prob=1, learning_decay=False,
batch_size=batch_size, num_steps=num_steps, n1=n1, n2=n2, n3=n3):
print(
"""
Train 3-layer NN with following settings:
Regularization lambda: {}
Learning rate: {}
learning_decay: {}
keep_prob: {}
Batch_size: {}
Number of steps: {}
n1, n2, n3: {}, {}, {}""".format(lamba, learning_rate,
learning_decay, keep_prob,
batch_size, num_steps, n1, n2, n3))
# construct computation graph
graph = tf.Graph()
with graph.as_default():
# placeholder for mini-batch when training
X = tf.placeholder(tf.float32, shape=(n0, batch_size))
Y = tf.placeholder(tf.float32, shape=(num_labels, batch_size))
global_step = tf.Variable(0)
# use all valid/test set
tf_X_valid = tf.constant(X_valid)
tf_X_test = tf.constant(X_test)
# initialize weights, biases
# notice that we have two hidden
# layers so we now have W1, b1, W2, b2, W3, b3
W1 = tf.Variable(tf.truncated_normal([n1, n0], stddev=np.sqrt(2.0 / n0)))
W2 = tf.Variable(tf.truncated_normal([n2, n1], stddev=np.sqrt(2.0 / n1)))
W3 = tf.Variable(tf.truncated_normal([n3, n2], stddev=np.sqrt(2.0 / n2)))
W4 = tf.Variable(tf.truncated_normal([n4, n3], stddev=np.sqrt(2.0 / n3)))
b1 = tf.Variable(tf.zeros([n1, 1]))
b2 = tf.Variable(tf.zeros([n2, 1]))
b3 = tf.Variable(tf.zeros([n3, 1]))
b4 = tf.Variable(tf.zeros([n4, 1]))
# training computation
Z1 = tf.matmul(W1, X) + b1
A1 = tf.nn.relu(Z1) if keep_prob == 1 else tf.nn.dropout(tf.nn.relu(Z1), keep_prob)
Z2 = tf.matmul(W2, A1) + b2
A2 = tf.nn.relu(Z2) if keep_prob == 1 else tf.nn.dropout(tf.nn.relu(Z2), keep_prob)
Z3 = tf.matmul(W3, A2) + b3
A3 = tf.nn.relu(Z3) if keep_prob == 1 else tf.nn.dropout(tf.nn.relu(Z3), keep_prob)
Z4 = tf.matmul(W4, A3) + b4
loss = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(
labels=tf.transpose(Y), logits=tf.transpose(Z4)))
if lamba:
loss += lamba * \
(tf.nn.l2_loss(W1) + tf.nn.l2_loss(W2) + tf.nn.l2_loss(W3) + tf.nn.l2_loss(W4))
# optimizer
if learning_decay:
learning_rate = tf.train.exponential_decay(0.5, global_step, 5000, 0.80, staircase=True)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
else:
optimizer = (tf.train
.GradientDescentOptimizer(learning_rate).minimize(loss))
# valid / test prediction
Y_pred = tf.nn.softmax(Z4, dim=0)
Y_vaild_pred = tf.nn.softmax(
tf.matmul(W4, tf.nn.relu(
tf.matmul(W3, tf.nn.relu(
tf.matmul(W2, tf.nn.relu(
tf.matmul(W1, tf_X_valid) + b1)) + b2)) + b3)) + b4, dim=0)
Y_test_pred = tf.nn.softmax(
tf.matmul(W4, tf.nn.relu(
tf.matmul(W3, tf.nn.relu(
tf.matmul(W2, tf.nn.relu(
tf.matmul(W1, tf_X_test) + b1)) + b2)) + b3)) + b4, dim=0)
# define training
with tf.Session(graph=graph) as sess:
# initialized parameters
tf.global_variables_initializer().run()
print("Initialized")
for step in tnrange(num_steps):
# generate randomized mini-batches from training data
offset = (step * batch_size) % (Y_train.shape[1] - batch_size)
batch_X = X_train[:, offset:(offset + batch_size)]
batch_Y = Y_train[:, offset:(offset + batch_size)]
# train model
_, l, batch_Y_pred = sess.run(
[optimizer, loss, Y_pred], feed_dict={X: batch_X, Y: batch_Y})
if (step % 200 == 0):
print('Minibatch loss at step {}: {:.3f}. batch acc: {:.1f}%, Valid acc: {:.1f}%.'\
.format(step, l,
accuracy(batch_Y_pred, batch_Y),
accuracy(Y_vaild_pred.eval(), Y_valid)))
print('Test acc: {:.1f}%'.format(accuracy(Y_test_pred.eval(), Y_test)))
```

### Train model without regularization¶

```
model(learning_rate=0.5, num_steps=1601)
```

## L2 regularization¶

Introduce and tune L2 regularization for the models. Remember that L2 amounts to adding a penalty on the norm of the weights to the loss. In TensorFlow, you can compute the L2 loss for a tensor `t`

using `nn.l2_loss(t)`

. The right amount of regularization should improve your validation / test accuracy.

```
# for lamda in [1 / 10 ** i for i in list(np.arange(1, 4))]:
# model(lamba=lamda)
model(lamba=0.1, learning_rate=0.01)
```

## Case of overfitting¶

Let's demonstrate an extreme case of overfitting. Restrict your training data to just a few batches. What happens?

```
model(num_steps=10)
```

## Dropout¶

Introduce Dropout on the hidden layer of the neural network. Remember: Dropout should only be introduced during training, not evaluation, otherwise your evaluation results would be stochastic as well. TensorFlow provides `nn.dropout()`

for that, but you have to make sure it's only inserted during training.

What happens to our extreme overfitting case?

```
model(num_steps=10, keep_prob=0.5)
```

## Boost performance by using Multi-layer NN¶

Try to get the best performance you can using a multi-layer model! The best reported test accuracy using a deep network is 97.1%.

One avenue you can explore is to add multiple layers.

Another one is to use learning rate decay:

```
global_step = tf.Variable(0) # count the number of steps taken.
learning_rate = tf.train.exponential_decay(0.5, global_step, ...)
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step=global_step)
```

```
model(learning_decay=True, num_steps=1501, lamba=0, keep_prob=1)
```

```
```

跟資料科學相關的最新文章直接送到家。

只要加入訂閱名單，當新文章出爐時，

你將能馬上收到通知