day 17 more deep learning

20 minute read

Deep Learning

Neural Network

Scienctists observed the neural network in our brains anad use perceptron as a model to create an Artificial Neural Network.

MNIST REvisted

With MNIST deep learning framework, we can classify image data with 99% accuracy.

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

# MNIST 데이터를 로드. 다운로드하지 않았다면 다운로드까지 자동으로 진행됩니다. 
mnist = keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()   

# 모델에 맞게 데이터 가공
x_train_norm, x_test_norm = x_train / 255.0, x_test / 255.0
x_train_reshaped = x_train_norm.reshape(-1, x_train_norm.shape[1]*x_train_norm.shape[2])
x_test_reshaped = x_test_norm.reshape(-1, x_test_norm.shape[1]*x_test_norm.shape[2])

# 딥러닝 모델 구성 - 2 Layer Perceptron
model=keras.models.Sequential()
model.add(keras.layers.Dense(50, activation='sigmoid', input_shape=(784,)))  # 입력층 d=784, 은닉층 레이어 H=50
model.add(keras.layers.Dense(10, activation='softmax'))   # 출력층 레이어 K=10
model.summary()

# 모델 구성과 학습
model.compile(optimizer='adam',
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])
model.fit(x_train_reshaped, y_train, epochs=10)

# 모델 테스트 결과
test_loss, test_accuracy = model.evaluate(x_test_reshaped,y_test, verbose=2)
print("test_loss: {} ".format(test_loss))
print("test_accuracy: {}".format(test_accuracy))

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 50)                39250     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                510       
=================================================================
Total params: 39,760
Trainable params: 39,760
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.8451 - accuracy: 0.7986
Epoch 2/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.2464 - accuracy: 0.9305
Epoch 3/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1860 - accuracy: 0.9481
Epoch 4/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1517 - accuracy: 0.9573
Epoch 5/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.1310 - accuracy: 0.9627
Epoch 6/10
1875/1875 [==============================] - 3s 2ms/step - loss: 0.1119 - accuracy: 0.9688
Epoch 7/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0989 - accuracy: 0.9723
Epoch 8/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0919 - accuracy: 0.9749
Epoch 9/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0818 - accuracy: 0.9776
Epoch 10/10
1875/1875 [==============================] - 3s 1ms/step - loss: 0.0721 - accuracy: 0.9806
313/313 - 0s - loss: 0.1025 - accuracy: 0.9708
test_loss: 0.10251332074403763 
test_accuracy: 0.97079998254776

However, today we will create a simpler neural network called Multi-Layer Perceptron; MLP using Numpy.

Multi-Layer Perceptron; MLP

The 3 layers show a perceptron. It is identical to the above example code. 은닉충(Hidden Layer) has H number of nodes, 출력층(Output Later) has K number of nodes.

+1 means a bias and does not have any connection with previous layers. A bias value allows you to shift the activation function to the left or right, which may be critical for successful learning.

The above code is defined with H=50,k=10 and d=784.

The number of layers is 2, since there are two connections for the nodes.

In neural networks, networks with two or more layers are called multi-layered perceptron, and if the number of hidden layer gets larger, we say the neural network became deep or refered as DNN(Deep Neural Network)

Fully Connected Neural Network means nodes in different layers are not connected but are only connected to nodes in other layers in close proximity.

Parameters/Weights

If there are 100 input layers, and 20 hidden layers, there is a 100x20 matrix between the two layers. If there are 10 output layers, then there is a 20x10 matrix.

These matrix are called parameter or weight.

Layers next to each other have the following relationship.

\[y=W⋅X+b\]

if we recreate the MLP deep learning model with Numpy, below happens

# 입력층 데이터의 모양(shape)
print(x_train_reshaped.shape)

# 테스트를 위해 x_train_reshaped의 앞 5개의 데이터를 가져온다.
X = x_train_reshaped[:5]
print(X.shape)
weight_init_std = 0.1
input_size = 784
hidden_size=50

# 인접 레이어간 관계를 나타내는 파라미터 W를 생성하고 random 초기화
W1 = weight_init_std * np.random.randn(input_size, hidden_size)  
# 바이어스 파라미터 b를 생성하고 Zero로 초기화
b1 = np.zeros(hidden_size)

a1 = np.dot(X, W1) + b1   # 은닉층 출력

print(W1.shape)
print(b1.shape)
print(a1.shape)
# 첫 번째 데이터의 은닉층 출력을 확인해 봅시다.  50dim의 벡터가 나오나요?
a1[0]

(60000, 784)
(5, 784)
(784, 50)
(50,)
(5, 50)





array([ 0.77876534, -0.26404503,  1.0587616 , -0.3312955 ,  0.09      ,
       -0.35105135, -1.3575146 , -1.36718134,  0.43115337,  0.59146081,
        0.00951019,  0.23436203, -0.0841257 ,  0.34637381,  0.90672424,
       -1.24046068, -0.78228776,  0.02901161,  0.88157429,  0.93928923,
       -0.40195086, -1.44413499, -0.33465078, -2.91534912,  0.53624765,
       -0.92317463,  0.36655854,  1.24648682,  1.00008446,  0.4737703 ,
        2.37733936,  1.47601797,  0.22530449, -0.53381735,  0.53166893,
        0.17096541, -0.22337705, -0.4346925 ,  0.35928353,  0.11571586,
        0.91221893, -0.1596964 ,  0.03699103,  1.04748736, -1.04258888,
       -0.47653046,  0.06307699, -0.48450634,  1.78438264, -0.71859584])

Activation Functions and Loss Functions

Activation Functions

In deep learning, the presense of activation functions are a necessity. Activation functions are non linear and this makes the model more accurate.

Sigmoid \(σ(x)=1/(1+e^{-x})\)

model.add(keras.layers.Dense(50, activation='sigmoid', input_shape=(784,)))

we used sigmoid for the activation function previously.

# 위 수식의 sigmoid 함수를 구현해 봅니다.
def sigmoid(x):
    return 1 / (1 + np.exp(-x))  


z1 = sigmoid(a1)
print(z1[0])  # sigmoid의 출력은 모든 element가 0에서 1사이

[0.68541395 0.43436961 0.74245381 0.41792544 0.52248483 0.4131275
20464454 0.20307562 0.60614905 0.64370025 0.50237753 0.5583238
47898097 0.58573796 0.71232937 0.22435581 0.31382703 0.50725239
70714835 0.71895606 0.40084372 0.19090584 0.41710945 0.0514
6309391  0.28431148 0.59062714 0.77669112 0.73107518 0.61627575
91508291 0.81397036 0.55608906 0.36962699 0.62987228 0.54263755
44438679 0.39300636 0.58886699 0.52889673 0.71345401 0.46016053
5092467  0.74029211 0.26065078 0.38307174 0.51576402 0.38118858
85623719 0.32770226]

Traditionally, sigmoid was used most. Now ReLU function is used the most due to the following reasons

vanishing gradient in sigmoid
high cost of using exp function

Tanh

\[tanh(x)=(e^x−e^{−x})/(e^x+e^{−x})\]

Tanh moves the center value to 0 and solves the problem of slow optimization, but there still exists the problem of the vanishing gradient

ReLU

\[f(x)=max(0,x)\]

faster learning than sigmoid or tahn
lower calculation cost and easier to implement

# 단일 레이어 구현 함수
def affine_layer_forward(X, W, b):
    y = np.dot(X, W) + b
    cache = (X, W, b)
    return y, cache

print('go~')

go~

input_size = 784
hidden_size = 50
output_size = 10

W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

a1, cache1 = affine_layer_forward(X, W1, b1)
z1 = sigmoid(a1)
a2, cache2 = affine_layer_forward(z1, W2, b2)    # z1이 다시 두번째 레이어의 입력이 됩니다. 

print(a2[0])  # 최종 출력이 output_size만큼의 벡터가 되었습니다.

[ 0.08290471  0.92026821  0.24812314  0.08186972  0.30907202  0.54244163
 -0.48377995  0.08508334 -0.5240482   0.27622424]

def softmax(x):
    if x.ndim == 2:
        x = x.T
        x = x - np.max(x, axis=0)
        y = np.exp(x) / np.sum(np.exp(x), axis=0)
        return y.T 

    x = x - np.max(x) # 오버플로 대책
    return np.exp(x) / np.sum(np.exp(x))

y_hat = softmax(a2)
y_hat[0]  # 10개의 숫자 중 하나일 확률이 되었습니다.

array([0.08580837, 0.19824034, 0.10122391, 0.08571961, 0.10758529,
       0.13586388, 0.04868797, 0.08599552, 0.04676634, 0.10410876])

Loss Function

the information is relayed to the output layer through the hidden layers with non linear activation functions. Here, we try to minimize the calculated difference betweeen the answer we hope for and the relayed information. The difference is called Loss function or Cost Function.

교차 엔트로피(Cross Entropy)

Today we will be using cross entropy. Cross Entropy is reduced when the probability distribution similarity increased. The more the model learns, the closer the answer is close to ˆy.

# 정답 라벨을 One-hot 인코딩하는 함수
def _change_ont_hot_label(X, num_category):
    T = np.zeros((X.size, num_category))
    for idx, row in enumerate(T):
        row[X[idx]] = 1
        
    return T

Y_digit = y_train[:5]
t = _change_ont_hot_label(Y_digit, 10)
t     # 정답 라벨의 One-hot 인코딩

array([[0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])

we hope that our model’s final output softmax(a2) and answer label’s one-hot encoding’s distribution becomes similar. For now, it is now.

print(y_hat[0])
print(t[0])

[0.08580837 0.19824034 0.10122391 0.08571961 0.10758529 0.13586388
 0.04868797 0.08599552 0.04676634 0.10410876]
[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]

def cross_entropy_error(y, t):
    if y.ndim == 1:
        t = t.reshape(1, t.size)
        y = y.reshape(1, y.size)
        
    # 훈련 데이터가 원-핫 벡터라면 정답 레이블의 인덱스로 반환
    if t.size == y.size:
        t = t.argmax(axis=1)
             
    batch_size = y.shape[0]
    return -np.sum(np.log(y[np.arange(batch_size), t])) / batch_size

Loss = cross_entropy_error(y_hat, t)
Loss

2.2055431327016684

경사하강법

Now we calculated the error, our goal is to minimize it.

Gradient Descent lets us move the slope at each step.

We implement the idea of learning rate to only move along the slope as much as the value of of the slope multiplied by the learning rate.

We calculate the weight initialization

batch_num = y_hat.shape[0]
dy = (y_hat - t) / batch_num
dy    # softmax값의 출력으로 Loss를 미분한 값

array([[ 0.01716167,  0.03964807,  0.02024478,  0.01714392,  0.02151706,
        -0.17282722,  0.00973759,  0.0171991 ,  0.00935327,  0.02082175],
       [-0.18539873,  0.04093421,  0.02077606,  0.01862671,  0.021631  ,
         0.02594169,  0.00926761,  0.01645019,  0.01108381,  0.02068746],
       [ 0.0168999 ,  0.037318  ,  0.02075345,  0.02047094, -0.1759752 ,
         0.02466792,  0.01115949,  0.01751881,  0.01056171,  0.01662498],
       [ 0.01557329, -0.17035937,  0.02484548,  0.0167078 ,  0.0278854 ,
         0.02544151,  0.01072687,  0.02023737,  0.01081457,  0.01812708],
       [ 0.01624499,  0.04328141,  0.02033156,  0.01838969,  0.01958535,
         0.02564359,  0.01071604,  0.01667989,  0.01072816, -0.18160067]])

dW2 = np.dot(z1.T, dy)    
dW2

array([[-0.0087704 ,  0.02151481,  0.03283362,  0.02879794, -0.05165479,
        -0.04367198,  0.01626062,  0.02726795,  0.01617291, -0.03875068],
       [-0.01671251, -0.01019857,  0.03125051,  0.0265803 , -0.03218333,
        -0.01929739,  0.01516797,  0.02580006,  0.01526043, -0.03566746],
       [-0.12626712,  0.06589524,  0.05917199,  0.05173953, -0.01737465,
        -0.09530792,  0.02849448,  0.04869652,  0.02951682, -0.04456489],
       [-0.0501188 , -0.01415211,  0.05219379,  0.04446389, -0.06712117,
        -0.05555967,  0.02509608,  0.04313115,  0.02536012, -0.00329327],
       [-0.0590854 ,  0.01762426,  0.03496423,  0.03060376, -0.04241697,
        -0.00747096,  0.01701468,  0.02874038,  0.01748158, -0.03745555],
       [-0.10524715,  0.05562839,  0.06458438,  0.0561122 , -0.0091349 ,
        -0.07787407,  0.03130227,  0.05313233,  0.03221294, -0.10071639],
       [-0.06818816, -0.00569284,  0.06479995,  0.05517566, -0.04623775,
        -0.06375769,  0.03119897,  0.05342983,  0.03170304, -0.05243101],
       [-0.04646031, -0.02007751,  0.05702209,  0.04822336, -0.0343237 ,
        -0.03524371,  0.02749448,  0.04696817,  0.02788089, -0.07148376],
       [-0.1092808 , -0.02809757,  0.04076627,  0.03387356,  0.00367565,
        -0.01707791,  0.01888954,  0.0332529 ,  0.01994858,  0.00404978],
       [-0.05457551, -0.01180297,  0.03043755,  0.02587294, -0.02862433,
        -0.00568166,  0.01449774,  0.02496988,  0.0149636 , -0.01005724],
       [-0.02573734,  0.00456476,  0.04351485,  0.0379773 , -0.07493801,
         0.01022458,  0.0215082 ,  0.03589437,  0.02163671, -0.07464541],
       [-0.10211601,  0.0331065 ,  0.05952131,  0.05166498, -0.05073715,
        -0.06990971,  0.02869188,  0.04904125,  0.0294525 , -0.02871555],
       [-0.07490142,  0.05858593,  0.06601437,  0.05816973, -0.07808721,
        -0.06083812,  0.03246844,  0.05452138,  0.03293294, -0.08886604],
       [ 0.00132975, -0.06010498,  0.03224505,  0.02649465, -0.04695728,
         0.01657227,  0.01547771,  0.02656909,  0.01544537, -0.02707164],
       [-0.00290772, -0.08343838,  0.05250822,  0.04214486,  0.00654395,
        -0.07936847,  0.02470242,  0.04337036,  0.02473772, -0.02829297],
       [-0.04172349, -0.06047071,  0.05064811,  0.04224176, -0.06634163,
        -0.0028673 ,  0.02416151,  0.04169446,  0.02446112, -0.01180381],
       [-0.06760704, -0.03691689,  0.0562881 ,  0.04720074, -0.03399209,
        -0.02730125,  0.0268215 ,  0.0462672 ,  0.02741375, -0.03817401],
       [-0.06734878,  0.0067471 ,  0.04972555,  0.04250293, -0.024966  ,
        -0.05171998,  0.02390476,  0.04094567,  0.02445058, -0.04424183],
       [-0.08780567, -0.00800213,  0.06735207,  0.05794205, -0.09227274,
        -0.0237772 ,  0.03252839,  0.05546697,  0.03315131, -0.03458305],
       [-0.05856239, -0.01878232,  0.03200719,  0.02700299, -0.02118527,
         0.00231423,  0.01519441,  0.0262029 ,  0.01573974, -0.01993146],
       [-0.0366746 ,  0.02855438,  0.04235513,  0.03700121, -0.03816113,
        -0.03515113,  0.02083112,  0.03496705,  0.02107377, -0.0747958 ],
       [-0.10083678,  0.05252122,  0.0696986 ,  0.06127045, -0.07985238,
        -0.04100169,  0.03410885,  0.05741869,  0.03484864, -0.0881756 ],
       [ 0.01230181, -0.0364026 ,  0.03183262,  0.02621166, -0.01288382,
        -0.02310288,  0.01537552,  0.02631013,  0.01526717, -0.0549096 ],
       [-0.02479839,  0.01039202,  0.04304122,  0.03732313, -0.05307137,
        -0.02518468,  0.02114769,  0.03556234,  0.02126288, -0.06567485],
       [-0.06750938,  0.01605131,  0.04288479,  0.03747123, -0.06787493,
        -0.02360512,  0.02081038,  0.03534505,  0.02126206, -0.01483539],
       [-0.06298961,  0.02163471,  0.05648919,  0.04937945, -0.07722306,
        -0.0091172 ,  0.02770992,  0.04653708,  0.02815518, -0.08057567],
       [-0.03335223,  0.00666199,  0.02704515,  0.0234378 , -0.03406125,
        -0.01936937,  0.01312984,  0.02230644,  0.0133512 , -0.01914957],
       [-0.09268417, -0.03924119,  0.06513885,  0.05455855, -0.04441361,
        -0.06844906,  0.03079718,  0.05362559,  0.03152172,  0.00914614],
       [-0.04325866,  0.0365687 ,  0.05518329,  0.04883283, -0.09881565,
        -0.01435063,  0.02738778,  0.04559926,  0.02757459, -0.0847215 ],
       [-0.04278936, -0.0477209 ,  0.05383369,  0.04487736, -0.03360932,
        -0.01977273,  0.02573264,  0.04428944,  0.02611543, -0.05095624],
       [-0.01547073, -0.00922416,  0.04513256,  0.03848104, -0.0453554 ,
        -0.03517656,  0.02199824,  0.03730875,  0.02204269, -0.05973644],
       [ 0.00146555, -0.04176537,  0.02629871,  0.02130144, -0.00154095,
        -0.01255617,  0.01252721,  0.0216611 ,  0.01255805, -0.03994957],
       [-0.10453434, -0.01362696,  0.07939776,  0.06730271, -0.02451201,
        -0.04398817,  0.03809454,  0.06521785,  0.03906722, -0.1024186 ],
       [-0.07729054, -0.03566251,  0.05628412,  0.04727086, -0.04424953,
        -0.03739433,  0.02672318,  0.04629248,  0.02734972, -0.00932344],
       [-0.07973615, -0.02099934,  0.06422812,  0.05502783, -0.08595158,
         0.00212552,  0.03102414,  0.05281802,  0.03164024, -0.05017679],
       [-0.04564011, -0.01555194,  0.06410737,  0.05493145, -0.0755266 ,
        -0.00592705,  0.0312459 ,  0.0528195 ,  0.03157342, -0.09203194],
       [-0.10583464,  0.03749885,  0.06573   ,  0.05712324, -0.03914536,
        -0.03882709,  0.0318999 ,  0.05402014,  0.03280644, -0.09527148],
       [-0.09078278,  0.01555964,  0.04744462,  0.04126338, -0.05999333,
        -0.02346668,  0.02285859,  0.03900905,  0.02355387, -0.01544637],
       [-0.11499188, -0.01484553,  0.08607548,  0.0730962 , -0.0501275 ,
        -0.06469184,  0.0412424 ,  0.07080816,  0.04220517, -0.06877066],
       [-0.08003608,  0.00692929,  0.04374409,  0.03766249, -0.04302017,
        -0.03592405,  0.02094556,  0.03598447,  0.02156468, -0.00785028],
       [-0.06506398, -0.05439849,  0.06722746,  0.05585902, -0.02913479,
        -0.0601022 ,  0.03191328,  0.05535195,  0.03246332, -0.03411558],
       [-0.09069541, -0.01874084,  0.07097526,  0.06030292, -0.04906332,
        -0.03085677,  0.0340878 ,  0.05834618,  0.03486677, -0.06922259],
       [-0.11333197,  0.02134876,  0.05328582,  0.04568863,  0.00049127,
        -0.04470962,  0.02543414,  0.04366572,  0.02648887, -0.05836161],
       [ 0.00619282, -0.02502087,  0.04243515,  0.0364384 , -0.0953646 ,
        -0.00292906,  0.02090066,  0.03517098,  0.02065736, -0.03848085],
       [-0.02359421,  0.00197064,  0.05910832,  0.05072033, -0.05412248,
        -0.0429572 ,  0.02893405,  0.04883837,  0.02904229, -0.09794012],
       [-0.05143172, -0.00826163,  0.03604221,  0.03064888, -0.03421429,
        -0.03314572,  0.01721398,  0.02968984,  0.01759375, -0.0041353 ],
       [-0.08234597, -0.06066986,  0.05025012,  0.04098794,  0.02625945,
        -0.00417512,  0.02348603,  0.04101995,  0.02442987, -0.05924241],
       [-0.05702871, -0.05596953,  0.04676085,  0.03847635, -0.0191159 ,
        -0.0230642 ,  0.02199842,  0.03839584,  0.02252944, -0.01298254],
       [-0.0115076 ,  0.02102698,  0.02727841,  0.02381803, -0.0313888 ,
        -0.05047058,  0.01342561,  0.02265772,  0.01339725, -0.02823703],
       [-0.06289521, -0.02496264,  0.05395556,  0.04562706, -0.03082702,
        -0.00134677,  0.02594326,  0.04428324,  0.02654714, -0.07632463]])

dW2 = np.dot(z1.T, dy)
db2 = np.sum(dy, axis=0)

def sigmoid_grad(x):
    return (1.0 - sigmoid(x)) * sigmoid(x)

dz1 = np.dot(dy, W2.T)
da1 = sigmoid_grad(a1) * dz1
dW1 = np.dot(X.T, da1)
db1 = np.sum(dz1, axis=0)

learning_rate = 0.1

def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate):
    W1 = W1 - learning_rate*dW1
    b1 = b1 - learning_rate*db1
    W2 = W2 - learning_rate*dW2
    b2 = b2 - learning_rate*db2
    return W1, b1, W2, b2

Backpropagation

We get the target difference and we update the previous nodes

def affine_layer_backward(dy, cache):
    X, W, b = cache
    dX = np.dot(dy, W.T)
    dW = np.dot(X.T, dy)
    db = np.sum(dy, axis=0)
    return dX, dW, db

# 파라미터 초기화
W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

# Forward Propagation
a1, cache1 = affine_layer_forward(X, W1, b1)
z1 = sigmoid(a1)
a2, cache2 = affine_layer_forward(z1, W2, b2)

# 추론과 오차(Loss) 계산
y_hat = softmax(a2)
t = _change_ont_hot_label(Y_digit, 10)   # 정답 One-hot 인코딩
Loss = cross_entropy_error(y_hat, t)

print(y_hat)
print(t)
print('Loss: ', Loss)
        
dy = (y_hat - t) / X.shape[0]
dz1, dW2, db2 = affine_layer_backward(dy, cache2)
da1 = sigmoid_grad(a1) * dz1
dX, dW1, db1 = affine_layer_backward(da1, cache1)

# 경사하강법을 통한 파라미터 업데이트    
learning_rate = 0.1
W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)

[[0.07659182 0.11714822 0.07840599 0.07923031 0.11888569 0.0824821
  0.08180805 0.09014648 0.07119801 0.20410334]
 [0.09812083 0.13428559 0.0632928  0.07899265 0.09852886 0.06622592
  0.09027599 0.07093442 0.08169125 0.21765168]
 [0.08669859 0.10090836 0.1044299  0.09298888 0.12033938 0.09010087
  0.08500331 0.09918036 0.07538254 0.14496782]
 [0.0685417  0.12360911 0.07430658 0.06698724 0.1015394  0.06093044
  0.1024918  0.08919032 0.07848571 0.23391767]
 [0.08928762 0.10486563 0.08992052 0.08336552 0.0942107  0.08678042
  0.07822181 0.08012738 0.0841347  0.2090857 ]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  2.1179622069193416

Model

W1 = weight_init_std * np.random.randn(input_size, hidden_size)
b1 = np.zeros(hidden_size)
W2 = weight_init_std * np.random.randn(hidden_size, output_size)
b2 = np.zeros(output_size)

def train_step(X, Y, W1, b1, W2, b2, learning_rate=0.1, verbose=False):
    a1, cache1 = affine_layer_forward(X, W1, b1)
    z1 = sigmoid(a1)
    a2, cache2 = affine_layer_forward(z1, W2, b2)
    y_hat = softmax(a2)
    t = _change_ont_hot_label(Y, 10)
    Loss = cross_entropy_error(y_hat, t)

    if verbose:
        print('---------')
        print(y_hat)
        print(t)
        print('Loss: ', Loss)
        
    dy = (y_hat - t) / X.shape[0]
    dz1, dW2, db2 = affine_layer_backward(dy, cache2)
    da1 = sigmoid_grad(a1) * dz1
    dX, dW1, db1 = affine_layer_backward(da1, cache1)
    
    W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, learning_rate)
    
    return W1, b1, W2, b2, Loss

X = x_train_reshaped[:5]
Y = y_train[:5]

# train_step을 다섯 번 반복 돌립니다.
for i in range(5):
    W1, b1, W2, b2, _ = train_step(X, Y, W1, b1, W2, b2, learning_rate=0.1, verbose=True)

---------
[[0.09241339 0.10963326 0.10731715 0.05298096 0.0541422  0.14360273
  0.10872153 0.20848736 0.04338178 0.07931963]
 [0.07727564 0.11789564 0.09868143 0.06294007 0.06535471 0.13826136
  0.11867937 0.16886302 0.04751311 0.10453564]
 [0.06961914 0.10285867 0.14633505 0.05895999 0.05543531 0.12190046
  0.11462721 0.2169294  0.04808109 0.0652537 ]
 [0.07658489 0.09918    0.13042271 0.06196685 0.05522818 0.1421048
  0.1219421  0.17624199 0.05457652 0.08175197]
 [0.06813557 0.1231802  0.10184686 0.05400556 0.05220961 0.15266616
  0.12071466 0.18438725 0.05536374 0.08749039]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  2.428132969130341
---------
[[0.11059285 0.12435913 0.09185671 0.04981241 0.06632601 0.16481088
  0.09354834 0.16175944 0.04131636 0.09561787]
 [0.09499898 0.13100942 0.08433368 0.0585402  0.0783258  0.15062062
  0.10174485 0.13090765 0.04448681 0.12503201]
 [0.08426724 0.12002083 0.12846928 0.05660019 0.07439575 0.13567923
  0.10172586 0.17113133 0.04693154 0.08077873]
 [0.09097146 0.11704487 0.11327528 0.05823013 0.06829544 0.15664691
  0.10645868 0.1381334  0.05239476 0.09854909]
 [0.08232852 0.14022175 0.08628741 0.05020673 0.06474146 0.166875
  0.10358757 0.14141118 0.052426   0.11191438]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  2.21808426564194
---------
[[0.12602521 0.13422091 0.07843037 0.04579152 0.07743871 0.18054952
  0.08013239 0.12995638 0.03831603 0.10913898]
 [0.11169902 0.13937667 0.07226236 0.05350438 0.08992032 0.15753391
  0.08719997 0.10541969 0.04075731 0.14232638]
 [0.09656256 0.13262911 0.11215304 0.05294674 0.09460875 0.14341626
  0.08956221 0.13945361 0.04448315 0.09418456]
 [0.10320412 0.13188181 0.09829385 0.0536509  0.08077933 0.16529178
  0.09268341 0.11212698 0.04915968 0.11292814]
 [0.09456542 0.1517466  0.07304152 0.0456657  0.07638232 0.17385435
  0.08856932 0.11262863 0.04835116 0.13519498]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  2.0577178483841623
---------
[[0.13899028 0.1405526  0.06735546 0.04172188 0.08745038 0.19277301
  0.06894204 0.10717466 0.03511074 0.11992894]
 [0.12740072 0.14439641 0.06243219 0.04859455 0.10016343 0.16111839
  0.07520962 0.08724501 0.03699161 0.15644807]
 [0.10651906 0.14155484 0.09817259 0.04891936 0.11572587 0.14707082
  0.07893754 0.11635469 0.04153112 0.10521412]
 [0.11340659 0.14432363 0.08577997 0.04902682 0.09251833 0.17009661
  0.08104509 0.09336031 0.04563468 0.12480798]
 [0.10493093 0.15916901 0.06223302 0.04116703 0.08698834 0.17629432
  0.07608696 0.09221669 0.04405581 0.15685788]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  1.9302606072444888
---------
[[0.14977186 0.14445821 0.0583395  0.03793066 0.09629526 0.20298697
  0.05976295 0.09023552 0.03203027 0.12818879]
 [0.14215595 0.14717513 0.05446688 0.04410163 0.10903249 0.1628354
  0.06541978 0.07373071 0.03347403 0.16760799]
 [0.1142878  0.14769435 0.08642576 0.04495482 0.13736939 0.1481522
  0.06988919 0.09887023 0.03848648 0.11386979]
 [0.12176171 0.15502661 0.07545179 0.04469794 0.1033437  0.17258304
  0.07136253 0.07928226 0.04218675 0.13430365]
 [0.11357387 0.16372585 0.0534957  0.03702277 0.09641975 0.17610778
  0.06586561 0.07714052 0.03995542 0.17669271]]
[[0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]
Loss:  1.8256054710935827

you can see that the loss decreases as yhat and t vaue becomes closer

calculating acurracy

def predict(W1, b1, W2, b2, X):
    a1 = np.dot(X, W1) + b1
    z1 = sigmoid(a1)
    a2 = np.dot(z1, W2) + b2
    y = softmax(a2)

    return y

# X = x_train[:100] 에 대해 모델 추론을 시도합니다. 
X = x_train_reshaped[:100]
Y = y_test[:100]
result = predict(W1, b1, W2, b2, X)
result[0]

array([0.15864957, 0.14675823, 0.05099315, 0.03452584, 0.10390606,
       0.21222145, 0.05224419, 0.07727098, 0.02920603, 0.13422451])

def accuracy(W1, b1, W2, b2, x, y):
    y_hat = predict(W1, b1, W2, b2, x)
    y_hat = np.argmax(y_hat, axis=1)

    accuracy = np.sum(y_hat == y) / float(x.shape[0])
    return accuracy

acc = accuracy(W1, b1, W2, b2, X, Y)

t = _change_ont_hot_label(Y, 10)
print(result[0])
print(t[0])
print(acc)

[0.15864957 0.14675823 0.05099315 0.03452584 0.10390606 0.21222145
 0.05224419 0.07727098 0.02920603 0.13422451]
[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
0.09

we can see that with 5 learning cycles, we only reach 10% accuracy.

entire learning cycle

def init_params(input_size, hidden_size, output_size, weight_init_std=0.01):

    W1 = weight_init_std * np.random.randn(input_size, hidden_size)
    b1 = np.zeros(hidden_size)
    W2 = weight_init_std * np.random.randn(hidden_size, output_size)
    b2 = np.zeros(output_size)

    print(W1.shape)
    print(b1.shape)
    print(W2.shape)
    print(b2.shape)
    
    return W1, b1, W2, b2

# 하이퍼파라미터
iters_num = 50000  # 반복 횟수를 적절히 설정한다.
train_size = x_train.shape[0]
batch_size = 100   # 미니배치 크기
learning_rate = 0.1

train_loss_list = []
train_acc_list = []
test_acc_list = []

# 1에폭당 반복 수
iter_per_epoch = max(train_size / batch_size, 1)

W1, b1, W2, b2 = init_params(784, 50, 10)

for i in range(iters_num):
    # 미니배치 획득
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train_reshaped[batch_mask]
    y_batch = y_train[batch_mask]
    
    W1, b1, W2, b2, Loss = train_step(x_batch, y_batch, W1, b1, W2, b2, learning_rate=0.1, verbose=False)

    # 학습 경과 기록
    train_loss_list.append(Loss)
    
    # 1에폭당 정확도 계산
    if i % iter_per_epoch == 0:
        print('Loss: ', Loss)
        train_acc = accuracy(W1, b1, W2, b2, x_train_reshaped, y_train)
        test_acc = accuracy(W1, b1, W2, b2, x_test_reshaped, y_test)
        train_acc_list.append(train_acc)
        test_acc_list.append(test_acc)
        print("train acc, test acc | " + str(train_acc) + ", " + str(test_acc))

(784, 50)
(50,)
(50, 10)
(10,)
Loss:  2.3073640537077202
train acc, test acc | 0.0993, 0.1032
Loss:  0.7630525541792824
train acc, test acc | 0.7936833333333333, 0.7966
Loss:  0.4665767142334493
train acc, test acc | 0.8756166666666667, 0.8808
Loss:  0.4064926710259513
train acc, test acc | 0.8978833333333334, 0.902
Loss:  0.35704740926022766
train acc, test acc | 0.9078, 0.9109
Loss:  0.236257774331139
train acc, test acc | 0.9129666666666667, 0.9159
Loss:  0.3256439325815769
train acc, test acc | 0.9181666666666667, 0.9198
Loss:  0.16467121544443203
train acc, test acc | 0.9228333333333333, 0.9231
Loss:  0.26826626797587433
train acc, test acc | 0.92405, 0.9255
Loss:  0.1372506036109581
train acc, test acc | 0.9288833333333333, 0.9284
Loss:  0.3216081152777485
train acc, test acc | 0.9315333333333333, 0.9324
Loss:  0.30630795850279424
train acc, test acc | 0.9345333333333333, 0.9328
Loss:  0.14893036613453334
train acc, test acc | 0.93675, 0.9366
Loss:  0.2507463541901329
train acc, test acc | 0.9389, 0.9375
Loss:  0.22538441089442138
train acc, test acc | 0.9414333333333333, 0.939
Loss:  0.08888824428445248
train acc, test acc | 0.9438666666666666, 0.9427
Loss:  0.12688944621851359
train acc, test acc | 0.94505, 0.9435
Loss:  0.23726666445409156
train acc, test acc | 0.9467666666666666, 0.9454
Loss:  0.2769398391608577
train acc, test acc | 0.9488166666666666, 0.9467
Loss:  0.19786140158039206
train acc, test acc | 0.9500333333333333, 0.949
Loss:  0.24511097940822615
train acc, test acc | 0.9515666666666667, 0.9497
Loss:  0.2563945606016873
train acc, test acc | 0.9530666666666666, 0.9509
Loss:  0.19114494113687386
train acc, test acc | 0.95485, 0.9521
Loss:  0.14604783330319843
train acc, test acc | 0.9557333333333333, 0.9541
Loss:  0.1436685440568118
train acc, test acc | 0.95685, 0.9543
Loss:  0.175248491056858
train acc, test acc | 0.9580333333333333, 0.9553
Loss:  0.09060562252228319
train acc, test acc | 0.9589, 0.9559
Loss:  0.09897968369633517
train acc, test acc | 0.9597333333333333, 0.9571
Loss:  0.10786787251513041
train acc, test acc | 0.96125, 0.9576
Loss:  0.10137171797295473
train acc, test acc | 0.9622666666666667, 0.9581
Loss:  0.25569400935688325
train acc, test acc | 0.9632833333333334, 0.9591
Loss:  0.11308516351938806
train acc, test acc | 0.9632833333333334, 0.9593
Loss:  0.1555407858576058
train acc, test acc | 0.9646666666666667, 0.96
Loss:  0.13854340985351535
train acc, test acc | 0.9654833333333334, 0.9599
Loss:  0.11091073997859244
train acc, test acc | 0.9662, 0.9621
Loss:  0.145482467504777
train acc, test acc | 0.9667166666666667, 0.9616
Loss:  0.14367753381286547
train acc, test acc | 0.9674666666666667, 0.9629
Loss:  0.1323613789185397
train acc, test acc | 0.9681666666666666, 0.9636
Loss:  0.06667830370131701
train acc, test acc | 0.9685166666666667, 0.964
Loss:  0.06896981481070984
train acc, test acc | 0.9695166666666667, 0.964
Loss:  0.1441483659634111
train acc, test acc | 0.96965, 0.9642
Loss:  0.12881705148969272
train acc, test acc | 0.9702, 0.9641
Loss:  0.22508186270314337
train acc, test acc | 0.97085, 0.9653
Loss:  0.14958397226183037
train acc, test acc | 0.9711666666666666, 0.9646
Loss:  0.05003447358535515
train acc, test acc | 0.9715, 0.9652
Loss:  0.14039404435917335
train acc, test acc | 0.9721333333333333, 0.9666
Loss:  0.06566342268375921
train acc, test acc | 0.9728, 0.9661
Loss:  0.09400113756536471
train acc, test acc | 0.9734833333333334, 0.9658
Loss:  0.10707517221642045
train acc, test acc | 0.97345, 0.9671
Loss:  0.09966235325212497
train acc, test acc | 0.9738833333333333, 0.9672
Loss:  0.08504435781539466
train acc, test acc | 0.9741666666666666, 0.9667
Loss:  0.05851020721191461
train acc, test acc | 0.9744166666666667, 0.9676
Loss:  0.13651832362212038
train acc, test acc | 0.9749, 0.9672
Loss:  0.07201667171698008
train acc, test acc | 0.9750666666666666, 0.9683
Loss:  0.04424765593654628
train acc, test acc | 0.9753833333333334, 0.9675
Loss:  0.04967927550795562
train acc, test acc | 0.9761166666666666, 0.9684
Loss:  0.1381144231055439
train acc, test acc | 0.9763833333333334, 0.9668
Loss:  0.05714574011931622
train acc, test acc | 0.9767, 0.9683
Loss:  0.06303338450149659
train acc, test acc | 0.9770833333333333, 0.9682
Loss:  0.0772183420051194
train acc, test acc | 0.97705, 0.9684
Loss:  0.11318003109804804
train acc, test acc | 0.9774833333333334, 0.9688
Loss:  0.04764694694574954
train acc, test acc | 0.9780833333333333, 0.9681
Loss:  0.09249046240552493
train acc, test acc | 0.9780833333333333, 0.9684
Loss:  0.17676189757164024
train acc, test acc | 0.9782833333333333, 0.9686
Loss:  0.07015592712040573
train acc, test acc | 0.97895, 0.9686
Loss:  0.07856151925746122
train acc, test acc | 0.9791, 0.9697
Loss:  0.046677477907008026
train acc, test acc | 0.9794666666666667, 0.9693
Loss:  0.08114666314966647
train acc, test acc | 0.9796, 0.9694
Loss:  0.07235911222099775
train acc, test acc | 0.9803833333333334, 0.9704
Loss:  0.16875461797955763
train acc, test acc | 0.9801666666666666, 0.9707
Loss:  0.04439383769234401
train acc, test acc | 0.9805666666666667, 0.9702
Loss:  0.03189908674495415
train acc, test acc | 0.98065, 0.97
Loss:  0.0535934109441333
train acc, test acc | 0.9811166666666666, 0.9705
Loss:  0.08680418404715323
train acc, test acc | 0.9811, 0.9709
Loss:  0.09519600902588904
train acc, test acc | 0.9811166666666666, 0.9714
Loss:  0.057116244824369125
train acc, test acc | 0.98165, 0.9707
Loss:  0.053834069648940025
train acc, test acc | 0.9821666666666666, 0.9711
Loss:  0.07452843203001527
train acc, test acc | 0.9822, 0.9708
Loss:  0.12133045891447691
train acc, test acc | 0.9824, 0.9717
Loss:  0.19072261788698883
train acc, test acc | 0.9823833333333334, 0.9718
Loss:  0.0881421675394684
train acc, test acc | 0.9825833333333334, 0.971
Loss:  0.04053340858144673
train acc, test acc | 0.9828833333333333, 0.9707
Loss:  0.043488474672356274
train acc, test acc | 0.9829, 0.9709
Loss:  0.06497416427272214
train acc, test acc | 0.9833, 0.9721

from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12, 6 

# Accuracy 그래프 그리기
markers = {'train': 'o', 'test': 's'}
x = np.arange(len(train_acc_list))
plt.plot(x, train_acc_list, label='train acc')
plt.plot(x, test_acc_list, label='test acc', linestyle='--')
plt.xlabel("epochs")
plt.ylabel("accuracy")
plt.ylim(0, 1.0)
plt.legend(loc='lower right')
plt.show()

output_40_0

# Loss 그래프 그리기
x = np.arange(len(train_loss_list))
plt.plot(x, train_loss_list, label='train acc')
plt.xlabel("epochs")
plt.ylabel("Loss")
plt.ylim(0, 3.0)
plt.legend(loc='best')
plt.show()

output_41_0

Share on

Twitter Facebook LinkedIn