Data
Mean Squared Error (MSE) - {:.} Partial derivative of the weights - {:.} Partial derivative of the bias variable - {:.} Numpy - {:.} Sklearn - {:.} Visualization
Mean Absolute Error (MAE) - {:.} Numpy - {:.} Sklearn - {:.} Visualization
Root Mean Squared Logarithmic Error (RMSLE) - {:.} Numpy - {:.} Visualization
Binary Cross Entropy (a.k.a Logarithmic Loss) - {:.} Numpy - {:.} Sklearn - {:.} Pytorch - {:.} Visualization
Cross Entropy - {:.} Partial derivative of the weights - {:.} Partial derivative of the bias - {:.} Numpy - {:.} Pytorch - cross entropy - {:.} Pytorch - Custom Cross Entropy
Hinge Loss - {:.} Numpy - {:.} Sklearn
KL-Divergence - {:.} Numpy
Cosine Proximity - {:.} Numpy - {:.} Visualization
Poisson - {:.} Numpy

그냥 바다고 보고 싶었음.

Data

%pylab inline
import numpy as np
from sklearn import metrics
from sklearn.preprocessing import minmax_scale
from scipy import stats
from scipy.spatial.distance import cosine as cosine_distantce

# Pytorch
import torch
from torch import nn
from torch.nn import functional as F
from torch.autograd import Variable

a = np.random.normal(loc=3, size=1000)
b = np.random.normal(loc=3, size=1000)
c = np.random.gumbel(size=1000)
d = np.random.exponential(size=1000)
f = np.random.uniform(size=1000)

hist(a, bins=80, label='nomal a')
hist(b, bins=80, label='normal b')
hist(c, bins=80, label='gumbel c')
hist(d, bins=80, label='exponential d')
hist(f, bins=80, label='uniform f')

grid()
legend()

Mean Squared Error (MSE)

regression에서 대체로 많이 사용되는 cost function중의 하나입니다.
단점은 square 를 사용하기 때문에 cost이 지나치게 커져서 unstable 한 모델에 적용시 oscillation 현상이 일어날수 있습니다.

Gradient Descent Derivation

\[J(\theta) = \frac{1}{N} \sum^{N}_{i=1} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right)^2\]

\(J\) : cost function
\(\theta\) : parameters (weights)
N : training data의 갯수
\(x^{(i)}\) : \(i^{th}\)의 training input vector
\(y^{(i)}\) : \(i^{th}\)의 class label
\(h_{\theta} \left( x^{(i)} \right)\) : \(\theta\)를 사용하여 나온 \(i^{th}\) data에 대한 prediction

Partial derivative of the weights

\[\begin{eqnarray} \\ \frac{\partial}{\partial\theta} J(\theta) &=& \frac{\partial}{\partial \theta} \left( \frac{1}{N} \sum^N_{i=1} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right)^2 \right) & [0.1] \\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) \frac{\partial}{\partial \theta} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) & [0.2] \\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) \frac{\partial}{\partial \theta} \left( \theta^T \cdot x^{(i)} + b - y^{(i)} \right) & [0.3]\\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) \odot x^{(i)} & [0.4]\\ \end{eqnarray}\]

0.3 에서 0.4를 넘어갈때 \(\theta\)를 제외하고는 모두 상수이기 때문에 (\(b, x, y\))값 모두 0이 됩니다.

Partial derivative of the bias variable

\[\begin{eqnarray} \\ \frac{\partial}{\partial b} J(\theta) &=& \frac{\partial}{\partial b} \left( \frac{1}{N} \sum^N_{i=1} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right)^2 \right) & [0.1] \\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) \frac{\partial}{\partial b} \left( h_{\theta}(x^{(i)}) - y^{(i)} \right) & [0.2] \\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) \frac{\partial}{\partial b} \left( \theta^T \cdot x^{(i)} + b - y^{(i)} \right) & [0.3]\\ &=& \frac{2}{N} \sum^{N}_{i=0} \left( h_{\theta} (x^{(i)}) - y^{(i)} \right) & [0.4]\\ \end{eqnarray}\]

Numpy

>> p = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float32)
>> y = np.array([0, 0, 0, 1, 0], dtype=np.float32)
>>
>> def mean_squared_error(y, p):
>>     return ((y - p)**2).mean()
>>
>> mean_squared_error(y, p)
0.054499995

Sklearn

sklearn에서는 sklearn.metrics.mean_squared_error 함수를 사용하면 됩니다.

>> metrics.mean_squared_error(y, p)
0.054499995

Visualization

>> metrics.mean_squared_error(y, p)
normal_a, normal_a	: 0.0
normal_a, normal_b	: 1.94313775689
normal_a, gumbel  	: 8.34935806101
normal_a, exponent	: 5.89498613265
normal_a, uniform 	: 7.26803261167

Mean Absolute Error (MAE)

MSE가 large error를 낸다면, MAE의 경우는 상대적으로 작은 에러들을 만듭니다.
하지만 수학적으로 absolute는 많은 연산량을 필요로 합니다.

\[J(\theta) = \frac{1}{N} \sum^{N}_{i=1} \left| h_{\theta}(x^{(i)}) - y^{(i)} \right|\]

Numpy

>> def mean_abolute_error(y, p):
>>     return np.abs(y-p).mean()
>>
>> mean_abolute_error(y, p)
0.19

Sklearn

Sklearn 에서는 sklearn.metrics.mean_absolute_error 함수를 사용합니다.

>> metrics.mean_absolute_error(y, p)
0.19

Visualization

>> compare_distributions(mean_abolute_error)
normal_a, normal_a	: 0.0
normal_a, normal_b	: 1.11369204926
normal_a, gumbel  	: 2.57300947501
normal_a, exponent	: 2.13340783974
normal_a, uniform 	: 2.48578624644

Root Mean Squared Logarithmic Error (RMSLE)

\(p\) 그리고 \(a\) 는 확률.. 즉 0에서 1사이이의 값이 들어가야 합니다.

\[\epsilon = \sqrt{\frac{1}{n} \sum_{i=1}^n (\log(p_i + 1) - \log(a_i+1))^2 }\]

Numpy

>> def mean_squared_logarithmic_error(y, p):
>>     try:
>>         l = lambda x: np.nan_to_num(np.log(x + 1))
>>         return np.sqrt(((l(p) - l(y))**2).mean())
>>     except Exception as e:
>>         print(p + 1)
>>         raise e
>>
>> mean_squared_logarithmic_error(y, p)
0.16683918

Visualization

>> compare_distributions(mean_squared_logarithmic_error)
normal_a, normal_a	: 0.0
normal_a, normal_b	: 0.398201741497
normal_a, gumbel  	: 1.40502234456
normal_a, exponent	: 0.890724788695
normal_a, uniform 	: 1.02307079398

0~1사이의 확률 값이 들어가야 하는데, -2 또는 2같은 값이 들어가면서 그래프가 좀 이상하게 보입니다.

Binary Cross Entropy (a.k.a Logarithmic Loss)

https://www.kaggle.com/wiki/LogarithmicLoss

\[\text{logloss} = -\frac{1}{N} \sum^{N}_{i=1} \left( y^{(i)} \cdot \log(p_i) + (1 - y) \cdot \log(1- p_i) ) \right)\]

Numpy

>> y_true = np.array([0, 0, 0, 1, 0], dtype=np.float64)
>> y_pred = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float64)
>>
>> def binary_cross_entropy(y_true, y_pred):
>>     return -(y_true * np.log(y_pred) + (1-y_true) * np.log(1-y_pred)).mean()
>>
>> binary_cross_entropy(y_true, y_pred)
0.22590297868158524

Sklearn

sklean 에서는 sklearn.metrics.log_loss 함수를 사용합니다.

>> # Scipy와 동일함
>> metrics.log_loss(y_true, y_pred)
0.22590297868158524

Pytorch

F.binary_cross_entropy 와 nn.BCELoss 의 결과값은 동일합니다.

>> y_torch_pred = Variable(torch.DoubleTensor(y_pred))
>> y_torch_true = Variable(torch.DoubleTensor(y_true))
>>
>> torch_crossentropy = nn.BCELoss()
>> torch_crossentropy(y_torch_pred, y_torch_true).data.numpy()
array([ 0.22590298])

Visualization

그래프에서 -2 그리고 2까지 전체 x값이 안나온 이유는 NaN으로 바껴서 해당부분은 안나오는 것입니다.
\(p\) 값은 확률 0~1사이의 값으로 쓰여야 합니다.

Cross Entropy

https://rdipietro.github.io/friendly-intro-to-cross-entropy-loss/

\[H(y, \hat{y}) = - \sum_i y^{(i)} \log \hat{y}^{(i)}\]

Partial derivative of the weights

\[\begin{eqnarray} H(y, \hat{y}) &=& -\frac{\partial}{\partial \theta} \sum^N_{i=1} y^{(i)} \log \hat{y}^{(i)} \\ &=& - \sum^N_{i=1} \frac{y^{(i)}}{ \frac{\partial}{\partial \theta} \hat{y}^{(i)}} \\ &=& - \sum^N_{i=1} \frac{y^{(i)}}{ \frac{\partial}{\partial \theta} \left( \theta^{T} \cdot x^{(i)} + b \right) } \\ &=& - \sum^N_{i=1} \frac{y^{(i)}}{x^{(i)}} \end{eqnarray}\]

Partial derivative of the bias

\[\begin{eqnarray} H(y, \hat{y}) &=& -\frac{\partial}{\partial b} \sum^N_{i=1} y^{(i)} \log \hat{y}^{(i)} \\ &=& - \sum^N_{i=1} \frac{y^{(i)}}{ \frac{\partial}{\partial b} \hat{y}^{(i)}} \\ &=& - \sum^N_{i=1} \frac{y^{(i)}}{ \frac{\partial}{\partial b} \left( \theta^{T} \cdot x^{(i)} + b \right) } \\ &=& - \sum^N_{i=1} y^{(i)} \end{eqnarray}\]

Numpy

>> y_true = np.array([0, 0, 0, 1, 0], dtype=np.float32)
>> y_pred = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float32)
>>
>> def cross_entropy(y_true, y_pred):
>>     return -(y_true * np.log(y_pred)).sum()
>>
>> cross_entropy(y_true, y_pred)
0.51082557

Pytorch - cross entropy

Pytorch의 cross entropy는 일반적인 cross entropy와 전혀 다릅니다.

\[\hat{y}_{class} + \log\left( \sum_j e^{\hat{y}_j} \right)\]

>> y_true = np.array([3, 1], dtype=np.int64)
>> y_pred = np.array([[0.1, 0.1, 0.05, 0.6, 0.3],
>>                    [0, 0.9, 0.05, 0.001, 0.3]], dtype=np.float64)
>>
>>
>> def torch_cross_entropy(y_pred, labels):
>>     N = y_pred.shape[0]
>>     return (-y_pred[range(N), labels] + np.log(np.sum(np.exp(y_pred), axis=1))).mean()
>>
>> torch_cross_entropy(y_pred, y_true)
1.1437464478328658

>> y_true = np.array([3], dtype=np.int64)
>> y_pred = np.array([[0.1, 0.1, 0.05, 0.6, 0.3]], dtype=np.float32)
>>
>> y_torch_true = Variable(torch.LongTensor(y_true))
>> y_torch_pred = Variable(torch.FloatTensor(y_pred))
>>
>> torch_cross_entropy = nn.CrossEntropyLoss()
>> torch_cross_entropy(y_torch_pred, y_torch_true).data.numpy()
array([ 1.26153278], dtype=float32)

Pytorch - Custom Cross Entropy

Pytorch에서 제공하는 nn.CrossEntropyLoss는 기존 cross-entropy loss와 다름으로, 정확하게 동일한 코드를 사용시 만들어줘야 합니다.

>> def torch_custom_cross_entropy(y_true, y_pred):
>>     return -torch.sum(y_true * torch.log(y_pred))
>>
>> y_true = np.array([0, 0, 0, 1, 0], dtype=np.float32)
>> y_pred = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float32)
>>
>> y_torch_true = Variable(torch.FloatTensor(y_true))
>> y_torch_pred = Variable(torch.FloatTensor(y_pred))
>>
>> torch_custom_cross_entropy(y_torch_true, y_torch_pred).data.numpy()
array([ 0.51082557], dtype=float32)

Hinge Loss

logistic과 마찬가지로 binary classification에 사용됩니다.
라이브러리 사용하면 알아서 처리되지만, 기본적으로 labels은 -1 또는 1이어야 합니다.

\[J(\theta) = max(0,\ 1 - y * f(x))\]

Numpy

>> p = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float32)
>> y = np.array([-1, -1, -1, 1, -1], dtype=np.float32)
>>
>> def hinge_loss(y, p):
>>     l = 1-(y*p)
>>     l[l<=0] = 0
>>     return l.mean()
>>
>> hinge_loss(y, p)
0.98999995

Sklearn

>> metrics.hinge_loss(y, p)
0.98999999836087227

KL-Divergence

\(p\) 그리고 \(q\) 는 확률분포 (0에서 1사이의 값) 이어야 합니다.

\[D_{KL}\left(p(x), q(x)\right) = \sum_{x \in X} p(x) \ln\frac{p(x)}{q(x)}\]

참고로.. Scipy.stats.entropy(a, b)를 사용하면 KL-Divergence를 사용하는 것과 마찬가지인데..
내부적으로 nan등 처리가 안되어 있어서 값이 안나옴

Numpy

>> def kl_divergence(y, p):
>>     return np.sum(y * np.nan_to_num(np.log(y/p)), axis=0)
>>
>> compare_distributions(kl_divergence)
normal_a, normal_a	: 0.0
normal_a, normal_b	: 368.575575809
normal_a, gumbel  	: 2596.54562019
normal_a, exponent	: 4806.30679955
normal_a, uniform 	: 6476.72186957

Cosine Proximity

https://en.wikipedia.org/wiki/Cosine_similarity

\[\text{similarity} = cos(\theta) = \frac{A \cdot B}{ \| A \|_2 \| B \|_2}\]

Numpy

>> def cosine_proximity(a, b):
>>     norm2 = lambda x: np.sqrt((x**2).sum())
>>     return 1 - (a * b).sum() / (norm2(a) * norm2(b))
>>
>> cosine_proximity(np.array([0.3, 0.4]), np.array([1, 2]))
0.016130089900092459

Visualization

compare_distributions(cosine_distantce)
normal_a, normal_a	: 2.22044604925e-16
normal_a, normal_b	: 0.0998470768358
normal_a, gumbel  	: 0.604350409384
normal_a, exponent	: 0.331605768793
normal_a, uniform 	: 0.188153360738

Poisson

https://github.com/fchollet/keras/pull/479/commits/149d0e8d1871a7864fc2d582d6ce650512de371c

Numpy

>> p = np.array([0.1, 0.1, 0.05, 0.6, 0.3], dtype=np.float32)
>> y = np.array([0, 0, 0, 1, 0], dtype=np.float32)
>>
>> def poisson_loss(y, p):
>>     return (p - y * np.log(p)).mean()
>>
>> poisson_loss(y, p)
0.33216509