[Machine Learning] Exam 3 (Week 4)

해당 내용은 Andrew Ng 교수님의 Machine Learning 강의(Coursera)를 정리한 내용입니다.

※ 아래에 문제 풀이가 있습니다. 원하지 않는다면 스크롤을 내리지 마세요.

4주차 과제는 아래와 같다.

lrCostFunction.m - Regularized Logistic Regression의 Cost와 편미분항을 계산하는 과제. 결론부터 말하자면, Ex2의 costFunctionReg.m과 완전 동일하다.

oneVsAll.m - multi-class logistic regression의 분류를 예측하는 코드 작성 과제

predictOneVsAll.m - 예측 함수를 통해서 예측값을 반환하는 코드 작성 과제

predict.m - Neural network 예측 함수를 작성하는 과제

https://github.com/junstar92/Coursera/tree/master/MachineLearning/ex3

위 GitHub에서도 코드를 볼 수 있다.

[lrCostFunction.m]

처음에 언급했지만, Exam2의 costFunctionReg.m과 완전히 동일하다. 기존 설명을 참조.

2020/08/11 - [Machine Learning/Machine Learning - Andrew Ng] - [Machine Learning] Exam 2

코드

 function [J, grad] = lrCostFunction(theta, X, y, lambda)
%LRCOSTFUNCTION Compute cost and gradient for logistic regression with 
%regularization
%   J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 
 
% Initialize some useful values
m = length(y); % number of training examples
 
% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));
 
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Hint: The computation of the cost function and gradients can be
%       efficiently vectorized. For example, consider the computation
%
%           sigmoid(X * theta)
%
%       Each row of the resulting matrix will contain the value of the
%       prediction for that example. You can make use of this to vectorize
%       the cost function and gradient computations. 
%
% Hint: When computing the gradient of the regularized cost function, 
%       there're many possible vectorized solutions, but one solution
%       looks like:
%           grad = (unregularized gradient for logistic regression)
%           temp = theta; 
%           temp(1) = 0;   % because we don't add anything for j = 0  
%           grad = grad + YOUR_CODE_HERE (using the temp variable)
%
tempTheta = theta;
tempTheta(1) = 0;
 
J = (-1/m) * sum(y.*log(sigmoid(X*theta))+(1-y).*log(1-sigmoid(X*theta))) ...
        + ((lambda)/(2*m))*(tempTheta'*tempTheta);
 
temp = sigmoid(X*theta);
error = temp - y;
grad = (1/m) * (X' * error) +(lambda/m)*tempTheta;
 
% =============================================================
 
grad = grad(:);
 
end

[oneVsAll.m]

Multi-Classes Logistic Regression의 Class-label의 각 Hypothesis Function의 theta값을 반환하는 함수를 작성하는 코드이다.

여기서 코드에 Hint를 살펴보면, Initial theta는 전부 0으로 시작하면 되고, class-label의 개수만큼 regression을 진행한다.

그리고 fmincg 함수를 사용해서 최적화된 theta값을 사용하라고 되어 있다. optimset으로 옵션을 설정하고 해당 함수를 사용하는 것 같은데, 이 함수는 나중에 다시 알아보도록 해야겠다.

 % Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%
% Set Initial theta

위와 같은 힌트가 있다.

우리는 all_theta를 아래와 같이 반환한다. K는 num_label, 즉, class의 개수를 의미한다.

$\Theta = \begin{bmatrix} \Theta_1^{(0)} && \Theta_1^{(1)} && ... && \Theta_1^{(n)} \\ \Theta_2^{(0)} && \Theta_2^{(1)} && ... \Theta_2^{(n)} \\ ... && ... && ... && ... \\ \Theta_K^{(0)} && \Theta_K^{(1)} && ... && \Theta_K^{(n)} \end{bmatrix}$

코드로 나타내면 아래와 같다.

 function [all_theta] = oneVsAll(X, y, num_labels, lambda)
%ONEVSALL trains multiple logistic regression classifiers and returns all
%the classifiers in a matrix all_theta, where the i-th row of all_theta 
%corresponds to the classifier for label i
%   [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
%   logistic regression classifiers and returns each of these classifiers
%   in a matrix all_theta, where the i-th row of all_theta corresponds 
%   to the classifier for label i
 
% Some useful variables
m = size(X, 1);
n = size(X, 2);
 
% You need to return the following variables correctly 
all_theta = zeros(num_labels, n + 1);
 
% Add ones to the X data matrix
X = [ones(m, 1) X];
 
% ====================== YOUR CODE HERE ======================
% Instructions: You should complete the following code to train num_labels
%               logistic regression classifiers with regularization
%               parameter lambda. 
%
% Hint: theta(:) will return a column vector.
%
% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
%       whether the ground truth is true/false for this class.
%
% Note: For this assignment, we recommend using fmincg to optimize the cost
%       function. It is okay to use a for-loop (for c = 1:num_labels) to
%       loop over the different classes.
%
%       fmincg works similarly to fminunc, but is more efficient when we
%       are dealing with large number of parameters.
%
% Example Code for fmincg:
%
%     % Set Initial theta
%     initial_theta = zeros(n + 1, 1);
%     
%     % Set options for fminunc
%     options = optimset('GradObj', 'on', 'MaxIter', 50);
% 
%     % Run fmincg to obtain the optimal theta
%     % This function will return theta and the cost 
%     [theta] = ...
%         fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
%                 initial_theta, options);
%
% Set Initial theta
for c = 1:num_labels
    initial_theta = zeros(n + 1, 1);
 
    % Set options for fminunc
    options = optimset('GradObj', 'on', 'MaxIter', 50);
 
    % Run fmincg to obtain the optimal the
    % This function will return theta and 
    [theta] = ...
        fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
                initial_theta, options);
                
    all_theta(c,:) = theta';
end
 
% =========================================================================
 
 
end

[predictOneVsAll.m]

one-vs-all multi-class 분류의 예측값을 반환하는 코드 작성을 하면 된다.

우리는 Hypothesis Function이 아래와 같이 나타낼 수 있다는 것을 알고 있다.

$h_\theta(x) = \frac{1}{1 + e^{-X\theta}}$

다만 여기서 theta는 각 class-label로 이루어져 아래와 같다.

즉, 위 Hypothesis Function의 $X\theta$ 부분은 벡터화해서 나타내면 $X\Theta^T$ 로 나타낼 수 있다.

코드는 아래와 같다.

 function p = predictOneVsAll(all_theta, X)
%PREDICT Predict the label for a trained one-vs-all classifier. The labels 
%are in the range 1..K, where K = size(all_theta, 1). 
%  p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
%  for each example in the matrix X. Note that X contains the examples in
%  rows. all_theta is a matrix where the i-th row is a trained logistic
%  regression theta vector for the i-th class. You should set p to a vector
%  of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
%  for 4 examples) 
 
m = size(X, 1);
num_labels = size(all_theta, 1);
 
% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);
 
% Add ones to the X data matrix
X = [ones(m, 1) X];
 
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned logistic regression parameters (one-vs-all).
%               You should set p to a vector of predictions (from 1 to
%               num_labels).
%
% Hint: This code can be done all vectorized using the max function.
%       In particular, the max function can also return the index of the 
%       max element, for more information see 'help max'. If your examples 
%       are in rows, then, you can use max(A, [], 2) to obtain the max 
%       for each row.
%       
 
g_h = sigmoid(X * all_theta');
 
[max_pobability_one_example, class_in_one_example] = max(g_h, [], 2); 
p = class_in_one_example;
 
% =========================================================================
 
end

line 33처럼 Hypothesis Function은 이렇게 나타낼 수 있고, 각 column 값은 해당 row(즉, i번째 training set) class-label의 확률값이다.

line 35에서 각 행의 최대 확률값과, 그 확률값의 class로 이루어진 vector을 반환하도록 하였다. 즉, 우리가 구하고자 하는 것은 각 training set에서 최대의 확률을 갖는 class의 vector이다.

[predict.m]

Neural Network의 예측값을 반환하는 코드 작성 과제이다.

1개의 Hidden Layer층을 갖는 신경망을 구현하면 되는데, 첫 번째 레이어에서 두 번째 레이어의 변환에 Theta1, 두 번째 레이어에서 Output 레이어의 변환에 Theta2를 사용한다.

여기서 input layer는 400, hidden layer는 25, output layer는 10으로 설정되어 있다.

즉, $\Theta^{(1)}$ 은 25 x 401 matrix, $\Theta^{(2)}$ 는 10 x 26 matrix가 된다. Layer가 넘어갈 때마다 bias unit( $a_0^{(j)} = 1$ 이 추가되기 때문에 이전 단계 input 갯수에서 +1씩 된다.

문제에서 Layer 는 총 3개이며, $a^{(1)} = X$ 이고 나머지 $a^{(2)}, a^{(3)}$ 을 구하고, $a^{(3)}$ 이 Hypothesis Function이 된다.

마지막 결과는 output layer의 node가 10개(즉, class - label이 10개)이기 때문에, $a^{(3)}$ 은 m(여기서는 5000) x 10의 matrix가 된다.

우리는 각 row행에서 가장 큰 확률값는 column을 max함수를 통해 찾아서 반환하면 된다.

문제 코드는 아래와 같다.

 function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
%   p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
%   trained weights of a neural network (Theta1, Theta2)
 
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
 
% You need to return the following variables correctly 
p = zeros(size(X, 1), 1);
 
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
%               your learned neural network. You should set p to a 
%               vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
%       function can also return the index of the max element, for more
%       information see 'help max'. If your examples are in rows, then, you
%       can use max(A, [], 2) to obtain the max for each row.
%
 
 
% Add ones to the X data matrix  
X = [ones(m, 1) X];  
a2 = sigmoid(X * Theta1');  
 
% Add ones to the a2 data matrix  
a2 = [ones(m, 1) a2];  
a3 = sigmoid(a2 * Theta2');  
 
[max_pobability_one_example, class_in_one_example] = max(a3, [], 2);  
p = class_in_one_example;
 
% =========================================================================
end

'Coursera 강의 > Machine Learning' 카테고리의 다른 글

[Machine Learning] Backpropagation in Practice (0)	2020.08.15
[Machine Learning] Neural Network(Cost Function, Backpropagation Algorithm) (0)	2020.08.15
[Machine Learning] Neural Networks : Model Representation(신경망 모델) (0)	2020.08.12
[Machine Learning] Exam 2(Week 3) (3)	2020.08.11
[Machine Learning] Regularization 정규화 (0)	2020.08.08

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

별준

[Machine Learning] Exam 3 (Week 4)

※ 아래에 문제 풀이가 있습니다. 원하지 않는다면 스크롤을 내리지 마세요.

[lrCostFunction.m]

[oneVsAll.m]

[predictOneVsAll.m]

[predict.m]

'Coursera 강의 > Machine Learning' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	function [J, grad] = lrCostFunction(theta, X, y, lambda)
	%LRCOSTFUNCTION Compute cost and gradient for logistic regression with
	%regularization
	% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using
	% theta as the parameter for regularized logistic regression and the
	% gradient of the cost w.r.t. to the parameters.

	% Initialize some useful values
	m = length(y); % number of training examples

	% You need to return the following variables correctly
	J = 0;
	grad = zeros(size(theta));

	% ====================== YOUR CODE HERE ======================
	% Instructions: Compute the cost of a particular choice of theta.
	% You should set J to the cost.
	% Compute the partial derivatives and set grad to the partial
	% derivatives of the cost w.r.t. each parameter in theta
	%
	% Hint: The computation of the cost function and gradients can be
	% efficiently vectorized. For example, consider the computation
	%
	% sigmoid(X * theta)
	%
	% Each row of the resulting matrix will contain the value of the
	% prediction for that example. You can make use of this to vectorize
	% the cost function and gradient computations.
	%
	% Hint: When computing the gradient of the regularized cost function,
	% there're many possible vectorized solutions, but one solution
	% looks like:
	% grad = (unregularized gradient for logistic regression)
	% temp = theta;
	% temp(1) = 0; % because we don't add anything for j = 0
	% grad = grad + YOUR_CODE_HERE (using the temp variable)
	%
	tempTheta = theta;
	tempTheta(1) = 0;

	J = (-1/m) * sum(y.log(sigmoid(Xtheta))+(1-y).log(1-sigmoid(Xtheta))) ...
	+ ((lambda)/(2m))(tempTheta'*tempTheta);

	temp = sigmoid(X*theta);
	error = temp - y;
	grad = (1/m) * (X' * error) +(lambda/m)*tempTheta;

	% =============================================================

	grad = grad(:);

	end

	% Note: For this assignment, we recommend using fmincg to optimize the cost
	% function. It is okay to use a for-loop (for c = 1:num_labels) to
	% loop over the different classes.
	%
	% fmincg works similarly to fminunc, but is more efficient when we
	% are dealing with large number of parameters.
	%
	% Example Code for fmincg:
	%
	% % Set Initial theta
	% initial_theta = zeros(n + 1, 1);
	%
	% % Set options for fminunc
	% options = optimset('GradObj', 'on', 'MaxIter', 50);
	%
	% % Run fmincg to obtain the optimal theta
	% % This function will return theta and the cost
	% [theta] = ...
	% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
	% initial_theta, options);
	%
	% Set Initial theta

	function [all_theta] = oneVsAll(X, y, num_labels, lambda)
	%ONEVSALL trains multiple logistic regression classifiers and returns all
	%the classifiers in a matrix all_theta, where the i-th row of all_theta
	%corresponds to the classifier for label i
	% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels
	% logistic regression classifiers and returns each of these classifiers
	% in a matrix all_theta, where the i-th row of all_theta corresponds
	% to the classifier for label i

	% Some useful variables
	m = size(X, 1);
	n = size(X, 2);

	% You need to return the following variables correctly
	all_theta = zeros(num_labels, n + 1);

	% Add ones to the X data matrix
	X = [ones(m, 1) X];

	% ====================== YOUR CODE HERE ======================
	% Instructions: You should complete the following code to train num_labels
	% logistic regression classifiers with regularization
	% parameter lambda.
	%
	% Hint: theta(:) will return a column vector.
	%
	% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you
	% whether the ground truth is true/false for this class.
	%
	% Note: For this assignment, we recommend using fmincg to optimize the cost
	% function. It is okay to use a for-loop (for c = 1:num_labels) to
	% loop over the different classes.
	%
	% fmincg works similarly to fminunc, but is more efficient when we
	% are dealing with large number of parameters.
	%
	% Example Code for fmincg:
	%
	% % Set Initial theta
	% initial_theta = zeros(n + 1, 1);
	%
	% % Set options for fminunc
	% options = optimset('GradObj', 'on', 'MaxIter', 50);
	%
	% % Run fmincg to obtain the optimal theta
	% % This function will return theta and the cost
	% [theta] = ...
	% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
	% initial_theta, options);
	%
	% Set Initial theta
	for c = 1:num_labels
	initial_theta = zeros(n + 1, 1);

	% Set options for fminunc
	options = optimset('GradObj', 'on', 'MaxIter', 50);

	% Run fmincg to obtain the optimal the
	% This function will return theta and
	[theta] = ...
	fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...
	initial_theta, options);

	all_theta(c,:) = theta';
	end

	% =========================================================================


	end

	function p = predictOneVsAll(all_theta, X)
	%PREDICT Predict the label for a trained one-vs-all classifier. The labels
	%are in the range 1..K, where K = size(all_theta, 1).
	% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions
	% for each example in the matrix X. Note that X contains the examples in
	% rows. all_theta is a matrix where the i-th row is a trained logistic
	% regression theta vector for the i-th class. You should set p to a vector
	% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2
	% for 4 examples)

	m = size(X, 1);
	num_labels = size(all_theta, 1);

	% You need to return the following variables correctly
	p = zeros(size(X, 1), 1);

	% Add ones to the X data matrix
	X = [ones(m, 1) X];

	% ====================== YOUR CODE HERE ======================
	% Instructions: Complete the following code to make predictions using
	% your learned logistic regression parameters (one-vs-all).
	% You should set p to a vector of predictions (from 1 to
	% num_labels).
	%
	% Hint: This code can be done all vectorized using the max function.
	% In particular, the max function can also return the index of the
	% max element, for more information see 'help max'. If your examples
	% are in rows, then, you can use max(A, [], 2) to obtain the max
	% for each row.
	%

	g_h = sigmoid(X * all_theta');

	[max_pobability_one_example, class_in_one_example] = max(g_h, [], 2);
	p = class_in_one_example;

	% =========================================================================

	end

	function p = predict(Theta1, Theta2, X)
	%PREDICT Predict the label of an input given a trained neural network
	% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
	% trained weights of a neural network (Theta1, Theta2)

	% Useful values
	m = size(X, 1);
	num_labels = size(Theta2, 1);

	% You need to return the following variables correctly
	p = zeros(size(X, 1), 1);

	% ====================== YOUR CODE HERE ======================
	% Instructions: Complete the following code to make predictions using
	% your learned neural network. You should set p to a
	% vector containing labels between 1 to num_labels.
	%
	% Hint: The max function might come in useful. In particular, the max
	% function can also return the index of the max element, for more
	% information see 'help max'. If your examples are in rows, then, you
	% can use max(A, [], 2) to obtain the max for each row.
	%


	% Add ones to the X data matrix
	X = [ones(m, 1) X];
	a2 = sigmoid(X * Theta1');

	% Add ones to the a2 data matrix
	a2 = [ones(m, 1) a2];
	a3 = sigmoid(a2 * Theta2');

	[max_pobability_one_example, class_in_one_example] = max(a3, [], 2);
	p = class_in_one_example;

	% =========================================================================
	end

[Machine Learning] Exam 3 (Week 4)

※ 아래에 문제 풀이가 있습니다. 원하지 않는다면 스크롤을 내리지 마세요.

[lrCostFunction.m]

[oneVsAll.m]

[predictOneVsAll.m]

[predict.m]

'Coursera 강의 > Machine Learning' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역