[pytorch] Tutorial - Automatic Differentiation (autograd)

References

Official Pytorch Tutorial (link)

torch.autograd

Neural network를 학습할 때, 가장 많이 사용되는 알고리즘은 backpropagation 입니다. 이 알고리즘에서 파라미터, 즉, 모델의 weights는 주어진 파라미터에 대한 loss function의 gradient에 따라 조정됩니다.

Gradient를 계산하기 위해서 파이토치는 torch.autograd라는 built-in differentiation engine을 사용하며, 이는 computational graph의 gradient를 자동으로 계산할 수 있도록 지원합니다.

간단하게 다음과 같이 input이 x, 파라미터가 w, b로 이루어진 하나의 레이어로 구성된 네트워크와 loss function이 주어졌을 때를 가정해보겠습니다.

 import torch
 
x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn(5, 3, requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

위 코드는 다음과 같은 computational graph를 정의합니다.

이 네트워크에서 w와 b는 파라미터이며, 우리는 이 파라미터를 최적화해야 합니다. 따라서, 이 변수(파라미터)들에 대해 loss function의 gradient를 계산해야 하며, 이를 계산하기 위해 이 변수 텐서의 requires_grad 프로퍼티를 설정해주었습니다. requires_grad는 텐서를 생성할 때 설정해줄 수 있으며, 생성할 때 설정해주지 않더라도 x.requires_grad_(True) 메소드를 통해 설정해줄 수도 있습니다.

computational graph를 구성하기 위해 텐서에 적용하는 함수는 Function 클래스(torch.autograd.Function)의 객체입니다. 이 객체는 함수를 순방향으로 연산하는 방법과 backward propagation 단계에서 도함수(derivative)를 연산하는 방법을 알고 있습니다. Backward propagation function의 referencesms 텐서의 grad_fn 프로퍼티에 저장되며, 이 클래스에 대한 자세한 내용은 link에서 확인하실 수 있습니다. 여기서 Function 클래스의 자세한 내용은 생략하도록 하겠습니다.

 print(f"Gradient function for z = {z.grad_fn}")
print(f"Gradient function for loss = {loss.grad_fn}")

Computing Gradients

Neural network의 파라미터 weights를 최적화하려면 파라미터에 대한 loss function의 도함수를 계산해야 합니다. 즉, $\frac{\partial loss}{\partial w}$ 와 $\frac{\partial loss}{\partial b}$ 를 계산해야 합니다 (x, y는 고정된 상태에서). 이를 계산하려면, loss.backward()를 호출하면 되고, 이 값들은 w.grad와 b.grad로 확인할 수 있습니다.

 loss.backward()
print(w.grad)
print(b.grad)

requires_grad 프로퍼티가 True로 설정된 computational leaf nodes인 경우에만 grad 프로퍼티를 얻을 수 있습니다. 이외의 그래프 노드에서는 gradient를 사용할 수 없습니다.
성능상의 이유로 주어진 그래프에서 단 한 번만 backward를 사용하여 gradient를 계산할 수 있습니다. 만약 동일한 그래프에서 여러 번의 backward 호출이 필요하다면, backward를 호출할 때 retain_graph=True를 전달해주어야 합니다.

Disabling Gradient Tracking

Default로, requires_grad=True로 생성된 모든 텐서는 computational history를 추적하고 gradient computation을 지원합니다. 하지만, 학습이 끝난 후 forward 연산만 필요한 경우에는 gradient 계산이나 computational history 추적이 필요하지 않습니다. 이때는, torch.no_grad() 블록을 사용하여 computation 추적을 중지할 수 있습니다.

 z = torch.matmul(x, w)+b
print(z.requires_grad)
 
with torch.no_grad():
    z = torch.matmul(x, w)+b
print(z.requires_grad)

또 다른 방법으로, 텐서에 대해 detach() 메소드를 사용하면 동일한 결과를 얻을 수 있습니다.

 z = torch.matmul(x, w)+b
z_det = z.detach()
print(z_det.requires_grad)

위 코드의 결과로 False를 출력합니다.

일반적으로 gradient tracking을 비활성화하는 경우는 다음과 같습니다.

Neural network에서 일부 파라미터를 고정시킬 때 (frozen prameter로 만들 때). 주로 pretrained network를 finetuning할 때가 이 경우에 해당합니다.
Forward 연산만 필요한 상황에서 텐서에 대한 연산 속도를 높이기 위해 (speed up computations)

More on Computational Graphs

개념적으로, autograd는 Function 객체로 구성된 DAG(directed acyclic graph)에서 data(tensors) 및 실행된 모든 연산의 record를 유지합니다. 이 DAG에서 leaves(리프노드)는 input tensor이며, root(루트노드)는 output tensors입니다. 루트 노드에서부터 리프 노드까지 이 그래프를 추적하는 방법으로 chain rule을 사용하여 gradients를 자동으로 계산할 수 있습니다.

Forward pass에서 autograd는 다음의 두 가지 동작을 동시에 수행합니다.

요청된 연산을 수행하여 결과 텐서를 계산
DAG에서 operation의 gradient function을 유지

DAG root에서 .backward()가 호출되면, backward pass가 시작됩니다. 그러면 autograd는 다음의 작업을 수행합니다.

각 .grad_fn으로부터 gradient를 계산
각 텐서의 .grad 어트리뷰트에 기울기를 누적
chain rule을 사용하여 리프 텐서까지 전파(propagation)

참고로 DAG는 PyTorch에서 dynamic 이라고 합니다. 꼭 알아두어야 할 점은 각 .backward() 가 호출된 이후 autograd는 새로운 그래프를 생성한다는 것입니다. 이 때문에 모델에서 control flow statements를 사용할 수 있으며, 필요하다면 모든 반복에서 shape, size, operation을 변경할 수 있습니다.

'ML & DL > pytorch' 카테고리의 다른 글

[pytorch] Tensors (0)	2022.12.01
[pytorch] Tutorial - Optimizing Model Parameters (0)	2022.11.30
[pytorch] Tutorial - Build the Neural Network (0)	2022.11.29
[pytorch] Tutorial - Datasets & DataLoaders & Transforms (0)	2022.11.26
[pytorch] Tutorial - Tensors (0)	2022.11.26

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

별준

[pytorch] Tutorial - Automatic Differentiation (autograd)

References

torch.autograd

Computing Gradients

Disabling Gradient Tracking

More on Computational Graphs

'ML & DL > pytorch' 카테고리의 다른 글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	import torch

	x = torch.ones(5)
	y = torch.zeros(3)
	w = torch.randn(5, 3, requires_grad=True)
	b = torch.randn(3, requires_grad=True)
	z = torch.matmul(x, w) + b
	loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

	print(f"Gradient function for z = {z.grad_fn}")
	print(f"Gradient function for loss = {loss.grad_fn}")

	z = torch.matmul(x, w)+b
	print(z.requires_grad)

	with torch.no_grad():
	z = torch.matmul(x, w)+b
	print(z.requires_grad)

	z = torch.matmul(x, w)+b
	z_det = z.detach()
	print(z_det.requires_grad)

[pytorch] Tutorial - Automatic Differentiation (autograd)

References

torch.autograd

Computing Gradients

Disabling Gradient Tracking

More on Computational Graphs

'ML & DL > pytorch' 카테고리의 다른 글

관련글

댓글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역