Skip to content

Instantly share code, notes, and snippets.

View znxkznxk1030's full-sized avatar
๐ŸŒฎ
Taco

Youngsoo Kim znxkznxk1030

๐ŸŒฎ
Taco
View GitHub Profile

์ค‘๊ฐ„๊ณ ์‚ฌ ์˜ˆ์ƒ ๋ฌธ์ œ

1. (Finite) Markov Decision Process

1. ๊ฐ•ํ™”ํ•™์Šต(Reinforcement Learning)์˜ ์ •์˜๋ฅผ ์„œ์ˆ ํ•˜๊ณ , ์ง€๋„ํ•™์Šต(Supervised Learning)๊ณผ์˜ ์ฐจ์ด์ ์„ ์˜ˆ์‹œ์™€ ํ•จ๊ป˜ ์„ค๋ช…ํ•˜์‹œ์˜ค

A goal-directed learning from interaction

์ค‘๊ฐ„๊ณ ์‚ฌ ์˜ˆ์ƒ ๋ฌธ์ œ

Introduction

1. ๋‹ค์Œ ๊ฐœ๋…๋“ค: ์ธ๊ณต์ง€๋Šฅ(AI), ๋จธ์‹ ๋Ÿฌ๋‹(ML), ๋”ฅ๋Ÿฌ๋‹ (DL)์˜ ๊ด€๊ณ„๋ฅผ ์„ค๋ช…ํ•˜๊ณ , ๊ฐ๊ฐ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ๋ฅผ ํ•˜๋‚˜์”ฉ ๋“ค์–ด ์„œ์ˆ ํ•˜์‹œ์˜ค

$$ ๋”ฅ๋Ÿฌ๋‹ \subset ๋จธ์‹ ๋Ÿฌ๋‹ \subset ์ธ๊ณต์ง€๋Šฅ$$3

์ธ๊ณต์ง€๋Šฅ์€ ์ธ๊ฐ„์ฒ˜๋Ÿผ ์‚ฌ๊ณ ํ•˜๊ณ  ํ–‰๋™ํ•˜๋Š” ๊ธฐ๊ณ„๋ฅผ ๋งŒ๋“œ๋Š” ๊ธฐ์ˆ  ์ „๋ฐ˜์„ ์˜๋ฏธํ•œ๋‹ค.

๊ฐ•์˜ ๋‚ด์šฉ ์š”์•ฝ ๊ณผ์ œ

2025451021
์ธ๊ณต์ง€๋Šฅํ•™๊ณผ
๊น€์˜์ˆ˜

1. Entropy ์ •์˜

Entropy๋ž€ ์–ด๋–ค ํ™•๋ฅ  ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ์ •๋ณด์˜ ์–‘์„ ์ธก์ •ํ•˜๋Š” ๊ฐœ๋…์ด๋‹ค. ์—ฌ๊ธฐ์—์„œ ์ •๋ณด๋Š” ๋ถˆํ™•์‹ค์„ฑ์„ ์˜๋ฏธํ•˜๊ณ  ํ•ด๋‹น ํ™•๋ฅ  ๋ณ€์ˆ˜์˜ ๋ถˆํ™•์‹ค์„ฑ์˜ ์ •๋„๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

import random
import numpy as np
from visualize_train import draw_value_image, draw_policy_image
# left, right, up, down
ACTIONS = [np.array([0, -1]),
np.array([0, 1]),
np.array([-1, 0]),
import numpy as np
from numpy.linalg import inv
from visualize_train import draw_value_image, draw_policy_image
# left, right, up, down
ACTIONS = [np.array([0, -1]),
np.array([0, 1]),
np.array([-1, 0]),
np.array([1, 0])]
"""Showcase of flying arrows that can stick to objects in a somewhat
realistic looking way.
"""
import sys
from typing import List
import pygame
import pymunk
@znxkznxk1030
znxkznxk1030 / rl-001.py
Last active March 24, 2025 09:46
rl-001.py
import torch
from torch import initial_seed
directs = [(1, 0), (-1, 0), (0, 1), (0, -1)] # [down, up, right, left]
inf = int(1e9)
def initialize_policy(width, height, terminals):
policy = torch.full((height, width, 4), 0.0)
for y in range(height):
for x in range(width):
@znxkznxk1030
znxkznxk1030 / pytorch-practice-001.ipynb
Created March 19, 2025 07:41
pytorch ์—ฐ์Šต
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Statistic & Probability

Statistic & Probability Terminology

๋‹จ์–ด ์›์–ด ์„ค๋ช… ํ‘œ๊ธฐ
ํ™•๋ฅ  ๋ณ€์ˆ˜ Random Variable, Stochastic Variable ์ธก์ • ๊ฐ’์ด ๋ณ€ํ•  ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ ์ด ์ฃผ์–ด์ง„ ๋ณ€์ˆ˜ X
ํ™•๋ฅ  ๋ถ„ํฌ Probability Distribution ํ™•๋ฅ  ๋ณ€์ˆ˜๊ฐ€ ํŠน์ •ํ•œ ๊ฐ’์„ ๊ฐ€์งˆ ํ•™๋ฅ ์„ ๋‚˜๋‹ค๋‚ด๋Š” ํ•จ์ˆ˜
๊ธฐ๋Œ€๊ฐ’ Expected Value ์–ด๋–ค ํ™•๋ฅ ์„ ๊ฐ€์ง„ ์‚ฌ๊ฑด์„ ๋ฌดํ•œํžˆ ๋ฐ˜๋ณตํ–ˆ์„ ๊ฒฝ์šฐ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฐ’์˜ ํ‰๊ท ์œผ๋กœ์„œ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ’. ์ด์‚ฐ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ๋Š” ํ™•๋ฅ  ์งˆ๋Ÿ‰ ํ•จ์ˆ˜(PMF, Probability Mass Function), ์—ฐ์† ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ๋Š” ํ™•๋ฅ  ๋ฐ€๋„ ํ•จ์ˆ˜(PDF, Probability Density Function) ์ด๋‹ค. $E = \sum_x x f(x) \ E = \int_{-\infty}^{\infty} x f(x)$
ํ‰๊ท ๊ฐ’ Mean ํ™•๋ฅ /ํ†ต๊ณ„์—์„œ ๊ธฐ๋Œ“๊ฐ’์„ (๋ชจ)ํ‰๊ท  (Population Mean)์ด๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค. ๊ธฐ๋Œ€๊ฐ’ $\simeq$ ํ‰๊ท ๊ฐ’ $E(X)$ ๋˜๋Š” $\mu$