Skip to content

Instantly share code, notes, and snippets.

View jskDr's full-sized avatar

Sungjin Kim jskDr

View GitHub Profile
@jskDr
jskDr / grpo_demo.py
Created February 11, 2025 05:08 — forked from willccbb/grpo_demo.py
GRPO Llama-1B
# train_grpo.py
import re
import torch
from datasets import load_dataset, Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import LoraConfig
from trl import GRPOConfig, GRPOTrainer
# Load and prep dataset
@jskDr
jskDr / rust_code_refine.md
Created September 23, 2024 04:11
Rust Code Fixing Problem

Find any issues in my Rust code based on the error message and the current code.

[Error message]

Line 40: Char 13: error: cannot assign to tmp because it is borrowed (solution.rs)
|
40 | tmp = curr.as_mut().unwrap().next.take();
| ^^^ tmp is assigned to here but it was already borrowed
41 | curr = &mut tmp;
| -------- tmp is borrowed here
@jskDr
jskDr / tictactoe.py
Last active March 15, 2020 05:42
TieTacToe game agent using a kind of reinforcement learning algorithms
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pickle
# TicTacToe game has nine stateus with nine actions. An user can put his ston on any postion in the borad except
def set_state_inplace(S, action, P_no):
''' S is numpy array.'''
assert S[action] == 0, 'position should be empty to put a new stone'
@jskDr
jskDr / single_linkedlist.py
Created January 25, 2020 15:31
Single LinkedList - Testing for inserting and deleting
class LinkedList:
def __init__(self, d):
self.d = d
self.r = None
def append(self, d):
self.r = LinkedList(d)
def print_list(alist):
@jskDr
jskDr / minimal_AC_RL.ipynb
Last active November 23, 2019 09:32
High-level implementation of ActorCritic with minimal typing (General implementation)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jskDr
jskDr / pg_both_tf2_torch.ipynb
Last active October 26, 2019 16:20
Comparion policy gradient codes implemented by TF 2.0 and PyTorch based on https://medium.com/@hamza.emra/reinforcement-learning-with-tensorflow-2-0-cca33fead626
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jskDr
jskDr / Actor-Crtic_no_detached_pytorch.ipynb
Created October 3, 2019 15:11
Acotor Critic witout using detached() in PyTorch - It leads one loss function for both actor and critic networks
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jskDr
jskDr / Actor-Crtic_detached_pytorch.ipynb
Last active October 3, 2019 15:04
Actor-Critic implemented by PyTorch, separated loss formulations are used for actor and critic agents.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jskDr
jskDr / policy_gradient_by_pytorch.ipynb
Last active October 3, 2019 14:08
Policy gradient code written by PyTorch where the number of batches is larger than one
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jskDr
jskDr / pg_james.ipynb
Created September 29, 2019 13:36
Policy Gradient with PyTorch and Python Class Structure
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.