Skip to content

Instantly share code, notes, and snippets.

View dsdanielpark's full-sized avatar
๐Ÿ„โ€โ™‚๏ธ
Believe in your potential. May the Force be with us.

MinWoo(Daniel) Park dsdanielpark

๐Ÿ„โ€โ™‚๏ธ
Believe in your potential. May the Force be with us.
View GitHub Profile
@dsdanielpark
dsdanielpark / docker.md
Last active June 1, 2023 09:27
making docker file
@dsdanielpark
dsdanielpark / streamlit.md
Last active April 11, 2023 13:39
about streamlit landing page
  1. Install streamlit
pip install streamlit 
pip install streamlit-chat
  1. check status
@dsdanielpark
dsdanielpark / tech_blog.md
Created April 10, 2023 13:15
using remix
@dsdanielpark
dsdanielpark / git_ssh.md
Last active April 11, 2023 11:18
manage mutiple git ssh

SSH rsa key gen in local storage

   2 cd ~/.ssh
   
   3 ls -al

   # SSH Keygen
   5 ssh-keygen -t rsa -C "[email protected]" -f "id_rsa_gitId1"
@dsdanielpark
dsdanielpark / github_unstar.md
Last active April 11, 2023 13:27
make star clear
@dsdanielpark
dsdanielpark / shpinx.md
Last active January 23, 2024 17:30
shpinx auto doc

Shpinx

Installation

pip install Sphinx

Quick start

mkdir docs
cd docs
@dsdanielpark
dsdanielpark / gist:6087c6d8e17d1fce4472d1ffc9ea7176
Created July 25, 2023 06:59 — forked from rxaviers/gist:7360908
Complete list of github markdown emoji markup

People

:bowtie: :bowtie: ๐Ÿ˜„ :smile: ๐Ÿ˜† :laughing:
๐Ÿ˜Š :blush: ๐Ÿ˜ƒ :smiley: โ˜บ๏ธ :relaxed:
๐Ÿ˜ :smirk: ๐Ÿ˜ :heart_eyes: ๐Ÿ˜˜ :kissing_heart:
๐Ÿ˜š :kissing_closed_eyes: ๐Ÿ˜ณ :flushed: ๐Ÿ˜Œ :relieved:
๐Ÿ˜† :satisfied: ๐Ÿ˜ :grin: ๐Ÿ˜‰ :wink:
๐Ÿ˜œ :stuck_out_tongue_winking_eye: ๐Ÿ˜ :stuck_out_tongue_closed_eyes: ๐Ÿ˜€ :grinning:
๐Ÿ˜— :kissing: ๐Ÿ˜™ :kissing_smiling_eyes: ๐Ÿ˜› :stuck_out_tongue:
@dsdanielpark
dsdanielpark / RLHF.md
Created August 10, 2023 07:55 — forked from JoaoLages/RLHF.md
Reinforcement Learning from Human Feedback (RLHF) - a simplified explanation

Maybe you've heard about this technique but you haven't completely understood it, especially the PPO part. This explanation might help.

We will focus on text-to-text language models ๐Ÿ“, such as GPT-3, BLOOM, and T5. Models like BERT, which are encoder-only, are not addressed.

Reinforcement Learning from Human Feedback (RLHF) has been successfully applied in ChatGPT, hence its major increase in popularity. ๐Ÿ“ˆ

RLHF is especially useful in two scenarios ๐ŸŒŸ:

  • You canโ€™t create a good loss function
    • Example: how do you calculate a metric to measure if the modelโ€™s output was funny?
  • You want to train with production data, but you canโ€™t easily label your production data