Skip to content

Instantly share code, notes, and snippets.

@sikang99
Last active November 18, 2024 01:07
Show Gist options
  • Save sikang99/92f9eac82a14038d1ba77aab5729f434 to your computer and use it in GitHub Desktop.
Save sikang99/92f9eac82a14038d1ba77aab5729f434 to your computer and use it in GitHub Desktop.
Large Behavior Model

Large Behavior Model (LBM)

  • TRI : Toyota Research Institute

Articles

Information

  • OpenVLA (Open Vision-Language-Action Model)
  • VIMA (General Robot Manipulation with Multimodal Prompts
  • RT-1-X
  • LM-Nav
  • Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
  • Drake - Model-Based Design and Verification for Robotics

Slides

Papers

Open Source

@sikang99
Copy link
Author

ํ˜„์žฌ Large Behavior Models(LBM)์„ ์˜คํ”ˆ์†Œ์Šค๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋กœ์ ํŠธ๋Š” ๋‹ค์–‘ํ•˜์ง€ ์•Š์ง€๋งŒ, ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋“ค์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ๋กœ๋ด‡ ์ œ์–ด๋ฅผ ์œ„ํ•ด ๋Œ€๊ทœ๋ชจ ๋น„์ „-์–ธ์–ด-ํ–‰๋™ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ์—ฐ๊ตฌ๋“ค์ด ํ™œ๋ฐœํžˆ ์ง„ํ–‰๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
1. OpenVLA (Open Vision-Language-Action Model):
OpenVLA๋Š” ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•œ ์กฐ์ž‘ ์ž‘์—…์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ์˜คํ”ˆ์†Œ์Šค ๋น„์ „-์–ธ์–ด-ํ–‰๋™(VLA) ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. 7์–ต ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๊ฐ€์ง„ ์ด ๋ชจ๋ธ์€ Open X-Embodiment๋ผ๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹(970,000๊ฐœ ์ด์ƒ์˜ ๋กœ๋ด‡ ์‹œ์—ฐ ๋ฐ์ดํ„ฐ)์œผ๋กœ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. OpenVLA๋Š” ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ์—์„œ ๋ฒ”์šฉ์ ์œผ๋กœ ํ™œ์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํŠนํžˆ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ํ™˜๊ฒฝ๊ณผ ์–ธ์–ด์  ๋ช…๋ น์— ๋Œ€ํ•ด ์ ์‘ํ•˜๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ HuggingFace์—์„œ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณต๋˜๋ฉฐ, ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์ฝ”๋“œ๋„ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ ๋‹ซํžŒ ๊ตฌ์กฐ ๋ชจ๋ธ(RT-2-X) ๋Œ€๋น„ ์„ฑ๋Šฅ์ด ๋›ฐ์–ด๋‚˜๋ฉฐ, ์ƒˆ๋กœ์šด ์ž‘์—… ํ™˜๊ฒฝ์— ๋น ๋ฅด๊ฒŒ ์ ์‘ํ•˜๋Š” ํŠน์ง•์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
2. VIMA (General Robot Manipulation with Multimodal Prompts):
VIMA๋Š” ๋‹ค์ค‘ ๋ชจ๋‹ฌ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋กœ๋ด‡ ์กฐ์ž‘ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ์˜ ์ž…๋ ฅ(์˜ˆ: ์ด๋ฏธ์ง€, ํ…์ŠคํŠธ ๋ช…๋ น)์„ ๋ฐ›์•„ ๋กœ๋ด‡์ด ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. Pytorch ๊ธฐ๋ฐ˜์œผ๋กœ ์˜คํ”ˆ์†Œ์Šค ์ฝ”๋“œ๊ฐ€ ์ œ๊ณต๋˜์–ด ์—ฐ๊ตฌ์ž๋“ค์ด ์†์‰ฝ๊ฒŒ ์ ‘๊ทผํ•˜๊ณ  ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
3. RT-1-X:
Google Robotics์—์„œ ๊ฐœ๋ฐœํ•œ RT-1-X ์‹œ๋ฆฌ์ฆˆ๋Š” ๋Œ€๊ทœ๋ชจ ๋น„์ „-์–ธ์–ด ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋กœ๋ด‡์˜ ํ–‰๋™์„ ์ œ์–ดํ•˜๋Š” ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ Open X-Embodiment์™€ ๊ฐ™์€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต๋˜์—ˆ์œผ๋‚˜, ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณต๋˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค. ๋‹ค๋งŒ, ์ด์™€ ์œ ์‚ฌํ•œ ์ ‘๊ทผ์„ ํ•˜๋Š” ๋ชจ๋ธ๋“ค์€ ์˜คํ”ˆ์†Œ์Šค๋กœ ๊ณต๊ฐœ๋˜์–ด ์—ฐ๊ตฌ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ํ™œ๋ฐœํžˆ ์—ฐ๊ตฌ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์™ธ์—๋„ Github์—์„œ ๋กœ๋ด‡๊ณผ LLM(๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ)์„ ๊ฒฐํ•ฉํ•œ ์—ฌ๋Ÿฌ ์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋“ค์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, โ€œLM-Navโ€ ํ”„๋กœ์ ํŠธ๋Š” ๋Œ€ํ˜• ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋กœ๋ด‡์˜ ํƒ์ƒ‰ ๋ฐ ์กฐ์ž‘ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์ง€์›ํ•˜๋ฉฐ, ํ•ด๋‹น ์ฝ”๋“œ ์—ญ์‹œ ์˜คํ”ˆ์†Œ์Šค๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค.

์˜คํ”ˆ์†Œ์Šค ํ”„๋กœ์ ํŠธ๋ฅผ ํ™œ์šฉํ•˜๊ณ ์ž ํ•œ๋‹ค๋ฉด OpenVLA๊ฐ€ ํ˜„์žฌ๋กœ์„œ๋Š” ๊ฐ€์žฅ ์œ ๋งํ•œ ์„ ํƒ์ง€๋กœ ๋ณด์ด๋ฉฐ, ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ ๋ฒ”์šฉ์ ์œผ๋กœ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ๊ฐ€๋Šฅ์„ฑ์ด ํฝ๋‹ˆ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ VIMA์™€ ๊ฐ™์€ ๋ชจ๋ธ๋“ค์„ ๊ฒ€ํ† ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์กฐ์ž‘ ์ž‘์—…์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด๋ณด๋Š” ๊ฒƒ๋„ ์ข‹์€ ์ ‘๊ทผ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment