sikang99/LLaVA.md

Last active October 11, 2024 04:56

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/sikang99/17df7b4dbccf4a814fc455a50e53e600.js"></script>
Save sikang99/17df7b4dbccf4a814fc455a50e53e600 to your computer and use it in GitHub Desktop.

Download ZIP

LLaVA

Raw

LLaVA VLM

Articles

2024/10/09 LLaVA-Llama: An All-Around Text Processor
2024/09/18 Multimodal RAG: Chat with Videos and the Future of AI Interaction
2024/09/05 Introducing LLaVA V1.5 7B on GroqCloud - Qwen2-Vision, MiniCPM are better
2024/07/30 LLaVA Multimodel Image Search
2024/07/17 LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models
2024/06/18 LLaVA: Large Language and Vision Assistant
2024/06/12 Large Language and Vision Assistant (LLaVA) — v1.6 vs. v1.5
2024/06/06 Artificial IntelligenceLLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
2024/05/27 Tutorial: Using Ollama, LLaVA and Gravio to Build a Local Visual Question and Answer AI Assistant
2024/04/30 LLaVA-NeXT: A Strong Zero-shot Video Understanding Model
2024/04/08 LLaVA—New Standards In AI Accuracy
2024/04/01 nteracting with the Open Source Model LLaVA 1.5 on Paperspace Gradient
2024/03/31 Introduction to LLaVA: A Multimodal AI Model
2024/03/29 OMG-LLaVA: AI Model Integrating Multi-Level Visual Reasoning for Enhanced Scene Understanding
2024/02/09 How to Fine-Tune LLaVA on a Custom Dataset
2024/02/01 LLaVA 1.5 vs. 1.6
2024/01/27 Introducing LLaVA: The Fusion of Visual and Linguistic Intelligence in AI with code
2023/12/11 Understanding LLaVA: Large Language and Vision Assistant
2023/12/10 Unlocking Multimodal AI: LLaVA and LLaVA-1.5's Evolution in Language and Vision Fusion
2023/11/27 Exploring LLaVA-1.5 Technology: A Comprehensive Overview
2023/11/17 A Comprehensive First Look at LLaVA-1.5 Technology
2023/10/17 LLaVA, LLaVA-1.5, and LLaVA-NeXT(1.6) Explained

Articles (Korean)

2024/07/30 LLAVA 13b로 caption(설명) 또는 table 텍스트 데이터 생성해보기
2024/07/25 LLaVA & LLaVA 1.5
2024/07/01 OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding
2024/05/03 PLLaVA, Vision-Language 모델인 LLaVA를 Video로 확장하는 프로젝트
2024/02/01 MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

Information

Multi Modal Transformers
MarkTechPost: Computer Vision
What is LLaVA?
Ollama: llava
LlamaIndex: LlaVa Demo with LlamaIndex
vLLM: Llava Example
Building Next-Gen Multimodal Foundation Models for General-Purpose Assistants
NVIDIA Jetson AI Lab
- Tutorial - LLaVA
- Tutorial - Live LLaVA
NVIDIA: NeVA (LLaVA)
LLaVA vs. BakLLaVA
- BakLLaVA - BakLLaVA is an LMM developed by LAION, Ontocord, and Skunkworks AI. BakLLaVA uses a Mistral 7B base augmented with the LLaVA 1.5 architecture.
LLaVA Blog - LLaVA-NeXT, LLaVA-OneVision, LLaVA-Video
LLaVa Model Guide
ROCm blogs: Multimodal (Visual and Language) understanding with LLaVA-NeXT

Videos

2024/05/02 How To Fine-tune LLaVA Model (From Your Laptop!)

Projects

LLaVA-Critic: Learning to Evaluate Multimodal Models
Video Instruction Tuning with Synthetic Data - LLaVA-Video
MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
LLaVA: Large Language and Vision Assistant Visual Instruction Tuning - NeurIPS 2023 (Oral)
LLaVA-NeXT: Tackling Multi-image, Video, and 3D in Large Multimodal Models
LLaVA-OneVision - Easy Visual Task Transfer
Yo'LLaVA: Your Personalized Language and Vision Assistant
Spatial VLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
OpenVLA: An Open-Source Vision-Language-Action Model - Prismatic-7B VLM, Open X-Embodiment (OpenX) dataset

Hugging Face

Llava Hugging Face - LLaVa-NeXT, LLaVa-1.5, ViP-LLaVA, LLaVa-NeXT-Video, LLaVa-Interleave
- LLaVA WebGPU - A private and powerful multimodal AI chatbot that runs locally in your browser.
- LLaVA-Onevision
LLaVA-Next-Interleave
Video Language Models - Video LLaVA
SpaceVLMs - LLaVA, MobileVLM
bczhou/TinyLLaVA-1.5B
lmms-lab: LLaVA-OneVision
deepinfra: llava-hf/llava-1.5-7b-hf - LLaVa is a multimodal model that supports vision and language models combined.
remyxai/SpaceLLaVA - SpaceLLaVA uses LoRA to fine-tune LLaVA on a dataset designed with VQASynth to enhance spatial reasoning as in SpatialVLM
remyxai/SpaceLLaVA-lite - SpaceLLaVA-lite fine-tunes MobileVLM on a dataset designed with VQASynth to enhance spatial reasoning as in SpatialVLM
Fifth Civil Defender - 5CD
- LLaVA - Visual Question Answering
lamm-mit/Cephalo-Llava-v1.6-Mistral-vision-8b-alpha

Papers : CatalyzeX for LLaVA

Open Source

https://github.com/haotian-liu/LLaVA - NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://github.com/tabtoyou/KoLLaVA - KoLLaVA: Korean Large Language-and-Vision Assistant (feat.LLaVA)
https://github.com/LLaVA-VL/LLaVA-NeXT - LLaVA-NeXT: Open Large Multimodal Models
https://github.com/TinyLLaVA/TinyLLaVA_Factory - A Framework of Small-scale Large Multimodal Models
https://github.com/remyxai/VQASynth - Compose multimodal datasets
https://github.com/hasanar1f/HiRED - HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.
https://github.com/FreedomIntelligence/LongLLaVA - LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
https://github.com/fangyuan-ksgk/Mini-LLaVA - A minimal implementation of LLaVA-style VLM with interleaved image & text & video processing ability
https://github.com/NiuTrans/Vision-LLM-Alignment - About This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment