Skip to content

Instantly share code, notes, and snippets.

@mberman84
Last active June 23, 2024 15:34
Show Gist options
  • Save mberman84/f092a28e4151dd5cecebfc58ac1cbc0e to your computer and use it in GitHub Desktop.
Save mberman84/f092a28e4151dd5cecebfc58ac1cbc0e to your computer and use it in GitHub Desktop.
Steps to install Textgen WebUI
# this tutorial assumes conda and git are both installed on your computer
conda create -n tg python=3.10.9
conda activate tg
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
git clone https://github.com/oobabooga/text-generation-webui.git
cd text-generation-webui
pip install -r requirements.txt
# GPU only:
pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
# If you get: ERROR: Failed building wheel for llama-cpp-python
set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on"
set "FORCE_CMAKE=1"
pip install llama-cpp-python --no-cache-dir
# Put checker.py in your text-generation-webui folder
python checker.py #Make sure you have cuda and it is enabled
# If you get CUDA Setup failed despite GPU being available.:
pip install bitsandbytes-windows
# If you get AttributeError: module 'bitsandbytes.nn' has no attribute 'Linear4bit'. Did you mean: 'Linear8bitLt'?
pip install git+https://github.com/huggingface/peft@27af2198225cbb9e049f548440f2bd0fba2204aa --force-reinstall --no-deps
python server.py
@CGHoussem
Copy link

CGHoussem commented Jul 21, 2023

It's missing the git clone https://github.com/oobabooga/text-generation-webui.git before the cd text-generation-webui command

@mberman84
Copy link
Author

thank you

@that-scientist
Copy link

this assumes git is installed on the system.

@CGHoussem
Copy link

CGHoussem commented Jul 21, 2023

this assumes git is installed on the system.

well then, same goes for conda for the conda commands

@that-scientist
Copy link

this assumes git is installed on the system.

well then, same goes for conda for the conda commands

True! Might be improved by a brief comment for beginners.

@mberman84
Copy link
Author

Will add that! Thank you

@rumbis
Copy link

rumbis commented Jul 24, 2023

thanks

@japo42
Copy link

japo42 commented Jul 26, 2023

Thanks for the video.
When running the
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
command on my mac I get the error message
Could not find a version that satisfies the requirement torch
Any idea how to fix it?
Thank you.

@that-scientist
Copy link

that-scientist commented Jul 27, 2023 via email

@MuazAshraf
Copy link

'NoneType' object has no attribute 'cadam32bit_grad_fp32'
Running on local URL: http://127.0.0.1:7800

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.

@jonkurishita
Copy link

jonkurishita commented Aug 10, 2023

I created a step by step approach that works. Yours is great but some people need a full step-by-step approach.

Setup and Installation <<<

STEP #1: Run Anaconda App

Step #2: Select base (root)

Step #3: select Run Terminal (from the GUI)

Step #4: Create a new ENV
(base) > conda create -n textgen python=3.10.9

Step #5: Activate Textgen environment
(base) > conda activate textgen
-> (textgen) C:\Users\minam>

Step #6: Install all the Torch Libaries
(textgen) C:\Users\minam> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
*** Torch is the Library that runs all the math behind the LLMs.

Step #7: Install git package
(textgen) C:\Users\minam> conda install -c anaconda git
** Git is a distributed version control system that tracks changes in any set of computer files,
usually used for coordinating work among programmers collaboratively developing source code during software development.

Step #8: Clone the text-generation-webui repository
(textgen) C:\Users\minam> git clone https://github.com/oobabooga/text-generation-webui.git

Step #9: Change into that cloned directory
(textgen) C:\Users\minam> cd text-generation-webui

Step #10: Install all the Python modules
(textgen) C:\Users\minam> pip install -r requirements.txt
*** Because we are using condra, requirements.txt is part of the environment so no issues will occur.

--------------------------- GPU Acceleration --------------------------------------
Step #11: Install Microsoft Visual Studio 2022 IDE
Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022.
https://visualstudio.microsoft.com/downloads/ <- Select Visual Studio 2022 Community free version

Step #12 Desktop Development with C++
> Be sure that "Desktop development with C++" workload is selected on "Visual Studio Installer".
> Switch to the Individual components tab.
> Make sure that the the following C++ components are selected;
MSVC v142 - VS 2022 C++ x64/x86 build tools
Windows 11 SDK
C++ CMake tools for Windows
C++ ATL for the latest build tools
> click on "MODIFY" to apply the updated changes

Step #13: Uninstall llama python module
(textgen) C:\Users\minam> pip uninstall -y llama-cpp-python

Step #14: Set some variables
(textgen) C:\Users\minam> set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on"
(textgen) C:\Users\minam> set "FORCE_CMAKE=1"

Step #15: Install a version of ccp-python
(textgen) C:\Users\minam> pip install llama-cpp-python --no-cache-dir

Step #16: Make sure that CUDA is working
> create a Python Script called Checker.py
> checker.py
1. import torch
2. print (torch.version.cuda)
3. print(torch.cuda.is_available())
> Move the newly created checker.py to the textgen env.
[CMD Prompt] move "C:\Users\minam\Desktop\AI\03_Llama2_Setup\checker.py" "C:\Users\minam"
> Run the Python Script
(textgen) C:\Users\minam>python checker.py
11.7
True


Step #17: Need to verify the versions of Python, Gradio package, and aiofiles
> python --version => Python 3.10.9
> pip show gradio => Name: gradio
Version: 3.33.1
> pip show aiofiles => Name: aiofiles
Version: 23.2.0

********************************* FYI ***********************
Gradio is an open-source Python library that allows developers to rapidly create user interfaces (UIs) for machine learning models.

aiofiles is a Python library that provides an asynchronous interface to file I/O. It's designed to work with Python's asyncio library,
allowing you to perform file operations.


Step #18: Update the Libraries
> pip install --upgrade gradio -> version 3.39.0
> pip install --upgrade aiofiles -> version 23.2.1

Step #19: Spin up the Python server
(textgen) C:\Users\minam> cd text-generation-webui
(textgen) C:\Users\minam\text-generation-webui> C:\Users\minam> python server.py
>> Servers should be running at http://127.0.0.1:7860.


The directories and some naming will be different but the steps should work.

@BravoVictor27
Copy link

BravoVictor27 commented Aug 16, 2023

You, dear uchimoriya, are a champion. Thanks!

@RickyGoodlett
Copy link

You really helped me a lot. I have been looking for how to do this for a long time.

@jonkurishita
Copy link

jonkurishita commented Aug 17, 2023

Hi guys, Here is some more info for you!
I assume everyone has a high-end GPU personal PC at home.
My PC specs for LLM is as follows;
AMD Ryzen Threadripper 3960X 24-Core Processor
NVIDIA GeForce RTX 4080 (16.0 GB)
Microsoft Windows 11 Pro

I find it is fine for running different versions of Llama2 models as follows;
#1: Standard Llama2 13b
https://huggingface.co/TheBloke/Llama-2-13B-Chat-fp16

#2: Uncensored version: Lama2 13b
Wizard-Vicuna-13B-Uncensored-GPTQ (4-bit 128mb)
Wizard-Vicuna-13B-Uncensored.ggmlv3.q5_K_M.bin


More details on TextGEN WEBUI runbing on Python server.

cd text-generation-webui
python server.py --threads=16 --gpu-memory=12 --mlock

Threads: Given that you have a 24-core processor (AMD Ryzen Threadripper 3960X), you can utilize a substantial number of threads. Starting with 12 or 16 threads might be a good balance to fully leverage the multi-core architecture without overloading the system.

GPU Memory: Your NVIDIA GeForce RTX 4080 has 16 GB of GPU memory. The allocation of GPU memory will depend on the specific tasks you're running on the server. If the server is mainly utilizing the GPU, you might allocate up to 12-14 GB. However, be mindful of other GPU-intensive applications that might be running on your system.

--mlock: This option is typically used to lock the process's virtual address space into RAM, preventing the system from swapping it to the disk. It can be beneficial for real-time or latency-sensitive applications, as it ensures that the data associated with the process is always in physical memory. However, be cautious, as locking a large amount of memory can affect other system processes. I

Summary
-threads=16: Specifies the number of threads to use, presumably for parallel processing.
--gpu-memory=12: Allocates 12 GB of GPU memory.
--mlock: May lock the process's virtual address space into RAM.


For Lama-2-13b-chat model

Step #1: Launch TextGen WebUI (on the Python server)
>> http://127.0.0.1:7860

Step #2: Select Model Tab

Step #3: Download broke version of Llama-2-13b-Chat
Paste the model name in TEXT-GEN/ Model Tab
TheBloke/Llama-2-13B-Chat-fp16
> DONE

STEP #4: Click on the blue Reload Button

STEP #5: Find the mode TheBloke_Llama-2-13B-fp16

STEP #6: Click on "LOAD"
> Successfully loaded TheBloke_Llama-2-13B-fp16

STEP #7: Go to the Session Tab

STEP #8: Select the MODE <- chat

STEP #9: Apply and restart

STEP #10: Go to Parameters tab

STEP #11: Max out the tokens to 4096

STEP #12: Lower the temperature to 0.01

STEP #13: Test the model in chat mode

STEP #14: Chat Style -> cai-chat <- this looks better!!


Uncensored Llama-2 <<<<

Model Name:TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ
https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ/tree/main
TEXTGEN WEBUI for download: TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ

STEP #1: > run Anaconda
> ENV -> TextGen -> Run Terminal
> CD text-generation-webui

STEP #2: Setup Hugging face access
set HF_USER=
set HF_PASS=

STEP #3: Run the textgen Webui Server
(textgen) C:\Users\minam\text-generation-webui> python server.py --threads=16 --gpu-memory=12 --mlock
From the Web Browser: http://127.0.0.1:7860

STEP #2: Download the model in TextGen Webui
TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ

README FILE <<<
https://huggingface.co/TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ

Open the text-generation-webui UI as normal.
Click the Model tab.
Under Download custom model or LoRA, enter TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ.
Click Download.
Wait until it says it's finished downloading.
Click the Refresh icon next to Model in the top left.
In the Model drop-down: choose the model you just downloaded, Wizard-Vicuna-13B-Uncensored-GPTQ.
If you see an error in the bottom right, ignore it - it's temporary.

STEP #3: Fill out the GPTQ parameters on the right:
wBits = 4,
Groupsize = 128,
model_type = Llama

Click Save settings for this model in the top right.
Click Reload the Model in the top right.
Once it says it's loaded, click the Text Generation tab and enter a prompt!

Note: I was wondering why this is working quite fast on my PC. (not as fast as chatGTP but...), it is due to QPTQ.

About QPTQ <<<<<
GPTQ refers to a quantized version of a GPT-like model. Quantization is a process used to reduce the numerical precision of the weights in a neural network. It's often used to reduce the memory footprint and computational requirements of a model, making it more suitable for deployment on resource-constrained devices or to speed up inference.

In the context of the details you provided, "GPTQ format quantised 4bit models" means that the model's weights have been quantized to 4-bit precision. This reduces the size of the model and can accelerate its execution.


Have a good day!

@jonkurishita
Copy link

jonkurishita commented Aug 17, 2023 via email

@jonkurishita
Copy link

jonkurishita commented Aug 17, 2023 via email

@RickyGoodlett
Copy link

Glad to help! Have a great day Ricky!

On Thu, Aug 17, 2023 at 2:14 AM Ricky Goodlett @.> wrote: @.* commented on this gist. ------------------------------ You really helped me a lot. I have been looking for how to do this for a long time. — Reply to this email directly, view it on GitHub https://gist.github.com/mberman84/f092a28e4151dd5cecebfc58ac1cbc0e#gistcomment-4662837 or unsubscribe https://github.com/notifications/unsubscribe-auth/ABATFMF5VVLLQEDNH5RIFPLXVT5YRBFKMF2HI4TJMJ2XIZLTSKBKK5TBNR2WLJDHNFZXJJDOMFWWLK3UNBZGKYLEL52HS4DFQKSXMYLMOVS2I5DSOVS2I3TBNVS3W5DIOJSWCZC7OBQXE5DJMNUXAYLOORPWCY3UNF3GS5DZVRZXKYTKMVRXIX3UPFYGLK2HNFZXIQ3PNVWWK3TUUZ2G64DJMNZZDAVEOR4XAZNEM5UXG5FFOZQWY5LFVEYTEMZQHAZTSNJTU52HE2LHM5SXFJTDOJSWC5DF . You are receiving this email because you commented on the thread. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub .
-- --------------------------------------- Jon Kurishita (栗下・ジョン)

thanks, you too

@MuazAshraf
Copy link

I created a step by step approach that works. Yours is great but some people need a full step-by-step approach.

Setup and Installation <<<

STEP #1: Run Anaconda App

Step #2: Select base (root)

Step #3: select Run Terminal (from the GUI)

Step #4: Create a new ENV (base) > conda create -n textgen python=3.10.9

Step #5: Activate Textgen environment (base) > conda activate textgen -> (textgen) C:\Users\minam>

Step #6: Install all the Torch Libaries (textgen) C:\Users\minam> pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 *** Torch is the Library that runs all the math behind the LLMs.

Step #7: Install git package (textgen) C:\Users\minam> conda install -c anaconda git ** Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers collaboratively developing source code during software development.

Step #8: Clone the text-generation-webui repository (textgen) C:\Users\minam> git clone https://github.com/oobabooga/text-generation-webui.git

Step #9: Change into that cloned directory (textgen) C:\Users\minam> cd text-generation-webui

Step #10: Install all the Python modules (textgen) C:\Users\minam> pip install -r requirements.txt *** Because we are using condra, requirements.txt is part of the environment so no issues will occur.

--------------------------- GPU Acceleration -------------------------------------- Step #11: Install Microsoft Visual Studio 2022 IDE Building windows wheels for Python 3.10 requires Microsoft Visual Studio 2022. https://visualstudio.microsoft.com/downloads/ <- Select Visual Studio 2022 Community free version

Step #12 Desktop Development with C++ > Be sure that "Desktop development with C++" workload is selected on "Visual Studio Installer". > Switch to the Individual components tab. > Make sure that the the following C++ components are selected; MSVC v142 - VS 2022 C++ x64/x86 build tools Windows 11 SDK C++ CMake tools for Windows C++ ATL for the latest build tools > click on "MODIFY" to apply the updated changes

Step #13: Uninstall llama python module (textgen) C:\Users\minam> pip uninstall -y llama-cpp-python

Step #14: Set some variables (textgen) C:\Users\minam> set "CMAKE_ARGS=-DLLAMA_OPENBLAS=on" (textgen) C:\Users\minam> set "FORCE_CMAKE=1"

Step #15: Install a version of ccp-python (textgen) C:\Users\minam> pip install llama-cpp-python --no-cache-dir

Step #16: Make sure that CUDA is working > create a Python Script called Checker.py > checker.py 1. import torch 2. print (torch.version.cuda) 3. print(torch.cuda.is_available()) > Move the newly created checker.py to the textgen env. [CMD Prompt] move "C:\Users\minam\Desktop\AI\03_Llama2_Setup\checker.py" "C:\Users\minam" > Run the Python Script (textgen) C:\Users\minam>python checker.py 11.7 True

Step #17: Need to verify the versions of Python, Gradio package, and aiofiles > python --version => Python 3.10.9 > pip show gradio => Name: gradio Version: 3.33.1 > pip show aiofiles => Name: aiofiles Version: 23.2.0

********************************* FYI *********************** Gradio is an open-source Python library that allows developers to rapidly create user interfaces (UIs) for machine learning models.

aiofiles is a Python library that provides an asynchronous interface to file I/O. It's designed to work with Python's asyncio library, allowing you to perform file operations.

Step #18: Update the Libraries > pip install --upgrade gradio -> version 3.39.0 > pip install --upgrade aiofiles -> version 23.2.1

Step #19: Spin up the Python server (textgen) C:\Users\minam> cd text-generation-webui (textgen) C:\Users\minam\text-generation-webui> C:\Users\minam> python server.py >> Servers should be running at http://127.0.0.1:7860.

The directories and some naming will be different but the steps should work.

Hi I have completed all the steps. I don't have GPU. I have install all the requirements when I run the python server.py file it gave me url: but when I click on that url. It says 127.0.0.1 is refused to connect or when I add my ip which I hosted on it says ip is taking too long to respond.

@jonkurishita
Copy link

jonkurishita commented Aug 18, 2023 via email

@3dstudiodesign
Copy link

I have worked for quite a while to solve an issue, to no avail::. I am working on Windows 11 machine with NVIDIA Geforce RTX 3060, 64 GB RAM, 2TB SSD with 24 cores. I have read over 50 articles, issues, blogs, videos, etc., and can't find the answer to the problem. After a lot of tweaks, I was down to the last line of the instruction -- spin up the server: python server.py. And Boom..... several errors in Gradio, including that gr.Box is deprecated. I have downgraded to lower versions of gradio (no luck, just different errors), have tried different versions of Python, Anaconda, GIT, etc., (as found in different blogs or on HuggingTree or GitHub, but I am still getting errors with gradio. So, i removed the virtual environment, deleted the text-generation-webui directory, and am back to scratch. Any suggestions on which tools are the best to use on Windows environment? Or should I take the advice of Bing Co-Pilot and Jeremy Morgan and install WSL to run LLMs locally? Thanks for your advice here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment