Local VSCode AI code assistance via starcoder + 4-bit quantization in ~11GB VRAM

Install HF Code Autocomplete VSCode plugin.

We are not going to set an API token. We are going to specify an API endpoint.
We will try to deploy that API ourselves, to use our own GPU to provide the code assistance.

We will use bigcode/starcoder, a 15.5B param model.
We will use NF4 4-bit quantization to fit this into 10787MiB VRAM.
It would require 23767MiB VRAM unquantized. (still fits on a 4090, which has 24564MiB)!

Setup API

All instructions are written assuming your command-line shell is bash.

Clone huggingface-vscode-endpoint-server repository:

git clone https://github.com/Birch-san/huggingface-vscode-endpoint-server.git
cd huggingface-vscode-endpoint-server

Create + activate a new virtual environment

This is to avoid interfering with your current Python environment (other Python scripts on your computer might not appreciate it if you update a bunch of packages they were relying on).

Follow the instructions for virtualenv, or conda, or neither (if you don't care what happens to other Python scripts on your computer).

Using `venv`

Create environment:

python -m venv venv
pip install --upgrade pip

Activate environment:

. ./venv/bin/activate

(First-time) update environment's pip:

pip install --upgrade pip

Using `conda`

Download conda.

Skip this step if you already have conda.

Install conda:

Skip this step if you already have conda.

Assuming you're using a bash shell:

# Linux installs Anaconda via this shell script. Mac installs by running a .pkg installer.
bash Anaconda-latest-Linux-x86_64.sh
# this step probably works on both Linux and Mac.
eval "$(~/anaconda3/bin/conda shell.bash hook)"
conda config --set auto_activate_base false
conda init

Create environment:

conda create -n p311-code-api python=3.11

Activate environment:

conda activate p311-code-api

Install package dependencies

Ensure you have activated the environment you created above.

(Optional) treat yourself to latest nightly of PyTorch, with support for Python 3.11 and CUDA 12.1:

# CUDA
pip install --upgrade --pre torch --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Install dependencies:

pip install -r requirements.txt

Run API:

From root of huggingface-vscode-endpoint-server repository:

python -m main --model_name_or_path bigcode/starcoder --bf16

Error: bigcode/starcoder repository not found / "private repository"

If you get this error:
You'll need to accept the terms on the bigcode/starcoder model card.

If you haven't logged into the huggingface CLI before: you'll also need to do that, so that it can authenticate as you, to check whether you accepted the model card's terms.

Go to tokens, create a new read-only token.
Copy the new token to your clipboard.

Run huggingface-cli login from your command prompt, and paste the token.

Try running main again.

Test API

Check this first, before we try to get VSCode working.

curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'

If it works: we're ready to try it in VSCode.

Try out starcoder integration in VSCode

Open the VSCode extension settings for starcoder:

Set your API endpoint as:

http://localhost:8000/api/generate

You may need to Reload Window, to initialize the HF Code Autocomplete, now that you have changed the settings (open command palette with Cmd+Shift+P, and type Reload Window):

Running code inference

Create a new empty text file. Set the language to Python.

Type:

def main():

Whilst it thinks: you should see a spinner in the status bar at the bottom of VSCode:

Starcoder should auto-complete it for you!

Press tab to accept the completion.

Troubleshooting HF Code Autocomplete extension

Open the Output tab of VSCode's tray, pick the "Hugging Face Code" dropdown option:

You should be able to see anything logged by from the VSCode Extension.

Hi @Birch-san,

I'm encountering an error while testing the API using the following curl command:
curl -X POST http://localhost:8000/api/generate/ -d '{"inputs": "", "parameters": {"max_new_tokens": 64}}'

CUDA version - 12.1
GPU - 4090

INFO: 127.0.0.1:48018 - "POST /api/generate/ HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/cors.py", line 83, in call
await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/kd/anaconda3/envs/p311-code-api/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/kd/anaconda

Birch-san/code-assist.md

Select an option

No results found

Select an option

No results found

Setup API

Create + activate a new virtual environment

Using `venv`

Using `conda`

Install package dependencies

Run API:

Test API

Try out starcoder integration in VSCode

Running code inference

Troubleshooting HF Code Autocomplete extension

Birch-san commented Jun 10, 2023

Uh oh!

kdcyberdude commented Jul 6, 2023

Uh oh!

kharelpk commented Aug 16, 2023

Uh oh!

shan100github commented Oct 31, 2023

Uh oh!

Birch-san/code-assist.md

Setup API

Create + activate a new virtual environment

Using venv

Using conda

Install package dependencies

Run API:

Test API

Try out starcoder integration in VSCode

Running code inference

Troubleshooting HF Code Autocomplete extension

Birch-san commented Jun 10, 2023

Uh oh!

kdcyberdude commented Jul 6, 2023

Uh oh!

kharelpk commented Aug 16, 2023

Uh oh!

shan100github commented Oct 31, 2023

Uh oh!

Using `venv`

Using `conda`