Skip to content

Instantly share code, notes, and snippets.

@jlmelville
Last active February 2, 2024 19:03
Show Gist options
  • Save jlmelville/9b4f0d91ede13bff18d26759140709f9 to your computer and use it in GitHub Desktop.
Save jlmelville/9b4f0d91ede13bff18d26759140709f9 to your computer and use it in GitHub Desktop.

June 28 2023 O frabjous day! After much fruitless occasional faffing around with cmake settings over the course of several months, new CUDA update for WSL Ubuntu means that I am now able to build the nearest neighbors library Faiss with GPU support for my 1080 laptop graphics card. Here is the setup that worked for me.

Why would you want to build your own Faiss? Well, if you are using pip to install faiss-gpu, you can't get anything more recent than version 1.7.2. Also, with recent dependency management changes with newer Pythons (particularly Python 3.11), some packages are going to be playing catch-up with getting up to parity, so it may be useful to build from source. I don't remember how bad it was trying to install faiss-gpu on Python 3.11 with pip, but it must have been fairly un-fun, because I gave up and scuttled back to the relatively welcoming environs of Python 3.10 pretty quickly after briefly dipping my toes in its waters. I think I saw this issue. Additionally, you may want to use e.g. Intel OMP or MKL or AVX2 support.

First you will need to carefully install CUDA for Ubuntu on WSL2: https://gist.github.com/jlmelville/d236b6eafb3067bfdac304274dc5cf83. This put cuda in /usr/local/cuda. Hopefully this is not a problem for anyone any more but until CUDA 12.2 hit nvidia's wsl-ubuntu repos, building Faiss successfully eluded me. I could build Faiss without errors, but many tests would fail. The python bindings would also build and run but trying to get exact Euclidean nearest neighbors via IndexFlatL2 would return all zeros for the distances (and unsurprisingly the actual IDs of the neighbors were also wrong).

Hopefully those days are in the past. If you run /usr/local/cuda/bin/nvcc --version and get:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

you should be ok.

Also to get the SWIG python bindings working, I installed the python3-dev and python3-numpy packages for the system python.

You probably also want the Intel MKL libraries installed, which made a much bigger difference for me (versus e.g. OpenBLAS) than having AVX2 on or not. However getting MKL and AVX2 to work together was a bit of work. If you have trouble with symbol lookup error: /lib/x86_64-linux-gnu/libmkl_intel_thread.so: undefined symbol: omp_get_num_procs the installing libomp-dev may help (https://www.yodiw.com/setup-numpy-with-oneapi-mkl-in-ubuntu/) and then I got some other issues about missing symbols, which took me to https://bugs.launchpad.net/ubuntu/+source/intel-mkl/+bug/1947626 and ended up doing things like:

export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libmkl_def.so:/usr/lib/x86_64-linux-gnu/libmkl_avx2.so:/usr/lib/x86_64-linux-gnu/libmkl_core.so:/usr/lib/x86_64-linux-gnu/libmkl_intel_lp64.so:/usr/lib/x86_64-linux-gnu/libmkl_intel_thread.so:/usr/lib/x86_64-linux-gnu/libiomp5.so

But it's also possible that I just really messed something up along the way and you won't run into that (or you just care about GPU support).

I then futzed about reading https://github.com/facebookresearch/faiss/blob/main/INSTALL.md, the CI settings and also https://github.com/kyamagu/faiss-wheels/blob/main/scripts/build_Linux.sh to find the right settings.

From CMake 3.18 it seems like you need to specify the CUDA architecture. I looked at https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ and https://developer.nvidia.com/cuda-gpus to find the right numbers for my GPU (which is a 1080). However, recently I found that setting CMAKE_CUDA_ARCHITECTURES="native" did the right thing. See also: https://gitlab.kitware.com/cmake/cmake/-/issues/22375.

cmake . -B build \
    -DFAISS_ENABLE_GPU=ON \
    -DFAISS_OPT_LEVEL=avx2 \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc \
    -DCMAKE_CUDA_ARCHITECTURES="native" \
    -DBUILD_TESTING=ON \
    -DFAISS_ENABLE_PYTHON=ON \
    -DCMAKE_POLICY_DEFAULT_CMP0135=NEW

You don't need to have the testing on if you are sure everything is working, but everything was definitely not working on WSL for the past few months, so I recommend you keep it. Also, you don't need the CMAKE_POLICY_DEFAULT_CMP0135 either, it just shuts up a warning that is intended for the Faiss developers (which may have been fixed by now but of course I wouldn't know).

Then to build and run tests:

cmake --build build --config Release -j4 && make -C build test

A couple of tests still fail (51 - MEM_LEAK.ivfflat and 165 - TestGpuMemoryException.AddException) but that's still a lot better than the 33 tests that failed before the CUDA update.

Next, build the SWIG bindings:

make -C build -j swigfaiss

And then, after activating the virtual environment I wanted to install Faiss into:

# or wherever you install faiss
cd ~/dev/faiss/build/faiss/python/
python setup.py install

I was then able to confirm that unlike the faiss-gpu wheel, I had version 1.7.4:

import faiss
faiss.__version__

and I got a log message Successfully loaded faiss with AVX2 support. whereas before I got Could not load library with AVX2 support due to: ModuleNotFoundError("No module named 'faiss.swigfaiss_avx2'"). Admittedly, most of the time I want the GPU usage rather than AVX2 and in the tests I ran I didn't see much difference in the AVX2 vs non-AVX2 case, but it's the principle of the thing.

Here's a basic CPU test, which I definitely took from the Faiss docs somewhere:

import faiss
import numpy as np

# Generate random data for indexing
d = 64  # Dimensionality of the vectors
nb = 1000  # Number of vectors
np.random.seed(0)
data = np.random.random((nb, d)).astype(np.float32)
index = faiss.IndexFlatL2(d)  # L2 distance metric
index.add(data)
index.search(data, 5)

And here is the equivalent GPU test:

import faiss
import numpy as np

# Generate random data for indexing
d = 64  # Dimensionality of the vectors
nb = 1000  # Number of vectors
np.random.seed(0)
data = np.random.random((nb, d)).astype(np.float32)
res = faiss.StandardGpuResources()  # Create GPU resources object
index = faiss.IndexFlatL2(d)  # L2 distance metric
gpu_index = faiss.index_cpu_to_gpu(res, 0, index)  # Move index to GPU
gpu_index.add(data)
distances_gpu, indices_gpu = gpu_index.search(data, 5)
distances = distances_gpu.copy()
indices = indices_gpu.copy()
distances, indices

You should get the same results as with the CPU version. If not, something has gone horribly wrong. As noted above, in my case it was a problem with CUDA for WSL itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment