What the source says
Background Smart Search jobs encode each asset once and save the embedding:
- server/src/services/smart-info.service.ts
- handleEncodeClip(...) calls machineLearningRepository.encodeImage(...)
- then stores it with searchRepository.upsert(...)
So the library is indexed ahead of time.
When you search with text, Immich does not only hit the DB.
- server/src/services/search.service.ts
- searchSmart(...) calls machineLearningRepository.encodeText(dto.query, ...)
- unless that exact query embedding is already in the in-memory embeddingCache
So for a text query, Immich first turns your search text into a CLIP embedding at query time.
- server/src/repositories/machine-learning.repository.ts
- encodeText(...) sends a /predict request with ModelTask.SEARCH + ModelType.TEXTUAL
- machine-learning/immich_ml/main.py
- /predict accepts text
- runs inference via run_inference(...)
- machine-learning/immich_ml/models/clip/textual.py
- the textual CLIP encoder supports ONNX Runtime accel providers:
- CUDAExecutionProvider
- ROCMExecutionProvider
- OpenVINOExecutionProvider
- the textual CLIP encoder supports ONNX Runtime accel providers:
So if you have ML hardware acceleration enabled, a live text search can use it for the text encoder.
- server/src/queries/search.repository.sql
- smart search ranks results with:
- order by smart_search.embedding <=> $6
- smart search ranks results with:
That means the DB compares the query vector against stored image vectors.
Bottom line
The most accurate description is:
- Indexing phase: ML encodes images once and stores embeddings.
- Live text search phase: ML encodes the search text into an embedding.
- Retrieval phase: Postgres/VectorChord does the vector similarity search.
So the comment saying “smart search just uses the index; there’s little processing once the index is there” is only partly true.
It’s true that image embeddings are precomputed.
It’s not true that live text search is purely DB-only: Immich still runs the textual CLIP model for text queries, unless that exact query is cached.
Why some people see “zero delay” on CPU
That can still be true in practice because:
- the text model may already be loaded
- the query embedding may be cached
- the chosen CLIP model may be small/fast
- the DB search itself can be quick
There’s even an in-process query embedding cache in:
- server/src/services/search.service.ts
- private embeddingCache = new LRUMap<string, string>(100);
One important nuance
If the search uses queryAssetId instead of text, Immich reuses the stored asset embedding from the DB, so that path does not need live text encoding.
Docs that also confirm this
- docs/docs/features/searching.md
- says larger CLIP models are slower both when indexing and when searching
- docs/docs/administration/system-settings.md
- says a more powerful GPU can help with batch imports or for faster search
- docs/docs/features/ml-hardware-acceleration.md
- explicitly says to check GPU provider logs when a Smart Search job starts or when you search with text in Immich
- same doc also notes:
- ARM NN does not improve search latency
- but Smart Search jobs still use ARM NN
Answer to the original “no GPU / no iGPU” question
Yes, Immich will still work on CPU only. GPU/iGPU is optional acceleration, not a requirement.
What you should expect on CPU-only:
- Smart Search indexing: slower
- Face recognition: slower
- Live text search: still works, usually okay, but model-dependent
- Video transcoding: CPU fallback, can spike CPU usage