To modify the previous implementation to use Qdrant for hybrid search (vector + keyword search) while keeping Neo4j for Cypher queries in Clojure, we’ll integrate the Qdrant database for the hybrid search part and use Neo4j solely for graph-based Cypher retrieval. Below, I’ll adapt the code assuming you have a Qdrant instance running and a Neo4j database populated with graph data. We’ll use the Qdrant REST API via clj-http for simplicity, as there’s no official Clojure client for Qdrant (though you could use a gRPC client if preferred).
- Qdrant Instance: Running locally (e.g.,
http://localhost:6333) or on a remote server, with a collection set up for hybrid search (vector embeddings + text payloads). - Neo4j Database: Populated and accessible via Bolt (e.g.,
bolt://localhost:7687). - Clojure Dependencies: Add
clj-httpandcheshire(for JSON parsing) to yourproject.clj:
:dependencies [[org.clojure/clojure "1.11.1"]
[clj-http "3.12.3"] ;; For HTTP requests to Qdrant
[cheshire "5.11.0"] ;; For JSON parsing
[com.novemberain/neocons "3.3.0"]] ;; Neo4j Bolt client- Qdrant Setup: You have a Qdrant collection (e.g.,
chunks) with:- Vector embeddings (e.g., 1536 dimensions) indexed for similarity search.
- A payload field
textfor keyword search.
- Neo4j Schema: Contains nodes like
Chunk(with atextproperty) connected to other nodes (e.g.,Entityvia relationships). - Embedding Model: You have a way to generate embeddings for your query (e.g., via an external API like Ollama or OpenAI).
(ns graphrag.core
(:require [clojurewerkz.neocons.rest :as nr]
[clojurewerkz.neocons.rest.cypher :as cy]
[clj-http.client :as http]
[cheshire.core :as json]
[clojure.string :as str]))
;; Neo4j Bolt connection
(def neo4j-conn
(nr/connect "bolt://neo4j:password@localhost:7687"))
;; Qdrant base URL
(def qdrant-url "http://localhost:6333")
(def qdrant-collection "chunks") ;; Replace with your collection nameReplace the Neo4j credentials and Qdrant URL/collection as needed.
Qdrant supports hybrid search through its /search endpoint (vector search) and /scroll endpoint with filters (keyword search). We’ll combine these.
(defn hybrid-search [query top-k]
(let [;; Placeholder embedding for the query (replace with real embedding)
embedding (repeat 1536 0.1) ;; Generate this with an embedding model
;; Vector search request
vector-req {:vector embedding
:limit top-k
:with_payload true}
vector-resp (http/post (str qdrant-url "/collections/" qdrant-collection "/points/search")
{:body (json/generate-string vector-req)
:headers {"Content-Type" "application/json"}
:as :json})
vector-results (-> vector-resp :body :result)
;; Keyword search request using scroll with a filter
keyword-req {:filter {:must [{:key "text"
:match {:value query}}]}
:limit top-k
:with_payload true}
keyword-resp (http/post (str qdrant-url "/collections/" qdrant-collection "/points/scroll")
{:body (json/generate-string keyword-req)
:headers {"Content-Type" "application/json"}
:as :json})
keyword-results (-> keyword-resp :body :result)]
{:vector (map #(hash-map :text (get-in % [:payload :text])
:score (:score %))
vector-results)
:keyword (map #(hash-map :text (get-in % [:payload :text])
:score 1.0) ;; No score from scroll, assign default
keyword-results)}))Notes:
- Embedding: The
embeddingis a dummy vector. Replace it with an actual embedding generated forquery(e.g., via an API call to Ollama). - Vector Search: Uses Qdrant’s
/searchendpoint for semantic similarity. - Keyword Search: Uses
/scrollwith amatchfilter on thetextpayload field. Qdrant doesn’t provide a score for this, so we assign a default score of1.0. - Payload: Assumes each point in Qdrant has a
textfield in its payload.
This remains largely unchanged, querying Neo4j based on texts from Qdrant:
(defn cypher-retrieval [conn texts]
(let [cypher-query (str "MATCH (c:Chunk)-[r]->(e:Entity) "
"WHERE c.text IN $texts "
"RETURN c.text AS chunk_text, r, e.name AS entity_name")
results (cy/tquery conn cypher-query {:texts texts})]
results))Adjust the Cypher query to match your Neo4j schema.
(defn hybrid-cypher-retrieval [conn query top-k]
(let [;; Step 1: Perform hybrid search with Qdrant
hybrid-results (hybrid-search query top-k)
vector-texts (map :text (:vector hybrid-results))
keyword-texts (map :text (:keyword hybrid-results))
all-texts (distinct (concat vector-texts keyword-texts))
;; Step 2: Fetch related graph data with Neo4j Cypher
cypher-results (cypher-retrieval conn all-texts)
;; Step 3: Merge and rank results
merged-results (map (fn [result]
(let [text (:chunk_text result)
vector-score (or (some #(when (= (:text %) text) (:score %))
(:vector hybrid-results))
0)
keyword-score (or (some #(when (= (:text %) text) (:score %))
(:keyword hybrid-results))
0)]
{:text text
:entity (:entity_name result)
:combined-score (+ vector-score keyword-score)}))
cypher-results)]
;; Sort by combined score and take top-k
(take top-k (sort-by :combined-score > merged-results))))(defn -main []
(let [query "What is the role of enzymes in biology?"
top-k 5
results (hybrid-cypher-retrieval neo4j-conn query top-k)]
(doseq [result results]
(println (str "Text: " (:text result)
", Entity: " (:entity result)
", Score: " (:combined-score result))))))
;; Run the main function
(-main)-
Qdrant for Hybrid Search:
- Replaced Neo4j vector and full-text index queries with Qdrant’s REST API calls.
- Vector search uses
/searchfor similarity based on embeddings. - Keyword search uses
/scrollwith a filter to match thetextpayload field.
-
Neo4j for Cypher Only:
- Neo4j is now used solely for graph traversal and relationship queries, not for search.
-
Result Structure:
- Qdrant returns results with
:payload(containingtext) and:score(for vector search). We map these into a consistent format for merging with Cypher results.
- Qdrant returns results with
-
Qdrant Configuration:
- Ensure your Qdrant collection (
chunks) has vectors and atextpayload field. Create it if needed:curl -X PUT http://localhost:6333/collections/chunks \ -H "Content-Type: application/json" \ -d '{"vectors": {"size": 1536, "distance": "Cosine"}}' - Populate it with points containing embeddings and
textpayloads.
- Ensure your Qdrant collection (
-
Embedding Generation:
- Replace the dummy
embeddinginhybrid-searchwith a real one. For example, if using Ollama:Then update(defn generate-embedding [text] (let [resp (http/post "http://localhost:11434/api/embeddings" ;; Ollama API {:body (json/generate-string {:model "nomic-embed-text" :prompt text}) :headers {"Content-Type" "application/json"} :as :json})] (-> resp :body :embedding)))
hybrid-searchto use(generate-embedding query).
- Replace the dummy
-
Neo4j Schema:
- Ensure the
textvalues in Qdrant match thetextproperties in Neo4jChunknodes for seamless integration.
- Ensure the
-
Scoring:
- The keyword search score is hardcoded to
1.0. You could enhance this by implementing a custom scoring mechanism (e.g., TF-IDF) or using Qdrant’s experimental hybrid search features if available in your version.
- The keyword search score is hardcoded to
- Start your Qdrant instance (e.g., via Docker:
docker run -p 6333:6333 qdrant/qdrant). - Start your Neo4j instance with Bolt enabled.
- Run
lein repland load the namespace. - Call
-main.
This setup leverages Qdrant’s strengths for hybrid search and Neo4j’s graph capabilities for Cypher-based retrieval, all within Clojure. Let me know if you need further adjustments!