Last active
October 9, 2023 13:36
-
-
Save Daethyra/6d38c57fe0bcadfd303fb49f95db2177 to your computer and use it in GitHub Desktop.
Question/Answering w/ Chroma | Embeddings Retrieval
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"provenance": [], | |
"authorship_tag": "ABX9TyMjub/jevXvCqoI4i3BHYUB", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
}, | |
"language_info": { | |
"name": "python" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/Daethyra/6d38c57fe0bcadfd303fb49f95db2177/langchain-qa-over-local-docs.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"# Question/Answering Over Locally Hosted Documents\n", | |
"\n", | |
"## Core question:\n", | |
" * By the end of this notebook we should be able to answer the following:\n", | |
" * \"What's the best way to use LangChain to load documents and be able to answer questions over them?\"\n", | |
"\n", | |
"Resources:\n", | |
"* [Chat with LangChain's documentation](https://chat.langchain.com/)\n", | |
"* [Question/Answering Documentation](https://python.langchain.com/docs/integrations/document_loaders/unstructured_file)\n", | |
"* [Question/Answering Jupyter notebook by LangChain](https://python.langchain.com/docs/use_cases/question_answering.html)\n", | |
"* [GitHub repository for building chatbots](https://github.com/langchain-ai/chat-langchain)" | |
], | |
"metadata": { | |
"id": "7a9Dm7OC-p0i" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"- I asked LangChain's documentation chatbot how to build this project! With it as my search engine, I was able to build this entire notebook." | |
], | |
"metadata": { | |
"id": "7fkfkVkX_ed-" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"" | |
], | |
"metadata": { | |
"id": "aqrN21a2_Rmz" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### Let's first focus on answering our objective question.\n", | |
"\n", | |
"#### Answer:\n", | |
"\n", | |
"To use LangChain for QA over local docs, we can use the UnstructuredFileLoader class from the `document_loader` module. With that we can specify the directory of our document files." | |
], | |
"metadata": { | |
"id": "ENJP6WcM_5r6" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# install dependencies\n", | |
"!pip install -qU langchain \"unstructured[all-docs]\" python-dotenv\n", | |
"!pip install -qU openai chromadb tiktoken" | |
], | |
"metadata": { | |
"id": "Dme34TGlAdky" | |
}, | |
"execution_count": 12, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### Getting Started\n", | |
"\n", | |
"#### To emulate a scenario where the documents are stored locally, let's [click here to download an example document](https://arxiv.org/pdf/2310.03562 \"One click download\") then drag and drop it from your computer's downloads folder into the files tab\n", | |
"\n", | |
"Note: I archived the page I accessed for transparency. [Click to see the arXiv page that I saw.](https://web.archive.org/web/20230930123356/https://arxiv.org/list/astro-ph.IM/recent)" | |
], | |
"metadata": { | |
"id": "shzxvxrUPN7K" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Initialize the file loader\n", | |
"\n", | |
"from langchain.document_loaders import UnstructuredFileLoader\n", | |
"from unstructured.cleaners.core import clean_extra_whitespace\n", | |
"\n", | |
"loader = UnstructuredFileLoader(\n", | |
" # Specify the document(s)\n", | |
" \"./sample_data/2310.03562.pdf\",\n", | |
" mode=\"elements\",\n", | |
" post_processors=[clean_extra_whitespace],\n", | |
")" | |
], | |
"metadata": { | |
"id": "sld9fdIxB73u" | |
}, | |
"execution_count": 34, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Load the data\n", | |
"docs = loader.load()" | |
], | |
"metadata": { | |
"id": "NU-KuD3YE6kp" | |
}, | |
"execution_count": 36, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Check for results\n", | |
"docs[0]" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "eJWINzaqKb4c", | |
"outputId": "5994d72a-0b4a-4246-d23f-22a45f3bf2de" | |
}, | |
"execution_count": 37, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"Document(page_content='Astronomy & Astrophysics manuscript no. seapipe October 6, 2023', metadata={'source': './sample_data/2310.03562.pdf', 'coordinates': {'points': ((40.238, 36.98040000000003), (40.238, 60.073846799999956), (243.85418180000005, 60.073846799999956), (243.85418180000005, 36.98040000000003)), 'system': 'PixelSpace', 'layout_width': 595.276, 'layout_height': 841.89}, 'filename': '2310.03562.pdf', 'file_directory': './sample_data', 'last_modified': '2023-10-07T05:06:23', 'filetype': 'application/pdf', 'page_number': 1, 'links': [], 'category': 'NarrativeText'})" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 37 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### We could instead load data using a URL.\n", | |
"\n", | |
"##### **DO NOT** execute the following cell if you'd like to keep the local data loaded.\n", | |
"\n", | |
"Note: The `OnlinePDFLoader` isn't nearly as robust as the unstructured loader from the `unstructured` library. Check out their [documentation.](https://unstructured-io.github.io/unstructured/)" | |
], | |
"metadata": { | |
"id": "i6IOPN9YFnmO" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from langchain.document_loaders import OnlinePDFLoader\n", | |
"\n", | |
"# Replace the instantiated UnstructuredFileLoader\n", | |
"loader = OnlinePDFLoader(\"https://arxiv.org/pdf/2302.03803.pdf\") # Same file as before\n", | |
"\n", | |
"# Reinstantiate the loaded docs\n", | |
"docs = loader.load()\n", | |
"\n", | |
"# Check for results\n", | |
"docs[0]" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "Xo_Zecs3Fm5G", | |
"outputId": "a81e0b3c-a68d-4f89-e402-f7a0129719e4" | |
}, | |
"execution_count": 38, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"Document(page_content='A WEAK (k, k)-LEFSCHETZ THEOREM FOR PROJECTIVE TORIC ORBIFOLDS\\n\\nWilliam D. Montoya\\n\\n3 2 0 2\\n\\nb e F 7\\n\\nInstituto de Matem´atica, Estat´ıstica e Computa¸c˜ao Cient´ıfica, Universidade Estadual de Campinas (UNICAMP),\\n\\nRua S´ergio Buarque de Holanda 651, 13083-859, Campinas, SP, Brazil\\n\\n]\\n\\nG A . h t a m\\n\\nFebruary 9, 2023\\n\\nAbstract\\n\\n[\\n\\n1 v 3 0 8 3 0 . 2 0 3 2 : v i X r a\\n\\n1\\n\\nFirstly we show a generalization of the (1, 1)-Lefschetz theorem for projective toric orbifolds and secondly we prove that on 2k-dimensional quasi-smooth hyper- surfaces coming from quasi-smooth intersection surfaces, under the Cayley trick, every rational (k, k)-cohomology class is algebraic, i.e., the Hodge conjecture holds on them.\\n\\nIntroduction\\n\\nIn [3] we proved that, under suitable conditions, on a very general codimension s quasi- smooth intersection subvariety X in a projective toric orbifold Pd Σ with d + s = 2(k + 1) the Hodge conjecture holds, that is, every (p, p)-cohomology class, under the Poincar´e duality is a rational linear combination of fundamental classes of algebraic subvarieties of X. The proof of the above-mentioned result relies, for p ≠ d + 1 − s, on a Lefschetz\\n\\nDate: February 9, 2023 2020 Mathematics Subject Classification: 14C30, 14M10, 14J70, 14M25 Keywords: (1,1)- Lefschetz theorem, Hodge conjecture, toric varieties, complete intersection Email: [email protected]\\n\\n1\\n\\nsions. I also acknowledge support from FAPESP postdoctoral grant No. 2019/23499-7.\\n\\n2 Preliminaries and Notation\\n\\n2.1 Toric varieties\\n\\nLet M be a free abelian group of rank d, let N = Hom(M, Z), and NR = N ⊗Z R.\\n\\nDefinition 2.1.\\n\\nA convex subset σ ⊂ NR is a rational k-dimensional simplicial cone if there exist k linearly independent primitive elements e1, . . . , ek ∈ N such that σ = {µ1e1 + ⋯ + µkek}.\\n\\nThe generators ei are integral if for every i and any nonnegative rational number µ the product µei is in N only if µ is an integer.\\n\\nThe generators ei are integral if for every i and any nonnegative rational number µ the product µei is in N only if µ is an integer.\\n\\nGiven two rational simplicial cones σ, σ′ one says that σ′ is a face of σ (σ′ < σ) if the set of integral generators of σ′ is a subset of the set of integral generators of σ.\\n\\nA finite set Σ = {σ1, . . . , σt} of rational simplicial cones is called a rational simplicial complete d-dimensional fan if:\\n\\nA finite set Σ = {σ1, . . . , σt} of rational simplicial cones is called a rational simplicial complete d-dimensional fan if:\\n\\n1. all faces of cones in Σ are in Σ;\\n\\n2. if σ, σ′ ∈ Σ then σ ∩ σ′ < σ and σ ∩ σ′ < σ′;\\n\\n3. NR = σ1 ∪ ⋅ ⋅ ⋅ ∪ σt.\\n\\nA rational simplicial complete d-dimensional fan Σ defines a d-dimensional toric variety Σ having only orbifold singularities which we assume to be projective. Moreover, T ∶= Pd N ⊗Z C∗ ≃ (C∗)d is the torus action on Pd Σ. We denote by Σ(i) the i-dimensional cones\\n\\n2\\n\\nof Σ and each ρ ∈ Σ corresponds to an irreducible T -invariant Weil divisor Dρ on Pd Cl(Σ) be the group of Weil divisors on Pd\\n\\nΣ module rational equivalences.\\n\\nΣ. Let\\n\\nCl(Σ)-grading, a Weil divisor D = ∑ρ∈Σ(1) uρDρ determines the monomial xu ∶= ∏ρ∈Σ(1) xuρ S and conversely deg(xu) = [D] ∈ Cl(Σ).\\n\\nThe total coordinate ring of Pd\\n\\nΣ is the polynomial ring S = C[xρ ∣ ρ ∈ Σ(1)], S has the ρ ∈\\n\\nFor a cone σ ∈ Σ, ˆσ is the set of 1-dimensional cone in Σ that are not contained in σ\\n\\nand xˆσ ∶= ∏ρ∈ˆσ xρ is the associated monomial in S.\\n\\nDefinition 2.2. The irrelevant ideal of Pd the zero locus Z(Σ) ∶= V(BΣ) in the affine space Ad ∶= Spec(S) is the irrelevant locus.\\n\\nΣ is the monomial ideal BΣ ∶=< xˆσ ∣ σ ∈ Σ > and\\n\\nProposition 2.3 (Theorem 5.1.11 [5]). The toric variety Pd Σ is a categorical quotient Ad ∖ Z(Σ) by the group Hom(Cl(Σ), C∗) and the group action is induced by the Cl(Σ)- grading of S.\\n\\n2.2 Orbifolds\\n\\nNow we give a brief introduction to complex orbifolds and we mention the needed theorems for the next section. Namely: de Rham theorem and Dolbeault theorem for complex orbifolds.\\n\\nDefinition 2.4. A complex orbifold of complex dimension d is a singular complex space whose singularities are locally isomorphic to quotient singularities Cd/G, for finite sub- groups G ⊂ Gl(d, C).\\n\\nDefinition 2.5. A differential form on a complex orbifold Z is defined locally at z ∈ Z as a G-invariant differential form on Cd where G ⊂ Gl(d, C) and Z is locally isomorphic to Cd/G around z.\\n\\nRoughly speaking the local geometry of orbifolds reduces to local G-invariant geometry. We have a complex of differential forms (A●(Z), d) and a double complex (A●,●(Z), ∂, ¯∂) of bigraded differential forms which define the de Rham and the Dolbeault cohomology groups (for a fixed p ∈ N) respectively:\\n\\nH ●\\n\\ndR(Z, C) ∶=\\n\\nker d im d\\n\\nand H p,●(Z, ¯∂) ∶=\\n\\nker ¯∂ im ¯∂\\n\\nTheorem 2.6 (Theorem 3.4.4 in [4] and Theorem 1.2 in [1] ). Let Z be a compact complex orbifold. There are natural isomorphisms:\\n\\n3\\n\\ndR(Z, C) ≃ H ●(Z, C)\\n\\nH p,●(Z, ¯∂) ≃ H ●(X, Ωp Z )\\n\\nH p,●(Z, ¯∂) ≃ H ●(X, Ωp Z )\\n\\n3\\n\\n(1,1)-Lefschetz theorem for projective toric orbifolds\\n\\nDefinition 3.1. A subvariety X ⊂ Pd Z(Σ).\\n\\nΣ is quasi-smooth if V(IX ) ⊂ A#Σ(1) is smooth outside\\n\\nExample 3.2. Quasi-smooth hypersurfaces or more generally quasi-smooth intersection sub- varieties are quasi-smooth subvarieties (see [2] or [7] for more details).\\n\\n△\\n\\nRemark 3.3. Quasi-smooth subvarieties are suborbifolds of Pd Σ in the sense of Satake in [8]. Intuitively speaking they are subvarieties whose only singularities come from the ambient space.\\n\\n△\\n\\nTheorem 3.4. Let X ⊂ Pd class λ ∈ H 1,1(X) ∩ H 2(X, Z) is algebraic\\n\\nΣ be a quasi-smooth subvariety. Then every (1, 1)-cohomology\\n\\nProof. From the exponential short exact sequence\\n\\n0 → Z → OX → O∗ X\\n\\n→ 0\\n\\nwe have a long exact sequence in cohomology\\n\\nH 1(O∗\\n\\nX ) → H 2(X, Z) → H 2(OX ) ≃ H 0,2(X)\\n\\nwhere the last isomorphisms is due to Steenbrink in [9]. Now, it is enough to prove the commutativity of the next diagram\\n\\nH 2(X, Z)\\n\\nH 2(X, OX )\\n\\nH 2(X, C)\\n\\n≃ Dolbeault\\n\\nde Rham ≃\\n\\n(cid:15)\\n\\n(cid:15)\\n\\nH 2\\n\\ndR(X, C)\\n\\n/ H 0,2\\n\\n¯∂ (X)\\n\\n4\\n\\nThe key points are the de Rham and Dolbeault’s isomorphisms for orbifolds. The rest\\n\\nof the proof follows as the (1, 1)-Lefschetz theorem in [6].\\n\\nRemark 3.5. For k = 1 and Pd Lefschetz theorem.\\n\\nΣ as the projective space, we recover the classical (1, 1)-\\n\\n△\\n\\nBy the Hard Lefschetz Theorem for projective orbifolds (see [11] for details) we get an\\n\\nisomorphism of cohomologies :\\n\\nH ●(X, Q) ≃ H 2 dim X−●(X, Q)\\n\\ngiven by the Lefschetz morphism and since it is a morphism of Hodge structures, we have:\\n\\nH 1,1(X, Q) ≃ H dim X−1,dim X−1(X, Q)\\n\\nFor X as before:\\n\\nCorollary 3.6. If the dimension of X is 1, 2 or 3. The Hodge conjecture holds on X.\\n\\nProof. If the dimCX = 1 the result is clear by the Hard Lefschetz theorem for projective orbifolds. The dimension 2 and 3 cases are covered by Theorem 3.5 and the Hard Lefschetz. theorem.\\n\\n4 Cayley trick and Cayley proposition\\n\\nThe Cayley trick is a way to associate to a quasi-smooth intersection subvariety a quasi- smooth hypersurface. Let L1, . . . , Ls be line bundles on Pd Σ be the projective space bundle associated to the vector bundle E = L1 ⊕ ⋯ ⊕ Ls. It is known that P(E) is a (d + s − 1)-dimensional simplicial toric variety whose fan depends on the degrees of the line bundles and the fan Σ. Furthermore, if the Cox ring, without considering the grading, of Pd\\n\\nΣ is C[x1, . . . , xm] then the Cox ring of P(E) is\\n\\nΣ and let π ∶ P(E) → Pd\\n\\nC[x1, . . . , xm, y1, . . . , ys]\\n\\nMoreover for X a quasi-smooth intersection subvariety cut off by f1, . . . , fs with deg(fi) = [Li] we relate the hypersurface Y cut off by F = y1f1 + ⋅ ⋅ ⋅ + ysfs which turns out to be quasi-smooth. For more details see Section 2 in [7].\\n\\n5\\n\\nWe will denote P(E) as Pd+s−1\\n\\nΣ,X to keep track of its relation with X and Pd Σ.\\n\\nThe following is a key remark.\\n\\nRemark 4.1. There is a morphism ι ∶ X → Y ⊂ Pd+s−1 with y ≠ 0 has a preimage. Hence for any subvariety W = V(IW ) ⊂ X ⊂ Pd W ′ ⊂ Y ⊂ Pd+s−1 Σ,X such that π(W ′) = W , i.e., W ′ = {z = (x, y) ∣ x ∈ W }.\\n\\nΣ,X . Moreover every point z ∶= (x, y) ∈ Y Σ there exists\\n\\n△\\n\\nFor X ⊂ Pd\\n\\nΣ a quasi-smooth intersection variety the morphism in cohomology induced\\n\\nby the inclusion i∗ ∶ H d−s(Pd\\n\\nΣ, C) → H d−s(X, C) is injective by Proposition 1.4 in [7].\\n\\nDefinition 4.2. The primitive cohomology of H d−s and H d−s prim(X, Q) with rational coefficients.\\n\\nprim(X) is the quotient H d−s(X, C)/i∗(H d−s(Pd\\n\\nΣ, C))\\n\\nH d−s(Pd\\n\\nΣ, C) and H d−s(X, C) have pure Hodge structures, and the morphism i∗ is com-\\n\\npatible with them, so that H d−s\\n\\nprim(X) gets a pure Hodge structure.\\n\\nThe next Proposition is the Cayley proposition.\\n\\nProposition 4.3. [Proposition 2.3 in [3] ] Let X = X1 ∩⋅ ⋅ ⋅∩Xs be a quasi-smooth intersec- , d+s−3 tion subvariety in Pd 2\\n\\nΣ cut off by homogeneous polynomials f1 . . . fs. Then for p ≠ d+s−1\\n\\n2\\n\\nH p−1,d+s−1−p\\n\\nprim\\n\\n(Y ) ≃ H p−s,d−p\\n\\nprim (X).\\n\\nCorollary 4.4. If d + s = 2(k + 1),\\n\\nH k+1−s,k+1−s\\n\\nprim\\n\\n(X) ≃ H k,k\\n\\nprim(Y )\\n\\nRemark 4.5. The above isomorphisms are also true with rational coefficients since H ●(X, C) = H ●(X, Q) ⊗Q C. See the beginning of Section 7.1 in [10] for more details.\\n\\n△\\n\\n5 Main result\\n\\nTheorem 5.1. Let Y = {F = y1f1 + ⋯ + ykfk = 0} ⊂ P2k+1 associated to the quasi-smooth intersection surface X = Xf1 ∩ ⋅ ⋅ ⋅ ∩ Xfk ⊂ Pk+2 the Hodge conjecture holds.\\n\\nΣ,X be the quasi-smooth hypersurface Σ . Then on Y\\n\\nProof. If H k,k proposition H k,k\\n\\nprim(X, Q) = 0 we are done. So let us assume H k,k\\n\\nprim(Y, Q) ≃ H 1,1\\n\\nprim(X, Q) ≠ 0. By the Cayley prim(X, Q) and by the (1, 1)-Lefschetz theorem for projective\\n\\n6\\n\\nprim(X, Q), that is, there are n ∶= h1,1\\n\\ntoric orbifolds there is a non-zero algebraic basis λC1, . . . , λCn with rational coefficients of H 1,1 prim(X, Q) algebraic curves C1, . . . , Cn in X such that under the Poincar´e duality the class in homology [Ci] goes to λCi, [Ci] ↦ λCi. Recall that the Cox ring of Pk+2 is contained in the Cox ring of P2k+1 Σ,X without considering the Σ ) then (α, 0) ∈ Cl(P2k+1 grading. Considering the grading we have that if α ∈ Cl(Pk+2 Σ,X ). So the polynomials defining Ci ⊂ Pk+2 X,Σ but with different degree. Moreover, by Remark 4.1 each Ci is contained in Y = {F = y1f1 + ⋯ + ykfk = 0} and furthermore it has codimension k.\\n\\nΣ\\n\\ncan be interpreted in P2k+1\\n\\nClaim: {λCi}n\\n\\ni=1 is a basis of H k,k It is enough to prove that λCi is different from zero in H k,k prim(Y, Q) or equivalently that the cohomology classes {λCi}n i=1 do not come from the ambient space. By contradiction, let us assume that there exists a j and C ⊂ P2k+1 Σ,X , Q) with i∗(λC) = λCj or in terms of homology there exists a (k + 2)-dimensional algebraic subvariety V ⊂ P2k+1 Σ,X such that V ∩ Y = Cj so they are equal as a homology class of P2k+1 Σ,X ,i.e., [V ∩ Y ] = [Cj] . Σ where π ∶ (x, y) ↦ x. Hence It is easy to check that π(V ) ∩ X = Cj as a subvariety of Pk+2 [π(V ) ∩ X] = [Cj] which is equivalent to say that λCj comes from Pk+2 Σ which contradicts the choice of [Cj].\\n\\nprim(Y, Q).\\n\\nΣ,X such that λC ∈ H k,k(P2k+1\\n\\nRemark 5.2. Into the proof of the previous theorem, the key fact was that on X the Hodge conjecture holds and we translate it to Y by contradiction. So, using an analogous argument we have:\\n\\n△\\n\\nProposition 5.3. Let Y = {F = y1fs+⋯+ysfs = 0} ⊂ P2k+1 associated to a quasi-smooth intersection subvariety X = Xf1 ∩ ⋅ ⋅ ⋅ ∩ Xfs ⊂ Pd d + s = 2(k + 1). If the Hodge conjecture holds on X then it holds as well on Y .\\n\\nΣ,X be the quasi-smooth hypersurface Σ such that\\n\\nCorollary 5.4. If the dimension of Y is 2s − 1, 2s or 2s + 1 then the Hodge conjecture holds on Y .\\n\\nProof. By Proposition 5.3 and Corollary 3.6.\\n\\n7\\n\\nReferences\\n\\n[1] Angella, D. Cohomologies of certain orbifolds. Journal of Geometry and Physics\\n\\n71 (2013), 117–126.\\n\\n[2] Batyrev, V. V., and Cox, D. A. On the Hodge structure of projective hypersur-\\n\\nfaces in toric varieties. Duke Mathematical Journal 75, 2 (Aug 1994).\\n\\n[3] Bruzzo, U., and Montoya, W. On the Hodge conjecture for quasi-smooth in- tersections in toric varieties. S˜ao Paulo J. Math. Sci. Special Section: Geometry in Algebra and Algebra in Geometry (2021).\\n\\n[4] Caramello Jr, F. C. Introduction to orbifolds. arXiv:1909.08699v6 (2019).\\n\\n[5] Cox, D., Little, J., and Schenck, H. Toric varieties, vol. 124. American Math-\\n\\nematical Soc., 2011.\\n\\n[6] Griffiths, P., and Harris, J. Principles of Algebraic Geometry. John Wiley &\\n\\nSons, Ltd, 1978.\\n\\n[7] Mavlyutov, A. R. Cohomology of complete intersections in toric varieties. Pub-\\n\\nlished in Pacific J. of Math. 191 No. 1 (1999), 133–144.\\n\\n[8] Satake, I. On a Generalization of the Notion of Manifold. Proceedings of the National Academy of Sciences of the United States of America 42, 6 (1956), 359–363.\\n\\n[9] Steenbrink, J. H. M. Intersection form for quasi-homogeneous singularities. Com-\\n\\npositio Mathematica 34, 2 (1977), 211–223.\\n\\n[10] Voisin, C. Hodge Theory and Complex Algebraic Geometry I, vol. 1 of Cambridge\\n\\nStudies in Advanced Mathematics. Cambridge University Press, 2002.\\n\\n[11] Wang, Z. Z., and Zaffran, D. A remark on the Hard Lefschetz theorem for K¨ahler orbifolds. Proceedings of the American Mathematical Society 137, 08 (Aug 2009).\\n\\n8', metadata={'source': '/tmp/tmpbq6gfb13/tmp.pdf'})" | |
] | |
}, | |
"metadata": {}, | |
"execution_count": 38 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### Now let's split the text to make our data more manageable\n", | |
"\n", | |
"We'll use a chunk size of 512 and set `chunk_overlap` to 0." | |
], | |
"metadata": { | |
"id": "abBCCQxvK7YR" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Split documents\n", | |
"from langchain.text_splitter import RecursiveCharacterTextSplitter\n", | |
"from langchain.vectorstores.utils import filter_complex_metadata\n", | |
"\n", | |
"text_splitter = RecursiveCharacterTextSplitter(chunk_size = 512, chunk_overlap = 0)\n", | |
"\n", | |
"splits = text_splitter.split_documents(docs)\n", | |
"\n", | |
"# Filter the splits' complex metadata\n", | |
"filtered_splits = filter_complex_metadata(documents=splits)" | |
], | |
"metadata": { | |
"id": "RFfdX46qENNr" | |
}, | |
"execution_count": 39, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### We're all set to begin working with embeddings to" | |
], | |
"metadata": { | |
"id": "wZBxkw99VV8g" | |
} | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Before we can retrieve embeddings from OpenAI, we'll need an API key, and to instantiate it as a variable.\n", | |
"\n", | |
"Upload yours to the current working directory, '/content/'." | |
], | |
"metadata": { | |
"id": "mQcwVVQ0LenY" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from dotenv import load_dotenv\n", | |
"import os\n", | |
"from langchain.embeddings import OpenAIEmbeddings\n", | |
"\n", | |
"load_dotenv(dotenv_path=\"./.env\")\n", | |
"\n", | |
"# Retrieve the OPENAI_API_KEY from environment variables\n", | |
"OPENAI_API_KEY = os.environ.get(\"OPENAI_API_KEY\")\n", | |
"\n", | |
"# Check if the API key is present\n", | |
"if OPENAI_API_KEY is None:\n", | |
" raise ValueError(\"OPENAI_API_KEY not found in the environment.\")\n", | |
"else:\n", | |
" print(\"OPENAI_API_KEY successfully loaded.\")" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/" | |
}, | |
"id": "8yR85v2hLtah", | |
"outputId": "8f11746d-2179-4f70-a54d-705820305582" | |
}, | |
"execution_count": 31, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"name": "stdout", | |
"text": [ | |
"OPENAI_API_KEY successfully loaded.\n" | |
] | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"### Setting up a database\n", | |
"\n", | |
"Next we'll retrieve embeddings from `text-embeddings-ada-002` and store them in a local Chroma vectorstore." | |
], | |
"metadata": { | |
"id": "loCrhAacV502" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"# Embed and store splits\n", | |
"from langchain.embeddings import OpenAIEmbeddings\n", | |
"from langchain.llms import OpenAI\n", | |
"from langchain.vectorstores import Chroma\n", | |
"\n", | |
"embeddings = OpenAIEmbeddings()\n", | |
"\n", | |
"vectorstore = Chroma.from_documents(documents=filtered_splits,embedding=embeddings)\n", | |
"\n", | |
"retriever = vectorstore.as_retriever()" | |
], | |
"metadata": { | |
"id": "ebOUDQ3vEgqh" | |
}, | |
"execution_count": 40, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"Finally we can query the document(s)!" | |
], | |
"metadata": { | |
"id": "iqFswCI2Y1i-" | |
} | |
}, | |
{ | |
"cell_type": "code", | |
"source": [ | |
"from langchain.chains import RetrievalQA\n", | |
"\n", | |
"qa = RetrievalQA.from_chain_type(llm=OpenAI(), chain_type=\"stuff\", retriever=retriever)\n", | |
"\n", | |
"query = \"\"\"Does the Cayley trick only apply to quasi-smooth intersection subvarieties, \\\n", | |
"is the (1, 1)-Lefschetz theorem exclusively for projective toric orbifolds, and does \\\n", | |
"the Hodge conjecture hold for all types of orbifolds regardless of their dimensions?\"\"\"\n", | |
"\n", | |
"qa.run(query)" | |
], | |
"metadata": { | |
"colab": { | |
"base_uri": "https://localhost:8080/", | |
"height": 53 | |
}, | |
"id": "6TVbWrlCY58J", | |
"outputId": "979f8d37-91d4-496d-b953-257311156cdb" | |
}, | |
"execution_count": 42, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"data": { | |
"text/plain": [ | |
"' No, the Cayley trick does not only apply to quasi-smooth intersection subvarieties. The (1, 1)-Lefschetz theorem is exclusively for projective toric orbifolds, and the Hodge conjecture holds for certain types of orbifolds depending on their dimensions. Specifically, if the dimension of the orbifold is 1, 2 or 3, the Hodge conjecture holds on it.'" | |
], | |
"application/vnd.google.colaboratory.intrinsic+json": { | |
"type": "string" | |
} | |
}, | |
"metadata": {}, | |
"execution_count": 42 | |
} | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"source": [ | |
"* While I can't speak to the truth of the AI's response, it definitely made use of our local document. That's awesome!" | |
], | |
"metadata": { | |
"id": "tncfvqM1a0dF" | |
} | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment