Skip to content

Instantly share code, notes, and snippets.

@secemp9
Created January 8, 2025 01:13
Show Gist options
  • Save secemp9/a0e6fb5ad40c11d5bc864cf0118df4bf to your computer and use it in GitHub Desktop.
Save secemp9/a0e6fb5ad40c11d5bc864cf0118df4bf to your computer and use it in GitHub Desktop.
ollama sliding window retrieval with llm classification
import textwrap
from ollama import chat
# ---------------------------
# 1) Setup: Long text & question
# ---------------------------
# ~ LONG_TEXT = """
# ~ Coffee, one of the world’s most beloved beverages, has a storied history that dates back centuries. Legend has it that an Ethiopian goat herder named Kaldi first noticed his goats becoming particularly energetic after nibbling on certain red berries. This discovery eventually led to the cultivation of coffee beans in the Middle East, where Sufi monks used the brew to stay alert during nighttime prayers. Over time, the practice of preparing and drinking coffee spread across the Islamic world and into Europe, sparking a cultural revolution of cafés and intellectual discourse.
# ~ By the 17th century, coffee houses in places like London, Paris, and Vienna had become epicenters of debate, conversation, and commerce. These early coffee houses attracted scientists, merchants, and artists eager to exchange ideas. Because coffee sharpened focus and kept patrons awake, it was linked with creative thinking and vibrant intellectual communities. In fact, some historians argue that Europe’s coffee house culture contributed to scientific discoveries, political movements, and the birth of newspaper reading among the middle class.
# ~ As demand soared, global trade routes were established to support the coffee industry. Colonists in the Caribbean and South America began cultivating coffee plants, radically altering agricultural practices in regions with suitable climates. Brazil quickly rose to become the world’s largest coffee producer by the mid-19th century, a title it still holds today. Coffee production required large estates, known as fazendas in Brazil, which brought about significant changes to local economies and labor systems.
# ~ Technological advancements in the 20th century brought changes to every stage of coffee’s lifecycle. Mechanized farming methods improved yields, while innovations like vacuum-packaging allowed for coffee’s broader distribution. Instant coffee, patented in 1901, became extremely popular after World War II when convenience was a premium. These developments meant that coffee was no longer a luxury good but a daily fixture for many, especially in countries like the United States.
# ~ The latter half of the 20th century saw the emergence of specialty coffee movements. Coffee connoisseurs began paying closer attention to bean origin, roasting methods, and brewing techniques. Independent cafés offered single-origin selections and artisanal brewing equipment, introducing consumers to latte art and concepts like fair trade. By the early 2000s, coffee culture had become deeply intertwined with global social life, as seen in the proliferation of coffee chains and local roasters alike.
# ~ Today, coffee is grown in more than 50 countries, with Ethiopia, Colombia, and Vietnam among the top exporters alongside Brazil. Each region’s unique climate and soil characteristics produce beans with distinct flavor profiles. For instance, Ethiopian coffees often carry bright, floral notes, while Brazilian beans are known for their nutty, chocolatey undertones. Roasters and baristas worldwide celebrate these differences, offering tasting flights that mirror the style of wine or craft beer events.
# ~ Beyond taste, coffee’s economic and environmental impact has drawn increased scrutiny. The crop is vulnerable to climate change, as higher temperatures and erratic rainfall patterns can reduce yields or force farmers to relocate to higher altitudes. Many small-scale growers struggle to achieve a stable income in the face of fluctuating coffee prices on the global market. Fair trade and direct trade initiatives aim to improve conditions by guaranteeing higher and more stable prices to farmers who practice sustainable agriculture.
# ~ Meanwhile, the coffee shop itself has evolved into a multi-purpose space for remote workers, casual meetings, and personal leisure. Wi-Fi-equipped cafés blur the line between home and office, offering an environment for freelancers and digital nomads. Traditional coffee rituals also endure, such as the Turkish coffee ceremony or the Italian espresso bar culture. Throughout the world, coffee remains a unifying beverage that is both steeped in tradition and continuously evolving—a testament to its enduring appeal and adaptability.
# ~ """.strip()
# ~ QUESTION = "Which country is the world’s largest coffee producer?"
# ~ LONG_TEXT = '''
# ~ import os
# ~ from flask import Flask, jsonify, request, make_response
# ~ app = Flask(__name__)
# ~ # Configuration settings can be defined here or pulled from environment variables
# ~ app.config['SECRET_KEY'] = os.environ.get('SECRET_KEY', 'default_secret_key')
# ~ app.config['DEBUG'] = True
# ~ # -----------------------------------------------------------------------------------
# ~ # Utility functions
# ~ # -----------------------------------------------------------------------------------
# ~ def get_app_info():
# ~ """
# ~ Returns basic application information in a dictionary.
# ~ Typically, this could include version, build date, or other metadata.
# ~ """
# ~ return {
# ~ 'name': 'Sample Flask App',
# ~ 'version': '1.0.2',
# ~ 'description': 'This is a demo Flask application with multiple routes.'
# ~ }
# ~ def validate_user_credentials(username, password):
# ~ """
# ~ This function simulates user credential validation.
# ~ In a production system, you would query a user database,
# ~ compare hashed passwords, etc.
# ~ """
# ~ # For demonstration, we assume a single valid user: 'admin' with password 'secret'
# ~ valid_users = {
# ~ 'admin': 'secret'
# ~ }
# ~ return valid_users.get(username) == password
# ~ # -----------------------------------------------------------------------------------
# ~ # Routes
# ~ # -----------------------------------------------------------------------------------
# ~ @app.route('/healthz', methods=['GET'])
# ~ def health_check():
# ~ """
# ~ Simple route to check application health.
# ~ Returns a JSON indicating that the service is running.
# ~ """
# ~ return jsonify({'status': 'ok'}), 200
# ~ @app.route('/login', methods=['POST'])
# ~ def login():
# ~ """
# ~ Authenticates a user by checking username and password from JSON body.
# ~ Returns a 200 if valid, 401 otherwise.
# ~ """
# ~ data = request.get_json() or {}
# ~ username = data.get('username')
# ~ password = data.get('password')
# ~ if validate_user_credentials(username, password):
# ~ return jsonify({'message': f'Welcome, {username}!'}), 200
# ~ else:
# ~ return make_response(jsonify({'error': 'Invalid credentials'}), 401)
# ~ @app.route('/info', methods=['GET'])
# ~ def app_info():
# ~ """
# ~ Returns meta information about the application, such as name, version, and description.
# ~ """
# ~ info = get_app_info()
# ~ return jsonify(info), 200
# ~ # -----------------------------------------------------------------------------------
# ~ # Main entry point
# ~ # -----------------------------------------------------------------------------------
# ~ if __name__ == '__main__':
# ~ # You can set 'host' and 'port' as needed for your environment.
# ~ app.run(host='0.0.0.0', port=5000)
# ~ '''
# ~ QUESTION = "Which Flask route in this code returns the JSON data containing the application’s name, version, and description, and how does it achieve this?"
# ~ LONG_TEXT = """
# ~ Cryptography has a rich and intricate history spanning multiple centuries and cultures. Some of the earliest records date back to ancient civilizations that used simple substitution ciphers for military communication. By the 9th century, Arab polymath Al-Kindi wrote a groundbreaking treatise on cryptanalysis techniques, notably frequency analysis, which revolutionized code-breaking at the time.
# ~ In the 16th century, Giovan Battista Bellaso and later Blaise de Vigenère expanded upon polyalphabetic ciphers, creating a more resilient system than simple substitution. However, lesser-known records indicate that in 1776, a French cryptographer named Michel Tournier devised an even more complex variant, referred to as the “Tournier Cipher,” while secretly employed by the Austrian monarchy. This cipher combined multiple rotating polyalphabetic tables with a modular exponentiation concept that was ahead of its time, though it remained largely unknown outside Austrian military circles.
# ~ Fast-forward to the early 19th century: mathematicians began exploring the foundations of number theory in greater depth. Building upon Leonhard Euler’s expansions on prime factorization and modular arithmetic, a relatively obscure German mathematician, Ludwig Schumann, published a treatise in 1821 titled ‘Advanced Investigations in Modular Congruences.’ In an 1823 revision of that work, Schumann detailed a novel approach to key exchange—an idea well before its time—that relied on both parties separately computing partial modular exponentiations, then combining them to derive a shared secret. Schumann referred to this as “Reciprocal Residue Exchange,” a method lost to history for nearly a century because it was never widely circulated or translated outside of German.
# ~ In the late 19th and early 20th centuries, cryptographic research gradually advanced. However, the Tournier Cipher lay dormant in archival files until it resurfaced in a cache of documents discovered by Allied forces during World War II. In early 1942, British codebreakers at Bletchley Park stumbled upon intercepted communications from a previously undocumented Enigma variant. Cross-referencing these messages with the Austrian archives revealed that the cryptographic routines in this Enigma offshoot bore a striking resemblance to Tournier’s polyalphabetic and modular design.
# ~ Alan Turing, leading a special sub-team, realized that because the Tournier Cipher’s modular exponentiation aspects mirrored Schumann’s early key-exchange ideas, certain vulnerabilities existed when the cipher was implemented on top of standard Enigma hardware. Turing’s team tracked down Schumann’s 1823 manuscript from a German university library. They discovered that Schumann’s “Reciprocal Residue Exchange” could be inverted under certain conditions, thereby allowing cryptanalysts to break the key-combining procedure used in the Tournier-influenced Enigma machine. In June 1942, leveraging Schumann’s theoretical framework, the British codebreakers successfully deciphered a critical set of enemy communications, an achievement that significantly altered the course of the war in that theater.
# ~ After WWII, the relevance of these historical cryptographic breakthroughs became clearer. Although modern public-key cryptography would not fully emerge until the 1970s with the work of Whitfield Diffie and Martin Hellman (and later Rivest, Shamir, and Adleman), it is fascinating to note that pieces of the puzzle were foreshadowed by Euler, expanded upon by Schumann, and ironically weaponized in a short-lived Enigma adaptation nearly a century later. Today, cryptographers study these historical footnotes as reminders that visionary insights can appear well before the technology needed to implement them on a large scale.
# ~ Meanwhile, Schumann’s treatise remains a historical curiosity. It was largely overlooked in his own era, yet its influence and practical utility proved indispensable in one of the most crucial cryptographic victories of the 20th century. Historians and mathematicians alike continue to debate what might have happened had Schumann’s original monograph been translated and widely shared earlier, and whether the Tournier Cipher might have been much harder—or even impossible—to break in 1942 without his blueprint for reversing modular secrets.
# ~ """
# ~ QUESTION = """
# ~ Which mathematician from the 19th century built upon Euler’s expansions to demonstrate
# ~ an early form of key exchange, and how was it eventually used in the 20th century to break
# ~ the variant of the Enigma machine that incorporated Tournier’s cipher elements?
# ~ Please provide:
# ~ 1. The name of the mathematician,
# ~ 2. The name he gave to his method,
# ~ 3. The approximate date on which it was used to crack the Tournier-influenced Enigma.
# ~ """
# ---------------------------
# 2) Sliding window parameters
# ---------------------------
LONG_TEXT = """
Spanning thousands of years and countless civilizations, the annals of cryptography reflect a perpetual race between secrecy and revelation. From the ancient scytale of Sparta—where strips of parchment were wound around a rod of a specific diameter, creating a transposition puzzle—to the marvels of 20th-century electro-mechanical encryption, each era left behind its own cryptographic fingerprints.
Some evidence suggests that even in ancient Mesopotamia, specialized scribes encoded certain medicinal recipes on clay tablets, though the extent to which this was genuine cryptography remains debated. Egyptian temple walls occasionally show cryptic writings that may have served ceremonial rather than clandestine purposes. In the Roman Republic, Julius Caesar famously used a shift cipher to send military orders, rotating the alphabet by a few letters—child’s play by modern standards, yet a stepping stone for more advanced methods to come.
The real revolution in code-breaking, however, was spurred in the 9th century by the polymath Al-Kindi. In his treatise “A Manuscript on Deciphering Cryptographic Messages,” he formalized frequency analysis, ushering in an era in which monoalphabetic ciphers could be systematically breached. This laid the bedrock upon which future cryptographers and cryptanalysts would stand.
Fast forward to the Renaissance: Giovan Battista Bellaso and Blaise de Vigenère propelled polyalphabetic ciphers to prominence, leading many to believe such methods were effectively impregnable. Less-cited innovators pepper the historical record—like an Italian cryptographer, Giovanni Venturi, who allegedly concocted a so-called “Venturi Box Cipher” in 1592, though no verifiable manuscript remains. Meanwhile, European courts, such as that of Louis XIV in France, cultivated secret cabinets for deciphering enemy dispatches.
The Austrian monarchy, likewise, spent considerable resources in the 17th and 18th centuries to secure its communications. Amid this cloak-and-dagger atmosphere, a French cryptographer named Michel Tournier operated surreptitiously around 1776. Tournier’s “Tournier Cipher,” combining rotating polyalphabetic tables with what we would now call modular exponentiation, was far ahead of mainstream 18th-century math. But few outside Austrian military circles caught wind of it, and official documents on Tournier’s work are both sparse and partially lost.
Then came the 19th century, a time when mathematics advanced at a dazzling pace. Leonhard Euler had earlier paved the way by delving into prime factorizations and modular arithmetic, while luminaries like Gauss and Sophie Germain continued to probe the edges of number theory. Yet in 1821, an obscure German scholar named Ludwig Schumann quietly published a limited-run treatise, “Advanced Investigations in Modular Congruences,” in Munich and Vienna. The work garnered modest attention from a handful of academics, more interested in purely theoretical manipulations of congruences than in cryptographic applications.
Nonetheless, Schumann’s 1823 revision of the same text went further. He introduced a protocol for secure key exchange via partial modular exponentiations, which he dubbed “Reciprocal Residue Exchange.” By modern standards, the idea resembled a discrete-logarithm-based scheme for establishing shared secrets. But Schumann’s treatise, written in dense 19th-century academic German, never achieved wide circulation. No major cryptographic community existed to spread or test his concept, leaving his pioneering method largely forgotten.
Meanwhile, the invention of mechanical cipher devices in the late 19th and early 20th centuries opened new horizons. Arthur Scherbius’s Enigma machine emerged as a pivotal tool for German military encryption. Foreign intelligence agencies, from Poland’s Cipher Bureau to Britain’s Government Code and Cypher School at Bletchley Park, raced to keep pace. Charles Babbage’s analytical proclivities and Friedrich Kasiski’s methodical analysis of polyalphabetic ciphers foreshadowed the rigorous approaches soon to come.
The Tournier Cipher might have remained an intriguing curiosity had Allied forces not uncovered Austrian archives holding references to Tournier’s polyalphabetic-plus-modular routine. In early 1942, British codebreakers intercepted messages hinting at an unusual Enigma variant, tentatively nicknamed “T-Enigma,” believed to incorporate Tournier’s half-forgotten concepts. Alan Turing, central to cryptanalytic efforts, assembled a sub-team versed in advanced mathematics and scoured every lead on Tournier’s design. A passing remark in the Austrian documents alluded to Ludwig Schumann’s theoretical work, prompting a frantic hunt for the 1823 monograph.
In a stroke of luck, scans of Schumann’s text were procured from a German university library—one of the few places that still had a copy. Translators and mathematicians spent days poring over Schumann’s archaic German. They discovered Schumann’s scheme hinged on certain prime moduli that, if misapplied, could inadvertently allow an attacker to reverse the key exchange. Such a structural flaw turned out to parallel Tournier’s approach in the T-Enigma, effectively opening a hidden side-door into the encryption process.
By the final weeks of the year’s second quarter, Turing’s team had crafted a plan to exploit this vulnerability. After extensive trials, they succeeded in neutralizing the extra modular layer attached to the standard Enigma mechanism. By systematically collecting enough T-Enigma intercepts and applying a variant of Schumann’s “Reciprocal Residue Exchange” logic in reverse, the codebreakers pierced the encryption. This revelation triggered immediate strategic advantages: critical enemy communications, which had been presumed secure, were now legible to Allied command.
Although the Allies kept these breakthroughs under the highest classification, word of a “key exchange flaw” in an advanced Enigma variant leaked out in cryptographic circles after the war. Once Bletchley Park’s exploits were declassified decades later, historians were stunned at the confluence of Tournier’s 18th-century seed of modular exponentiation and Schumann’s 19th-century theoretical framework, finally unleashed in a 20th-century conflict. Had either Tournier’s cipher or Schumann’s treatise vanished entirely, T-Enigma might have remained unassailable at a pivotal moment.
Subsequent revelations in the 1950s and 1960s placed Ludwig Schumann on a short list of cryptographic “visionaries before their time,” albeit one overshadowed by Gauss, Euler, Germain, and the more recognized heroes of code-breaking. By the 1970s, a new generation of mathematicians (Diffie, Hellman, Rivest, Shamir, Adleman) formalized public-key cryptography in ways that echoed Schumann’s insights, though the direct lineage from his treatise to modern methods was mostly coincidental and rediscovered independently.
Today, some historians refer to the dramatic synergy between Tournier and Schumann as the “T–S Convergence,” a prime example of how fragments of advanced thinking can languish unnoticed until historical forces demand them. Another lesson is that cryptography’s progress can hinge on centuries-old manuscripts, lost for decades and rediscovered at just the right moment. Bletchley Park’s T-Enigma coup, achieved late in the first half of 1942, stands as an enduring reminder that no cipher is safe from the combined weight of mathematical foresight, archival sleuthing, and urgent necessity.
""".strip()
QUESTION = """
Which mathematician from the 19th century built upon Euler’s expansions to demonstrate
an early form of key exchange, and how was it eventually used in the 20th century to break
the variant of the Enigma machine that incorporated Tournier’s cipher elements?
Please provide:
1. The name of the mathematician,
2. The name he gave to his method,
3. The approximate date on which it was used to crack the Tournier-influenced Enigma.
"""
WINDOW_SIZE = 100 # Characters in each window
STRIDE_SIZE = 10 # How many characters to move the window each iteration
# ---------------------------
# 3) Generate windows of text (moving/sliding)
# ---------------------------
def sliding_windows(text, window_size=200, stride=100):
"""
Yields overlapping windows of 'window_size' characters,
moving by 'stride' characters each time.
"""
start_index = 0
text_length = len(text)
while start_index < text_length:
end_index = start_index + window_size
# Extract the window from the text
window_text = text[start_index:end_index]
yield window_text, start_index, end_index
# Move the start index by the stride
start_index += stride
# Collect all windows
windows = list(sliding_windows(LONG_TEXT, WINDOW_SIZE, STRIDE_SIZE))
# ---------------------------
# 4) Explain + classify each window
# ---------------------------
relevant_windows = []
for idx, (window_text, s_idx, e_idx) in enumerate(windows):
# -----------------------------------------------------
# (A) First model call: Summarize/Explain the window text
# -----------------------------------------------------
explanation_prompt = f"""
You are a helpful assistant. Please provide a concise explanation or summary
of the following text (no more than 2-3 sentences). Do not answer questions,
just explain what it says:
Text window [{s_idx}:{e_idx}]:
{window_text}
"""
explanation_stream = chat(
model="llama3.2", # Example: a model tuned for summarization
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": explanation_prompt}
],
# No num_predict limit. We want a short summary, but unconstrained tokens.
stream=True
)
explanation_output = ""
for partial_response in explanation_stream:
explanation_output += partial_response['message']['content']
explanation_output = explanation_output.strip()
print(f"\n[Window {idx}] Explanation:\n{explanation_output}\n")
# -----------------------------------------------------
# (B) Second model call: Classify the explanation with single token
# -----------------------------------------------------
classification_system_prompt = """
You are a helpful classifier. Respond with a single word:
- "yes" if the explanation is relevant to the user's question,
- "no" if it is not relevant.
No other words.
"""
classification_user_prompt = f"""
The user's question is: "{QUESTION}"
Explanation of a portion of text:
{explanation_output}
Is this relevant to answering the question?
"""
classification_stream = chat(
model="llama3.2", # Example: a model or same model but different context
messages=[
{"role": "system", "content": classification_system_prompt},
{"role": "user", "content": classification_user_prompt}
],
options={
"num_predict": 1 # Constrain to single token
},
stream=True
)
classification_output = ""
for partial_response in classification_stream:
classification_output += partial_response['message']['content']
classification_output = classification_output.strip().lower()
print(f"[Window {idx}] Classification: {classification_output}")
# If it starts with 'yes', store the original window text
if classification_output.startswith("yes"):
relevant_windows.append(window_text)
# ---------------------------
# 5) Generate final answer from relevant windows
# ---------------------------
# Combine all relevant windows into a single context
combined_context = "\n\n".join(relevant_windows)
# Only if we found any relevant windows
if not combined_context.strip():
print("\nNo relevant text found for the question. Cannot answer.")
else:
final_prompt = f"""
You have the following relevant text from a larger document:
{combined_context}
The user's question is: "{QUESTION}"
Please provide a concise answer.
"""
final_answer_stream = chat(
model="llama3.2", # A model for final Q&A
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": final_prompt}
],
# No num_predict limit here, let the model give a full answer
stream=True
)
print("\nFinal answer (streamed):")
final_answer = ""
for partial_response in final_answer_stream:
content = partial_response['message']['content']
final_answer += content
print(content, end='', flush=True)
print("\n\nStored final answer:", final_answer)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment