theatom06/Notebook.README.md

Last active October 23, 2025 12:26

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/theatom06/b68bbbf5ddd047f4480c2a79d75786e1.js"></script>
Save theatom06/b68bbbf5ddd047f4480c2a79d75786e1 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

Notebook.README.md

Project: Notebook

Overview

Notebook is a minimal, trust-driven platform for students in grades 8–12 to share study materials—organized by board, grade, and subject. It enables uploads of notes (PDFs, PPTs), question papers, and worksheets. Students can vote, comment, report, and build a trust score similar to Reddit karma. A clean black-and-white interface, realtime updates, and offline access through PWA make it lightweight but powerful.

Core Concept

A peer-driven learning space where credibility rises naturally through trust, not algorithms. Every upload, vote, and report shapes the reputation ecosystem.

Architecture

Stack:

Frontend: Vite + Tailwind + PWA
Backend: Bun.js on server
Storage: JSON files (state + users) and filesystem for uploaded content

Folder Structure

notebook/
├── server/
│   ├── index.ts                 # Bun server entry
│   ├── routes/
│   │   ├── upload.ts            # file uploads
│   │   ├── feed.ts              # fetch feeds
│   │   ├── vote.ts              # upvote/downvote
│   │   ├── report.ts            # moderation logic
│   │   ├── user.ts              # profiles and leaderboard
│   ├── db/
│   │   ├── recent.json          # maintains list of recent posts ranked 
|   |   ├── posts/
|   |       └── uuid.json        # stores information of each post
│   │   └── users/
|   |       └── uuid.json        # stores information of each user
│   ├── uploads/ 
|   |       └── uuid.json        # stored PDFs/PPTs by board/grade/subject
│   └── utils/
│       ├── trust.ts             # trust logic
│       └── id.ts                # UUID generator
│
├── client/
│   ├── index.html
│   ├── main.tsx
│   ├── App.tsx
│   ├── components/ (Feed, UploadForm, NoteCard, Profile, Leaderboard, CommentSection)
│   ├── pages/ (Home, FeedPage, UploadPage)
│   ├── service-worker.js        # caching logic
│   └── manifest.json            # PWA manifest

Data Models

posts/uuid.json:

{
  "title": "Electromagnetism Notes",
  "board": "CBSE",
  "grade": "11",
  "bloomData": bloomData          #blooms helps store liked users data
  "subject": "Physics",
  "filename": "uuid.pdf",
  "uploader": "user123",
  "trust": 35,
  "votes": { "up": 12, "down": 2 },
  "reports": 1,
  "timestamp": 1697000000,
  "comments": []
}

users/uuid.json:

{
  "username": "user123",
  "trust": 85,
  "uploads": 10,
  "reports": 0
}

API Endpoints

Endpoint	Method	Purpose
`/upload`	POST	Upload file + metadata
`/feed/:board/:grade/:subject`	GET	Fetch feed sorted by trust/time
`/vote/:id`	POST	Upvote/downvote note
`/report/:id`	POST	Report content (auto-deletes after N reports)
`/user/:username`	GET	Get profile
`/leaderboard`	GET	Fetch top trusted users

Moderation Logic

if (note.reports >= min(5, 0.1 * note.iews)) {
  removeFile(note.filename);
  reportingUser.reported += 1;
  user.trust -= 10;
}

Frontend Structure

Pages: Feed, Profile, Leaderboard, Dcuments

Feed: Lazy load N new posts based on trust score and likes/views Documents: downloaded and stared ones with modal to upload Profile: check own profile and others Leaderboard: Check others rank top 10 and yours and +/-3 from u

Core Components:

NoteCard: shows note title, votes, trust
UploadForm: handles metadata + file upload
CommentSection: threaded discussion
Leaderboard: global trust ranking
Profile: user uploads and stats

UI / UX

Theme: Black and white monochrome, grayscale shadows
Typography: Sans (Inter) + Mono (JetBrains Mono)
Icons: Thin-line Lucide or Tabler icons
Layout: Minimal, 3–4 pages; feed-focused UI

Features

Core: Upload, download, trust score, vote, report, profile, feed Extras: Comments, leaderboard, realtime feed, search bar Advanced: Offline mode, export, auto-trust decay

Realtime Updates

import { WebSocketServer } from "ws";
const wss = new WebSocketServer({ port: 8081 });
function broadcastUpdate(note) {
  wss.clients.forEach(c => c.send(JSON.stringify({ type: "new", note })));
}

Frontend listens for new uploads and injects into the live feed.

PWA & Offline Mode

Cache static assets and last-viewed feeds.
IndexedDB/localStorage for offline reading.
Manifest for installable app experience.
Banner indicator for offline state.

Hackathon Build Plan

Day 1:

Setup Bun server and routes (upload/feed/vote/report)
Create JSON storage system
Build core UI (Feed + Upload)
Implement trust/voting logic

Day 2:

Add comments, profiles, leaderboard
Integrate WebSocket realtime feed
Add offline caching + final polish
Prepare final demo pitch

Pitch Summary

Notebook is a peer-to-peer academic sharing space where trust, not algorithms, decides visibility. Students upload, vote, and collaborate on notes across boards, grades, and subjects. It’s fast, minimal, works offline, and rewards credible contribution.

triadwoozie commented Oct 23, 2025 •

edited

Loading

RAG

The best approach is to build the AI as a separate Python microservice. Your main Bun.js server will remain the lightweight "frontend" server, which will then make internal HTTP requests (acting as a proxy) to this new Python AI service.

This keeps your stacks separate: Bun.js handles fast I/O and web logic, while Python handles the heavy compute of AI/ML.

Here are the two main AI features we'll design:

AI Q&A Assistant (The RAG Pipeline):
- Purpose: Allows students to ask questions ("Explain electromagnetism," "What are the key themes in Chapter 3?") and get answers based specifically on the uploaded notes and question papers.
- Tech:
  - Retrieval: all-mpnet-base-v2 will be used to embed (vectorize) all uploaded documents (PDFs, PPTs) into a vector database (like ChromaDB, which is file-system-based and fits your "minimal" ethos).
  - Generation: A student's question is embedded. We find the most relevant document chunks from the vector DB. These chunks (the "context") and the original question are sent to Gemini 12B (via Ollama) to generate a helpful, context-aware answer.
AI Question Paper Predictor (Hybrid ML + LLM Model):
- Purpose: Generates a "predicted," new practice paper for a specific board, grade, and subject.
- Tech: This is a hybrid approach.
  - Analyzer (Classic ML): We'll use TF-IDF and K-Means Clustering to analyze the existing corpus of question papers.
  - Generator (LLM): We'll use Gemini 12B (via Ollama) to author a new paper based on the ML model's analysis.
- How it works:
  1. All questions from all papers are extracted and stored.
  2. We apply TF-IDF + K-Means to this corpus to cluster questions into topics (e.g., "Topic 1: Kinematics," "Topic 2: Optics").
  3. When a user requests a "predicted paper," the service:
    - Filters all known questions by board/grade/subject.
    - Uses the ML model to find the most important topics and a set of example questions from those topics.
    - Feeds this analysis (e.g., "Key Topics: Kinematics, Optics. Example Kinematics Question: ...") as rich context to Gemini 12B.
    - Prompts the LLM to generate a new, original practice paper (with a specific number of questions) based on these topics and examples. This is an intensive process and can take up to a minute.

📁 AI Service: Filesystem Structure

You will create a new folder next to your client and server folders, called ai_service.

notebook/
├── server/
│   ├── index.ts
│   ├── routes/
│   │   ├── ... (existing routes)
│   │   └── ai.ts          # <--- NEW: Proxy for AI service
│   └── ... (rest of server)
│
├── client/
│   ├── ... (existing client files)
│   ├── components/
│   │   ├── ... (existing components)
│   │   ├── AiChatBot.js   # <--- NEW: UI for Q&A
│   │   └── PaperGen.js    # <--- NEW: UI for paper generator
│
└── ai_service/            # <--- NEW: Python AI Microservice
    ├── main.py            # FastAPI server entry point
    ├── requirements.txt   # Python dependencies
    ├── config.py          # Configuration (model names, paths)
    ├── data/              # Persistent data for AI
    │   ├── vector_store/  # ChromaDB vector database files
    │   └── ml_models/     # Saved TF-IDF/K-Means models (as .pkl)
    │
    ├── rag_pipeline/
    │   ├── __init__.py
    │   ├── embedder.py    # Loads all-mpnet-base-v2
    │   ├── text_extractor.py # Logic to read PDFs and PPTs
    │   ├── vector_db.py   # Manages ChromaDB (add, query)
    │   └── generator.py   # Interfaces with Ollama (Gemini 12B)
    │
    ├── ml_models/
    │   ├── __init__.py
    │   ├── topic_modeler.py # TF-IDF & K-Means logic
    │   └── question_db.py   # Manages a simple DB of questions
    │
    └── processing/
        ├── __init__.py
        └── indexer.py     # Main script to read, process, & index files

📚 Incredibly Detailed Documentation: `ai_service`

1. Overview & Purpose

The ai_service is a self-contained Python microservice built with FastAPI. Its sole purpose is to handle all computationally expensive AI and ML tasks. It communicates only with the main notebook/server (Bun.js) via a local HTTP API. It never talks directly to the end-user's client.

Python Stack: FastAPI, Ollama, Sentence-Transformers, ChromaDB, Scikit-learn, PyPDF2, python-pptx.

2. Setup & Installation

ai_service/requirements.txt:

fastapi
uvicorn[standard]
ollama
sentence-transformers
chromadb
scikit-learn
pypdf
python-pptx
numpy

Installation Steps:

Install Python & Pip: Ensure you have Python 3.9+ installed.

Install Dependencies:

cd notebook/ai_service
pip install -r requirements.txt

Install & Run Ollama:

Follow the official Ollama setup instructions.

Pull the required models:

ollama pull all-mpnet-base-v2  # For embeddings
ollama pull gemini:12b-qat       # Or whatever tag you use for Gemini

Ensure the Ollama server is running.

3. Core Component Deep-Dive

`processing/indexer.py` (The "Data Pipeline")

This is the most critical offline script. It must be run to build the AI's knowledge.

Purpose: To find all uploaded documents, extract their text, process them for RAG and ML, and store the results.
Workflow:
1. Scan ../server/db/posts/: Reads all uuid.json files to get a list of all posts.
2. Filter for Documents: It looks for posts that are "question papers" or "notes" (PDF/PPT).
3. For each document:
  - Get Metadata: Reads the board, grade, subject, and filename from the JSON.
  - Get File Path: Constructs the path (e.g., ../server/uploads/[filename]).
  - Extract Text (using rag_pipeline/text_extractor.py):
    - If .pdf, use PyPDF2 to read text page by page.
    - If .pptx, use python-pptx to read text from slides.
  - Chunk Text: Splits the full text into smaller, overlapping chunks (e.g., 500 characters per chunk with 100 overlap).
  - Store for RAG (using rag_pipeline/vector_db.py):
    - Each chunk is embedded using all-mpnet-base-v2.
    - The resulting vector is stored in ChromaDB (data/vector_store/) along with its metadata: { "text": chunk_text, "source_file": filename, "board": board, "grade": grade }.
  - Store for ML (using ml_models/question_db.py):
    - A simpler regex/heuristic identifies "questions" (e.g., lines ending in "?", or starting with "Q.").
    - These questions are stored raw in a separate simple JSON file or SQLite DB (ml_models/question_db.json) for the topic modeler.
4. Train ML Model (using ml_models/topic_modeler.py):
  - After processing all questions, it loads the entire question bank.
  - It fits a TfidfVectorizer on the text and saves the vectorizer to data/ml_models/tfidf.pkl.
  - It then fits a KMeans model (e.g., n_clusters=50) on the TF-IDF vectors and saves the model to data/ml_models/kmeans.pkl.

How to run it:
You must run this script the first time you set up the service.

cd notebook/ai_service
python processing/indexer.py

`rag_pipeline/` (The Q&A System)

embedder.py:
- Loads the all-mpnet-base-v2 model using sentence_transformers.
- Provides a simple function: def get_embedding(text: str) -> List[float]: ...
vector_db.py:
- Initializes the ChromaDB client: client = chromadb.PersistentClient(path="./data/vector_store").
- Gets or creates a collection: collection = client.get_or_create_collection("notebook_docs").
- def add_documents(chunks: List[str], metadatas: List[dict], ids: List[str]): ... (used by indexer.py).
- def query_db(query_text: str, n_results=5, filter_dict={}) -> List[str]: ...
  1. Gets the embedding for query_text from embedder.py.
  2. Queries the collection: results = collection.query(query_embeddings=[embedding], n_results=n_results, where=filter_dict).
  3. Returns the text from the documents in the results.

generator.py:

Interfaces with Ollama.
def get_ai_response(question: str, context: List[str]) -> str:
1. Initializes the Ollama client: client = ollama.Client().
2. Builds the RAG prompt:
```
You are a helpful study assistant...
CONTEXT:
---
[context[0]]
---
QUESTION:
[question]
ANSWER:
```
3. Calls the model: response = client.chat(model='gemini:12b-qat', messages=[...]).
4. Returns response['message']['content'].

def generate_paper_from_context(analysis_data: dict, num_questions: int) -> str:

Initializes Ollama client.

Builds the detailed paper generation prompt:

You are an expert curriculum developer and high school teacher.
Your task is to generate a new, high-quality practice question paper.

Use the following analysis of past papers as your guide.

Board: [analysis_data.board]
Grade: [analysis_data.grade]
Subject: [analysis_data.subject]

Key Topics & Examples:
---
Topic 1:
Keywords: [analysis_data.top_topics[0].keywords]
Example Questions:
- [analysis_data.top_topics[0].examples[0]]
- [analysis_data.top_topics[0].examples[1]]
---
... (repeat for all top topics) ...

INSTRUCTIONS:
1.  Generate a new practice paper with exactly [num_questions] questions.
2.  The questions must be *new* and *original*, but inspired by the topics, keywords, and example questions provided.
3.  Do NOT just copy the example questions.
4.  Ensure the paper covers the identified topics in a balanced way.
5.  Format the output clearly (e.g., "Q1: ...", "Q2: ...").

NEW PRACTICE PAPER:

Calls the model: response = client.chat(model='gemini:12b-qat', messages=[...]).
- Note: This call will be slow (up to 1 minute). The Ollama client and FastAPI server must be configured to handle this long timeout.
Returns response['message']['content'] (the full paper as a single string).

`ml_models/` (The Paper Predictor's Analyzer)

topic_modeler.py:
- def load_models(): ...
  - Loads tfidf.pkl and kmeans.pkl from data/ml_models/ using joblib or pickle.
- def get_topic_analysis(board: str, grade: str, subject: str) -> dict:
  1. Loads the pre-trained TF-IDF and K-Means models.
  2. Loads all questions for the matching board, grade, and subject from question_db.py.
  3. Transforms these filtered questions using the loaded TF-IDF vectorizer.
  4. Uses the loaded K-Means model to predict the topic (cluster label) for each question.
  5. Counts the most frequent topics (e.g., Top 5 topics).
  6. For each top topic, finds 2-3 representative questions (e.g., those closest to the cluster center).
  7. Returns a dictionary structure (the analysis_data for the LLM):
```
{
  "board": "CBSE",
  "grade": "12",
  "subject": "Physics",
  "top_topics": [
    { "topic_id": 3, "keywords": "kinematics, velocity", "examples": ["Q: ...", "Q: ..."] },
    { "topic_id": 7, "keywords": "optics, lens, mirror", "examples": ["Q: ..."] }
  ]
}
```

theatom06/Notebook.README.md

Select an option

No results found

Select an option

No results found

Project: Notebook

Overview

Core Concept

Architecture

Folder Structure

Data Models

API Endpoints

Moderation Logic

Frontend Structure

UI / UX

Features

Realtime Updates

PWA & Offline Mode

Hackathon Build Plan

Pitch Summary

triadwoozie commented Oct 23, 2025 •

edited

Loading

Uh oh!

theatom06/Notebook.README.md

Project: Notebook

Overview

Core Concept

Architecture

Folder Structure

Data Models

API Endpoints

Moderation Logic

Frontend Structure

UI / UX

Features

Realtime Updates

PWA & Offline Mode

Hackathon Build Plan

Pitch Summary

triadwoozie commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RAG

📁 AI Service: Filesystem Structure

📚 Incredibly Detailed Documentation: ai_service

1. Overview & Purpose

2. Setup & Installation

3. Core Component Deep-Dive

processing/indexer.py (The "Data Pipeline")

rag_pipeline/ (The Q&A System)

ml_models/ (The Paper Predictor's Analyzer)

Uh oh!

triadwoozie commented Oct 23, 2025 •

edited

Loading

📚 Incredibly Detailed Documentation: `ai_service`

`processing/indexer.py` (The "Data Pipeline")

`rag_pipeline/` (The Q&A System)

`ml_models/` (The Paper Predictor's Analyzer)