neerajgoel82 · May 16, 2025 21:20
diff --git a/gistfile1.txt b/gistfile1.txt
 ======================
 Models/Algorithms
 ======================

 Models for Sentence Embeddings 
 ------------------------------
 - Cross Encoders (https://www.dailydoseofds.com/bi-encoders-and-cross-encoders-for-sentence-pair-similarity-scoring-part-1/)
 - Bi-Encoders (https://www.dailydoseofds.com/bi-encoders-and-cross-encoders-for-sentence-pair-similarity-scoring-part-1/)
 - Hybrid (combination of above) (https://www.dailydoseofds.com/augsbert-bi-encoders-cross-encoders-for-sentence-pair-similarity-scoring-part-2/)

 Graph Neural Networds
 ---------------------
 - Type of Tasks 
  - Node Level Tasks (eg. identify if a user is fake in a social network)
  - Edge Level Tasks (eg. recommending someone to follow (or add as a friend) on social media)
  - Graph level tasks (eg. given a set of research papers represented as graphs, a graph neural network could classify them into different categories (e.g., medical, engineering, computer science))
 - Challenges with Graph Modelling 
  - Irregular shapes 
  - Interdependence of data points
  - Permutation invariance
 - Neural Networks 
  - Graph Convolutional Networks (GCNs)
  - Graph Attention Networks
  - GraphSAGE (short for Graph Sample and AggregatE)
 - Node Level Features 
  - In-degree
  - Out-degree
  - Total degree
  - Betweenness centrality
  - Closeness centrality
  - Eigenvector centrality

 ======================
 Model Training Process
 ======================

 Model Training
 --------------
 - Mixed Precision Training (https://blog.dailydoseofds.com/p/mixed-precision-training)
 - Gradient descent with momentum (https://blog.dailydoseofds.com/p/an-intuitive-and-visual-demonstration)
 - Gradient Checkpointing (https://blog.dailydoseofds.com/p/gradient-checkpointing-save-50-60)
 - Hyperparameter Tuning(https://www.dailydoseofds.com/tag/hyperparameter-tuning/)


 ======================
 Understand the Model 
 ======================

 Model Interpretability Tools/Methods
 ------------------------------
 - Conformal Predications (https://www.dailydoseofds.com/conformal-predictions-build-confidence-in-your-ml-models-predictions/)
  - Has distribution-free nature
  - Requires no additional training
 - Partial dependency plots (PDPs) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-1/)
 - Individual Conditional Expectation (ICE) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-1/) 
 - LIME (Local Interpretable Model-agnostic Explanations) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-2/)
 - SHAP (SHapley Additive exPlanations) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-3/)

 ======================
 Model Deployment
 ======================

 Model Compression (https://www.dailydoseofds.com/model-compression-a-critical-step-towards-efficient-machine-learning/)
 -----------------
 - Knowledge Distillation 
 - Pruning 
  - Neuron Pruning
    - Activation Pruning
    - Redundancy Pruning (Complex to implement and can be a bit unreliable)
  - Weight Pruning
    - Zero Pruning
    
 - Low Rank Factorization 
 - Quantization (https://www.dailydoseofds.com/quantization-optimize-ml-models-to-run-them-on-tiny-hardware/)
  - Linear Quantization 
      - Affine Quantization
      - Symmetric Quantization
  - LLM Quantization 
      - LLM.Int8
      - SmoothQuant
  - What we can quantize?
    - Weights 
    - Activations 
      - Static (when we have training data)
      - Dynamic



 Post Deployment Considerations
 ------------------------------
 - Version Control
  - Data Version Control using DVC (https://www.dailydoseofds.com/you-cannot-build-large-data-projects-until-you-learn-data-version-control/)
  - Model Version Control using modelbit (https://www.dailydoseofds.com/deploy-version-control-and-manage-ml-models-right-from-your-jupyter-notebook-with-modelbit/)
    - Need 
      - Collaboration 
      - Reproducibility 
      - CI/CD
    - Advantages of model registry
      - Ongoing Model Improvements
      - Model Reusability (while changing the inference code)
      - Conditional inference
 - Model Logging (Monitoring)
  - Need 
    - Concept Drift 
    - Data Drift
      - Covariate shift (occurs when the distribution of the input features (covariates) in your data changes over time, but the true relationship between the target variable and the input remains the same.)
      - Non-stationarity (refers to the situation where the probability distribution of the samples evolves over time in a non-systematic or unpredictable manner.)
      - Unrepresentative training data (situation where the data used to train a machine learning model does not adequately represent the real-world conditions or the diversity of scenarios that the model will encounter in production.)
  - Things to be collected
    - System 
      - Resource Utilization Logs 
      - Latency Logs 
    - Model
      - Prediction Logs 
      - Input Data Logs 
 - Adaptive learning
 - Deployment Challenges 
  - Consistency challenges
  - Inadequate Expertise (or Knowledge Gap)
  - Pain Points with Traditional Hosting Services
    - No Jupyter Notebook Support
    - Specialized Expertise


 =================================
 Application/Deployment Patterns 
 =================================
 RAG
 ----
 - Vector Databases 
  - Indexing and Search (KNN/ANN)
    - Flat Index 
    - Inverted File Index 
    - Product Quantization 
    - Hierarchical Navigable Small World (HNSW)
 - Part 1 (End 2 End Process)
  - Indexing Time
    - Chunking 
      - Fixed Size Chunking
      - Semantic Chunking
      - Recursive Chunking
      - Document Structure Based Chunking
      - LLM Based Chunking
    - Generation of Embeddings 
    - Indexing in Vector DB
  - Query Time 
    - Generation of Embedding for Query 
    - Similarity Search to retrieve relevant chunks
    - Rerank the chunks
    - Provide retrieved documents to LLM for output generation
    
 - Part 2 (Measurement)
  - Faithfulness (how much a(q) is faithful to c(q)
  - Answer relevance
  - Context relevance
  - Answer correctness
  - Context recall
  - Context precision
 - Part 3 (Optimize retrieval from Vector DB)
  - Binary Quantization
 - Part 4,5,6
  - Multi-modal RAGs
    - Generate summary of the images using vision models and then generate embeddings
    - Train a model like CLIP(Contrastive Language–Image Pretraining) to generate image and text embedding in common embedding space
 - Part 7
  - Graph RAGs
    - Limitations of traditional RAG
      - LLMs love structured data
      - Limited handling of long-term connections across chunks
      - Inability to handle complex query reasoning
      - Explainability challenges
      - Context overlap and redundancy issues
    - Solve
      - Use graph databases to give structure to the data and the retriever uses graph databases to retrieve the documents

    
 Agentic AI
 ----------
 - Agentic AI Design Patterns (https://blog.dailydoseofds.com/p/5-agentic-ai-design-patterns)
  - Reflection Pattern 
  - Tool Use Pattern 
  - ReAct Pattern
  - Planning Pattern 
  - Multi-Agent Pattern
 - Building blocks of Agents:
  - Role-playing
  - Focus
  - Tools
  - Cooperation
  - Guardrails
  - Memory
    - Short Term 
    - Long Term
    - Entity 
    - User
 
 
 =================================
 Misc 
 =================================
 - Curse of dimensionality (https://www.dailydoseofds.com/a-mathematical-deep-dive-into-the-curse-of-dimensionality/)
  - As the dimension of the hypercube increases, the volume gets heavily concentrated near the boundaries of the hypercube.
  - As the dimensional space expands, the relative increase in the maximum possible distance between points becomes smaller and smaller. (O(sqrt(n))
    - As the dimensionality goes to infinity, the nearest and farthest distances almost become equal. In other words, as dimensionality increases, the distance of the query point to the farthest data point approaches its distance to the nearest data point.
    - In high dimensions (mostly beyond 50 dimensions), almost the entire space (99%) is needed to find just the 10 nearest neighbors.
	======================
	Models/Algorithms
	======================

	Models for Sentence Embeddings
	------------------------------
	- Cross Encoders (https://www.dailydoseofds.com/bi-encoders-and-cross-encoders-for-sentence-pair-similarity-scoring-part-1/)
	- Bi-Encoders (https://www.dailydoseofds.com/bi-encoders-and-cross-encoders-for-sentence-pair-similarity-scoring-part-1/)
	- Hybrid (combination of above) (https://www.dailydoseofds.com/augsbert-bi-encoders-cross-encoders-for-sentence-pair-similarity-scoring-part-2/)

	Graph Neural Networds
	---------------------
	- Type of Tasks
	- Node Level Tasks (eg. identify if a user is fake in a social network)
	- Edge Level Tasks (eg. recommending someone to follow (or add as a friend) on social media)
	- Graph level tasks (eg. given a set of research papers represented as graphs, a graph neural network could classify them into different categories (e.g., medical, engineering, computer science))
	- Challenges with Graph Modelling
	- Irregular shapes
	- Interdependence of data points
	- Permutation invariance
	- Neural Networks
	- Graph Convolutional Networks (GCNs)
	- Graph Attention Networks
	- GraphSAGE (short for Graph Sample and AggregatE)
	- Node Level Features
	- In-degree
	- Out-degree
	- Total degree
	- Betweenness centrality
	- Closeness centrality
	- Eigenvector centrality

	======================
	Model Training Process
	======================

	Model Training
	--------------
	- Mixed Precision Training (https://blog.dailydoseofds.com/p/mixed-precision-training)
	- Gradient descent with momentum (https://blog.dailydoseofds.com/p/an-intuitive-and-visual-demonstration)
	- Gradient Checkpointing (https://blog.dailydoseofds.com/p/gradient-checkpointing-save-50-60)
	- Hyperparameter Tuning(https://www.dailydoseofds.com/tag/hyperparameter-tuning/)


	======================
	Understand the Model
	======================

	Model Interpretability Tools/Methods
	------------------------------
	- Conformal Predications (https://www.dailydoseofds.com/conformal-predictions-build-confidence-in-your-ml-models-predictions/)
	- Has distribution-free nature
	- Requires no additional training
	- Partial dependency plots (PDPs) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-1/)
	- Individual Conditional Expectation (ICE) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-1/)
	- LIME (Local Interpretable Model-agnostic Explanations) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-2/)
	- SHAP (SHapley Additive exPlanations) (https://www.dailydoseofds.com/a-crash-course-on-model-interpretability-part-3/)

	======================
	Model Deployment
	======================

	Model Compression (https://www.dailydoseofds.com/model-compression-a-critical-step-towards-efficient-machine-learning/)
	-----------------
	- Knowledge Distillation
	- Pruning
	- Neuron Pruning
	- Activation Pruning
	- Redundancy Pruning (Complex to implement and can be a bit unreliable)
	- Weight Pruning
	- Zero Pruning

	- Low Rank Factorization
	- Quantization (https://www.dailydoseofds.com/quantization-optimize-ml-models-to-run-them-on-tiny-hardware/)
	- Linear Quantization
	- Affine Quantization
	- Symmetric Quantization
	- LLM Quantization
	- LLM.Int8
	- SmoothQuant
	- What we can quantize?
	- Weights
	- Activations
	- Static (when we have training data)
	- Dynamic



	Post Deployment Considerations
	------------------------------
	- Version Control
	- Data Version Control using DVC (https://www.dailydoseofds.com/you-cannot-build-large-data-projects-until-you-learn-data-version-control/)
	- Model Version Control using modelbit (https://www.dailydoseofds.com/deploy-version-control-and-manage-ml-models-right-from-your-jupyter-notebook-with-modelbit/)
	- Need
	- Collaboration
	- Reproducibility
	- CI/CD
	- Advantages of model registry
	- Ongoing Model Improvements
	- Model Reusability (while changing the inference code)
	- Conditional inference
	- Model Logging (Monitoring)
	- Need
	- Concept Drift
	- Data Drift
	- Covariate shift (occurs when the distribution of the input features (covariates) in your data changes over time, but the true relationship between the target variable and the input remains the same.)
	- Non-stationarity (refers to the situation where the probability distribution of the samples evolves over time in a non-systematic or unpredictable manner.)
	- Unrepresentative training data (situation where the data used to train a machine learning model does not adequately represent the real-world conditions or the diversity of scenarios that the model will encounter in production.)
	- Things to be collected
	- System
	- Resource Utilization Logs
	- Latency Logs
	- Model
	- Prediction Logs
	- Input Data Logs
	- Adaptive learning
	- Deployment Challenges
	- Consistency challenges
	- Inadequate Expertise (or Knowledge Gap)
	- Pain Points with Traditional Hosting Services
	- No Jupyter Notebook Support
	- Specialized Expertise


	=================================
	Application/Deployment Patterns
	=================================
	RAG
	----
	- Vector Databases
	- Indexing and Search (KNN/ANN)
	- Flat Index
	- Inverted File Index
	- Product Quantization
	- Hierarchical Navigable Small World (HNSW)
	- Part 1 (End 2 End Process)
	- Indexing Time
	- Chunking
	- Fixed Size Chunking
	- Semantic Chunking
	- Recursive Chunking
	- Document Structure Based Chunking
	- LLM Based Chunking
	- Generation of Embeddings
	- Indexing in Vector DB
	- Query Time
	- Generation of Embedding for Query
	- Similarity Search to retrieve relevant chunks
	- Rerank the chunks
	- Provide retrieved documents to LLM for output generation

	- Part 2 (Measurement)
	- Faithfulness (how much a(q) is faithful to c(q)
	- Answer relevance
	- Context relevance
	- Answer correctness
	- Context recall
	- Context precision
	- Part 3 (Optimize retrieval from Vector DB)
	- Binary Quantization
	- Part 4,5,6
	- Multi-modal RAGs
	- Generate summary of the images using vision models and then generate embeddings
	- Train a model like CLIP(Contrastive Language–Image Pretraining) to generate image and text embedding in common embedding space
	- Part 7
	- Graph RAGs
	- Limitations of traditional RAG
	- LLMs love structured data
	- Limited handling of long-term connections across chunks
	- Inability to handle complex query reasoning
	- Explainability challenges
	- Context overlap and redundancy issues
	- Solve
	- Use graph databases to give structure to the data and the retriever uses graph databases to retrieve the documents


	Agentic AI
	----------
	- Agentic AI Design Patterns (https://blog.dailydoseofds.com/p/5-agentic-ai-design-patterns)
	- Reflection Pattern
	- Tool Use Pattern
	- ReAct Pattern
	- Planning Pattern
	- Multi-Agent Pattern
	- Building blocks of Agents:
	- Role-playing
	- Focus
	- Tools
	- Cooperation
	- Guardrails
	- Memory
	- Short Term
	- Long Term
	- Entity
	- User


	=================================
	Misc
	=================================
	- Curse of dimensionality (https://www.dailydoseofds.com/a-mathematical-deep-dive-into-the-curse-of-dimensionality/)
	- As the dimension of the hypercube increases, the volume gets heavily concentrated near the boundaries of the hypercube.
	- As the dimensional space expands, the relative increase in the maximum possible distance between points becomes smaller and smaller. (O(sqrt(n))
	- As the dimensionality goes to infinity, the nearest and farthest distances almost become equal. In other words, as dimensionality increases, the distance of the query point to the farthest data point approaches its distance to the nearest data point.
	- In high dimensions (mostly beyond 50 dimensions), almost the entire space (99%) is needed to find just the 10 nearest neighbors.