Skip to content

Instantly share code, notes, and snippets.

View keskinonur's full-sized avatar
🖖

Onur Keskin, Ph.D. keskinonur

🖖
View GitHub Profile

Boost Prompt

A prompt to boost your lazy "do this" prompts. Install with one of the buttons below.

Install in VS Code Install in VS Code Insiders

Use

@philschmid
philschmid / GEMINI.md
Last active November 5, 2025 12:11
Gemini CLI Plan Mode prompt

Gemini CLI Plan Mode

You are Gemini CLI, an expert AI assistant operating in a special 'Plan Mode'. Your sole purpose is to research, analyze, and create detailed implementation plans. You must operate in a strict read-only capacity.

Gemini CLI's primary goal is to act like a senior engineer: understand the request, investigate the codebase and relevant resources, formulate a robust strategy, and then present a clear, step-by-step plan for approval. You are forbidden from making any modifications. You are also forbidden from implementing the plan.

Core Principles of Plan Mode

  • Strictly Read-Only: You can inspect files, navigate code repositories, evaluate project structure, search the web, and examine documentation.
  • Absolutely No Modifications: You are prohibited from performing any action that alters the state of the system. This includes:
@awni
awni / mlx_distributed_deepseek.md
Last active November 5, 2025 16:03
Run DeepSeek R1 or V3 with MLX Distributed

Setup

On every machine in the cluster install openmpi and mlx-lm:

conda install conda-forge::openmpi
pip install -U mlx-lm

Next download the pipeline parallel run script. Download it to the same path on every machine:

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@catid
catid / gist:533dd0c7d4f3ee8d34a6a905155b72ae
Last active April 22, 2024 04:53
How to quantize 70B model so it will fit on 2x4090 GPUs
How to quantize 70B model so it will fit on 2x4090 GPUs:
I tried EXL2, AutoAWQ, and SqueezeLLM and they all failed for different reasons (issues opened).
HQQ worked:
I rented a 4x GPU 1TB RAM ($19/hr) instance on runpod with 1024GB container and 1024GB workspace disk space.
I think you only need 2x GPU with 80GB VRAM and 512GB+ system RAM so probably overpaid.
Note you need to fill in the form to get access to the 70B Meta weights.
@adrienbrault
adrienbrault / llama2-mac-gpu.sh
Last active April 8, 2025 13:49
Run Llama-2-13B-chat locally on your M1/M2 Mac with GPU inference. Uses 10GB RAM. UPDATE: see https://twitter.com/simonw/status/1691495807319674880?s=20
# Clone llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
# Build it
make clean
LLAMA_METAL=1 make
# Download model
export MODEL=llama-2-13b-chat.ggmlv3.q4_0.bin
@rishi-singh26
rishi-singh26 / main.dart
Last active February 12, 2024 21:27
How to implement end-to-end encryption using PBKDF in Flutter
// You can read the article I wrote for this setup on medium.com at the link below
// https://medium.com/@rishi_singh/how-to-implement-end-to-end-encryption-using-pbkdf-in-flutter-a5508e7ad93e
import 'dart:math';
import 'dart:typed_data';
import 'package:crypton/crypton.dart';
import 'package:pointycastle/block/aes.dart';
import 'package:pointycastle/digests/sha256.dart';
import 'package:pointycastle/key_derivators/pbkdf2.dart';
@jcmvbkbc
jcmvbkbc / gist:316e6da728021c8ff670a24e674a35e6
Last active August 29, 2025 23:28
esp32s3 linux rebuild scripts
Latest versions of these scripts are available in git repository https://github.com/jcmvbkbc/esp32-linux-build
@rain-1
rain-1 / llama-home.md
Last active June 24, 2025 11:12
How to run Llama 13B with a 6GB graphics card

This worked on 14/May/23. The instructions will probably require updating in the future.

llama is a text prediction model similar to GPT-2, and the version of GPT-3 that has not been fine tuned yet. It is also possible to run fine tuned versions (like alpaca or vicuna with this. I think. Those versions are more focused on answering questions)

Note: I have been told that this does not support multiple GPUs. It can only use a single GPU.

It is possible to run LLama 13B with a 6GB graphics card now! (e.g. a RTX 2060). Thanks to the amazing work involved in llama.cpp. The latest change is CUDA/cuBLAS which allows you pick an arbitrary number of the transformer layers to be run on the GPU. This is perfect for low VRAM.

  • Clone llama.cpp from git, I am on commit 08737ef720f0510c7ec2aa84d7f70c691073c35d.
@rain-1
rain-1 / LLM.md
Last active October 20, 2025 07:02
LLM Introduction: Learn Language Models

Purpose

Bootstrap knowledge of LLMs ASAP. With a bias/focus to GPT.

Avoid being a link dump. Try to provide only valuable well tuned information.

Prelude

Neural network links before starting with transformers.