This document provides guidelines for maintaining high-quality Python code. These rules MUST be followed by all AI coding agents and contributors.
All code you write MUST be fully optimized.
"Fully optimized" includes:
| #!/bin/bash | |
| echo "Cleaning up Xcode files…" | |
| # Show current heavy folders | |
| du -sh ~/Library/Developer/Xcode/DerivedData \ | |
| ~/Library/Developer/Xcode/Archives \ | |
| ~/Library/Developer/Xcode/iOS\ DeviceSupport \ | |
| ~/Library/Developer/CoreSimulator/Devices \ | |
| ~/Library/Developer/Xcode/DocumentationCache 2>/dev/null || true |
| from google import genai | |
| from google.genai import types | |
| import typing_extensions as typing | |
| from PIL import Image | |
| import requests | |
| import io | |
| import json | |
| import os |
| # /// script | |
| # dependencies = [ | |
| # "atproto" | |
| # ] | |
| # /// | |
| from atproto import Client | |
| import getpass | |
| import time | |
| import torch | |
| from diffusers import FluxTransformer2DModel | |
| import torch.utils.benchmark as benchmark | |
| from torchao.quantization import quantize_, int8_weight_only | |
| from torchao.utils import unwrap_tensor_subclass | |
| import torch._inductor | |
| torch._inductor.config.mixed_mm_choice = "triton" |
| # download FluxCFGPipline | |
| !wget https://raw.githubusercontent.com/linoytsaban/diffusers/refs/heads/dreambooth-lora-flux-exploration/examples/community/pipeline_flux_with_cfg.py | |
| # load pipeline | |
| import diffusers | |
| import torch | |
| from pipeline_flux_with_cfg import FluxCFGPipeline | |
| pipe = FluxCFGPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", | |
| torch_dtype=torch.bfloat16) |
Flux: https://blackforestlabs.ai/announcing-black-forest-labs/
torchaoThe first resource even allows you to run the pipeline under 16GBs of GPU VRAM.
| import argparse | |
| import numpy as np | |
| import torch | |
| import torch.nn as nn | |
| import coremltools as ct | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| # When using float16, all predicted logits are 0. To be debugged. | |
| compute_precision = ct.precision.FLOAT32 | |
| compute_units = ct.ComputeUnit.CPU_ONLY |
Good question! I am collecting human data on how quantization affects outputs. See here for more information: ggml-org/llama.cpp#5962
In the meantime, use the largest that fully fits in your GPU. If you can comfortably fit Q4_K_S, try using a model with more parameters.
See the wiki upstream: https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix
| # Copyright 2023 Taiga Takano | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http:#www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, |