Skip to content

Instantly share code, notes, and snippets.

@StanAngeloff
Created March 15, 2026 12:13
Show Gist options
  • Select an option

  • Save StanAngeloff/df333e5d5071cbd2a4e66a2ea7e42002 to your computer and use it in GitHub Desktop.

Select an option

Save StanAngeloff/df333e5d5071cbd2a4e66a2ea7e42002 to your computer and use it in GitHub Desktop.
Perplexity redone README

Flemma 🪶

Neovim as an AI workspace — where every conversation is a real file.

Flemma turns Neovim into a first-class AI workspace for writing, research, planning, and optionally coding. It treats every interaction with an LLM as a .chat markdown buffer you can edit, refactor, search, and commit like any other file.

  • Keep long-running conversations as versioned documents under Git.
  • Compose prompts from frontmatter, inline expressions, and includes.
  • Attach local files (code, PDFs, screenshots) directly in messages.
  • Switch between Claude, OpenAI, and Vertex models from a single command tree.
  • See token usage, cost, and even streamed "thinking" traces without leaving Neovim.

Flemma is built for people who already live in the terminal: technical writers, architects, researchers, tinkerers — and developers who want AI woven into their editor without a browser or heavyweight IDE.

Flemma is a workspace, not just a coding assistant. It shines for documents, knowledge work, and experiments as much as small coding tasks.


Why Flemma instead of a browser tab?

If you spend your day in Neovim, AI in the browser is constantly in your way:

  • Tabs go to sleep, pages refresh, and you lose unsaved context.
  • Sharing files means uploading them to someone else’s servers.
  • You can’t drive it with your Neovim muscle memory, motions, and macros.

Femma keeps everything local and text-first:

  • Conversations are .chat files in your project tree — duplicate, split, and branch them like code.
  • You can open ten different .chat buffers, each focused on a separate document, feature, or idea.
  • File references like @./specs/v2.pdf or @./patches/fix.lua;type=text/x-lua are part of the prompt language.
  • Git gives you history, diffs, and blame over your AI-assisted reasoning.

If you can write it down and reason about it, you can probably do it more comfortably inside a .chat buffer.


What you can use Flemma for

Examples taken from real-world use:

  • Turning rough notes into polished technical documents: PRDs, AKMs, architecture docs, release notes, SOWs, and client-facing briefs.
  • Turning transcripts (e.g. Whisper output) into training material, checklists, or knowledge base articles.
  • Distilling weeks of emails, tickets, and meeting minutes into project plans and decision logs.
  • Drafting storyboards and course content from Figma designs and other visual inputs.
  • Generating prompts for image models or external agents — .chat files are great scratchpads.
  • Lightweight coding support: small scripts, one-off jobs, or explaining unfamiliar code.
  • Personal tasks: bedtime stories, decision support, difficult emails, or legal questions (with the usual disclaimers).

Flemma is not a replacement for tools like Copilot or dedicated coding agents; it is a complementary workspace that excels at long-form thinking and documentation.


Core ideas

Flemma’s design is built around a few simple ideas:

  1. Conversations are files
    Every interaction is a .chat buffer. There is no hidden JSON store or opaque session object — what you see in the buffer is the canonical state.

  2. Prompts are programmable
    Frontmatter and inline expressions let you treat prompts like templates and mini-programs.

  3. Attachments are first-class
    You reference local files directly in messages; Flemma figures out MIME types and provider-specific upload details.

  4. The editor is the IDE for AI
    Navigation, folding, search, and motions all work like you expect in Neovim, tuned for chat transcripts and reasoning traces.

  5. Multiple providers, one workflow
    Claude, OpenAI, and Vertex share the same UX; switching model or provider is a single command instead of a different plugin.


Feature tour

.chat buffers

A .chat file is a normal buffer with a special structure:

```lua
release = {
  version = "v25.10-1",
  focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]

@System: You turn engineering notes into concise changelog entries.

@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below: {{notes}}

@Assistant:

  • Changelog bullets...
  • Follow-up actions...
Model thoughts stream here and auto-fold. ```
  • Frontmatter on the first line returns variables for the rest of the file. Lua and JSON parsers are built in; you can register more.
  • Messages begin with @System:, @You:, or @Assistant:. The parser is whitespace-tolerant and supports long, multi-line messages.
  • Thinking blocks (<thinking>...</thinking>) are folded automatically and highlighted separately from the answer.

Folding is tuned to keep large chats readable:

  • Level 3: frontmatter
  • Level 2: thinking blocks
  • Level 1: individual messages

You also get message-aware motions and text objects (e.g. ]m, [m, im, am) and buffer-local mappings for send / cancel.


Templates, expressions, and includes

Treat .chat files as living templates instead of static prompts.

  • Frontmatter returns a table of values:

    ```lua
    recipient = "QA team"
    notes = [[
    - Verify presets list before providers.
    - Check spinner no longer triggers spell checking.
    - Confirm logging commands live under :Flemma logging:*.
    ]]
    
    
  • Inline expressions like {{ fnamemodify(vim.fn.bufname('%'), ':t') }} are evaluated in a sandbox with standard Lua libs, vim.fn, vim.fs, and your own variables.

  • include("path") lets you pull in reusable fragments from other templates, with guards against missing files and circular includes.

Before any request is sent, Flemma:

  1. Parses frontmatter and includes.
  2. Evaluates inline expressions.
  3. Validates file references and attachments.

Blocking errors are surfaced as diagnostics and cancel the request before it ever hits a provider.


Referencing local files

Use @./relative/path (or @../up-one/path) to attach local context:

@You: Critique @./patches/fix.lua;type=text/x-lua.
@You: OCR this screenshot @./artifacts/failure.png.
@You: Compare these specs: @./specs/v1.pdf and @./specs/v2.pdf.

Flemma will:

  • Resolve the path relative to the .chat file.
  • Detect the MIME type via the file CLI (with an extension-based fallback).
  • Encode and send the asset in the format expected by each provider.

If a file is missing or unsupported, you get a precise warning (with line number) and the raw @./path is left in the prompt so you can fix it.


Providers and models

Flemma ships a model catalogue for:

  • Anthropic Claude
  • OpenAI (GPT‑5 family, including reasoning effort)
  • Google Vertex AI (Gemini 2.5, streamed thinking output)

You can:

  • Switch with :Flemma switch (interactive picker) or :Flemma switch openai gpt-5 temperature=0.3.
  • Define named presets under presets (e.g. $fast, $deep) and recall them with :Flemma switch $fast.
  • Pass arbitrary key=value overrides that are forwarded to the underlying provider.

Provider-specific features:

  • Claude: text, image, and PDF attachments.
  • OpenAI: reasoning effort levels (low|medium|high); lualine shows the active level.
  • Vertex: thinking_budget switches on streamed <thinking> traces which Flemma folds and highlights.

Token and cost reporting uses the bundled pricing table so you can see both per-request and per-session totals.


Observability and UX niceties

Flemma tries to make AI activity visible and unobtrusive:

  • Floating usage reports after each request: provider, model, input/output tokens, reasoning tokens, and cost.
  • :Flemma notification:recall reopens the last report if you closed it too fast.
  • Optional structured logging with :Flemma logging:enable / :Flemma logging:open showing redacted curl commands and streaming traces.
  • Lualine integration that shows the active model and reasoning effort only in .chat buffers.
  • A spinner line (@Assistant: Thinking...) that is explicitly marked non-spellable so spell checkers stay quiet.

Quick start

  1. Install the plugin (lazy.nvim example):

    {
      "Flemma-Dev/flemma.nvim",
      opts = {}, -- calls require("flemma").setup({}) for you
    }
  2. Set provider credentials using environment variables:

    • ANTHROPIC_API_KEY
    • OPENAI_API_KEY
    • VERTEX_AI_ACCESS_TOKEN or a Vertex service-account (see below)

    Or store them once in your Linux keyring via secret-tool so Flemma can reuse them across sessions.

  3. Create a .chat file in your project:

    :edit notes/product-update.chat
    
  4. Type a message and send:

    @You: Turn the notes below into a short project update.
    - Added Vertex thinking budget support.
    - Refactored :Flemma command routing.
    - Documented presets in the README.
    

    Then press <C-]> in normal or insert mode, or run:

    :Flemma send
  5. Watch the response stream in
    The buffer is temporarily locked while streaming; you’ll see @Assistant: Thinking... and then the reply. A floating window shows tokens and cost for this request and the current session.

Cancel a running request with <C-c> or :Flemma cancel.


Installation and requirements

Flemma supports any plugin manager.

Minimal config

require("flemma").setup({})

Requirements

  • Neovim 0.11+ — for Tree-sitter folding APIs and vim.fs helpers.
  • curl — used for streaming via Server-Sent Events.
  • Markdown Tree-sitter grammar — Flemma reuses it for .chat highlighting and folding.
  • file CLI (optional but recommended) — robust MIME detection for @./path attachments.

Vertex AI with service account (optional)

Flemma supports both direct access tokens and a more ergonomic service-account flow. See the docs for the full steps; in short you:

  1. Create a service account with the Vertex AI user role.
  2. Store its JSON either in VERTEX_SERVICE_ACCOUNT or the system keyring.
  3. Ensure gcloud is on your $PATH so Flemma can refresh tokens.
  4. Configure project_id and location via :Flemma switch or setup().

Commands overview

All commands hang off a single entry point:

:Flemma {subcommand}

Useful ones:

  • :Flemma send [key=value…] — send the current buffer (with optional hooks like on_request_start / on_request_complete).
  • :Flemma cancel — abort the active request.
  • :Flemma switch … — choose provider/model or preset; accepts key=value overrides.
  • :Flemma message:next / :Flemma message:previous — jump between messages.
  • :Flemma logging:enable|disable|open — control structured logging.
  • :Flemma notification:recall — reopen the last usage notification.
  • :Flemma import — convert Claude Workbench TypeScript exports into .chat buffers.

Legacy commands like :FlemmaSend and :FlemmaCancel still work but forward to the new tree with a deprecation note.


Customisation

Everything Flemma does can be tuned from setup():

  • Highlights for roles, expressions, file references, and thinking blocks.
  • Ruler character and highlight between messages.
  • Per-role signs in the sign column.
  • Spinner behaviour while requests run.
  • Text object key (m by default) and buffer-local keymaps.
  • Notification appearance and logging path.
  • Automatic buffer writes on successful requests.

See the configuration section in the docs for a full reference.


Developing and contributing

This repository ships with a Nix dev shell and a headless test suite.

  • nix develop — enter the dev shell.
  • flemma-fmt — run nixfmt, stylua, and prettier across the repo.
  • make test — run the Plenary + Busted specs against a minimal Neovim config.

To test Flemma locally without installing:

nvim --cmd "set runtimepath+=`pwd`" \
  -c 'lua require("flemma").setup({})' \
  -c ':edit scratch.chat'

Contributions are welcome — from small documentation fixes to new providers, better diagnostics, and sharper .chat ergonomics.


License

Flemma is licensed under the AGPL-3.0 license.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment