Neovim as an AI workspace — where every conversation is a real file.
Flemma turns Neovim into a first-class AI workspace for writing, research, planning, and optionally coding.
It treats every interaction with an LLM as a .chat markdown buffer you can edit, refactor, search, and commit like any other file.
- Keep long-running conversations as versioned documents under Git.
- Compose prompts from frontmatter, inline expressions, and includes.
- Attach local files (code, PDFs, screenshots) directly in messages.
- Switch between Claude, OpenAI, and Vertex models from a single command tree.
- See token usage, cost, and even streamed "thinking" traces without leaving Neovim.
Flemma is built for people who already live in the terminal: technical writers, architects, researchers, tinkerers — and developers who want AI woven into their editor without a browser or heavyweight IDE.
Flemma is a workspace, not just a coding assistant. It shines for documents, knowledge work, and experiments as much as small coding tasks.
If you spend your day in Neovim, AI in the browser is constantly in your way:
- Tabs go to sleep, pages refresh, and you lose unsaved context.
- Sharing files means uploading them to someone else’s servers.
- You can’t drive it with your Neovim muscle memory, motions, and macros.
Femma keeps everything local and text-first:
- Conversations are
.chatfiles in your project tree — duplicate, split, and branch them like code. - You can open ten different
.chatbuffers, each focused on a separate document, feature, or idea. - File references like
@./specs/v2.pdfor@./patches/fix.lua;type=text/x-luaare part of the prompt language. - Git gives you history, diffs, and blame over your AI-assisted reasoning.
If you can write it down and reason about it, you can probably do it more comfortably inside a .chat buffer.
Examples taken from real-world use:
- Turning rough notes into polished technical documents: PRDs, AKMs, architecture docs, release notes, SOWs, and client-facing briefs.
- Turning transcripts (e.g. Whisper output) into training material, checklists, or knowledge base articles.
- Distilling weeks of emails, tickets, and meeting minutes into project plans and decision logs.
- Drafting storyboards and course content from Figma designs and other visual inputs.
- Generating prompts for image models or external agents —
.chatfiles are great scratchpads. - Lightweight coding support: small scripts, one-off jobs, or explaining unfamiliar code.
- Personal tasks: bedtime stories, decision support, difficult emails, or legal questions (with the usual disclaimers).
Flemma is not a replacement for tools like Copilot or dedicated coding agents; it is a complementary workspace that excels at long-form thinking and documentation.
Flemma’s design is built around a few simple ideas:
-
Conversations are files
Every interaction is a.chatbuffer. There is no hidden JSON store or opaque session object — what you see in the buffer is the canonical state. -
Prompts are programmable
Frontmatter and inline expressions let you treat prompts like templates and mini-programs. -
Attachments are first-class
You reference local files directly in messages; Flemma figures out MIME types and provider-specific upload details. -
The editor is the IDE for AI
Navigation, folding, search, and motions all work like you expect in Neovim, tuned for chat transcripts and reasoning traces. -
Multiple providers, one workflow
Claude, OpenAI, and Vertex share the same UX; switching model or provider is a single command instead of a different plugin.
A .chat file is a normal buffer with a special structure:
```lua
release = {
version = "v25.10-1",
focus = "command presets and UI polish",
}
notes = [[
- Presets appear first in :Flemma switch completion.
- Thinking tags have dedicated highlights.
- Logging toggles now live under :Flemma logging:*.
]]@System: You turn engineering notes into concise changelog entries.
@You: Summarise {{release.version}} with emphasis on {{release.focus}} using the points below: {{notes}}
@Assistant:
- Changelog bullets...
- Follow-up actions...
- Frontmatter on the first line returns variables for the rest of the file. Lua and JSON parsers are built in; you can register more.
- Messages begin with
@System:,@You:, or@Assistant:. The parser is whitespace-tolerant and supports long, multi-line messages. - Thinking blocks (
<thinking>...</thinking>) are folded automatically and highlighted separately from the answer.
Folding is tuned to keep large chats readable:
- Level 3: frontmatter
- Level 2: thinking blocks
- Level 1: individual messages
You also get message-aware motions and text objects (e.g. ]m, [m, im, am) and buffer-local mappings for send / cancel.
Treat .chat files as living templates instead of static prompts.
-
Frontmatter returns a table of values:
```lua recipient = "QA team" notes = [[ - Verify presets list before providers. - Check spinner no longer triggers spell checking. - Confirm logging commands live under :Flemma logging:*. ]]
-
Inline expressions like
{{ fnamemodify(vim.fn.bufname('%'), ':t') }}are evaluated in a sandbox with standard Lua libs,vim.fn,vim.fs, and your own variables. -
include("path") lets you pull in reusable fragments from other templates, with guards against missing files and circular includes.
Before any request is sent, Flemma:
- Parses frontmatter and includes.
- Evaluates inline expressions.
- Validates file references and attachments.
Blocking errors are surfaced as diagnostics and cancel the request before it ever hits a provider.
Use @./relative/path (or @../up-one/path) to attach local context:
@You: Critique @./patches/fix.lua;type=text/x-lua.
@You: OCR this screenshot @./artifacts/failure.png.
@You: Compare these specs: @./specs/v1.pdf and @./specs/v2.pdf.
Flemma will:
- Resolve the path relative to the
.chatfile. - Detect the MIME type via the
fileCLI (with an extension-based fallback). - Encode and send the asset in the format expected by each provider.
If a file is missing or unsupported, you get a precise warning (with line number) and the raw @./path is left in the prompt so you can fix it.
Flemma ships a model catalogue for:
- Anthropic Claude
- OpenAI (GPT‑5 family, including reasoning effort)
- Google Vertex AI (Gemini 2.5, streamed thinking output)
You can:
- Switch with
:Flemma switch(interactive picker) or:Flemma switch openai gpt-5 temperature=0.3. - Define named presets under
presets(e.g.$fast,$deep) and recall them with:Flemma switch $fast. - Pass arbitrary
key=valueoverrides that are forwarded to the underlying provider.
Provider-specific features:
- Claude: text, image, and PDF attachments.
- OpenAI: reasoning effort levels (
low|medium|high); lualine shows the active level. - Vertex:
thinking_budgetswitches on streamed<thinking>traces which Flemma folds and highlights.
Token and cost reporting uses the bundled pricing table so you can see both per-request and per-session totals.
Flemma tries to make AI activity visible and unobtrusive:
- Floating usage reports after each request: provider, model, input/output tokens, reasoning tokens, and cost.
:Flemma notification:recallreopens the last report if you closed it too fast.- Optional structured logging with
:Flemma logging:enable/:Flemma logging:openshowing redacted curl commands and streaming traces. - Lualine integration that shows the active model and reasoning effort only in
.chatbuffers. - A spinner line (
@Assistant: Thinking...) that is explicitly marked non-spellable so spell checkers stay quiet.
-
Install the plugin (lazy.nvim example):
{ "Flemma-Dev/flemma.nvim", opts = {}, -- calls require("flemma").setup({}) for you } -
Set provider credentials using environment variables:
ANTHROPIC_API_KEYOPENAI_API_KEYVERTEX_AI_ACCESS_TOKENor a Vertex service-account (see below)
Or store them once in your Linux keyring via
secret-toolso Flemma can reuse them across sessions. -
Create a
.chatfile in your project::edit notes/product-update.chat -
Type a message and send:
@You: Turn the notes below into a short project update. - Added Vertex thinking budget support. - Refactored :Flemma command routing. - Documented presets in the README.Then press
<C-]>in normal or insert mode, or run::Flemma send
-
Watch the response stream in
The buffer is temporarily locked while streaming; you’ll see@Assistant: Thinking...and then the reply. A floating window shows tokens and cost for this request and the current session.
Cancel a running request with <C-c> or :Flemma cancel.
Flemma supports any plugin manager.
require("flemma").setup({})- Neovim 0.11+ — for Tree-sitter folding APIs and
vim.fshelpers. curl— used for streaming via Server-Sent Events.- Markdown Tree-sitter grammar — Flemma reuses it for
.chathighlighting and folding. fileCLI (optional but recommended) — robust MIME detection for@./pathattachments.
Flemma supports both direct access tokens and a more ergonomic service-account flow. See the docs for the full steps; in short you:
- Create a service account with the Vertex AI user role.
- Store its JSON either in
VERTEX_SERVICE_ACCOUNTor the system keyring. - Ensure
gcloudis on your$PATHso Flemma can refresh tokens. - Configure
project_idandlocationvia:Flemma switchorsetup().
All commands hang off a single entry point:
:Flemma {subcommand}Useful ones:
:Flemma send [key=value…]— send the current buffer (with optional hooks likeon_request_start/on_request_complete).:Flemma cancel— abort the active request.:Flemma switch …— choose provider/model or preset; acceptskey=valueoverrides.:Flemma message:next/:Flemma message:previous— jump between messages.:Flemma logging:enable|disable|open— control structured logging.:Flemma notification:recall— reopen the last usage notification.:Flemma import— convert Claude Workbench TypeScript exports into.chatbuffers.
Legacy commands like :FlemmaSend and :FlemmaCancel still work but forward to the new tree with a deprecation note.
Everything Flemma does can be tuned from setup():
- Highlights for roles, expressions, file references, and thinking blocks.
- Ruler character and highlight between messages.
- Per-role signs in the sign column.
- Spinner behaviour while requests run.
- Text object key (
mby default) and buffer-local keymaps. - Notification appearance and logging path.
- Automatic buffer writes on successful requests.
See the configuration section in the docs for a full reference.
This repository ships with a Nix dev shell and a headless test suite.
nix develop— enter the dev shell.flemma-fmt— runnixfmt,stylua, andprettieracross the repo.make test— run the Plenary + Busted specs against a minimal Neovim config.
To test Flemma locally without installing:
nvim --cmd "set runtimepath+=`pwd`" \
-c 'lua require("flemma").setup({})' \
-c ':edit scratch.chat'Contributions are welcome — from small documentation fixes to new providers, better diagnostics, and sharper .chat ergonomics.
Flemma is licensed under the AGPL-3.0 license.