Skip to content

Instantly share code, notes, and snippets.

@SamSaffron
Created May 29, 2026 06:46
Show Gist options
  • Select an option

  • Save SamSaffron/987bfe46f221a28e287f418fdec6f033 to your computer and use it in GitHub Desktop.

Select an option

Save SamSaffron/987bfe46f221a28e287f418fdec6f033 to your computer and use it in GitHub Desktop.

Plan: PostsFilter and workflow Post node

Goal

Make post retrieval a first-class Discourse primitive that can be reused by:

  • Discourse Workflows
  • Discourse AI researcher
  • Discourse AI report automation
  • future reporting / automation features

The end state is:

Core
  └── PostsFilter
        - safe, permission-aware post filtering
        - reusable from Ruby and UI-backed workflows
        - extensible by plugins

Discourse AI
  ├── researcher tool uses PostsFilter
  └── AI report automation uses PostsFilter

Discourse Workflows
  └── action:post
        - create
        - get
        - list, backed by PostsFilter

This avoids HTTP self-calls, Data Explorer workarounds, custom SQL, and duplicated filter logic.

Key product decisions

  1. Name / shape

    • Core already has TopicsFilter.
    • The new core primitive should be named PostsFilter for consistency.
  2. Workflow UI must be usable from day 0

    • A raw filter string alone is not enough.
    • The workflow Post: list node needs an admin-friendly filter UI at launch.
    • A text filter can exist as an advanced escape hatch, but common filters should be configured with controls.
  3. Actor / permission semantics should align with topic retrieval

    • Follow the same actor pattern used by the existing workflow topic retriever/list operations.
    • The node should not invent a separate private-content model.
  4. Ship a complete replacement, not a partial filter

    • PostsFilter should fully replace the relevant Discourse AI researcher filter behavior, not only support a minimal date/category/tag subset.
  5. Plugin filters should work like topic filters

    • assigned_to: should be registered by the assign plugin, analogous to how topic filters are extensible.
    • Core should provide a filter registration extension point.
  6. Migrate all existing consumers

    • Researcher tool should move to PostsFilter.
    • AI report automation should move to PostsFilter.
    • Workflows should use PostsFilter for Post: list.
  7. Memory cap

    • Result size is tricky because post bodies can be large and downstream AI nodes can multiply cost.
    • Start with a memory cap around 50 MB for retrieved/serialized post data.
    • This is in addition to count limits.

Current implementation to extract from

The richer existing implementation lives in Discourse AI:

plugins/discourse-ai/lib/utils/research/filter.rb
plugins/discourse-ai/lib/utils/research/llm_formatter.rb
plugins/discourse-ai/lib/agents/tools/researcher.rb

The reusable part is:

DiscourseAi::Utils::Research::Filter

Current usage pattern:

filter = DiscourseAi::Utils::Research::Filter.new(
  filter_string,
  limit: max_results,
  guardian: guardian,
)

posts = filter.search

Important current behavior:

  • Uses Post.secured(guardian).
  • Uses Topic.secured(guardian).
  • Excludes PMs by constraining to regular topic archetype.
  • Supports AND filters and OR groups.
  • Returns an ActiveRecord::Relation<Post>.
  • Tracks invalid filter fragments.
  • Supports a broad set of useful post filters.

Existing researcher filters to preserve

The new PostsFilter should preserve the existing researcher filter behavior, including:

username:user1
usernames:user1,user2

group:group1
groups:group1,group2

post_type:first
post_type:reply

keywords:word1,word2
topic_keywords:word1,word2

topic:123
topics:123,456

category:bugs
category:=bugs
category:support/bugs
categories:bugs,feature

tag:urgent
tags:urgent,regression

after:YYYY-MM-DD
before:YYYY-MM-DD

topic_after:YYYY-MM-DD
topic_before:YYYY-MM-DD

status:open
status:closed
status:archived
status:noreplies
status:single_user

max_results:50

order:latest
order:oldest
order:latest_topic
order:oldest_topic
order:likes

It should also preserve OR grouping:

category:bugs OR tag:urgent

Add missing complete-replacement filters

To fully replace AI report automation and make workflows useful, add exclusion support:

-category:bugs
-=category:bugs
exclude_category:bugs
exclude_categories:bugs,staff

-tag:internal
exclude_tag:internal
exclude_tags:internal,noise

The exact syntax should align with TopicsFilter conventions where possible.

Align with TopicsFilter

Core already has:

lib/topics_filter.rb

PostsFilter should intentionally mirror the useful parts of TopicsFilter:

  • class name style: PostsFilter
  • filter_from_query_string(query_string) style API where appropriate
  • aliases like singular/plural filter names
  • prefix handling for inclusion/exclusion/exact matching
  • option metadata for UI builders
  • custom filter extension hooks
  • strict allowlisted filters, no arbitrary SQL

Potential core shape:

filter = PostsFilter.new(
  guardian: guardian,
  scope: Post.all,
)

posts = filter.filter_from_query_string(query_string)

However, we should preserve the convenient researcher-style constructor too if it reduces migration friction:

filter = PostsFilter.new(
  query_string,
  guardian: guardian,
  limit: limit,
  offset: offset,
)

posts = filter.search

Recommendation: implement the TopicsFilter-aligned API as primary, and provide small compatibility helpers for the old researcher API during migration.

Plugin extension model

TopicsFilter has extension points for custom filters. PostsFilter should too.

Target shape:

PostsFilter.add_filter("assigned_to", enabled: -> { SiteSetting.assign_enabled }) do |scope, values, guardian|
  # plugin-provided filtering
end

or similar, matching TopicsFilter conventions as closely as possible.

The assign plugin should register assigned_to: rather than core owning assign-specific SQL.

Desired assign syntax:

assigned_to:username
assigned_to:username1,username2
assigned_to:*
assigned_to:nobody

Workflow Post node

Add a new workflow node:

action:post

Location:

plugins/discourse-workflows/lib/discourse_workflows/nodes/post/v1.rb

Operations:

create
get
list

Operation: create

This is straightforward and should reuse the existing action:create_post logic.

Parameters:

topic_id
raw
reply_to_post_number
author_username

Output:

{
  "post": { ... }
}

Notes:

  • Keep action:create_post for compatibility.
  • Share the implementation so behavior does not drift.
  • Continue avoiding recursive workflow triggers when creating posts from workflows.

Operation: get

Also straightforward.

Parameters:

post_id
actor_username / actor setting aligned with topic retriever
include_raw
include_cooked

Behavior:

  • Find post.
  • Authorize through the same actor/guardian model used by topic retrieval.
  • Serialize workflow-friendly post data.

Operation: list

This is the important operation.

It should use PostsFilter.

The UI should expose common filters with controls from day 0.

Day-0 UI fields

Recommended first UI:

operation: list

Date range:
  created_after
  created_before
  topic_created_after
  topic_created_before

Scope:
  categories
  exclude_categories
  exact_category_match / include_subcategories
  tags
  exclude_tags
  topics
  usernames
  groups

Post type:
  all regular posts
  first posts only
  replies only

Status:
  open
  closed
  archived
  no replies
  single user

Text search:
  keywords
  topic_keywords

Ordering:
  latest
  oldest
  latest_topic
  oldest_topic
  likes

Limits:
  max_results
  offset
  memory_cap_mb, default 50

Advanced:
  raw filter string / additional filter query

The UI should compile these controls into the same PostsFilter backend. The advanced raw filter string can be appended or combined with generated filters.

Output

Return one workflow item per post:

[
  { "json": { "post": { ... } } },
  { "json": { "post": { ... } } }
]

This composes well with downstream workflow nodes. A report workflow can use a Code node to combine items into a corpus.

A future output mode can produce a single item with a posts array, but that is not required for the first version.

Workflow post serialization

Do not expose full ActiveRecord attributes directly.

Create a small serializer/helper for workflow output.

Suggested fields:

{
  "id": 123,
  "topic_id": 45,
  "topic_title": "Example topic",
  "topic_slug": "example-topic",
  "post_number": 2,
  "post_url": "/t/example-topic/45/2",
  "username": "sam",
  "user_id": 7,
  "created_at": "2026-05-29T12:00:00Z",
  "updated_at": "2026-05-29T12:05:00Z",
  "raw": "Post body",
  "cooked": "<p>Post body</p>",
  "excerpt": "Post body",
  "like_count": 3,
  "reply_count": 1,
  "score": 0.5,
  "category_id": 4,
  "category_name": "General",
  "tags": ["weekly-report"]
}

For AI/report workflows, the critical fields are:

  • raw
  • post_url
  • topic_title
  • username
  • created_at
  • category_name
  • tags
  • engagement counts

Memory cap

Post: list needs both count and memory safeguards.

Initial recommendation:

memory cap: 50 MB serialized output

Behavior:

  • Track approximate serialized payload size while building output items.
  • Stop when the cap is reached.
  • Include metadata indicating truncation.
  • Log a workflow warning when truncation happens.

Possible metadata:

{
  "truncated": true,
  "truncation_reason": "memory_cap",
  "memory_cap_bytes": 52428800,
  "posts_returned": 137
}

Because workflows normally return one item per post, metadata placement needs design. Options:

  1. Add metadata to each item.
  2. Add a final metadata item.
  3. Add workflow execution log warning only.
  4. Support a single-output wrapper mode later.

Recommendation for first version:

  • log warning in execution log
  • expose truncated metadata on each item only if needed by downstream nodes
  • keep implementation simple

Security and permissions

PostsFilter must be permission-aware by default.

Use the existing researcher approach as the base:

Post
  .secured(guardian)
  .joins(:topic)
  .merge(Topic.secured(guardian))
  .where("topics.archetype = 'regular'")

Workflow actor semantics should align with the current topic retriever/list behavior.

Private content should only appear if the workflow actor can see it. PMs should remain excluded unless a separate explicit PM feature is added later.

Discourse AI migrations

Researcher tool

Update:

plugins/discourse-ai/lib/agents/tools/researcher.rb

from:

DiscourseAi::Utils::Research::Filter

to:

PostsFilter

Keep the AI-specific pieces in Discourse AI:

  • LlmFormatter
  • token batching
  • goals
  • inference
  • cancellation
  • tool output formatting

Research formatter

Keep:

plugins/discourse-ai/lib/utils/research/llm_formatter.rb

in the AI plugin for now. It is LLM-specific and does not belong in core.

AI report automation

Update:

plugins/discourse-ai/lib/automation/report_context_generator.rb
plugins/discourse-ai/lib/automation/report_runner.rb

to use PostsFilter as well.

This is required for a complete replacement. The report automation should not keep a separate relation-building implementation long term.

Weekly AI report workflow after this change

Desired workflow:

Weekly schedule
  -> Post: list
       created_after: 7 days ago
       order: latest
       max_results: 200
       memory cap: 50 MB
  -> Code: build post corpus / chunk if needed
      -> AI Agent: themes report
      -> AI Agent: actions report
      -> AI Agent: validation report
  -> Merge reports
  -> AI Agent: synthesize and validate
  -> Topic: create

More examples enabled by PostsFilter:

created_after: 7 days ago
categories: support
tags: bug,regression
order: likes
created_after: 30 days ago
groups: moderators
post_type: reply
order: latest
topic_keywords: upload,error
OR
tags: uploads

Implementation phases

Phase 1: Extract PostsFilter into core

  • Create PostsFilter, aligned with TopicsFilter.
  • Port the researcher filter behavior.
  • Preserve existing tests from Discourse AI researcher filter specs.
  • Add missing exclusion filters needed for complete replacement.
  • Add plugin extension hooks.

Phase 2: Register plugin-specific filters

  • Move assigned_to: out of the extracted core filter.
  • Register it from the assign plugin or equivalent plugin initializer.
  • Ensure behavior matches topic filter extensibility.

Phase 3: Migrate Discourse AI researcher

  • Update researcher tool to use PostsFilter.
  • Keep LLM formatter in AI.
  • Keep behavior and specs passing.

Phase 4: Migrate AI report automation

  • Update ReportContextGenerator to use PostsFilter.
  • Preserve report behavior.
  • Use the same include/exclude category/tag semantics as workflows and researcher.

Phase 5: Add workflow action:post

  • Implement create.
  • Implement get.
  • Implement list with PostsFilter.
  • Add memory cap handling.
  • Add workflow post serializer.
  • Add node specs.

Phase 6: Add day-0 workflow UI

  • Add form controls for the common filters.
  • Include an advanced/raw filter input for power users.
  • Ensure generated filter maps to PostsFilter behavior.

Phase 7: Add workflow template

Add a template for:

Weekly AI post report

Graph:

Schedule
  -> Post: list
  -> Code: build corpus
  -> AI Agent: themes
  -> AI Agent: actions
  -> AI Agent: validation
  -> Merge
  -> AI Agent: synthesize
  -> Topic: create

Testing plan

Core PostsFilter specs

Port and expand current researcher filter specs for:

  • usernames, including unicode usernames
  • groups
  • categories
  • exact category matching
  • category parent/child paths
  • tags
  • topics
  • post type
  • date filters
  • topic date filters
  • status filters
  • keywords
  • topic keywords
  • ordering
  • max results
  • OR groups
  • invalid filters
  • PM exclusion
  • secure category visibility
  • plugin-registered filters
  • exclusion filters

Discourse AI specs

  • Researcher tool still finds and processes the same posts.
  • Dry run still counts matches.
  • Invalid filters still return useful error messages.
  • AI report automation includes/excludes the same posts as before.

Workflow specs

  • Post: create creates expected posts.
  • Post: get retrieves and serializes visible posts.
  • Post: get rejects inaccessible posts.
  • Post: list returns expected posts for common UI filters.
  • Post: list handles advanced raw filters.
  • Post: list respects actor permissions.
  • Post: list enforces count and memory caps.
  • Post: list returns one workflow item per post.

PR breakdown

PR 1: Core PostsFilter

  • Add PostsFilter to core.
  • Align with TopicsFilter conventions.
  • Port researcher behavior and specs.
  • Add extension hook.

PR 2: Plugin filter registration

  • Move/register assigned_to: through plugin extension.
  • Add tests for plugin filter registration.

PR 3: Discourse AI migrations

  • Researcher uses PostsFilter.
  • AI report automation uses PostsFilter.
  • Remove duplicated query logic.

PR 4: Workflow Post node backend

  • Add action:post.
  • Implement create, get, list.
  • Add serializer and caps.

PR 5: Workflow UI and template

  • Add usable filter controls.
  • Add advanced filter input.
  • Add weekly AI report workflow template.

Risks

Filter language becomes a core API

Once workflows and AI both depend on it, PostsFilter syntax becomes semi-public.

Mitigation:

  • document supported filters
  • follow TopicsFilter conventions
  • keep parser strict
  • reject/track invalid filters

Memory and cost explosions

Post bodies can be large and AI nodes can be expensive.

Mitigation:

  • default conservative count limit
  • hard 50 MB serialized output cap to start
  • log truncation
  • require explicit higher limits only if we later support them

Actor/private-content ambiguity

Scheduled workflows can run without an obvious human actor.

Mitigation:

  • align with topic retriever behavior
  • make actor visible in node configuration where needed
  • never bypass Post.secured / Topic.secured

Plugin filter compatibility

Custom filters must not make OR relations structurally incompatible.

Mitigation:

  • document filter registration constraints
  • test custom filters with OR groups where possible

Success criteria

  • PostsFilter exists in core and is aligned with TopicsFilter.
  • Existing Discourse AI researcher behavior is preserved through PostsFilter.
  • AI report automation uses PostsFilter; no duplicate report-only post relation remains.
  • Workflows have a day-0 usable Post node with create, get, and rich list.
  • Post: list is powerful enough to replace the HTTP/Data Explorer workaround for AI report workflows.
  • Assign-related post filtering is plugin-registered, not hardcoded in core.
  • Post retrieval remains permission-safe and memory-bounded.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment