Make post retrieval a first-class Discourse primitive that can be reused by:
- Discourse Workflows
- Discourse AI researcher
- Discourse AI report automation
- future reporting / automation features
The end state is:
Core
└── PostsFilter
- safe, permission-aware post filtering
- reusable from Ruby and UI-backed workflows
- extensible by plugins
Discourse AI
├── researcher tool uses PostsFilter
└── AI report automation uses PostsFilter
Discourse Workflows
└── action:post
- create
- get
- list, backed by PostsFilter
This avoids HTTP self-calls, Data Explorer workarounds, custom SQL, and duplicated filter logic.
-
Name / shape
- Core already has
TopicsFilter. - The new core primitive should be named
PostsFilterfor consistency.
- Core already has
-
Workflow UI must be usable from day 0
- A raw filter string alone is not enough.
- The workflow
Post: listnode needs an admin-friendly filter UI at launch. - A text filter can exist as an advanced escape hatch, but common filters should be configured with controls.
-
Actor / permission semantics should align with topic retrieval
- Follow the same actor pattern used by the existing workflow topic retriever/list operations.
- The node should not invent a separate private-content model.
-
Ship a complete replacement, not a partial filter
PostsFiltershould fully replace the relevant Discourse AI researcher filter behavior, not only support a minimal date/category/tag subset.
-
Plugin filters should work like topic filters
assigned_to:should be registered by the assign plugin, analogous to how topic filters are extensible.- Core should provide a filter registration extension point.
-
Migrate all existing consumers
- Researcher tool should move to
PostsFilter. - AI report automation should move to
PostsFilter. - Workflows should use
PostsFilterforPost: list.
- Researcher tool should move to
-
Memory cap
- Result size is tricky because post bodies can be large and downstream AI nodes can multiply cost.
- Start with a memory cap around 50 MB for retrieved/serialized post data.
- This is in addition to count limits.
The richer existing implementation lives in Discourse AI:
plugins/discourse-ai/lib/utils/research/filter.rb
plugins/discourse-ai/lib/utils/research/llm_formatter.rb
plugins/discourse-ai/lib/agents/tools/researcher.rb
The reusable part is:
DiscourseAi::Utils::Research::FilterCurrent usage pattern:
filter = DiscourseAi::Utils::Research::Filter.new(
filter_string,
limit: max_results,
guardian: guardian,
)
posts = filter.searchImportant current behavior:
- Uses
Post.secured(guardian). - Uses
Topic.secured(guardian). - Excludes PMs by constraining to regular topic archetype.
- Supports AND filters and
ORgroups. - Returns an
ActiveRecord::Relation<Post>. - Tracks invalid filter fragments.
- Supports a broad set of useful post filters.
The new PostsFilter should preserve the existing researcher filter behavior, including:
username:user1
usernames:user1,user2
group:group1
groups:group1,group2
post_type:first
post_type:reply
keywords:word1,word2
topic_keywords:word1,word2
topic:123
topics:123,456
category:bugs
category:=bugs
category:support/bugs
categories:bugs,feature
tag:urgent
tags:urgent,regression
after:YYYY-MM-DD
before:YYYY-MM-DD
topic_after:YYYY-MM-DD
topic_before:YYYY-MM-DD
status:open
status:closed
status:archived
status:noreplies
status:single_user
max_results:50
order:latest
order:oldest
order:latest_topic
order:oldest_topic
order:likes
It should also preserve OR grouping:
category:bugs OR tag:urgent
To fully replace AI report automation and make workflows useful, add exclusion support:
-category:bugs
-=category:bugs
exclude_category:bugs
exclude_categories:bugs,staff
-tag:internal
exclude_tag:internal
exclude_tags:internal,noise
The exact syntax should align with TopicsFilter conventions where possible.
Core already has:
lib/topics_filter.rb
PostsFilter should intentionally mirror the useful parts of TopicsFilter:
- class name style:
PostsFilter filter_from_query_string(query_string)style API where appropriate- aliases like singular/plural filter names
- prefix handling for inclusion/exclusion/exact matching
- option metadata for UI builders
- custom filter extension hooks
- strict allowlisted filters, no arbitrary SQL
Potential core shape:
filter = PostsFilter.new(
guardian: guardian,
scope: Post.all,
)
posts = filter.filter_from_query_string(query_string)However, we should preserve the convenient researcher-style constructor too if it reduces migration friction:
filter = PostsFilter.new(
query_string,
guardian: guardian,
limit: limit,
offset: offset,
)
posts = filter.searchRecommendation: implement the TopicsFilter-aligned API as primary, and provide small compatibility helpers for the old researcher API during migration.
TopicsFilter has extension points for custom filters. PostsFilter should too.
Target shape:
PostsFilter.add_filter("assigned_to", enabled: -> { SiteSetting.assign_enabled }) do |scope, values, guardian|
# plugin-provided filtering
endor similar, matching TopicsFilter conventions as closely as possible.
The assign plugin should register assigned_to: rather than core owning assign-specific SQL.
Desired assign syntax:
assigned_to:username
assigned_to:username1,username2
assigned_to:*
assigned_to:nobody
Add a new workflow node:
action:post
Location:
plugins/discourse-workflows/lib/discourse_workflows/nodes/post/v1.rb
Operations:
create
get
list
This is straightforward and should reuse the existing action:create_post logic.
Parameters:
topic_id
raw
reply_to_post_number
author_username
Output:
{
"post": { ... }
}Notes:
- Keep
action:create_postfor compatibility. - Share the implementation so behavior does not drift.
- Continue avoiding recursive workflow triggers when creating posts from workflows.
Also straightforward.
Parameters:
post_id
actor_username / actor setting aligned with topic retriever
include_raw
include_cooked
Behavior:
- Find post.
- Authorize through the same actor/guardian model used by topic retrieval.
- Serialize workflow-friendly post data.
This is the important operation.
It should use PostsFilter.
The UI should expose common filters with controls from day 0.
Recommended first UI:
operation: list
Date range:
created_after
created_before
topic_created_after
topic_created_before
Scope:
categories
exclude_categories
exact_category_match / include_subcategories
tags
exclude_tags
topics
usernames
groups
Post type:
all regular posts
first posts only
replies only
Status:
open
closed
archived
no replies
single user
Text search:
keywords
topic_keywords
Ordering:
latest
oldest
latest_topic
oldest_topic
likes
Limits:
max_results
offset
memory_cap_mb, default 50
Advanced:
raw filter string / additional filter query
The UI should compile these controls into the same PostsFilter backend. The advanced raw filter string can be appended or combined with generated filters.
Return one workflow item per post:
[
{ "json": { "post": { ... } } },
{ "json": { "post": { ... } } }
]This composes well with downstream workflow nodes. A report workflow can use a Code node to combine items into a corpus.
A future output mode can produce a single item with a posts array, but that is not required for the first version.
Do not expose full ActiveRecord attributes directly.
Create a small serializer/helper for workflow output.
Suggested fields:
{
"id": 123,
"topic_id": 45,
"topic_title": "Example topic",
"topic_slug": "example-topic",
"post_number": 2,
"post_url": "/t/example-topic/45/2",
"username": "sam",
"user_id": 7,
"created_at": "2026-05-29T12:00:00Z",
"updated_at": "2026-05-29T12:05:00Z",
"raw": "Post body",
"cooked": "<p>Post body</p>",
"excerpt": "Post body",
"like_count": 3,
"reply_count": 1,
"score": 0.5,
"category_id": 4,
"category_name": "General",
"tags": ["weekly-report"]
}For AI/report workflows, the critical fields are:
rawpost_urltopic_titleusernamecreated_atcategory_nametags- engagement counts
Post: list needs both count and memory safeguards.
Initial recommendation:
memory cap: 50 MB serialized output
Behavior:
- Track approximate serialized payload size while building output items.
- Stop when the cap is reached.
- Include metadata indicating truncation.
- Log a workflow warning when truncation happens.
Possible metadata:
{
"truncated": true,
"truncation_reason": "memory_cap",
"memory_cap_bytes": 52428800,
"posts_returned": 137
}Because workflows normally return one item per post, metadata placement needs design. Options:
- Add metadata to each item.
- Add a final metadata item.
- Add workflow execution log warning only.
- Support a single-output wrapper mode later.
Recommendation for first version:
- log warning in execution log
- expose
truncatedmetadata on each item only if needed by downstream nodes - keep implementation simple
PostsFilter must be permission-aware by default.
Use the existing researcher approach as the base:
Post
.secured(guardian)
.joins(:topic)
.merge(Topic.secured(guardian))
.where("topics.archetype = 'regular'")Workflow actor semantics should align with the current topic retriever/list behavior.
Private content should only appear if the workflow actor can see it. PMs should remain excluded unless a separate explicit PM feature is added later.
Update:
plugins/discourse-ai/lib/agents/tools/researcher.rb
from:
DiscourseAi::Utils::Research::Filterto:
PostsFilterKeep the AI-specific pieces in Discourse AI:
LlmFormatter- token batching
- goals
- inference
- cancellation
- tool output formatting
Keep:
plugins/discourse-ai/lib/utils/research/llm_formatter.rb
in the AI plugin for now. It is LLM-specific and does not belong in core.
Update:
plugins/discourse-ai/lib/automation/report_context_generator.rb
plugins/discourse-ai/lib/automation/report_runner.rb
to use PostsFilter as well.
This is required for a complete replacement. The report automation should not keep a separate relation-building implementation long term.
Desired workflow:
Weekly schedule
-> Post: list
created_after: 7 days ago
order: latest
max_results: 200
memory cap: 50 MB
-> Code: build post corpus / chunk if needed
-> AI Agent: themes report
-> AI Agent: actions report
-> AI Agent: validation report
-> Merge reports
-> AI Agent: synthesize and validate
-> Topic: create
More examples enabled by PostsFilter:
created_after: 7 days ago
categories: support
tags: bug,regression
order: likes
created_after: 30 days ago
groups: moderators
post_type: reply
order: latest
topic_keywords: upload,error
OR
tags: uploads
- Create
PostsFilter, aligned withTopicsFilter. - Port the researcher filter behavior.
- Preserve existing tests from Discourse AI researcher filter specs.
- Add missing exclusion filters needed for complete replacement.
- Add plugin extension hooks.
- Move
assigned_to:out of the extracted core filter. - Register it from the assign plugin or equivalent plugin initializer.
- Ensure behavior matches topic filter extensibility.
- Update researcher tool to use
PostsFilter. - Keep LLM formatter in AI.
- Keep behavior and specs passing.
- Update
ReportContextGeneratorto usePostsFilter. - Preserve report behavior.
- Use the same include/exclude category/tag semantics as workflows and researcher.
- Implement
create. - Implement
get. - Implement
listwithPostsFilter. - Add memory cap handling.
- Add workflow post serializer.
- Add node specs.
- Add form controls for the common filters.
- Include an advanced/raw filter input for power users.
- Ensure generated filter maps to
PostsFilterbehavior.
Add a template for:
Weekly AI post report
Graph:
Schedule
-> Post: list
-> Code: build corpus
-> AI Agent: themes
-> AI Agent: actions
-> AI Agent: validation
-> Merge
-> AI Agent: synthesize
-> Topic: create
Port and expand current researcher filter specs for:
- usernames, including unicode usernames
- groups
- categories
- exact category matching
- category parent/child paths
- tags
- topics
- post type
- date filters
- topic date filters
- status filters
- keywords
- topic keywords
- ordering
- max results
- OR groups
- invalid filters
- PM exclusion
- secure category visibility
- plugin-registered filters
- exclusion filters
- Researcher tool still finds and processes the same posts.
- Dry run still counts matches.
- Invalid filters still return useful error messages.
- AI report automation includes/excludes the same posts as before.
Post: createcreates expected posts.Post: getretrieves and serializes visible posts.Post: getrejects inaccessible posts.Post: listreturns expected posts for common UI filters.Post: listhandles advanced raw filters.Post: listrespects actor permissions.Post: listenforces count and memory caps.Post: listreturns one workflow item per post.
- Add
PostsFilterto core. - Align with
TopicsFilterconventions. - Port researcher behavior and specs.
- Add extension hook.
- Move/register
assigned_to:through plugin extension. - Add tests for plugin filter registration.
- Researcher uses
PostsFilter. - AI report automation uses
PostsFilter. - Remove duplicated query logic.
- Add
action:post. - Implement
create,get,list. - Add serializer and caps.
- Add usable filter controls.
- Add advanced filter input.
- Add weekly AI report workflow template.
Once workflows and AI both depend on it, PostsFilter syntax becomes semi-public.
Mitigation:
- document supported filters
- follow
TopicsFilterconventions - keep parser strict
- reject/track invalid filters
Post bodies can be large and AI nodes can be expensive.
Mitigation:
- default conservative count limit
- hard 50 MB serialized output cap to start
- log truncation
- require explicit higher limits only if we later support them
Scheduled workflows can run without an obvious human actor.
Mitigation:
- align with topic retriever behavior
- make actor visible in node configuration where needed
- never bypass
Post.secured/Topic.secured
Custom filters must not make OR relations structurally incompatible.
Mitigation:
- document filter registration constraints
- test custom filters with OR groups where possible
PostsFilterexists in core and is aligned withTopicsFilter.- Existing Discourse AI researcher behavior is preserved through
PostsFilter. - AI report automation uses
PostsFilter; no duplicate report-only post relation remains. - Workflows have a day-0 usable
Postnode withcreate,get, and richlist. Post: listis powerful enough to replace the HTTP/Data Explorer workaround for AI report workflows.- Assign-related post filtering is plugin-registered, not hardcoded in core.
- Post retrieval remains permission-safe and memory-bounded.