Skip to content

Instantly share code, notes, and snippets.

@daneuchar
Created March 26, 2026 17:59
Show Gist options
  • Select an option

  • Save daneuchar/b5e6c339edeaf863ec0feedc51a4dbe5 to your computer and use it in GitHub Desktop.

Select an option

Save daneuchar/b5e6c339edeaf863ec0feedc51a4dbe5 to your computer and use it in GitHub Desktop.
idea
Idea 1: Producer-Side Agent for Data Product Discovery & Structuring
The core insight here is that producers often have raw data but struggle with the "productization" step. An AI agent dropped into a producer's environment could do several things. First, it would profile and catalog the data automatically — scanning tables, files, or streams to understand schema, data types, cardinality, null rates, and statistical distributions. Second, it could infer semantic meaning by looking at column names, sample values, and relationships to suggest business-friendly names, descriptions, and domain classification (e.g., "this looks like customer PII" or "this appears to be a clickstream event"). Third, it would recommend a data product structure — suggesting how to split or combine tables into coherent, self-contained data products aligned with domain boundaries. It could propose SLAs based on observed update frequency, suggest primary keys, recommend partitioning strategies, and flag quality issues before registration. Finally, it could generate the registration metadata — pre-filling your platform's registration form with descriptions, tags, lineage hints, and schema definitions so the producer just reviews and approves rather than writing from scratch.
The key architectural consideration is where this agent runs. If it connects directly to the producer's source systems (databases, S3 buckets, Kafka topics), it needs secure, scoped access. A lighter approach is having the producer point it at a sample dataset or a staging area.
Idea 2: Natural Language Policy Generation
Access policies in data mesh are notoriously painful — they involve combining business rules ("marketing can see aggregated demographics but not individual records") with technical enforcement (row-level security, column masking, role mappings in Starburst). An AI layer here would let a data steward type something like "allow the analytics team to query order data but mask customer email and restrict to orders from the last 2 years" and have the system generate the corresponding Starburst access control rules, column masks, and row filters. Beyond generation, the AI could validate policies against each other — detecting conflicts, gaps, or overly permissive rules. It could also explain existing policies in plain English so auditors or new team members can understand what's enforced without reading SQL predicates. One important design choice: you'd want a "propose and review" workflow rather than auto-apply, so humans stay in the loop for governance.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment