Skip to content

Instantly share code, notes, and snippets.

@willwade
Last active October 2, 2025 08:38
Show Gist options
  • Save willwade/f700875b707c1c9c7320b776ea082868 to your computer and use it in GitHub Desktop.
Save willwade/f700875b707c1c9c7320b776ea082868 to your computer and use it in GitHub Desktop.
Details on NHS CARM

Project Abstract and Methodology

Title

Building a Unified AAC Research Data Infrastructure for the NHS

Abstract

Specialised AAC services across the NHS hold valuable data on the people they support, the interventions provided, and the technology prescribed. However, this data is fragmented across local SQL databases, Word/PDF case notes, and file storage systems. This fragmentation makes it difficult to answer critical national questions such as:

  • Who are the clients currently using AAC?
  • What is the distribution of diagnoses (e.g., cerebral palsy GMFCS III–V, MND bulbar vs spinal onset)?
  • Which equipment types, access methods, and symbol sets are most commonly used?
  • What are the average times from referral → assessment → provision?
  • Where are the bottlenecks, gaps, or inequities in service delivery?

This project proposes to build a national AAC research data layer that unifies data from multiple services into a consistent, de-identified, and ethically governed system. By combining structured (SQL) and unstructured (Word/PDF) sources into a standard model, and embedding a Trusted Research Environment (TRE) with the HDR UK Data Access Agreement (DAA), the project will provide an independent, transparent basis for answering service planning and equity questions across the UK.


Aims

  1. Unify Data Sources: Connect to local SQL databases and file stores across services, extracting structured and unstructured data.
  2. Standardise & De-identify: Transform all data into a Common AAC Research Model (CARM), with pseudonymisation at source.
  3. Strengthen Governance: Implement the HDR UK DAA template within the TRE, ensuring clear, standardised data access processes.
  4. Enrich Clinical Understanding: Apply text mining and rule-based coding to extract structured diagnosis categories (e.g., GMFCS levels, MND onset type).
  5. Support National Queries: Provide an independent query interface, enabling researchers and commissioners to explore patterns in AAC usage, provision, and timelines.
  6. Highlight Gaps & Inequities: Identify where services struggle with provision timelines, access to technology, or equitable distribution of resources.

Methodology

Phase 1 – Foundations (2025 Q2–Q3)

  • HDR UK DAA adoption project (£30k grant, Feb–Aug 2026):
    • Gap analysis of current contracts vs DAA template.
    • Population of annexes, SOP updates, training of IG/contracts staff.
    • Pilot AAC dataset access via TRE under DAA.
    • Collect impact metrics: time from approval → DAA signature, contract iteration counts, qualitative ease-of-use.
  • Output: TRE operational with DAA in place; one pilot AAC dataset accessible under new governance.

Phase 2 – Technical Implementation (2025 Q3–2026 Q2)

  • Ingestion of structured SQL + unstructured file data into Spark/Delta.
  • Build bronze → silver → gold tables:
    • Persons (de-identified), Encounters, Provisions, Notes, Diagnoses.
  • Apply rule-based dictionaries (GMFCS, MND subtypes).
  • Establish de-identified gold dataset in TRE.

Phase 3 – Analytics & Evaluation (2026 Q2–Q4)

  • Run initial analyses:
    • Client mix (diagnoses, age, region).
    • Timelines (referral → assessment → provision).
    • Equipment/access method distributions.
    • Equity across IMD quintiles and regions.
  • Independent evaluation of data quality, accuracy of diagnosis inference, and TRE usability.

Phase 4 – Scale & Sustain (2027 onward)

  • Apply for larger NIHR i4i or SBRI Healthcare calls (~£300–500k) to extend to all specialised AAC services.
  • Integrate into NHS England commissioning metrics.
  • Publish annual AAC service equity and access reports.

Governance

  • Pseudonymisation at source using HMAC on NHS numbers; no direct identifiers exported.
  • Trusted Research Environment (TRE) holds only de-identified gold data.
  • Data Access Agreement (DAA) governs researcher access, embedded via HDR UK grant.
  • Transparency standards: publish data dictionary, governance processes, and small-n suppression rules.
  • Ethics: independent governance board with patient/AT user representation.

Funding Strategy

  • Immediate (2025): HDR UK TRE/SDE DAA Adoption Grant (£30k, Feb 2025–Aug 2026).
  • Mid-term (2026–27): NIHR i4i PDA or FAST awards for R&D of middleware + multi-site deployment (£200–500k).
  • Challenge-led (2026–28): SBRI Healthcare themed calls (productivity, equity, digital data) to fund national scaling.
  • Complementary: HDR UK small calls (TRE infrastructure), Health Foundation/charitable funds for user involvement, and local ICB contributions for service deployment.

Expected Impact

  • Provide the first unified national picture of AAC service users in the NHS.
  • Enable evidence-based planning of provision models, workforce, and funding.
  • Support equity monitoring, ensuring clients across diagnoses, geographies, and socioeconomic groups have equal access.
  • Allow commissioners to identify where provision delays or equipment mismatches occur and act proactively.
  • Demonstrate a governance-first approach by embedding the HDR UK DAA, creating a replicable model for other datasets.

Project Technical Details

Overview

This project establishes a national AAC research data infrastructure. It ingests SQL and file-based data from individual services, transforms them into a Common AAC Research Model (CARM), and provides secure, de-identified query access via a Trusted Research Environment (TRE). Governance is strengthened by adopting the HDR UK Data Access Agreement (DAA) template, ensuring consistent, transparent, and efficient contracting for data use.


Data Sources

  1. SQL Databases (service-local)

    • Typical: SQLite, MySQL, SQL Server, Postgres.
    • Contents: demographics, encounters, referrals, device records.
    • Access: read-only connectors with local pseudonymisation.
  2. File Stores (service-local)

    • Locations: SMB shares, SharePoint/OneDrive, S3/MinIO.
    • Contents: assessment reports, clinic letters, case notes.
    • Ingestion: crawlers parse documents with Apache Tika + OCR (Tesseract).

Data Flow & Architecture

Layers

  • Bronze: raw extracts (SQL tables, file text).
  • Silver: cleaned and structured:
    • persons, encounters, provisions, notes_sentences, entity_spans.
  • Gold: analysis-ready:
    • dx_inferred (diagnoses, with evidence + confidence).
    • timelines (referral → provision intervals).
    • equipment_usage (device classes, access methods, symbol sets).

Governance

  • Pseudonymisation at source: HMAC(NHS number, project key).
  • DAA adoption: governs all TRE access.
  • TRE access control: SQL (Trino/Presto), JSON API, suppression rules.
  • Auditability: every derived diagnosis linked to source snippet (entity span).

Timeline & Milestones

Phase 1 – Governance & Foundations (Q2–Q3 2025)

  • HDR UK DAA Adoption Project (£30k grant)
    • Map existing agreements vs DAA template.
    • Populate annexes, update SOPs.
    • Deliver training for IG/contracts staff.
    • Publish transparency page per HDR UK Alliance standards.
  • Output: TRE operational with DAA, ready to accept pilot AAC dataset.

Phase 2 – Ingestion & Model Build (Q3 2025 – Q2 2026)

  • Deploy SQL connectors (read-only).
  • Ingest file stores, parse docs → docs_raw.
  • Transform to notes_sentences and entity_spans.
  • Build CARM gold tables:
    • persons, encounters, provisions.
    • Apply rule-based extraction for GMFCS levels, MND subtypes.
  • Output: First pilot AAC gold dataset, securely available in TRE.

Phase 3 – Analytics & Evaluation (Q2–Q4 2026)

  • Run core analyses:
    • Client mix: age, sex, condition (CP, MND, Rett, TBI).
    • Referral/assessment/provision timelines by subgroup.
    • Equipment & access method distributions.
    • Regional and IMD quintile comparisons.
  • Evaluate DAA adoption:
    • Metrics: median days approval→DAA signature (before vs after), iterations, requester satisfaction.
  • Output: Independent evaluation report + first national AAC dataset publication (de-identified).

Phase 4 – Scale & Sustain (2027 onward)

  • Apply for NIHR i4i PDA/FAST or SBRI Healthcare funding (~£300–500k).
  • Extend ingestion nodes to all NHS specialised AAC services.
  • Regular refreshes of gold dataset.
  • Embed reporting into NHS England commissioning dashboards.
  • Output: Sustainable, national AAC research infrastructure.

Milestone Diagram (textual Gantt)

2025 Q2 ─────────────────────────────┐
   Phase 1: HDR UK DAA Adoption      │
        - Contract mapping            │
        - SOPs, annexes, training     │
        - TRE readiness               │
2025 Q3 ─────────────────────────────┤
   Phase 2: Ingestion & Model Build   │
        - SQL + file ingestion        │
        - Bronze → Silver → Gold      │
        - First pilot dataset         │
2026 Q2 ─────────────────────────────┤
   Phase 3: Analytics & Evaluation    │
        - National queries            │
        - DAA impact metrics          │
        - Evaluation report           │
2027+   ─────────────────────────────┤
   Phase 4: Scale & Sustain           │
        - National roll-out           │
        - Annual AAC reports          │
        - NIHR/SBRI scale-up          │

Example Queries Enabled

  • Demographics & diagnosis mix

    SELECT condition_code, COUNT(DISTINCT person_pid)
    FROM dx_inferred
    WHERE confidence >= 0.8
    GROUP BY condition_code;
  • Referral-to-provision intervals

    SELECT AVG(DATEDIFF(provision.issued_dt, encounter.start_ts)) AS avg_days
    FROM encounters e
    JOIN aac_provisions provision ON e.encounter_id = provision.encounter_id;
  • Equipment usage patterns

    SELECT device_class, access_method, COUNT(*) AS n
    FROM aac_provisions
    GROUP BY device_class, access_method;

Expected Benefits

  • Data visibility: first national picture of AAC users in NHS specialised services.
  • Service improvement: clear evidence of referral bottlenecks, provision delays, and equity gaps.
  • Research acceleration: faster, standardised access to AAC datasets via the TRE + DAA.
  • Scalability: reproducible ingestion pipeline applicable across all AAC services.
  • Policy alignment: supports NHS England, NIHR, and HDR UK objectives for data-driven healthcare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment