Specification for AI Tools

Final Detailed Specification

Overview

Purpose:
Develop a Node.js application that runs once per day, retrieves events from a PostgreSQL database older than a configurable threshold (default one year), groups them by their UTC-created date, and exports each group as a CSV file. Each CSV is named in the format events_YYYY-MM-DD.csv and stored in an Azure Data Lake under the folder titan-pulse-event-archive. After successful upload, the corresponding events are deleted from the database in one transaction.
Deployment Environment:
The application will run in a Docker container (using a base image node:20-alpine3.18) deployed as a Kubernetes CronJob scheduled daily at 02:00 AM UTC in an Azure Kubernetes Cluster.

Application Details

Database Schema:
The PostgreSQL table is defined as follows:

CREATE TABLE IF NOT EXISTS public."Event" (
    id integer NOT NULL DEFAULT nextval('"Event_id_seq"'::regclass),
    event_id text NOT NULL,
    event_type text NOT NULL,
    created_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP,
    processed_at timestamp(3) without time zone,
    payload jsonb NOT NULL,
    CONSTRAINT "Event_pkey" PRIMARY KEY (id)
);

The created_at column is used to identify events older than one year.

CSV File Export:
- Columns: All columns (id, event_id, event_type, created_at, processed_at, payload) are included.
- JSON Payload: Must be properly escaped so it can be later re-imported as valid JSON.
- Naming & Structure:
  - Files are grouped by the day (UTC) based on created_at.
  - For each day, a single CSV file is generated with the name events_YYYY-MM-DD.csv.
  - Files are stored inside the titan-pulse-event-archive folder.
- Appending: If a CSV file for a given day already exists, new records should be appended.
- Folder Management: The application should verify the existence of titan-pulse-event-archive and create it if needed.
Process Flow:
1. Query: Connect to PostgreSQL (using pg library) and query for events older than the configurable threshold.
2. Grouping: Group retrieved events by day (using UTC from created_at).
3. CSV Generation: Use fast-csv to generate a CSV for each day including a header row.
4. Upload: Connect to Azure Storage (using @azure/storage-blob) and ensure the folder exists, then upload (or append to) the CSV files.
5. Transaction & Deletion:
  - The entire operation (export, upload, deletion) must occur in a single transaction.
  - If any step fails, log the error using standard container logging, abort the transaction, and do not delete the events.
6. Dry-run Mode:
  - A configurable mode where the CSV export and upload are executed, but deletion from PostgreSQL is skipped.
7. Metrics Logging:
  - Log the number of events processed per day and the total execution time as standard container logs.
Configuration & Credentials:
All sensitive information and configuration settings (PostgreSQL and Azure Storage connection strings, retention threshold, dry-run flag, etc.) are provided via environment variables sourced from Kubernetes secrets.

Technical Specifications

Runtime & Libraries:
- Node.js Version: node:20-alpine3.18
- PostgreSQL Connection: pg library
- CSV Generation: fast-csv
- Azure Storage Interaction: @azure/storage-blob
- Use ES Modules (ESM) import syntax
- Organize the code into separate modules for better maintainability
- Store the modules into the libfolder
Resource Constraints:
- Requests:
  - Memory: 256Mi
  - CPU: 100m
- Limits:
  - Memory: 512Mi
  - CPU: 300m
Trigger & Scheduling:
A Kubernetes CronJob will trigger the job daily at 02:00 AM UTC.

Kubernetes CronJob YAML Manifest

Below is an example YAML manifest that includes all the necessary configuration details:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: event-archiver
  labels:
    app: event-archiver
spec:
  schedule: "0 2 * * *"  # Runs daily at 02:00 AM UTC
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: event-archiver
            image: your-docker-repo/event-archiver:latest  # Replace with your image
            imagePullPolicy: Always
            env:
              - name: PG_CONNECTION_STRING
                valueFrom:
                  secretKeyRef:
                    name: pg-credentials
                    key: connectionString
              - name: AZURE_STORAGE_CONNECTION_STRING
                valueFrom:
                  secretKeyRef:
                    name: azure-storage-credentials
                    key: connectionString
              - name: EVENT_RETENTION_PERIOD
                value: "1 year"  # or an equivalent duration (configurable)
              - name: DRY_RUN
                value: "false"  # Set to "true" for dry-run mode
              # Additional configuration variables as needed
            resources:
              requests:
                memory: "256Mi"
                cpu: "100m"
              limits:
                memory: "512Mi"
                cpu: "300m"
            command: ["node", "app.js"]
          restartPolicy: OnFailure

Summary

Process:
The application will retrieve, group, and export events older than one year into CSVs named by their UTC date. It then uploads (or appends) these files to Azure Storage and deletes the events from PostgreSQL in one atomic transaction—unless in dry-run mode.
Deployment:
It runs as a Kubernetes CronJob at 02:00 AM UTC using a Node.js container (node:20-alpine3.18) with credentials injected via Kubernetes secrets.