-
Purpose:
Develop a Node.js application that runs once per day, retrieves events from a PostgreSQL database older than a configurable threshold (default one year), groups them by their UTC-created date, and exports each group as a CSV file. Each CSV is named in the formatevents_YYYY-MM-DD.csv
and stored in an Azure Data Lake under the foldertitan-pulse-event-archive
. After successful upload, the corresponding events are deleted from the database in one transaction. -
Deployment Environment:
The application will run in a Docker container (using a base imagenode:20-alpine3.18
) deployed as a Kubernetes CronJob scheduled daily at 02:00 AM UTC in an Azure Kubernetes Cluster.
-
Database Schema:
The PostgreSQL table is defined as follows:CREATE TABLE IF NOT EXISTS public."Event" ( id integer NOT NULL DEFAULT nextval('"Event_id_seq"'::regclass), event_id text NOT NULL, event_type text NOT NULL, created_at timestamp(3) without time zone NOT NULL DEFAULT CURRENT_TIMESTAMP, processed_at timestamp(3) without time zone, payload jsonb NOT NULL, CONSTRAINT "Event_pkey" PRIMARY KEY (id) );
The
created_at
column is used to identify events older than one year. -
CSV File Export:
- Columns: All columns (id, event_id, event_type, created_at, processed_at, payload) are included.
- JSON Payload: Must be properly escaped so it can be later re-imported as valid JSON.
- Naming & Structure:
- Files are grouped by the day (UTC) based on
created_at
. - For each day, a single CSV file is generated with the name
events_YYYY-MM-DD.csv
. - Files are stored inside the
titan-pulse-event-archive
folder.
- Files are grouped by the day (UTC) based on
- Appending: If a CSV file for a given day already exists, new records should be appended.
- Folder Management: The application should verify the existence of
titan-pulse-event-archive
and create it if needed.
-
Process Flow:
- Query: Connect to PostgreSQL (using
pg
library) and query for events older than the configurable threshold. - Grouping: Group retrieved events by day (using UTC from
created_at
). - CSV Generation: Use
fast-csv
to generate a CSV for each day including a header row. - Upload: Connect to Azure Storage (using
@azure/storage-blob
) and ensure the folder exists, then upload (or append to) the CSV files. - Transaction & Deletion:
- The entire operation (export, upload, deletion) must occur in a single transaction.
- If any step fails, log the error using standard container logging, abort the transaction, and do not delete the events.
- Dry-run Mode:
- A configurable mode where the CSV export and upload are executed, but deletion from PostgreSQL is skipped.
- Metrics Logging:
- Log the number of events processed per day and the total execution time as standard container logs.
- Query: Connect to PostgreSQL (using
-
Configuration & Credentials:
All sensitive information and configuration settings (PostgreSQL and Azure Storage connection strings, retention threshold, dry-run flag, etc.) are provided via environment variables sourced from Kubernetes secrets.
-
Runtime & Libraries:
- Node.js Version: node:20-alpine3.18
- PostgreSQL Connection:
pg
library - CSV Generation:
fast-csv
- Azure Storage Interaction:
@azure/storage-blob
- Use ES Modules (ESM) import syntax
- Organize the code into separate modules for better maintainability
- Store the modules into the libfolder
-
Resource Constraints:
- Requests:
- Memory: 256Mi
- CPU: 100m
- Limits:
- Memory: 512Mi
- CPU: 300m
- Requests:
-
Trigger & Scheduling:
A Kubernetes CronJob will trigger the job daily at 02:00 AM UTC.
Below is an example YAML manifest that includes all the necessary configuration details:
apiVersion: batch/v1
kind: CronJob
metadata:
name: event-archiver
labels:
app: event-archiver
spec:
schedule: "0 2 * * *" # Runs daily at 02:00 AM UTC
jobTemplate:
spec:
template:
spec:
containers:
- name: event-archiver
image: your-docker-repo/event-archiver:latest # Replace with your image
imagePullPolicy: Always
env:
- name: PG_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: pg-credentials
key: connectionString
- name: AZURE_STORAGE_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: azure-storage-credentials
key: connectionString
- name: EVENT_RETENTION_PERIOD
value: "1 year" # or an equivalent duration (configurable)
- name: DRY_RUN
value: "false" # Set to "true" for dry-run mode
# Additional configuration variables as needed
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "300m"
command: ["node", "app.js"]
restartPolicy: OnFailure
-
Process:
The application will retrieve, group, and export events older than one year into CSVs named by their UTC date. It then uploads (or appends) these files to Azure Storage and deletes the events from PostgreSQL in one atomic transaction—unless in dry-run mode. -
Deployment:
It runs as a Kubernetes CronJob at 02:00 AM UTC using a Node.js container (node:20-alpine3.18) with credentials injected via Kubernetes secrets.