Task | API Endpoint | Script Example | Section |
---|---|---|---|
User Staging | /api/v2/user/status |
stage_users.py |
3.1 |
Tag Migration | /api/v2/tag |
migrate_tags.py |
2.4 |
Policy Testing | /api/v2/policy/test |
test_policy.py |
2.3 |
User Sync | /api/v2/user/sync |
sync_users.py |
2.2 |
SDD Scan | /api/v2/sdd/scan |
run_sdd_scan.py |
2.4 |
Data Product Creation | /api/v2/dataSource |
create_data_product.py |
2.5 |
This Standard Operating Procedure (SOP) outlines best practices for Business As Usual (BAU) operations, change management, and incident handling for the Immuta Data Security Platform integrated with Snowflake. It addresses challenges in your environment: restricted MMD access, AD-based authentication, incomplete API documentation, and complex tasks like tag migration across environments.
Core Principles:
- Reliability First: Ensure uptime and rapid recovery with proactive monitoring.
- Automation-Driven: Prioritize API-based automation to reduce toil, despite documentation gaps.
- Security by Default: Enforce least-privilege and auditability in all operations.
- Cost Efficiency: Optimize Snowflake compute and storage usage.
- Continuous Learning: Integrate incident learnings to prevent recurrence.
- User-Centric Design: Streamline Data Marketplace access for seamless user experience.
- Disciplined Change Management: Test, validate, and log all changes to minimize risks.
- Tasks: Configure and monitor a dedicated Snowflake warehouse for Immuta.
- Incident Learning: Undersized warehouses cause query queuing; unoptimized syncs spike costs (e.g., during SDD scans).
- Checklist:
- Use X-Small warehouse with 1-minute auto-suspend and transient tables.
- Monitor query history (filter by
IMMUNTA_USER
) for performance (>5s query duration triggers alerts). - Set Snowflake budget alerts (>10 credits/day).
- Define SLI: 99% of Immuta queries complete in <5s.
- Best Practice: Use
/api/v2/dataSource/sync
to batch metadata updates, scheduling during off-peak hours (2 AM NZST). - Reminder: Review warehouse usage biweekly via Snowflake’s cost management dashboard.
- Tasks: Ensure secure, reliable connectivity for metadata and policy enforcement.
- Incident Learning: Expired AD credentials or network latency (>100ms) halt operations.
- Checklist:
- Validate AD-based credentials monthly via
/api/v2/connection/test
. - Apply least-privilege to Immuta’s Snowflake user (e.g.,
SELECT
for metadata,CREATE VIEW
for policies). - Monitor latency between Immuta and Snowflake (<50ms target).
- Document zero-downtime credential rotation process.
- Validate AD-based credentials monthly via
- Best Practice: Enable Connections feature for scalable onboarding (contact Immuta support for pre-Feb 2025 tenants).
- Reminder: Test connections in a sandbox before production changes.
- Tasks: Monitor Immuta services, API, and resource utilization.
- Incident Learning: API failures (e.g., 500 errors) or job queue backlogs (>100 tasks) block operations.
- Checklist:
- Monitor logs (
/var/log/immuta
) and API p99 latency (>500ms triggers alerts). - Track CPU/memory/disk for on-prem Immuta instances (alert on >80% utilization).
- Set health checks via
/api/v2/health
(99.9% uptime SLI). - Alert on job queue length (>100 tasks).
- Monitor logs (
- Best Practice: Deploy Immuta in high-availability mode across two availability zones.
- Reminder: Subscribe to Immuta’s Status page for service updates.
- Tasks: Sync users/groups from AD to Immuta for accurate access control.
- Incident Learning: Sync failures cause stale attributes, leading to policy misapplication or access denials.
- Checklist:
- Monitor sync logs daily via
/api/v2/audit
(alert on <95% success rate). - Audit orphaned accounts quarterly using
/api/v2/user
. - Alert on user/group count changes (>5% deviation).
- Document manual attribute override process.
- Monitor sync logs daily via
- Best Practice: Automate syncs with
/api/v2/user/sync
:import requests import logging def sync_users(headers): try: response = requests.post(f"{IMMUNTA_URL}/api/v2/user/sync", headers=headers) response.raise_for_status() logging.info(f"Sync completed: {response.json()['status']}") except requests.exceptions.HTTPError as e: logging.error(f"Sync failed: {e}, Response: {response.text}") raise
- Reminder: Validate AD group mappings weekly.
- Tasks: Configure and audit default/admin access.
- Incident Learning: Stale birthright policies overexpose data; excessive admin privileges increase risks.
- Checklist:
- Review birthright policies quarterly via
/api/v2/permissions
. - Log privileged actions in UAM (audit via
/api/v2/audit
). - Implement just-in-time (JIT) access for admins using AD groups.
- Review birthright policies quarterly via
- Best Practice: Automate birthright assignments with
/api/v2/user/permissions
. - Reminder: Document privileged roles in ADO.
- Tasks: Build scalable ABAC policies and monitor performance.
- Incident Learning: Complex policies (>5 conditions) slow Snowflake queries, causing timeouts.
- Checklist:
- Write plain-English policies (e.g., “Mask SSN for non-Compliance users”).
- Test policies in sandboxes with
/api/v2/policy/test
. - Monitor Snowflake Query Profile for Immuta view/UDF bottlenecks (>10% query time).
- Limit policy conditions to <5 for performance.
- Best Practice: Use Policy-as-Code with Git/CI-CD pipelines (e.g., GitHub Actions, Azure DevOps).
- Reminder: Stage users (
/api/v2/user/status
) before policy changes.
- Tasks: Ensure policy accuracy and compliance.
- Incident Learning: Misconfigured policies cause data leaks or over-restrictions.
- Checklist:
- Automate policy testing with scripts validating access for sample users.
- Conduct impact analysis (affected users/tables) before deployment.
- Assign policy owners for accountability.
- Best Practice: Use
/api/v2/policy/version
for rollback; integrate with SIEM for audit logging. - Reminder: Log policy changes in UAM.
- Tasks: Automate and validate sensitive data tagging.
- Incident Learning: False positives/negatives cause compliance risks; heavy scans spike Snowflake costs.
- Checklist:
- Run incremental SDD scans weekly via
/api/v2/sdd/scan
for new data. - Validate tags with data owners using
/api/v2/tag
. - Exclude non-sensitive schemas to reduce load.
- Document tag dispute resolution process.
- Run incremental SDD scans weekly via
- Best Practice: Create custom classifiers for domain-specific data and test in sandboxes.
- Reminder: Audit tags monthly for compliance.
- Tasks: Automate tag migration (e.g., Env A to Env B).
- Incident Learning: Missing
id
fields or duplicate tags cause migration failures. - Checklist:
- Export tags with
/api/v2/tag
, stripid
fields, preserve hierarchy (e.g.,tagA.tagB
). - Validate tag uniqueness in target environment before import.
- Test migrations in staging.
- Export tags with
- Best Practice: Use Python script with error handling:
import requests import logging def migrate_tags(source_url, target_url, headers): try: tags = requests.get(f"{source_url}/api/v2/tag", headers=headers).json() cleaned_tags = [{k: v for k, v in tag.items() if k != "id"} for tag in tags] for tag in cleaned_tags: response = requests.post(f"{target_url}/api/v2/tag", headers=headers, json=tag) response.raise_for_status() logging.info("Tag migration completed") except requests.exceptions.HTTPError as e: logging.error(f"Migration failed: {e}, Response: {response.text}") raise
- Reminder: Document API quirks (e.g., inconsistent
id
fields) in ADO doc.
- Tasks: Maintain high-quality data products for user access.
- Incident Learning: Poor metadata increases support tickets; manual approvals create bottlenecks.
- Checklist:
- Audit data product metadata quarterly via
/api/v2/dataSource
. - Enable auto-approvals for pre-governed datasets.
- Monitor usage via
/api/v2/audit
(alert on >10 access requests/hour). - Define SLIs: <5 access requests/hour, >90% user satisfaction (via monthly surveys).
- Audit data product metadata quarterly via
- Best Practice: Automate data product creation with
/api/v2/dataSource
and clear metadata. - Reminder: Collect user feedback monthly via surveys or Slack.
- Tasks: Set up proactive monitoring for Immuta and Snowflake.
- Incident Learning: Vague alerts delay detection (high MTTD); missing metrics prolong diagnosis (high MTTI).
- Checklist:
- Alert on:
- Policy change failures (
/api/v2/audit
, >1 failure/hour). - High denied access attempts (>10/user/hour).
- Anomalous data access (>100x typical query volume).
- Snowflake query failures from Immuta views (>5/hour).
- Immuta resource limits (disk >90%, license usage >95%).
- Policy change failures (
- Define SLIs: API uptime (99.9%), query latency (<5s), sync success rate (>95%).
- Create dashboards correlating Immuta API, Snowflake queries, and AD syncs.
- Alert on:
- Best Practice: Integrate UAM with SIEM (e.g., Datadog) for real-time alerts.
- Reminder: Tune alerts weekly to minimize false positives.
- Tasks: Test changes in a production-like staging environment.
- Incident Learning: Un-tested changes cause outages or access issues.
- Checklist:
- Use anonymized data and user personas for testing.
- Test negative scenarios (e.g., unauthorized access).
- Validate API changes (e.g.,
/api/v2/policy
) in sandboxes.
- Best Practice: Automate testing in CI/CD pipelines (e.g., GitHub Actions, Azure DevOps).
- Reminder: Log test results in ADO doc.
- Tasks: Stage users before policy/attribute changes.
- Incident Learning: Active users during changes trigger query failures or lockouts.
- Checklist:
- Stage users via
/api/v2/user/status
in bulk. - Verify status post-change to avoid lockouts.
- Stage users via
- Best Practice: Automate staging:
import requests import logging def stage_users(headers, user_ids): try: for user_id in user_ids: response = requests.put(f"{IMMUNTA_URL}/api/v2/user/{user_id}/status", headers=headers, json={"status": "staged"}) response.raise_for_status() logging.info(f"Staged user {user_id}") except requests.exceptions.HTTPError as e: logging.error(f"Staging failed: {e}, Response: {response.text}") raise
- Reminder: Schedule changes for off-peak hours (2 AM NZST).
- Tasks: Evaluate impact and require peer reviews.
- Incident Learning: Un-reviewed changes increase error risk.
- Checklist:
- Assess blast radius (e.g., users/tables affected).
- Mandate peer reviews for API-driven changes.
- Categorize changes: minor (1-hour approval), major (2-day review), emergency (escalation path).
- Best Practice: Use Git for Policy-as-Code with automated linting.
- Reminder: Log reviews in a change management system.
- Tasks: Deploy incrementally with versioning.
- Incident Learning: Big-bang deployments amplify failure impact.
- Checklist:
- Use
/api/v2/policy/version
for all changes. - Test high-risk changes on small user groups (canary testing).
- Log deployment steps in UAM.
- Use
- Best Practice: Automate rollouts via CI/CD with GitHub Actions or Azure DevOps.
- Reminder: Define rollback triggers (e.g., >5% query failures).
- Tasks: Prepare tested rollback procedures.
- Incident Learning: Un-tested rollbacks delay recovery (high MTTR).
- Checklist:
- Document rollback steps (e.g.,
/api/v2/policy/rollback
). - Test rollbacks in staging.
- Document rollback steps (e.g.,
- Best Practice: Automate rollback scripts for critical changes.
- Reminder: Validate rollback success post-deployment.
- Tasks: Monitor system behavior to confirm stability.
- Incident Learning: Most incidents occur post-change due to undetected errors.
- Checklist:
- Monitor API latency, Snowflake query performance, and denied access rates.
- Validate changes with test queries (e.g.,
/api/v2/dataSource/test
). - Set 24-hour bake time for stability.
- Best Practice: Use automated validation scripts for change outcomes.
- Reminder: Document observations in ADO doc.
- Tasks: Maintain playbooks for common incidents.
- Examples:
- Policy Lockdown: Roll back via
/api/v2/policy/rollback
, stage users, notify data owners.- Detection: >10 denied accesses/min (
/api/v2/audit
). - Triage: Check policy logs, revert changes.
- Escalation: SRE lead within 10 minutes.
- Detection: >10 denied accesses/min (
- SDD Over-Tagging: Pause scans (
/api/v2/sdd/pause
), validate tags, update classifiers.- Detection: >20% tags flagged as incorrect.
- Triage: Review scan logs, consult data owners.
- Escalation: Data governance lead within 30 minutes.
- Immuta Service Failure: Check
/api/v2/health
, restart services, escalate to support.- Detection: API errors >5/min.
- Triage: Check logs (
/var/log/immuta
), restart services. - Escalation: Immuta support within 15 minutes.
- Data Leak: Isolate data source (
/api/v2/dataSource/disable
), audit logs, notify compliance.- Detection: Anomalous access (>100x typical volume).
- Triage: Review UAM logs, isolate source.
- Escalation: Compliance team within 5 minutes.
- Policy Lockdown: Roll back via
- Checklist:
- Define detection, triage, escalation, and communication (Slack template).
- Conduct quarterly tabletop exercises.
- Run proactive checks (e.g., pre-deployment policy validation scripts).
- Best Practice: Automate triage with scripts checking
/api/v2/health
and/api/v2/audit
. - Reminder: Update playbooks post-incident.
- Tasks: Analyze incidents/near-misses for root causes.
- Checklist:
- Document timeline, impact, root causes (5 Whys), and actions.
- Assign owners/deadlines for corrective measures.
- Share learnings in ADO doc.
- Best Practice: Use structured postmortem templates for consistency.
- Reminder: Review postmortems quarterly.
- Tasks: Update SOPs with incident findings.
- Incident Learning: Unincorporated learnings lead to repeat incidents.
- Checklist:
- Maintain “Known Issues” section for API quirks (e.g., tag
id
issues). - Review SOP quarterly for new learnings.
- Maintain “Known Issues” section for API quirks (e.g., tag
- Best Practice: Version-control SOP in Git for traceability.
- Reminder: Share learnings with governance teams.
- Tasks: Define restricted emergency access.
- Incident Learning: Missing procedures delay recovery in outages.
- Checklist:
- Restrict break-glass accounts with multi-factor approval.
- Log actions in UAM.
- Test procedures in staging.
- Best Practice: Store credentials in Azure Key Vault.
- Reminder: Review access annually.
- Use
/api/v2
for all tasks, with robust error handling:import requests import logging import time def call_api(endpoint, headers, payload=None, method="GET", retries=3): for attempt in range(retries): try: response = requests.request(method, f"{IMMUNTA_URL}{endpoint}", headers=headers, json=payload) response.raise_for_status() return response.json() except requests.exceptions.HTTPError as e: logging.error(f"API error: {e}, Status: {response.status_code}, Response: {response.text}") if response.status_code == 429 and attempt < retries - 1: time.sleep(2 ** attempt) # Exponential backoff continue raise
- Deploy scripts on a build agent / tooling server (Python/FastAPI) to bypass MMD restrictions.
- Document quirks (e.g., missing
id
fields) in ADO doc.
- Schedule syncs (
/api/v2/dataSource/sync
) for off-peak hours (2 AM NZST). - Monitor credit usage biweekly; use transient tables to reduce costs.
- Enforce least-privilege for Immuta’s Snowflake user and AD groups.
- Encrypt data with Azure Key Vault integration.
- Train users on Data Marketplace navigation and policy interpretation via quarterly sessions.
- Provide Slack/ticketing support channels.
- Maintain ADO doc for configurations, API quirks, and incident learnings.
- Update SOP quarterly in Git.
- Tag Migration Failures: Duplicate tags or missing
id
fields cause errors. Validate uniqueness and stripid
fields. - SDD Performance: Over-scanning spikes Snowflake costs. Use incremental scans and exclude non-sensitive schemas.
- API Rate Limits: Undocumented 429 errors disrupt automation. Implement exponential backoff.
This toolkit provides reusable scripts for automating Immuta operations in an MMD environment with AD authentication and Snowflake integration, addressing API documentation gaps.
- Purpose: Stage users before policy updates.
- Script: See Section 3.1 (User Staging Protocol).
- Best Practice: Run during off-peak hours; validate status post-staging.
- Purpose: Migrate tags between environments.
- Script: See Section 2.4 (Tag Migration Across Environments).
- Best Practice: Test in staging; validate tag uniqueness.
- Purpose: Validate policy logic before deployment.
- Script:
import requests import logging def test_policy(headers, policy_id, test_data): try: response = requests.post(f"{IMMUNTA_URL}/api/v2/policy/{policy_id}/test", headers=headers, json=test_data) response.raise_for_status() logging.info(f"Policy test passed: {response.json()['result']}") except requests.exceptions.HTTPError as e: logging.error(f"Policy test failed: {e}, Response: {response.text}") raise
- Best Practice: Use representative user personas and datasets.
- Purpose: Sync AD users with Immuta.
- Script: See Section 2.2 (User Synchronization with AD).
- Best Practice: Implement retry logic for transient errors.
- Purpose: Run incremental SDD scans.
- Script:
import requests import logging def run_sdd_scan(headers, data_source_id): try: response = requests.post(f"{IMMUNTA_URL}/api/v2/sdd/scan", headers=headers, json={"dataSourceId": data_source_id, "incremental": True}) response.raise_for_status() logging.info(f"SDD scan started for {data_source_id}") except requests.exceptions.HTTPError as e: logging.error(f"SDD scan failed: {e}, Response: {response.text}") raise
- Best Practice: Schedule weekly scans for new data.
- Purpose: Create Data Marketplace products.
- Script:
import requests import logging def create_data_product(headers, data_source_config): try: response = requests.post(f"{IMMUNTA_URL}/api/v2/dataSource", headers=headers, json=data_source_config) response.raise_for_status() logging.info(f"Data product created: {data_source_config['name']}") except requests.exceptions.HTTPError as e: logging.error(f"Data product creation failed: {e}, Response: {response.text}") raise
- Best Practice: Include clear metadata (e.g., description, owner).
- Strategy:
- Test API calls in a sandbox to identify undocumented behaviors.
- Use JSON schema validation:
from jsonschema import validate schema = {"type": "object", "required": ["name"], "properties": {"name": {"type": "string"}}} def validate_response(response_data): validate(instance=response_data, schema=schema)
- Document quirks in ADO doc (e.g., tag
id
inconsistencies).
- Best Practice: Cache API responses to handle rate limits (429 errors).
- Setup:
- Deploy a Python/FastAPI server on-premises or Azure.
- Configure AD authentication with OAuth tokens in Azure Key Vault.
- Example FastAPI endpoint:
from fastapi import FastAPI app = FastAPI() @app.post("/stage-users") async def stage_users_endpoint(user_ids: list): headers = {"Authorization": f"Bearer {get_token_from_key_vault()}"} stage_users(headers, user_ids) return {"status": "success"}
- Best Practice: Restrict server access to authorized SREs via AD groups.
- Reminder: Test server connectivity with Immuta and Snowflake.
- Purpose: Automate Immuta policy deployment, user staging, and tag migration using CI/CD pipelines to ensure consistent, error-free changes.
- Pipeline Examples:
- GitHub Actions:
name: Immuta Policy Deployment on: push: branches: [main] jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Run Policy Tests run: python test_policy.py - name: Deploy Policy run: python deploy_policy.py env: IMMUTA_TOKEN: ${{ secrets.IMMUTA_TOKEN }}
- Azure DevOps Pipeline:
trigger: branches: include: - main pool: vmImage: 'ubuntu-latest' steps: - checkout: self - task: UsePythonVersion@0 inputs: versionSpec: '3.x' - script: | python test_policy.py displayName: 'Run Policy Tests' - script: | python deploy_policy.py displayName: 'Deploy Policy' env: IMMUTA_TOKEN: $(IMMUNTA_TOKEN)
- Configuration Notes:
- Store
IMMUNTA_TOKEN
in Azure DevOps secure variables or link to Azure Key Vault for AD-authenticated access. - Use a service connection with AD credentials to access the Immuta API in your MMD-restricted environment.
- Add a
requirements.txt
step if dependencies are needed:- script: | pip install -r requirements.txt displayName: 'Install Dependencies'
- Store
- Configuration Notes:
- GitHub Actions:
- Best Practice:
- Include linting (e.g.,
flake8
for Python scripts) and peer review gates in both pipelines. - Use separate staging and production pipelines with approval gates for major changes.
- Log pipeline outputs to Azure Monitor or ADO doc for traceability.
- Include linting (e.g.,
- Reminder:
- Secure secrets in Azure Key Vault (for Azure DevOps) or GitHub Secrets (for GitHub Actions).
- Test pipelines in a sandbox environment before production deployment.