Skip to content

Instantly share code, notes, and snippets.

@oNddleo
Created November 13, 2024 15:31
Show Gist options
  • Save oNddleo/88d788c463a9a954bfae2c9cfa94545c to your computer and use it in GitHub Desktop.
Save oNddleo/88d788c463a9a954bfae2c9cfa94545c to your computer and use it in GitHub Desktop.
Different hard delete and soft delete

Let me break down the comparison between soft delete and hard delete in PostgreSQL, with specific contexts where each approach makes more sense.

Soft Delete vs Hard Delete in PostgreSQL - A Comprehensive Analysis

Soft Delete

Pros

  1. Data Recovery & Auditing

    • Easily recover accidentally deleted records
    • Maintain complete audit trail of all data changes
    • Historical analysis and trend tracking possible
    • Compliance with regulations requiring data history
  2. Referential Integrity

    • Preserves relationship history in complex data models
    • No orphaned records in related tables
    • Easier to maintain data consistency
    • Better for systems with complex foreign key relationships
  3. Business Intelligence

    • Historical data available for analytics
    • Ability to track deletion patterns
    • Complete data lifecycle visibility
    • Useful for customer behavior analysis

Cons

  1. Database Size

    • Continuous growth of database size
    • Increased storage costs
    • Regular archiving might be needed
    • Potential performance impact on full table scans
  2. Query Complexity

    • Every query needs WHERE is_deleted = FALSE
    • More complex indexes needed
    • Risk of forgetting the deletion filter
    • Additional coding overhead
  3. Performance Impact

    • Larger indices due to keeping deleted records
    • More complex query plans
    • Potential slower SELECT operations
    • Index maintenance overhead

Hard Delete

Pros

  1. Performance

    • Better query performance (smaller tables)
    • Simpler and faster index operations
    • Reduced storage requirements
    • Cleaner query patterns
  2. Data Clarity

    • No ambiguity about record state
    • Simpler application logic
    • Easier database maintenance
    • Clear space reclamation
  3. GDPR Compliance

    • True data deletion for privacy requirements
    • Easier to implement "right to be forgotten"
    • Clear data elimination proof
    • Simpler privacy compliance

Cons

  1. No Recovery

    • Impossible to recover accidentally deleted data
    • No historical analysis possible
    • Permanent loss of business intelligence
    • Need for careful delete operations
  2. Referential Integrity Challenges

    • Must carefully handle foreign key relationships
    • Risk of orphaned records
    • Complex cascade delete operations
    • Potential data consistency issues

Best Use Cases

Soft Delete is Better For:

  1. Financial Systems

    -- Example: Financial transaction records
    CREATE TABLE transactions (
        id SERIAL PRIMARY KEY,
        amount DECIMAL,
        is_deleted BOOLEAN DEFAULT FALSE,
        deleted_at TIMESTAMP,
        -- other fields
    );
    • Required audit trails
    • Legal compliance requirements
    • Historical record keeping
  2. Customer Relationship Management

    -- Example: Customer profiles
    CREATE TABLE customers (
        id SERIAL PRIMARY KEY,
        email VARCHAR(255),
        is_deleted BOOLEAN DEFAULT FALSE,
        deleted_at TIMESTAMP,
        -- other fields
    );
    • Customer reactivation possibilities
    • Behavior analysis needs
    • Relationship history tracking
  3. Content Management Systems

    -- Example: Blog posts or articles
    CREATE TABLE articles (
        id SERIAL PRIMARY KEY,
        title VARCHAR(255),
        is_deleted BOOLEAN DEFAULT FALSE,
        deleted_at TIMESTAMP,
        -- other fields
    );
    • Content versioning needs
    • Editorial workflow requirements
    • Recovery requirements

Hard Delete is Better For:

  1. Session Management

    -- Example: User sessions
    CREATE TABLE sessions (
        id SERIAL PRIMARY KEY,
        user_id INTEGER,
        last_active TIMESTAMP,
        -- other fields
    );
    • Temporary data
    • No historical value
    • Regular cleanup needed
  2. Log Data

    -- Example: Application logs
    CREATE TABLE logs (
        id SERIAL PRIMARY KEY,
        event_type VARCHAR(50),
        created_at TIMESTAMP,
        -- other fields
    );
    • High-volume data
    • Regular rotation needed
    • Time-bound relevance
  3. Personal Identifiable Information (PII)

    -- Example: User personal data
    CREATE TABLE user_data (
        id SERIAL PRIMARY KEY,
        user_id INTEGER,
        sensitive_data TEXT,
        -- other fields
    );
    • GDPR compliance
    • Privacy requirements
    • Legal deletion requirements

Hybrid Approach

Sometimes a hybrid approach makes sense:

-- Example: Hybrid deletion pattern
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    total DECIMAL,
    is_deleted BOOLEAN DEFAULT FALSE,
    deleted_at TIMESTAMP,
    -- other fields
);

-- Archival table for long-term storage
CREATE TABLE archived_orders (
    id INTEGER,
    user_id INTEGER,
    total DECIMAL,
    archived_at TIMESTAMP DEFAULT NOW(),
    -- other fields
);

-- Archival process
BEGIN;
INSERT INTO archived_orders 
SELECT id, user_id, total, NOW() 
FROM orders 
WHERE is_deleted = TRUE 
  AND deleted_at < NOW() - INTERVAL '1 year';

DELETE FROM orders 
WHERE is_deleted = TRUE 
  AND deleted_at < NOW() - INTERVAL '1 year';
COMMIT;

This approach:

  1. Uses soft delete initially
  2. Archives data after a certain period
  3. Hard deletes after archival
  4. Maintains both performance and history

Performance Optimization Tips

For Soft Delete:

-- Partial index for active records
CREATE INDEX idx_active_records ON table_name (id) 
WHERE is_deleted = FALSE;

-- Automated cleanup
CREATE FUNCTION cleanup_old_deleted_records()
RETURNS void AS $$
BEGIN
    DELETE FROM table_name 
    WHERE is_deleted = TRUE 
      AND deleted_at < NOW() - INTERVAL '5 years';
END;
$$ LANGUAGE plpgsql;

For Hard Delete:

-- Batch deletion for large tables
DELETE FROM large_table 
WHERE condition = true 
LIMIT 1000;

-- Use CASCADE with caution
DELETE FROM parent_table 
CASCADE;

Based on this analysis, here are the key decision factors:

  1. Choose Soft Delete when:

    • Data history is important for business/legal reasons
    • You need the ability to restore deleted data
    • You have complex relationships between tables
    • Audit trails are required
    • Data analytics on historical data is important
  2. Choose Hard Delete when:

    • Storage space is a primary concern
    • Performance is critical
    • Data has a clear expiration point
    • GDPR/privacy compliance requires true deletion
    • You're dealing with high-volume temporary data
  3. Consider Hybrid Approach when:

    • You need both historical data and performance
    • You have different retention requirements for different data types
    • You want to balance storage costs with data availability

Would you like me to elaborate on any specific aspect or provide more detailed examples for your particular use case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment