Skip to content

Instantly share code, notes, and snippets.

@CoolOppo
Created February 27, 2025 08:04
Show Gist options
  • Save CoolOppo/8f2304069ff9b77f994a32397293a733 to your computer and use it in GitHub Desktop.
Save CoolOppo/8f2304069ff9b77f994a32397293a733 to your computer and use it in GitHub Desktop.
The Ultimate PostgreSQL Guide for Software Engineers, Generated with Claude 3.7

The Ultimate PostgreSQL Guide for Software Engineers

Table of Contents

Introduction to PostgreSQL

What is PostgreSQL?

PostgreSQL (often abbreviated as "Postgres") is a powerful, open-source object-relational database management system (ORDBMS) with a strong reputation for reliability, feature robustness, and performance. First released in 1989 as a successor to the Ingres database project at the University of California, Berkeley, PostgreSQL has evolved over more than 30 years into one of the most advanced and capable database systems available.

As an object-relational database, PostgreSQL extends the traditional relational model with object-oriented features such as inheritance, complex objects, and user-defined types, making it particularly well-suited for complex data structures and relationships.

Why PostgreSQL Stands Out

PostgreSQL distinguishes itself from other database systems through several key characteristics:

  • Standards Compliance: PostgreSQL aims for strong compliance with the SQL standard, currently supporting most of SQL:2016.
  • Extensibility: Its architecture allows for custom data types, operators, functions, and procedural languages.
  • Concurrency and ACID Compliance: PostgreSQL delivers robust transaction support, ensuring data integrity even under concurrent operations.
  • Community-Driven Development: Being open-source, PostgreSQL benefits from a large, active community of developers worldwide.
  • Enterprise Features: Despite being free and open-source, PostgreSQL includes features traditionally found only in expensive enterprise database systems.

For software engineers accustomed to object-oriented paradigms, PostgreSQL provides a natural transition to the database world through its object-relational capabilities while maintaining the structured data approach of traditional relational databases.

PostgreSQL vs. Other Database Systems

Feature PostgreSQL MySQL Oracle SQL Server MongoDB
License Open-source (PostgreSQL License) Open-source with commercial elements Proprietary Proprietary Open-source with commercial elements
Data Model Object-relational Relational Relational Relational Document-oriented
ACID Compliance Full Depends on storage engine Full Full Optional
Complex Data Types Extensive Limited Good Good Native JSON support
Concurrency Model MVCC Varies by storage engine MVCC Locking with snapshot isolation Document-level locking
Extensibility Highly extensible Moderate Extensive (with cost) Good Plugins
Geospatial Support PostGIS (powerful) Basic Oracle Spatial (extra cost) Spatial extensions GeoJSON support

When to Choose PostgreSQL

PostgreSQL is an excellent choice for:

  • Complex applications requiring rich data types and relationships
  • Projects needing advanced SQL features like window functions, common table expressions, and recursive queries
  • Applications requiring strict data integrity and transaction guarantees
  • Geographic information systems via PostGIS extension
  • High-volume OLTP (Online Transaction Processing) workloads
  • Systems that need to scale vertically with growing data complexity
  • Open-source projects requiring a no-cost database solution with enterprise features

Software engineers will appreciate PostgreSQL's logical consistency, predictable behavior, and comprehensive documentation, which help reduce development surprises and debugging sessions.

Understanding Relational Database Concepts

The Relational Model

The relational model organizes data into tables (formally called relations) containing rows (tuples) and columns (attributes). This model, developed by E.F. Codd in 1970, provides a structured approach to data management based on mathematical principles from set theory.

Think of a table as a spreadsheet with strictly enforced rules:

  • Each column must have a unique name within that table
  • Each column must contain the same type of data (e.g., text, numbers, dates)
  • Each row represents a single entity or relationship
  • The order of rows and columns is logically insignificant
  • Each cell contains exactly one value (or NULL)

The relational model's power comes from its ability to represent virtually any data while maintaining integrity and allowing flexible queries across related tables.

Relations (Tables)

In practical terms, a table is a collection of related data organized into rows and columns. Each table typically represents:

  1. A concrete entity (e.g., customers, products, orders)
  2. A relationship between entities (e.g., order_items connecting orders and products)

For example, if modeling an e-commerce system, you might have tables for:

  • customers - storing information about people who buy products
  • products - containing items available for sale
  • orders - tracking purchases made by customers

A table has a defined schema which specifies:

  • Column names
  • Data types for each column
  • Constraints that enforce rules about the data

Example of creating a table in PostgreSQL:

CREATE TABLE employees (
    employee_id INTEGER,      -- Unique identifier for each employee
    first_name VARCHAR(50),   -- First name, limited to 50 characters
    last_name VARCHAR(50),    -- Last name, limited to 50 characters
    hire_date DATE,           -- Date the employee was hired
    department_id INTEGER     -- References the department they work in
);

Here, employees is the table name, and each line in the parentheses defines a column with its data type.

Keys and Relationships

In a relational database, keys serve as the foundation for both row identification and establishing relationships between tables.

Primary Keys

A primary key uniquely identifies each row in a table. Think of it as a table's "unique row identifier":

  • Must be unique across all rows in the table
  • Cannot contain NULL values
  • Should be stable (unlikely to change)
  • Is typically a single column, but can be multiple columns (composite key)

Common primary key patterns:

  1. Natural keys: Real-world unique identifiers (e.g., social security numbers, ISBN)
  2. Surrogate keys: Artificial identifiers not derived from data (e.g., auto-incrementing integers, UUIDs)

Example of creating a table with a primary key:

CREATE TABLE products (
    product_id INTEGER PRIMARY KEY,  -- This column uniquely identifies each product
    name VARCHAR(100) NOT NULL,      -- Product names cannot be null
    price NUMERIC(10,2)              -- Price with 10 digits total, 2 after decimal
);

Here, the PRIMARY KEY constraint ensures that:

  • Each product_id is unique
  • No product_id can be NULL
  • PostgreSQL automatically creates an index on the primary key column

Foreign Keys

A foreign key creates a relationship between tables by referencing a primary key in another table. It enforces referential integrity, ensuring that relationships remain valid:

CREATE TABLE orders (
    order_id INTEGER PRIMARY KEY,                    -- Unique order identifier
    customer_id INTEGER REFERENCES customers(customer_id),  -- Foreign key
    order_date DATE
);

In this example:

  • customer_id in the orders table must match an existing customer_id in the customers table
  • PostgreSQL will prevent operations that would break this relationship (e.g., deleting a customer who has orders)

The REFERENCES keyword establishes the foreign key relationship, pointing to the referenced table and column.

Composite Keys

A composite key is a primary key consisting of multiple columns. Use composite keys when:

  • No single column uniquely identifies a row
  • The relationship model requires a many-to-many junction table

Example of a composite primary key in an order items table:

CREATE TABLE order_items (
    order_id INTEGER,          -- Part of composite primary key
    product_id INTEGER,        -- Part of composite primary key
    quantity INTEGER,
    unit_price NUMERIC(10,2),
    -- Composite primary key definition
    PRIMARY KEY (order_id, product_id),
    -- Foreign key relationships
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

In this example:

  • The combination of order_id and product_id must be unique
  • Each foreign key separately references its parent table
  • This structure prevents adding the same product to an order multiple times

Types of Relationships

The relational model supports three fundamental relationship types:

One-to-One (1:1)

In a one-to-one relationship, each record in Table A relates to exactly one record in Table B, and vice versa.

Example: A users table and a user_profiles table, where each user has exactly one profile.

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(100) NOT NULL
);

CREATE TABLE user_profiles (
    user_id INTEGER PRIMARY KEY,  -- Both primary key and foreign key
    bio TEXT,
    avatar_url VARCHAR(255),
    birth_date DATE,
    FOREIGN KEY (user_id) REFERENCES users(user_id)
);

Key characteristics:

  • The foreign key in user_profiles is also its primary key
  • This enforces the 1:1 relationship (a user can have at most one profile)
  • Used to:
    • Split large tables for performance or organization
    • Separate frequently accessed data from rarely accessed data
    • Handle optional data fields

One-to-Many (1:N)

In a one-to-many relationship, each record in Table A can relate to multiple records in Table B, but each record in Table B relates to only one record in Table A.

Example: A departments table and an employees table, where each department has many employees, but each employee belongs to one department.

CREATE TABLE departments (
    department_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    location VARCHAR(100)
);

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    -- Foreign key: an employee belongs to one department
    department_id INTEGER REFERENCES departments(department_id),
    hire_date DATE
);

Key characteristics:

  • The foreign key is on the "many" side (employees)
  • Most common relationship type
  • Represents natural hierarchies and belongingness

Many-to-Many (M:N)

In a many-to-many relationship, each record in Table A can relate to multiple records in Table B, and vice versa.

Example: A students table and a courses table, where each student can enroll in multiple courses, and each course can have multiple students.

This requires a junction table (also called a linking, bridge, or join table):

CREATE TABLE students (
    student_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE courses (
    course_id SERIAL PRIMARY KEY,
    title VARCHAR(100) NOT NULL,
    department VARCHAR(50) NOT NULL
);

-- Junction table implementing the many-to-many relationship
CREATE TABLE enrollments (
    student_id INTEGER REFERENCES students(student_id),
    course_id INTEGER REFERENCES courses(course_id),
    enrollment_date DATE NOT NULL,
    grade VARCHAR(2),
    -- Composite primary key
    PRIMARY KEY (student_id, course_id)
);

Key characteristics:

  • The junction table contains foreign keys to both related tables
  • Often includes additional attributes about the relationship (enrollment date, grade)
  • The composite primary key prevents duplicate relationships

Here's a text-based visualization of these relationships:

One-to-One:
users (1)─────(1) user_profiles

One-to-Many:
departments (1)─────(N) employees

Many-to-Many:
students (M)─────(junction table: enrollments)─────(N) courses

Normalization

Normalization is the process of organizing database tables to:

  1. Minimize data redundancy
  2. Reduce anomalies during data operations
  3. Improve data integrity
  4. Optimize query flexibility

Think of normalization as "dividing data into the right tables, with the right relationships."

First Normal Form (1NF)

To achieve 1NF:

  • Each table must have a primary key
  • Each column must contain atomic (indivisible) values
  • No repeating groups or arrays of values

Before 1NF:

| customer_id | name       | phone_numbers      |
| ----------- | ---------- | ------------------ |
| 1           | John Doe   | 555-1234, 555-5678 |
| 2           | Jane Smith | 555-9876           |

After 1NF:

| customer_id | name       |
| ----------- | ---------- |
| 1           | John Doe   |
| 2           | Jane Smith |

| customer_id | phone_number |
| ----------- | ------------ |
| 1           | 555-1234     |
| 1           | 555-5678     |
| 2           | 555-9876     |

The improvement is that each cell now contains a single value, enabling better querying and data manipulation.

Second Normal Form (2NF)

To achieve 2NF:

  • Table must be in 1NF
  • All non-key attributes must depend on the entire primary key (especially important for tables with composite primary keys)

Before 2NF:

| order_id | product_id | product_name | quantity | customer_id |
| -------- | ---------- | ------------ | -------- | ----------- |
| 1        | 101        | Laptop       | 1        | 201         |
| 1        | 102        | Mouse        | 2        | 201         |
| 2        | 101        | Laptop       | 1        | 202         |

After 2NF:

| order_id | customer_id |
| -------- | ----------- |
| 1        | 201         |
| 2        | 202         |

| order_id | product_id | quantity |
| -------- | ---------- | -------- |
| 1        | 101        | 1        |
| 1        | 102        | 2        |
| 2        | 101        | 1        |

| product_id | product_name |
| ---------- | ------------ |
| 101        | Laptop       |
| 102        | Mouse        |

The improvement is that product_name depends only on product_id, not on the composite key (order_id, product_id), so it belongs in a separate table.

Third Normal Form (3NF)

To achieve 3NF:

  • Table must be in 2NF
  • All non-key attributes must depend directly on the primary key, not on other non-key attributes

Before 3NF:

| order_id | customer_id | customer_email   | order_date |
| -------- | ----------- | ---------------- | ---------- |
| 1        | 201         | [email protected] | 2023-01-15 |
| 2        | 202         | [email protected] | 2023-01-16 |
| 3        | 201         | [email protected] | 2023-01-17 |

After 3NF:

| order_id | customer_id | order_date |
| -------- | ----------- | ---------- |
| 1        | 201         | 2023-01-15 |
| 2        | 202         | 2023-01-16 |
| 3        | 201         | 2023-01-17 |

| customer_id | customer_email   |
| ----------- | ---------------- |
| 201         | [email protected] |
| 202         | [email protected] |

The improvement is that customer_email depends on customer_id, not on order_id, so it belongs in a separate table.

Practical Benefits of Normalization

  1. Reduces data redundancy: Information is stored in only one place
  2. Prevents update anomalies: Changing data in one place updates it everywhere
  3. Prevents deletion anomalies: You can delete a row without losing unrelated data
  4. Improves query flexibility: Allows more combinations of data to be queried

When to Denormalize

While normalization is generally good practice, sometimes controlled denormalization is justified:

  • When read performance is critical and data rarely changes
  • For analytical (OLAP) rather than transactional (OLTP) workloads
  • For computed columns that are frequently queried
  • For complex aggregations that would otherwise require many joins

Entity-Relationship Modeling

Entity-Relationship (ER) modeling is a technique for designing databases by visualizing the relationships between different data entities. It provides an abstract framework before writing any SQL code.

Core Components of ER Diagrams

  1. Entities: Distinguishable objects in your domain (rectangles in diagrams)

    • Examples: customers, products, orders
  2. Attributes: Properties that describe entities (ovals in diagrams)

    • Examples: customer_name, product_price, order_date
  3. Relationships: Associations between entities (diamonds in diagrams)

    • Examples: places (customer places order), contains (order contains products)
  4. Cardinality: The number of instances of one entity related to another (notations on relationship lines)

    • Examples: one-to-one, one-to-many, many-to-many

Sample ER Diagram (Text Representation)

[CUSTOMER]─────┐
 │             │
 │ customer_id │
 │ name        │
 │ email       │
 │             │
 └─────────────┘
      │
      │ places (1:N)
      ▼
[ORDER]─────────┐
 │              │
 │ order_id     │
 │ order_date   │
 │ total_amount │
 │              │
 └──────────────┘
      │
      │ contains (1:N)
      ▼
[ORDER_ITEM]────┐     │ contains (N:1)     [PRODUCT]──────┐
 │              │     ▲                     │             │
 │ order_id     │-----┘                     │ product_id  │
 │ product_id   │-------------------------> │ name        │
 │ quantity     │                           │ price       │
 │ unit_price   │                           │ category    │
 │              │                           │             │
 └──────────────┘                           └─────────────┘

From ER Diagram to PostgreSQL Tables

Translating the above ER diagram to PostgreSQL would look like:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    total_amount NUMERIC(10,2) NOT NULL DEFAULT 0
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(10,2) NOT NULL,
    category VARCHAR(50) NOT NULL
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER NOT NULL CHECK (quantity > 0),
    unit_price NUMERIC(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

ER Modeling Process

  1. Identify entities: Determine the main objects in your system
  2. Define attributes: Decide what properties each entity has
  3. Establish relationships: Determine how entities relate to each other
  4. Specify cardinality: Define the nature of each relationship (1:1, 1:N, M:N)
  5. Refine the model: Apply normalization principles
  6. Map to tables: Translate the model into SQL table definitions

This systematic approach helps ensure that your database structure properly reflects the business domain and can evolve as requirements change.

SQL Fundamentals

SQL Language Structure

SQL (Structured Query Language) is a declarative language, meaning you specify what results you want rather than how to get them. Understanding its components will help you navigate PostgreSQL effectively.

SQL is organized into several distinct sublanguages, each with a specific purpose:

Data Definition Language (DDL)

DDL commands create, modify, and remove database structures:

  • CREATE: Establishes new database objects
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    hire_date DATE
);
  • ALTER: Modifies existing database objects
ALTER TABLE employees ADD COLUMN last_name VARCHAR(50);
  • DROP: Removes database objects
DROP TABLE employees;
  • TRUNCATE: Quickly removes all rows from a table
TRUNCATE TABLE audit_logs;

Data Manipulation Language (DML)

DML commands manage the data stored within database objects:

  • SELECT: Retrieves data from tables
SELECT first_name, last_name FROM employees WHERE hire_date > '2020-01-01';
  • INSERT: Adds new rows to tables
INSERT INTO employees (first_name, last_name, hire_date)
VALUES ('John', 'Smith', '2022-06-15');
  • UPDATE: Modifies existing data
UPDATE employees SET last_name = 'Jones' WHERE employee_id = 101;
  • DELETE: Removes rows from tables
DELETE FROM employees WHERE hire_date < '2000-01-01';

Data Control Language (DCL)

DCL commands manage permissions and access:

  • GRANT: Assigns privileges to users
GRANT SELECT, INSERT ON employees TO hr_staff;
  • REVOKE: Removes privileges
REVOKE INSERT ON employees FROM temp_users;

Transaction Control Language (TCL)

TCL commands manage transactions (groups of operations treated as a single unit):

  • BEGIN: Starts a transaction
BEGIN;
  • COMMIT: Saves all changes made during the current transaction
COMMIT;
  • ROLLBACK: Undoes all changes made during the current transaction
ROLLBACK;
  • SAVEPOINT: Creates points within a transaction to which you can roll back
SAVEPOINT before_update;

PostgreSQL Data Types

Data types define what kind of values can be stored in a column and how PostgreSQL operates on them. Choosing the right data type is crucial for data integrity, storage efficiency, and query performance.

Numeric Types

PostgreSQL offers several numeric types to accommodate different requirements:

  • INTEGER / INT: Whole numbers from -2,147,483,648 to 2,147,483,647 (4 bytes)

    • Use when: Storing whole numbers like quantities, counts, or IDs within this range
    • Example: employee_id INTEGER, quantity INTEGER
  • SMALLINT: Whole numbers from -32,768 to 32,767 (2 bytes)

    • Use when: Space is a concern and values will remain within this smaller range
    • Example: age SMALLINT, year SMALLINT
  • BIGINT: Large whole numbers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (8 bytes)

    • Use when: Values might exceed the INTEGER range, like large IDs or counts
    • Example: transaction_id BIGINT, population BIGINT
  • NUMERIC(p, s) / DECIMAL(p, s): Exact numeric with specified precision (p) and scale (s)

    • Use when: Precision is critical, especially for financial calculations
    • Example: price NUMERIC(10,2) (10 digits total, 2 after decimal)
  • REAL: Floating-point number with 6 decimal digits precision (4 bytes)

    • Use when: Approximate numeric values are acceptable and range is moderate
    • Example: temperature REAL
  • DOUBLE PRECISION: Floating-point number with 15 decimal digits precision (8 bytes)

    • Use when: Approximate numeric values with greater precision are needed
    • Example: scientific_measurement DOUBLE PRECISION
  • SERIAL, BIGSERIAL: Auto-incrementing integers

    • Use when: You need an automatically generated unique identifier
    • Example: user_id SERIAL PRIMARY KEY
    • Note: SERIAL is actually a shorthand for creating an INTEGER column with a sequence

Character Types

Character types store text data:

  • CHAR(n): Fixed-length character string padded with spaces if shorter than n

    • Use when: Data has a fixed length, like state codes or specific codes
    • Example: state_code CHAR(2), product_code CHAR(8)
    • Note: CHAR without length is a single character
  • VARCHAR(n): Variable-length character string with a maximum length of n

    • Use when: Text has a clear upper limit but varies in length
    • Example: username VARCHAR(50), last_name VARCHAR(100)
  • TEXT: Variable-length character string without explicit length limit

    • Use when: Length might be substantial or varies greatly
    • Example: description TEXT, article_content TEXT
    • Note: In PostgreSQL, there's no performance difference between TEXT and VARCHAR without a length specifier

Date/Time Types

Date and time types store temporal data:

  • DATE: Calendar date (year, month, day)

    • Use when: Only the date matters, not the time
    • Example: birth_date DATE, hire_date DATE
    • Format: 'YYYY-MM-DD'
  • TIME: Time of day (hour, minute, second, microsecond)

    • Use when: Only time matters, not the date
    • Example: meeting_time TIME
    • Format: 'HH:MM:SS.ssssss'
  • TIMESTAMP: Date and time (without timezone)

    • Use when: You need both date and time but don't care about timezone
    • Example: created_at TIMESTAMP
    • Format: 'YYYY-MM-DD HH:MM:SS.ssssss'
  • TIMESTAMPTZ: Date and time with timezone

    • Use when: Timezone awareness is important (usually the best choice)
    • Example: last_login TIMESTAMPTZ
    • Format: 'YYYY-MM-DD HH:MM:SS.ssssss+TZ'
    • Best practice: Store in UTC, convert to local timezone only for display
  • INTERVAL: Period of time

    • Use when: Representing durations or time differences
    • Example: duration INTERVAL
    • Usage: '2 days 3 hours 40 minutes'::INTERVAL

Boolean Type

  • BOOLEAN: True or false values
    • Use when: Representing binary states or flags
    • Example: is_active BOOLEAN, has_subscription BOOLEAN
    • Values: TRUE/FALSE, 't'/'f', 'yes'/'no', 'y'/'n', '1'/'0'

JSON Types

PostgreSQL offers powerful JSON support:

  • JSON: JavaScript Object Notation data stored as text

    • Use when: You need JSON validation but don't frequently query or manipulate JSON fields
  • JSONB: Binary JSON format

    • Use when: You frequently query or update JSON data
    • Advantages: Faster querying, supports indexing, more efficient storage
    • Example: user_preferences JSONB, document_data JSONB
-- Querying JSONB data
SELECT * FROM users WHERE user_preferences @> '{"theme": "dark"}';

Arrays

PostgreSQL allows any data type to be defined as a multi-dimensional array:

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    tags TEXT[],  -- Array of text
    dimensions INTEGER[3]  -- 3D array of integers
);

-- Insert with arrays
INSERT INTO products (name, tags, dimensions)
VALUES ('Smartphone', ARRAY['electronics', 'mobile', 'gadget'], ARRAY[142, 72, 8]);

-- Query arrays
SELECT * FROM products WHERE 'mobile' = ANY(tags);

UUID Type

  • UUID: Universally Unique Identifier (128-bit identifier)
    • Use when: You need globally unique identifiers or want to hide sequential IDs
    • Example: session_id UUID DEFAULT gen_random_uuid()

Special Types

  • CIDR, INET, MACADDR: Network address types

    • Use when: Storing IP addresses or network information
  • POINT, LINE, POLYGON: Geometric types

    • Use when: Storing basic geometric data
  • MONEY: Currency amount

    • Warning: The MONEY type has limitations with internationalization

Creating Database Objects

PostgreSQL database objects include databases, schemas, tables, views, and more. Understanding how to create and manage these objects is fundamental.

Creating a Database

A database in PostgreSQL is a named collection of schemas. Creating a database establishes a new isolated environment:

CREATE DATABASE project_db
    WITH OWNER = my_user        -- Who owns this database
    ENCODING = 'UTF8'           -- Character encoding
    LC_COLLATE = 'en_US.UTF-8'  -- String sort order
    LC_CTYPE = 'en_US.UTF-8'    -- Character classification
    TABLESPACE = pg_default     -- Physical storage location
    CONNECTION LIMIT = -1;      -- Max concurrent connections (-1 means unlimited)

Key parameters:

  • OWNER: User who owns the database (has all privileges)
  • ENCODING: Character encoding (UTF8 recommended)
  • TABLESPACE: Physical location for storage
  • CONNECTION LIMIT: Maximum concurrent connections allowed

Creating a Schema

A schema is a namespace within a database that contains named objects like tables, views, and functions:

CREATE SCHEMA inventory;

Schemas help:

  • Organize database objects logically
  • Allow multiple users to use the same database without interference
  • Put third-party applications in separate schemas to avoid name conflicts

To use objects in a specific schema, either fully qualify the name:

SELECT * FROM inventory.products;

Or set the search path:

SET search_path TO inventory, public;

Creating Tables

Tables are the primary structures for storing data. A table creation statement specifies:

  1. Table name
  2. Column names and data types
  3. Constraints that enforce data rules

Basic table creation syntax:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,  -- Auto-incrementing ID as primary key
    first_name VARCHAR(50) NOT NULL, -- Required first name
    last_name VARCHAR(50) NOT NULL,  -- Required last name
    email VARCHAR(100) UNIQUE,       -- Optional email that must be unique
    age INTEGER CHECK (age >= 18),   -- Age must be at least 18
    created_at TIMESTAMPTZ DEFAULT NOW()  -- Automatically set to current time
);

Breaking down this statement:

  • CREATE TABLE customers: Specifies we're creating a table named "customers"
  • The parentheses contain a comma-separated list of column definitions
  • Each column definition includes:
    • Column name
    • Data type
    • Optional constraints or default values

Table Constraints

Constraints enforce rules on the data to maintain integrity:

  • NOT NULL: Column must contain a value

    first_name VARCHAR(50) NOT NULL
  • UNIQUE: Column value must be unique across all rows

    email VARCHAR(100) UNIQUE
  • PRIMARY KEY: Combines NOT NULL and UNIQUE; identifies each row

    customer_id INTEGER PRIMARY KEY
  • FOREIGN KEY: Ensures values reference existing values in another table

    department_id INTEGER REFERENCES departments(department_id)
  • CHECK: Enforces a Boolean condition on values

    price NUMERIC(10,2) CHECK (price > 0)
  • DEFAULT: Provides a value when none is specified

    created_at TIMESTAMPTZ DEFAULT NOW()
  • EXCLUSION: Ensures no two rows can satisfy a specified relationship

    EXCLUDE USING gist (room_id WITH =, time_slot WITH &&)

Named Constraints

For better error messages and easier management, you can name constraints:

CREATE TABLE products (
    product_id SERIAL,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(10,2) NOT NULL,
    stock INTEGER NOT NULL,

    CONSTRAINT pk_products PRIMARY KEY (product_id),
    CONSTRAINT uq_product_name UNIQUE (name),
    CONSTRAINT chk_positive_price CHECK (price > 0),
    CONSTRAINT chk_non_negative_stock CHECK (stock >= 0)
);

Altering Tables

Tables can be modified after creation with the ALTER TABLE command:

-- Add a column
ALTER TABLE customers ADD COLUMN phone VARCHAR(20);

-- Change column data type
ALTER TABLE customers ALTER COLUMN phone TYPE VARCHAR(30);

-- Add a constraint
ALTER TABLE customers ADD CONSTRAINT valid_email
    CHECK (email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');

-- Remove a constraint
ALTER TABLE customers DROP CONSTRAINT valid_email;

-- Rename a column
ALTER TABLE customers RENAME COLUMN phone TO contact_number;

-- Set a default value
ALTER TABLE customers ALTER COLUMN status SET DEFAULT 'active';

-- Remove a default value
ALTER TABLE customers ALTER COLUMN status DROP DEFAULT;

-- Make a column NOT NULL
ALTER TABLE customers ALTER COLUMN phone SET NOT NULL;

-- Make a column nullable
ALTER TABLE customers ALTER COLUMN phone DROP NOT NULL;

-- Drop a column
ALTER TABLE customers DROP COLUMN phone;

Dropping Objects

To remove database objects, use DROP commands:

-- Drop a table (and dependent objects)
DROP TABLE customers CASCADE;

-- Drop a table only if it exists
DROP TABLE IF EXISTS customers;

-- Drop a schema
DROP SCHEMA inventory;

-- Drop a database (cannot be executed while connected to it)
DROP DATABASE project_db;

Warning: DROP operations are permanent and cannot be rolled back without a backup. Use with caution, especially with CASCADE, which removes all dependent objects.

SQL Comments

Comments document your SQL code, making it more readable and maintainable:

  • Single-line comments start with -- and continue to the end of the line:

    -- This is a single-line comment
    SELECT * FROM users; -- This retrieves all users
  • Multi-line comments use /* ... */ syntax and can span multiple lines:

    /* This is a multi-line comment
       that spans multiple lines
       and can be quite lengthy */
    SELECT * FROM users;

Best practices for SQL comments:

  1. Document the purpose of tables and columns
  2. Explain complex queries or operations
  3. Mark sections in large SQL scripts
  4. Note any assumptions or dependencies
  5. Include authorship and date information for significant changes

Data Manipulation: The Core CRUD Operations

CRUD stands for Create, Read, Update, and Delete - the four fundamental operations for managing data in databases. In SQL, these correspond to INSERT, SELECT, UPDATE, and DELETE statements.

Inserting Data (Create)

The INSERT statement adds new rows to a table. Understanding its syntax and variations is essential for data creation.

Basic Insert Syntax

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

This syntax:

  1. Specifies the target table
  2. Lists the columns to populate (optional, but recommended)
  3. Provides values for those columns in the same order

Example with explanation:

INSERT INTO products (name, category, price, stock)
VALUES ('Laptop Pro X', 'Electronics', 1299.99, 50);

Breaking it down:

  • INSERT INTO products: Targets the products table
  • (name, category, price, stock): Lists columns to receive values
  • VALUES ('Laptop Pro X', 'Electronics', 1299.99, 50): Provides the values

Best practices:

  • Always list columns explicitly
  • Match values to column order
  • Ensure values match expected data types
  • Handle NULL values appropriately

Multiple Row Insert

To insert multiple rows efficiently, use a single INSERT statement with multiple value sets:

INSERT INTO products (name, category, price, stock)
VALUES
    ('Smartphone Y', 'Electronics', 699.99, 100),
    ('Wireless Earbuds', 'Accessories', 89.99, 200),
    ('USB-C Cable', 'Accessories', 12.99, 500);

This is much more efficient than multiple single-row INSERTs because:

  • It requires only one round trip to the database
  • Transaction overhead occurs only once
  • Indexes are updated once after all insertions

Insert with Returning Clause

PostgreSQL's RETURNING clause returns data from rows affected by the INSERT:

INSERT INTO products (name, category, price, stock)
VALUES ('Tablet Z', 'Electronics', 349.99, 75)
RETURNING product_id, name;

This statement:

  1. Inserts a new product
  2. Returns the product_id (possibly auto-generated) and name of the new row

The RETURNING clause is useful for:

  • Getting auto-generated primary keys
  • Confirming inserted values
  • Chaining operations without additional queries
  • Performing further operations with the inserted data

Insert from Select

You can insert data from a query instead of explicit values:

INSERT INTO product_archive (product_id, name, category, price, archived_date)
SELECT product_id, name, category, price, NOW()
FROM products
WHERE stock = 0;

This statement:

  1. Queries products with zero stock
  2. Inserts the results into product_archive
  3. Adds the current timestamp as archived_date

This approach is useful for:

  • Data migration
  • Creating summary tables
  • Archiving data
  • Duplicating rows with modifications

Insert with On Conflict (Upsert)

PostgreSQL's "upsert" feature handles conflicts with existing data:

INSERT INTO products (product_id, name, price, stock)
VALUES (101, 'Updated Laptop', 1199.99, 25)
ON CONFLICT (product_id)
DO UPDATE SET
    name = EXCLUDED.name,
    price = EXCLUDED.price,
    stock = products.stock + EXCLUDED.stock;

This statement:

  1. Attempts to insert a new product
  2. If the product_id already exists, updates specified columns instead
  3. References the attempted value with EXCLUDED
  4. References the existing row with the table name (products)

The ON CONFLICT clause is perfect for:

  • "Upsert" operations (update or insert)
  • Implementing idempotent operations
  • Handling import conflicts
  • Merging data from different sources

Querying Data (Read)

The SELECT statement retrieves data from tables and is the foundation of data analysis in SQL.

Basic Select Syntax

SELECT column1, column2, ...
FROM table_name
[WHERE condition]
[GROUP BY column(s)]
[HAVING group_condition]
[ORDER BY column(s)]
[LIMIT count];

Each clause has a specific role:

  • SELECT: Specifies which columns to return
  • FROM: Identifies the table(s) to query
  • WHERE: Filters rows based on conditions (before grouping)
  • GROUP BY: Combines rows with the same values
  • HAVING: Filters groups based on conditions (after grouping)
  • ORDER BY: Sorts the result set
  • LIMIT: Restricts the number of rows returned

Important: These clauses are processed in a specific order, regardless of how you write them:

  1. FROM + JOIN: Determines data source
  2. WHERE: Filters rows
  3. GROUP BY: Combines rows
  4. HAVING: Filters groups
  5. SELECT: Returns expressions
  6. ORDER BY: Sorts results
  7. LIMIT: Restricts output

Selecting All Columns

SELECT * FROM products;

While SELECT * retrieves all columns, it's generally better to:

  • List specific columns needed
  • Avoid transmission of unnecessary data
  • Ensure query resilience to schema changes
  • Improve query performance

Selecting Specific Columns

SELECT product_id, name, price FROM products;

This approach:

  • Returns only specified columns
  • Reduces network traffic
  • Makes clear what data your application uses
  • Potentially improves query performance

Using Expressions and Functions

You can transform data during selection:

SELECT
    product_id,
    name,
    price,
    price * 0.9 AS discounted_price,
    UPPER(category) AS category_code,
    EXTRACT(YEAR FROM created_at) AS creation_year
FROM products;

This statement:

  1. Returns basic columns unchanged
  2. Calculates a discounted price (90% of original)
  3. Converts category to uppercase
  4. Extracts the year from created_at timestamp
  5. Renames calculated columns with AS

Using Aliases

Aliases create more readable column names and are essential for complex queries:

SELECT
    p.product_id AS id,
    p.name AS product_name,
    p.price AS retail_price,
    c.name AS category_name
FROM products p
JOIN categories c ON p.category_id = c.category_id;

Note that:

  • Column aliases use AS (which is optional but recommended)
  • Table aliases (p for products, c for categories) simplify joining tables
  • Aliases don't change the actual database; they only affect the query output

Filtering with WHERE

The WHERE clause filters rows based on conditions:

SELECT * FROM products
WHERE category = 'Electronics' AND price < 1000;

The condition can use:

  • Comparison operators: =, <>, !=, <, >, <=, >=
  • Logical operators: AND, OR, NOT
  • Pattern matching: LIKE, ILIKE, SIMILAR TO, ~
  • Range tests: BETWEEN, NOT BETWEEN
  • List tests: IN, NOT IN
  • NULL tests: IS NULL, IS NOT NULL

Multiple conditions use logical operators:

SELECT * FROM products
WHERE
    (category = 'Electronics' OR category = 'Accessories')
    AND price < 500
    AND stock > 0
    AND name NOT LIKE '%refurbished%';

Understanding operator precedence is crucial:

  1. Parentheses
  2. NOT
  3. AND
  4. OR

Always use parentheses to make the logic clear.

Ordering Results

The ORDER BY clause sorts results:

SELECT * FROM products
ORDER BY price DESC, name ASC;

This statement:

  1. Orders primarily by price in descending order (highest first)
  2. When prices are equal, orders by name in ascending order (alphabetical)

You can sort by:

  • Column names
  • Column positions (ORDER BY 2 DESC) - not recommended
  • Expressions (ORDER BY price * quantity DESC)
  • NULL values (NULL FIRST or NULL LAST)

Limiting Results

The LIMIT clause restricts the number of rows returned:

SELECT * FROM products
ORDER BY created_at DESC
LIMIT 10;

This returns the 10 most recently created products.

For pagination, combine LIMIT with OFFSET:

SELECT * FROM products
ORDER BY product_id
LIMIT 20 OFFSET 40; -- Returns products 41-60

This pattern is essential for:

  • Pagination in web applications
  • Preventing overwhelming result sets
  • Breaking large data processing into chunks

Best practice: Always use ORDER BY with LIMIT to ensure consistent results.

Distinct Values

Use DISTINCT to remove duplicates from results:

SELECT DISTINCT category FROM products;

For multiple columns, DISTINCT considers the unique combination of values:

SELECT DISTINCT category, status FROM products;

DISTINCT ON allows selecting one row per group:

SELECT DISTINCT ON (category) category, name, price
FROM products
ORDER BY category, price DESC;

This returns the most expensive product in each category.

Conditional Logic with CASE

The CASE expression provides if-then-else logic in SELECT statements:

SELECT
    name,
    price,
    CASE
        WHEN price < 50 THEN 'Budget'
        WHEN price BETWEEN 50 AND 200 THEN 'Mid-range'
        ELSE 'Premium'
    END AS price_category
FROM products;

CASE has two forms:

  1. Simple CASE (compares an expression to values):
CASE category
    WHEN 'Electronics' THEN 'Tech'
    WHEN 'Clothing' THEN 'Apparel'
    ELSE category
END AS department
  1. Searched CASE (evaluates Boolean expressions):
CASE
    WHEN stock = 0 THEN 'Out of stock'
    WHEN stock < 10 THEN 'Low stock'
    WHEN stock < 50 THEN 'Medium stock'
    ELSE 'Good stock'
END AS stock_status

CASE expressions can be used anywhere an expression is allowed:

  • SELECT list
  • WHERE clause
  • ORDER BY clause
  • GROUP BY clause

Updating Data (Update)

The UPDATE statement modifies existing rows in a table.

Basic Update Syntax

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Breaking down this syntax:

  1. UPDATE table_name: Specifies which table to update
  2. SET column1 = value1, ...: Lists columns to modify with new values
  3. WHERE condition: Restricts which rows are updated

Warning: Without a WHERE clause, UPDATE modifies ALL rows in the table.

Example with explanation:

UPDATE products
SET stock = 0, status = 'discontinued'
WHERE product_id = 101;

This statement:

  • Targets the products table
  • Sets stock to 0 and status to 'discontinued'
  • Only affects the row where product_id equals 101

Update with Calculation

You can reference existing column values in the SET clause:

UPDATE products
SET
    price = price * 1.05,
    last_price_update = NOW()
WHERE category = 'Electronics';

This increases prices by 5% and sets the update timestamp for all electronics products.

You can also use more complex expressions:

UPDATE order_items
SET
    subtotal = quantity * unit_price,
    discount = CASE
                 WHEN quantity >= 10 THEN quantity * unit_price * 0.15
                 WHEN quantity >= 5 THEN quantity * unit_price * 0.10
                 ELSE 0
               END
WHERE order_id = 1005;

Update with Returning

The RETURNING clause returns data from updated rows:

UPDATE orders
SET status = 'shipped', shipped_date = NOW()
WHERE order_id = 1005
RETURNING order_id, status, shipped_date;

This is useful for:

  • Confirming the updated values
  • Getting computed values
  • Chaining operations without additional queries

Update from Another Table

PostgreSQL allows updates based on data from other tables:

UPDATE products p
SET stock = p.stock + i.quantity
FROM inventory_updates i
WHERE p.product_id = i.product_id;

This statement:

  1. Joins products with inventory_updates on product_id
  2. Updates the stock in products by adding the quantity from inventory_updates

For more complex joins:

UPDATE products p
SET
    category_id = c.category_id,
    category_name = c.name
FROM new_categories c
WHERE p.category_name = c.old_name;

Deleting Data (Delete)

The DELETE statement removes rows from a table.

Basic Delete Syntax

DELETE FROM table_name
WHERE condition;

Breaking down this syntax:

  1. DELETE FROM table_name: Specifies which table to delete from
  2. WHERE condition: Restricts which rows are deleted

Warning: Without a WHERE clause, DELETE removes ALL rows from the table.

Example with explanation:

DELETE FROM products
WHERE product_id = 101;

This statement removes the product with ID 101 from the products table.

Delete with Returning

The RETURNING clause returns data from deleted rows:

DELETE FROM products
WHERE stock = 0 AND status = 'discontinued'
RETURNING product_id, name;

This is useful for:

  • Keeping a record of what was deleted
  • Performing further operations with the deleted data
  • Confirming what was removed

Delete Using Joins

To delete rows based on data in other tables:

DELETE FROM products p
USING order_items oi
WHERE p.product_id = oi.product_id
  AND oi.order_id = 1005;

This statement:

  1. Joins products with order_items
  2. Deletes products that appear in order 1005

Delete All Rows

To remove all rows from a table while maintaining the table structure:

DELETE FROM temp_logs;

For better performance when deleting all rows:

TRUNCATE TABLE temp_logs;

TRUNCATE is faster because it:

  • Doesn't scan individual rows
  • Doesn't trigger row-level triggers
  • Resets sequences if specified
  • Requires fewer transaction log entries

Example with options:

TRUNCATE TABLE temp_logs, audit_trail
RESTART IDENTITY    -- Reset sequences
CASCADE;            -- Also truncate dependent tables

Filtering Data with WHERE

The WHERE clause is crucial for targeting specific data in all DML operations.

Comparison Operators

These operators compare values:

  • = Equal to

    SELECT * FROM products WHERE category = 'Electronics';
  • <> or != Not equal to

    SELECT * FROM products WHERE status <> 'discontinued';
  • < Less than

    SELECT * FROM orders WHERE total_amount < 100;
  • > Greater than

    SELECT * FROM products WHERE price > 1000;
  • <= Less than or equal to

    SELECT * FROM employees WHERE hire_date <= '2022-01-01';
  • >= Greater than or equal to

    SELECT * FROM inventory WHERE stock >= 10;

Logical Operators

Logical operators combine conditions:

  • AND Both conditions must be true

    SELECT * FROM products WHERE category = 'Electronics' AND price < 500;
  • OR Either condition can be true

    SELECT * FROM customers WHERE country = 'US' OR country = 'CA';
  • NOT Negates a condition

    SELECT * FROM products WHERE NOT (category = 'Electronics' OR category = 'Accessories');

Pattern Matching

For text searching:

  • LIKE Case-sensitive pattern matching

    SELECT * FROM products WHERE name LIKE 'Smart%'; -- Starts with "Smart"
  • ILIKE Case-insensitive pattern matching

    SELECT * FROM customers WHERE email ILIKE '%@gmail.com'; -- Ends with "@gmail.com"

Pattern matching wildcards:

  • % Matches any sequence of characters
  • _ Matches any single character

Examples:

  • 'A%' - Starts with A
  • '%Z' - Ends with Z
  • '%keyword%' - Contains "keyword"
  • 'A_C' - Three letters, starts with A, ends with C, any letter in between

For more advanced pattern matching:

  • SIMILAR TO Regular expression pattern matching

    SELECT * FROM products WHERE name SIMILAR TO '(Phone|Tablet)%';
  • ~ Matches regular expression, case-sensitive

    SELECT * FROM customers WHERE email ~ '^[a-zA-Z0-9._%+-]+@gmail\.com$';
  • ~* Matches regular expression, case-insensitive

    SELECT * FROM products WHERE description ~* 'wireless|bluetooth';

Range Conditions

For values within a range:

  • BETWEEN Within inclusive range

    SELECT * FROM products WHERE price BETWEEN 10 AND 50;

    This is equivalent to price >= 10 AND price <= 50

  • NOT BETWEEN Outside inclusive range

    SELECT * FROM orders WHERE order_date NOT BETWEEN '2023-01-01' AND '2023-01-31';

List Conditions

For matching against a set of values:

  • IN Matches any value in list

    SELECT * FROM products WHERE category IN ('Electronics', 'Accessories', 'Wearables');

    This is equivalent to category = 'Electronics' OR category = 'Accessories' OR category = 'Wearables'

  • NOT IN Matches no values in list

    SELECT * FROM customers WHERE country NOT IN ('US', 'CA', 'MX');

NULL Conditions

For handling NULL values (which represent missing data):

  • IS NULL Value is NULL

    SELECT * FROM orders WHERE shipped_date IS NULL;
  • IS NOT NULL Value is not NULL

    SELECT * FROM customers WHERE phone IS NOT NULL;

Important: NULL cannot be compared with = or <>. Always use IS NULL or IS NOT NULL.

Existence Check

For checking if related data exists:

  • EXISTS Subquery returns at least one row
    SELECT * FROM customers c
    WHERE EXISTS (
        SELECT 1
        FROM orders o
        WHERE o.customer_id = c.customer_id
    );
    This finds customers who have placed at least one order.

Advanced Querying Techniques

PostgreSQL-Specific Features

Advanced Data Types

JSONB for Document Storage

CREATE TABLE document_store (
    id SERIAL PRIMARY KEY,
    document JSONB NOT NULL
);

-- Insert JSON document
INSERT INTO document_store (document)
VALUES ('{"name": "John Doe", "contact": {"email": "[email protected]", "phone": "555-1234"}, "tags": ["customer", "premium"]}');

-- Query JSON data
SELECT
    document->'name' AS name,
    document->'contact'->>'email' AS email,
    document->'tags' AS tags
FROM document_store;

-- Search within JSON array
SELECT * FROM document_store
WHERE document->'tags' ? 'premium';

-- Update JSON data
UPDATE document_store
SET document = document || '{"status": "active", "last_login": "2023-05-01"}'::jsonb
WHERE id = 1;

Arrays

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    categories TEXT[] NOT NULL,
    available_sizes VARCHAR(3)[]
);

-- Insert with arrays
INSERT INTO products (name, categories, available_sizes)
VALUES ('T-shirt', ARRAY['Clothing', 'Casual', 'Summer'], ARRAY['S', 'M', 'L', 'XL']);

-- Query with arrays
SELECT * FROM products WHERE 'Casual' = ANY(categories);
SELECT * FROM products WHERE categories @> ARRAY['Clothing', 'Summer'];
SELECT * FROM products WHERE available_sizes && ARRAY['M', 'L']; -- Overlap

-- Expand array into rows
SELECT product_id, name, unnest(categories) AS category
FROM products;

-- Update array elements
UPDATE products
SET categories = array_append(categories, 'Featured')
WHERE product_id = 1;

Ranges

CREATE TABLE room_bookings (
    booking_id SERIAL PRIMARY KEY,
    room_id INTEGER,
    reservation DATERANGE NOT NULL,
    EXCLUDE USING GIST (room_id WITH =, reservation WITH &&)
);

-- Insert range data
INSERT INTO room_bookings (room_id, reservation)
VALUES (101, '[2023-05-01, 2023-05-05)');

-- Query ranges
SELECT * FROM room_bookings
WHERE reservation @> '2023-05-03'::DATE; -- Contains date

SELECT * FROM room_bookings
WHERE reservation && '[2023-05-03, 2023-05-10)'::DATERANGE; -- Overlaps with range

HStore Key-Value Pairs

-- Enable extension
CREATE EXTENSION IF NOT EXISTS hstore;

CREATE TABLE product_attributes (
    product_id INTEGER PRIMARY KEY,
    attributes HSTORE
);

-- Insert hstore data
INSERT INTO product_attributes (product_id, attributes)
VALUES (1, 'color => "red", size => "medium", material => "cotton"');

-- Query hstore
SELECT * FROM product_attributes
WHERE attributes -> 'color' = 'red';

-- Update hstore
UPDATE product_attributes
SET attributes = attributes || 'weight => "200g"'::hstore
WHERE product_id = 1;

UUID Type

CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE sessions (
    session_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    user_id INTEGER NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    data JSONB
);

-- Insert with UUID
INSERT INTO sessions (user_id, data)
VALUES (1001, '{"browser": "Chrome", "ip": "192.168.1.1"}'::jsonb);

Full-Text Search

PostgreSQL includes a powerful full-text search capability:

Creating a TSVector Column

CREATE TABLE articles (
    article_id SERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    body TEXT NOT NULL,
    search_vector TSVECTOR
);

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

-- Populate the search vector
UPDATE articles
SET search_vector = to_tsvector('english', title || ' ' || body);

-- Create a trigger to keep it updated
CREATE FUNCTION articles_search_trigger() RETURNS trigger AS $$
BEGIN
    NEW.search_vector := to_tsvector('english', NEW.title || ' ' || NEW.body);
    RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_search_trigger();

Basic Text Search

-- Search for documents containing 'postgresql' and 'database'
SELECT title
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & database');

-- Search for documents with 'fast' followed by 'query'
SELECT title, ts_rank(search_vector, query) AS rank
FROM articles, to_tsquery('english', 'fast <-> query') query
WHERE search_vector @@ query
ORDER BY rank DESC;

Highlighting Results

SELECT
    title,
    ts_headline(
        'english',
        body,
        to_tsquery('english', 'postgresql & database'),
        'StartSel=<b>, StopSel=</b>, MaxWords=50, MinWords=10'
    ) AS headline
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql & database');

Inheritance

PostgreSQL supports table inheritance for building hierarchies:

CREATE TABLE vehicles (
    vehicle_id SERIAL PRIMARY KEY,
    manufacturer VARCHAR(100),
    model VARCHAR(100),
    year INTEGER
);

CREATE TABLE cars (
    doors INTEGER,
    body_style VARCHAR(50)
) INHERITS (vehicles);

CREATE TABLE trucks (
    payload_capacity NUMERIC,
    bed_length NUMERIC
) INHERITS (vehicles);

-- Queries can target specific tables
SELECT * FROM cars;

-- Or the entire hierarchy
SELECT * FROM vehicles;

-- Or exclude specific child tables
SELECT * FROM ONLY vehicles;

Custom Data Types and Domains

Creating Domains

Domains are custom data types with constraints:

-- Create an email domain
CREATE DOMAIN email AS VARCHAR(255)
    CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

-- Use the domain
CREATE TABLE subscribers (
    subscriber_id SERIAL PRIMARY KEY,
    email email NOT NULL UNIQUE,
    subscribed_at TIMESTAMPTZ DEFAULT NOW()
);

Custom Composite Types

-- Create a custom type
CREATE TYPE address AS (
    street VARCHAR(100),
    city VARCHAR(50),
    state VARCHAR(2),
    zip VARCHAR(10)
);

-- Use the custom type
CREATE TABLE customer_profiles (
    customer_id INTEGER PRIMARY KEY,
    shipping_address address,
    billing_address address
);

-- Insert data with composite type
INSERT INTO customer_profiles (customer_id, shipping_address, billing_address)
VALUES (
    1001,
    ROW('123 Main St', 'Portland', 'OR', '97201'),
    ROW('123 Main St', 'Portland', 'OR', '97201')
);

-- Query composite type fields
SELECT
    customer_id,
    (shipping_address).city AS shipping_city,
    (billing_address).zip AS billing_zip
FROM customer_profiles;

Enumerations

-- Create an enum type
CREATE TYPE order_status AS ENUM ('pending', 'processing', 'shipped', 'delivered', 'canceled');

-- Use the enum type
CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    status order_status NOT NULL DEFAULT 'pending',
    created_at TIMESTAMPTZ DEFAULT NOW()
);

-- Insert with enum
INSERT INTO orders (customer_id, status)
VALUES (1001, 'processing');

-- Invalid enum value will raise an error
-- INSERT INTO orders (customer_id, status) VALUES (1002, 'unknown');

Recursive Queries

PostgreSQL's implementation of Common Table Expressions (CTEs) supports recursive queries:

Organization Hierarchy

WITH RECURSIVE org_hierarchy AS (
    -- Base case: CEO (no manager)
    SELECT employee_id, name, manager_id, 1 AS level, name::TEXT AS path
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    -- Recursive case: all employees under managers
    SELECT
        e.employee_id,
        e.name,
        e.manager_id,
        oh.level + 1 AS level,
        oh.path || ' > ' || e.name AS path
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT level, path, employee_id, name
FROM org_hierarchy
ORDER BY path;

Bill of Materials

CREATE TABLE parts (
    part_id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE assemblies (
    parent_id INTEGER REFERENCES parts(part_id),
    child_id INTEGER REFERENCES parts(part_id),
    quantity INTEGER NOT NULL,
    PRIMARY KEY (parent_id, child_id)
);

WITH RECURSIVE part_hierarchy AS (
    -- Base case: top-level part
    SELECT
        p.part_id,
        p.name,
        0 AS level,
        ARRAY[p.part_id] AS path,
        1::FLOAT AS quantity
    FROM parts p
    WHERE p.part_id = 101  -- Starting with specific product

    UNION ALL

    -- Recursive case: all child parts
    SELECT
        c.part_id,
        c.name,
        ph.level + 1,
        ph.path || c.part_id,
        ph.quantity * a.quantity
    FROM part_hierarchy ph
    JOIN assemblies a ON a.parent_id = ph.part_id
    JOIN parts c ON c.part_id = a.child_id
    WHERE NOT c.part_id = ANY(ph.path)  -- Prevent cycles
)
SELECT level, name, quantity
FROM part_hierarchy
ORDER BY path;

Triggers

Triggers execute functions automatically when specific events occur:

-- Create a function for the trigger
CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = NOW();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply the trigger to a table
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(10,2) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TRIGGER update_products_modtime
BEFORE UPDATE ON products
FOR EACH ROW
EXECUTE FUNCTION update_modified_column();

Auditing Changes with Triggers

-- Create audit table
CREATE TABLE audit_log (
    audit_id SERIAL PRIMARY KEY,
    table_name TEXT NOT NULL,
    operation TEXT NOT NULL,
    record_id INTEGER NOT NULL,
    changed_data JSONB,
    changed_by TEXT DEFAULT CURRENT_USER,
    changed_at TIMESTAMPTZ DEFAULT NOW()
);

-- Create audit trigger function
CREATE OR REPLACE FUNCTION audit_changes()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        INSERT INTO audit_log (table_name, operation, record_id, changed_data)
        VALUES (TG_TABLE_NAME, TG_OP, NEW.product_id, row_to_json(NEW));
    ELSIF TG_OP = 'UPDATE' THEN
        INSERT INTO audit_log (table_name, operation, record_id, changed_data)
        VALUES (
            TG_TABLE_NAME,
            TG_OP,
            NEW.product_id,
            jsonb_build_object(
                'old', row_to_json(OLD),
                'new', row_to_json(NEW)
            )
        );
    ELSIF TG_OP = 'DELETE' THEN
        INSERT INTO audit_log (table_name, operation, record_id, changed_data)
        VALUES (TG_TABLE_NAME, TG_OP, OLD.product_id, row_to_json(OLD));
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;

-- Apply audit trigger
CREATE TRIGGER products_audit
AFTER INSERT OR UPDATE OR DELETE ON products
FOR EACH ROW EXECUTE FUNCTION audit_changes();

Views and Materialized Views

Regular Views

Views are virtual tables based on the result of a SELECT query:

CREATE VIEW product_summary AS
SELECT
    category,
    COUNT(*) AS product_count,
    AVG(price) AS average_price,
    MIN(price) AS min_price,
    MAX(price) AS max_price
FROM products
GROUP BY category;

-- Query the view like a table
SELECT * FROM product_summary WHERE product_count > 10;

Updatable Views

Views can be updatable under certain conditions:

CREATE VIEW active_products AS
SELECT product_id, name, category, price, stock
FROM products
WHERE status = 'active';

-- Insert through the view
INSERT INTO active_products (name, category, price, stock, status)
VALUES ('New Product', 'Electronics', 499.99, 100, 'active');

-- Update through the view
UPDATE active_products
SET price = 449.99
WHERE product_id = 101;

Materialized Views

Materialized views store the query results physically:

CREATE MATERIALIZED VIEW monthly_sales AS
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(total_amount) AS total_sales,
    COUNT(DISTINCT customer_id) AS customer_count
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
WITH DATA;

-- Query the materialized view
SELECT * FROM monthly_sales ORDER BY month DESC;

-- Refresh the materialized view
REFRESH MATERIALIZED VIEW monthly_sales;

-- Concurrent refresh (doesn't block reads)
REFRESH MATERIALIZED VIEW CONCURRENTLY monthly_sales;

Performance Optimization

Indexing Strategies

B-tree Indexes (Default)

-- Basic index
CREATE INDEX idx_products_name ON products(name);

-- Multi-column index
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

-- Index with included columns
CREATE INDEX idx_products_category_incl ON products(category) INCLUDE (price, stock);

Unique Indexes

CREATE UNIQUE INDEX idx_users_email ON users(email);

Partial Indexes

-- Index only active products
CREATE INDEX idx_products_active ON products(name) WHERE status = 'active';

-- Index high-value orders
CREATE INDEX idx_orders_high_value ON orders(customer_id, order_date)
WHERE total_amount > 1000;

Expression Indexes

-- Index on lowercase email for case-insensitive searches
CREATE INDEX idx_users_email_lower ON users(LOWER(email));

-- Index for pattern matching at the beginning of strings
CREATE INDEX idx_products_name_pattern ON products(name text_pattern_ops);

GIN Indexes (Generalized Inverted Indexes)

-- Index for full-text search
CREATE INDEX idx_articles_fts ON articles USING GIN(search_vector);

-- Index for JSONB containment operations
CREATE INDEX idx_document_store_gin ON document_store USING GIN(document);

-- Index for array operations
CREATE INDEX idx_products_categories ON products USING GIN(categories);

BRIN Indexes (Block Range Indexes)

-- For large tables with ordered data (e.g., time-series data)
CREATE INDEX idx_logs_timestamp ON logs USING BRIN(log_time);

GiST Indexes (Generalized Search Tree)

-- Enable postgis extension
CREATE EXTENSION IF NOT EXISTS postgis;

-- Create a GiST index on a geometry column
CREATE INDEX idx_locations_geometry ON locations USING GIST(geometry);

-- Index for range overlaps
CREATE INDEX idx_reservations_daterange ON reservations USING GIST(date_range);

EXPLAIN and Query Analysis

Basic EXPLAIN

EXPLAIN SELECT * FROM products WHERE category = 'Electronics';

EXPLAIN ANALYZE

EXPLAIN ANALYZE SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2023-01-01';

Detailed Output

EXPLAIN (FORMAT JSON, ANALYZE, BUFFERS)
SELECT * FROM products WHERE price > 100;

Visualizing Query Plans

Third-party tools like pgAdmin provide visual query plan diagrams based on EXPLAIN output.

Query Optimization Techniques

Join Optimization

-- Make sure to use appropriate join types
-- Use INNER JOIN when possible instead of LEFT JOIN if all matches required
SELECT p.name, c.name AS category_name
FROM products p
INNER JOIN categories c ON p.category_id = c.category_id
WHERE p.price > 100;

-- Use EXISTS for existence checks (often faster than IN)
SELECT name, email
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date > '2023-01-01'
);

Avoid Function Calls on Indexed Columns

-- Inefficient, won't use index on name
SELECT * FROM products WHERE LOWER(name) = 'laptop';

-- Better approach (with appropriate index)
SELECT * FROM products WHERE name ILIKE 'laptop';

-- Or with expression index
CREATE INDEX idx_products_name_lower ON products(LOWER(name));
SELECT * FROM products WHERE LOWER(name) = 'laptop';

Minimize Data Transfer

-- Select only needed columns
SELECT customer_id, name, email FROM customers
WHERE signup_date > '2023-01-01';

-- Use LIMIT to restrict result size
SELECT * FROM logs
ORDER BY log_time DESC
LIMIT 1000;

Use COPY for Bulk Operations

-- Export query results to file
COPY (
    SELECT * FROM customers WHERE country = 'US'
) TO '/tmp/us_customers.csv' WITH CSV HEADER;

-- Import data from file
COPY customers(name, email, signup_date)
FROM '/tmp/new_customers.csv' WITH CSV HEADER;

Partitioning

PostgreSQL supports table partitioning for managing large tables:

Declarative Range Partitioning

CREATE TABLE orders (
    order_id SERIAL,
    customer_id INTEGER NOT NULL,
    order_date DATE NOT NULL,
    total_amount NUMERIC(10,2),
    PRIMARY KEY (order_id, order_date)
) PARTITION BY RANGE (order_date);

-- Create partitions
CREATE TABLE orders_2022 PARTITION OF orders
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');

CREATE TABLE orders_2023 PARTITION OF orders
    FOR VALUES FROM ('2023-01-01') TO ('2024-01-01');

CREATE TABLE orders_2024 PARTITION OF orders
    FOR VALUES FROM ('2024-01-01') TO ('2025-01-01');

List Partitioning

CREATE TABLE sales (
    sale_id SERIAL,
    product_id INTEGER NOT NULL,
    sale_date DATE NOT NULL,
    region VARCHAR(20) NOT NULL,
    amount NUMERIC(10,2),
    PRIMARY KEY (sale_id, region)
) PARTITION BY LIST (region);

CREATE TABLE sales_na PARTITION OF sales
    FOR VALUES IN ('US', 'CA', 'MX');

CREATE TABLE sales_eu PARTITION OF sales
    FOR VALUES IN ('UK', 'FR', 'DE', 'IT', 'ES');

CREATE TABLE sales_asia PARTITION OF sales
    FOR VALUES IN ('CN', 'JP', 'KR', 'IN');

Hash Partitioning

CREATE TABLE events (
    event_id SERIAL,
    device_id INTEGER NOT NULL,
    event_type VARCHAR(50) NOT NULL,
    event_time TIMESTAMPTZ NOT NULL,
    data JSONB,
    PRIMARY KEY (event_id, device_id)
) PARTITION BY HASH (device_id);

CREATE TABLE events_p0 PARTITION OF events
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE events_p1 PARTITION OF events
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE events_p2 PARTITION OF events
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE events_p3 PARTITION OF events
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Partition Maintenance

-- Create a new partition for future data
CREATE TABLE orders_2025 PARTITION OF orders
    FOR VALUES FROM ('2025-01-01') TO ('2026-01-01');

-- Detach a partition (stops routing new data, keeps existing data)
ALTER TABLE orders DETACH PARTITION orders_2022;

-- Attach an existing table as a partition
ALTER TABLE orders ATTACH PARTITION orders_2022
    FOR VALUES FROM ('2022-01-01') TO ('2023-01-01');

Transaction Management

Transaction Basics

A transaction is a sequence of operations performed as a single logical unit of work:

-- Start a transaction
BEGIN;

-- Perform operations within the transaction
UPDATE accounts SET balance = balance - 100 WHERE account_id = 123;
UPDATE accounts SET balance = balance + 100 WHERE account_id = 456;

-- Commit the transaction
COMMIT;

-- Or rollback if needed
-- ROLLBACK;

ACID Properties

PostgreSQL transactions adhere to ACID properties:

  • Atomicity: Transactions are all-or-nothing (fully complete or fully rolled back)
  • Consistency: Transactions bring the database from one valid state to another
  • Isolation: Concurrent transactions don't interfere with each other
  • Durability: Once committed, changes persist even after system failures

Transaction Isolation Levels

PostgreSQL supports all four standard isolation levels:

-- Set isolation level for the current transaction
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- ... transaction operations ...
COMMIT;

Available isolation levels:

  • READ UNCOMMITTED: In PostgreSQL, this behaves the same as READ COMMITTED
  • READ COMMITTED: Queries see only data committed before the query began (default)
  • REPEATABLE READ: All queries in a transaction see a consistent snapshot of the database
  • SERIALIZABLE: Complete isolation; concurrent serializable transactions act as if executed sequentially

Savepoints

Savepoints allow partial rollbacks within a transaction:

BEGIN;

UPDATE products SET stock = stock - 1 WHERE product_id = 101;

-- Create a savepoint
SAVEPOINT after_stock_update;

UPDATE orders SET status = 'processing' WHERE order_id = 5001;

-- Oops, something went wrong, rollback to the savepoint
ROLLBACK TO after_stock_update;

-- Continue with different operations
UPDATE orders SET status = 'pending' WHERE order_id = 5001;

COMMIT;

Advisory Locks

PostgreSQL provides application-level locks for coordination:

-- Attempt to acquire an exclusive advisory lock
SELECT pg_try_advisory_lock(123);

-- Release the lock when done
SELECT pg_advisory_unlock(123);

-- Session-level locks (automatically released at end of session)
SELECT pg_advisory_lock(hash_numeric(customer_id))
FROM customers WHERE name = 'Acme Corp';

-- Do work while holding the lock...

-- Release the lock
SELECT pg_advisory_unlock(hash_numeric(customer_id))
FROM customers WHERE name = 'Acme Corp';

Handling Deadlocks

PostgreSQL automatically detects and resolves deadlocks:

-- Set statement timeout to avoid waiting too long
SET statement_timeout = '5s';

-- Handle deadlock errors in application code
BEGIN;
    -- ... operations that might deadlock ...
EXCEPTION WHEN deadlock_detected THEN
    -- Handle the error
    ROLLBACK;
END;

Security in PostgreSQL

User Management

Creating Roles/Users

-- Create a login role (user)
CREATE ROLE alice WITH LOGIN PASSWORD 'secure_password';

-- Create a group role
CREATE ROLE developers;

-- Add a user to a group
GRANT developers TO alice;

-- Create a superuser (admin)
CREATE ROLE admin WITH SUPERUSER LOGIN PASSWORD 'very_secure_password';

Role Attributes

-- User can create databases
CREATE ROLE dbcreator WITH CREATEDB LOGIN PASSWORD 'secure_password';

-- User can create roles
CREATE ROLE usermanager WITH CREATEROLE LOGIN PASSWORD 'secure_password';

-- Set connection limit for a role
ALTER ROLE webservice WITH CONNECTION LIMIT 100;

-- Set validity period for a role
CREATE ROLE temp_user WITH LOGIN PASSWORD 'temporary' VALID UNTIL '2023-12-31';

Privilege Management

Basic Privileges

-- Grant table privileges
GRANT SELECT, INSERT, UPDATE ON products TO sales_team;

-- Grant all privileges on a table
GRANT ALL PRIVILEGES ON orders TO order_processor;

-- Grant column-level privileges
GRANT SELECT (customer_id, name, email) ON customers TO marketing;
GRANT UPDATE (status, updated_at) ON orders TO support_team;

-- Grant privileges on all tables in a schema
GRANT SELECT ON ALL TABLES IN SCHEMA public TO analyst;

-- Revoke privileges
REVOKE INSERT, UPDATE, DELETE ON customers FROM marketing;

Default Privileges

-- Set default privileges for new tables
ALTER DEFAULT PRIVILEGES
FOR ROLE app_owner
IN SCHEMA app
GRANT SELECT ON TABLES TO readonly_role;

Schema Privileges

-- Grant usage on schema
GRANT USAGE ON SCHEMA analytics TO analyst;

-- Grant create on schema
GRANT CREATE ON SCHEMA app TO developer;

Row-Level Security (RLS)

-- Enable row-level security on a table
ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY;

-- Create a policy for account managers
CREATE POLICY account_managers_policy ON customer_data
    FOR ALL
    TO account_managers
    USING (manager_id = current_user_id());

-- Create policy for customers to see only their own data
CREATE POLICY customer_data_policy ON customer_data
    FOR SELECT
    TO customer_role
    USING (customer_id = current_customer_id());

-- Create policy for different operations
CREATE POLICY update_own_data ON customer_data
    FOR UPDATE
    TO customer_role
    USING (customer_id = current_customer_id())
    WITH CHECK (customer_id = current_customer_id());

Data Encryption

Column-Level Encryption

-- Using pgcrypto extension
CREATE EXTENSION IF NOT EXISTS pgcrypto;

-- Store encrypted data
CREATE TABLE sensitive_data (
    id SERIAL PRIMARY KEY,
    name TEXT,
    ssn TEXT,
    encrypted_ssn BYTEA
);

-- Insert with encryption
INSERT INTO sensitive_data (name, ssn, encrypted_ssn)
VALUES (
    'John Doe',
    '123-45-6789',
    pgp_sym_encrypt('123-45-6789', 'encryption_key')
);

-- Query encrypted data
SELECT
    name,
    pgp_sym_decrypt(encrypted_ssn, 'encryption_key') AS decrypted_ssn
FROM sensitive_data;

Transport Layer Security (TLS)

Configure postgresql.conf for encrypted connections:

ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'

Client Connection Control

In pg_hba.conf:

# Require SSL for remote connections
hostssl all all 0.0.0.0/0 scram-sha-256

Audit Logging

Using PostgreSQL's Built-in Logging

In postgresql.conf:

log_statement = 'mod'           # Log all modification statements
log_min_duration_statement = 1000  # Log statements taking more than 1 second

Audit Triggers

As shown in the Triggers section, custom audit triggers can be implemented to track all changes to sensitive tables.

Advanced PostgreSQL Capabilities

Stored Procedures and Functions

Creating a Basic Function

CREATE OR REPLACE FUNCTION calculate_discount(
    price NUMERIC,
    discount_percent NUMERIC
) RETURNS NUMERIC AS $$
BEGIN
    RETURN ROUND(price * (1 - discount_percent / 100), 2);
END;
$$ LANGUAGE plpgsql;

-- Use the function
SELECT
    name,
    price,
    calculate_discount(price, 15) AS discounted_price
FROM products;

Table-Returning Functions

CREATE OR REPLACE FUNCTION get_products_by_category(
    category_name VARCHAR
) RETURNS TABLE (
    product_id INTEGER,
    product_name VARCHAR,
    price NUMERIC
) AS $$
BEGIN
    RETURN QUERY
    SELECT p.product_id, p.name, p.price
    FROM products p
    JOIN categories c ON p.category_id = c.category_id
    WHERE c.name = category_name
    ORDER BY p.price DESC;
END;
$$ LANGUAGE plpgsql;

-- Use the function
SELECT * FROM get_products_by_category('Electronics');

Stored Procedures (PostgreSQL 11+)

CREATE OR REPLACE PROCEDURE process_order(
    order_id_param INTEGER,
    process_payment BOOLEAN DEFAULT true
) AS $$
DECLARE
    total_amount NUMERIC;
BEGIN
    -- Update order status
    UPDATE orders SET status = 'processing'
    WHERE order_id = order_id_param;

    -- Get the total amount
    SELECT SUM(price * quantity) INTO total_amount
    FROM order_items oi
    JOIN products p ON oi.product_id = p.product_id
    WHERE oi.order_id = order_id_param;

    -- Process payment if requested
    IF process_payment THEN
        INSERT INTO payments (order_id, amount, payment_date)
        VALUES (order_id_param, total_amount, NOW());
    END IF;

    COMMIT;
END;
$$ LANGUAGE plpgsql;

-- Call the procedure
CALL process_order(1001, true);

Functions in Different Languages

PLV8 (JavaScript):

CREATE EXTENSION IF NOT EXISTS plv8;

CREATE OR REPLACE FUNCTION calculate_distance_js(
    lat1 FLOAT, lon1 FLOAT,
    lat2 FLOAT, lon2 FLOAT
) RETURNS FLOAT AS $$
    const R = 6371; // Earth's radius in km

    function toRad(degrees) {
        return degrees * Math.PI / 180;
    }

    const dLat = toRad(lat2 - lat1);
    const dLon = toRad(lon2 - lon1);

    const a = Math.sin(dLat/2) * Math.sin(dLat/2) +
              Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) *
              Math.sin(dLon/2) * Math.sin(dLon/2);

    const c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));

    return R * c;
$$ LANGUAGE plv8;

PL/Python:

CREATE EXTENSION IF NOT EXISTS plpython3u;

CREATE OR REPLACE FUNCTION validate_email_py(email TEXT)
RETURNS BOOLEAN AS $$
import re
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
$$ LANGUAGE plpython3u;

Loops and Control Flow

For Loop

CREATE OR REPLACE FUNCTION generate_dates(
    start_date DATE,
    end_date DATE
) RETURNS TABLE (date_value DATE) AS $$
BEGIN
    FOR date_value IN
        SELECT generate_series(start_date, end_date, '1 day'::interval)::date
    LOOP
        RETURN NEXT;
    END LOOP;
    RETURN;
END;
$$ LANGUAGE plpgsql;

While Loop

CREATE OR REPLACE FUNCTION fibonacci(n INTEGER)
RETURNS INTEGER AS $$
DECLARE
    i INTEGER := 0;
    j INTEGER := 1;
    temp INTEGER;
    counter INTEGER := 0;
BEGIN
    IF n < 1 THEN
        RETURN 0;
    END IF;

    WHILE counter < n - 1 LOOP
        temp := i + j;
        i := j;
        j := temp;
        counter := counter + 1;
    END LOOP;

    RETURN j;
END;
$$ LANGUAGE plpgsql;

Conditional Logic

CREATE OR REPLACE FUNCTION get_tax_rate(state_code VARCHAR)
RETURNS NUMERIC AS $$
BEGIN
    CASE state_code
        WHEN 'CA' THEN RETURN 7.25;
        WHEN 'NY' THEN RETURN 8.875;
        WHEN 'WA' THEN RETURN 6.5;
        WHEN 'TX' THEN RETURN 6.25;
        ELSE RETURN 0;
    END CASE;
END;
$$ LANGUAGE plpgsql;

Cursors

Cursors allow row-by-row processing of query results:

CREATE OR REPLACE FUNCTION process_large_result()
RETURNS void AS $$
DECLARE
    curs CURSOR FOR SELECT * FROM large_table ORDER BY id;
    row_data large_table%ROWTYPE;
    counter INTEGER := 0;
BEGIN
    -- Open the cursor
    OPEN curs;

    LOOP
        -- Fetch the next row
        FETCH curs INTO row_data;

        -- Exit when no more rows
        EXIT WHEN NOT FOUND;

        -- Process the row
        -- ... do something with row_data ...

        counter := counter + 1;

        -- Process in batches of 1000
        IF counter % 1000 = 0 THEN
            COMMIT;
            RAISE NOTICE 'Processed % rows', counter;
        END IF;
    END LOOP;

    -- Close the cursor
    CLOSE curs;
END;
$$ LANGUAGE plpgsql;

Error Handling

CREATE OR REPLACE FUNCTION transfer_funds(
    from_account_id INTEGER,
    to_account_id INTEGER,
    transfer_amount NUMERIC
) RETURNS BOOLEAN AS $$
DECLARE
    from_balance NUMERIC;
BEGIN
    -- Check if accounts exist
    PERFORM 1 FROM accounts WHERE account_id IN (from_account_id, to_account_id);
    IF NOT FOUND THEN
        RAISE EXCEPTION 'One or both accounts do not exist';
    END IF;

    -- Get current balance
    SELECT balance INTO from_balance
    FROM accounts
    WHERE account_id = from_account_id;

    -- Check sufficient funds
    IF from_balance < transfer_amount THEN
        RAISE EXCEPTION 'Insufficient funds (balance: %, needed: %)',
                        from_balance, transfer_amount;
    END IF;

    -- Perform the transfer
    UPDATE accounts SET balance = balance - transfer_amount
    WHERE account_id = from_account_id;

    UPDATE accounts SET balance = balance + transfer_amount
    WHERE account_id = to_account_id;

    RETURN true;
EXCEPTION
    WHEN OTHERS THEN
        -- Log the error
        INSERT INTO error_log (function_name, error_message, error_detail)
        VALUES ('transfer_funds', SQLERRM, SQLSTATE);

        -- Roll back any changes
        RAISE;
END;
$$ LANGUAGE plpgsql;

Event Triggers

Event triggers fire when database-level events occur:

-- Create a function to log DDL commands
CREATE OR REPLACE FUNCTION log_ddl_events()
RETURNS event_trigger AS $$
BEGIN
    INSERT INTO ddl_history (event_type, object_type, object_identity, command_text, username)
    VALUES (
        TG_EVENT,
        tg_tag,
        (SELECT object_identity FROM pg_event_trigger_ddl_commands() LIMIT 1),
        current_query(),
        current_user
    );
END;
$$ LANGUAGE plpgsql;

-- Create the event trigger
CREATE EVENT TRIGGER log_ddl ON ddl_command_end
EXECUTE FUNCTION log_ddl_events();

Using PostgreSQL in Applications

Connection Pooling

Connection pooling improves performance by reusing database connections:

PgBouncer

# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

Connection String with pgBouncer

postgresql://username:password@localhost:6432/mydb

Data Migration Strategies

Using pg_dump and pg_restore

# Create a full database backup
pg_dump -h localhost -U myuser -d mydb -F c -f mydb_backup.dump

# Restore from backup
pg_restore -h localhost -U myuser -d mydb_new mydb_backup.dump

# Backup a specific schema
pg_dump -h localhost -U myuser -d mydb -n myschema -F c -f myschema_backup.dump

# Backup specific tables
pg_dump -h localhost -U myuser -d mydb -t mytable1 -t mytable2 -F c -f tables_backup.dump

Schema Migration Tools

Using Flyway:

|- db/migrations/
   |- V1__Create_initial_tables.sql
   |- V2__Add_user_authentication.sql
   |- V3__Add_product_categories.sql

Example V1 migration:

-- V1__Create_initial_tables.sql
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    price NUMERIC(10,2) NOT NULL,
    stock INTEGER NOT NULL DEFAULT 0,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Flyway command:

flyway -url=jdbc:postgresql://localhost:5432/mydb \
       -user=myuser \
       -password=mypassword \
       -locations=filesystem:db/migrations \
       migrate

Deployment, Maintenance, and Operations

Configuration Tuning

Key configuration parameters in postgresql.conf:

Memory Settings

# Memory allocation
shared_buffers = 2GB                  # 25% of available RAM up to 8GB
work_mem = 64MB                       # For sorting operations
maintenance_work_mem = 256MB          # For VACUUM, CREATE INDEX, etc.
effective_cache_size = 6GB            # Estimate of disk cache available

# Background writer
bgwriter_delay = 200ms
bgwriter_lru_maxpages = 100
bgwriter_lru_multiplier = 2.0

Write-Ahead Log (WAL)

# WAL settings
wal_level = replica                   # Needed for replication
max_wal_size = 1GB
min_wal_size = 80MB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

Query Planner

# Query planner settings
random_page_cost = 1.1                # Lower for SSDs
effective_io_concurrency = 200        # Higher for SSDs
default_statistics_target = 100       # Increase for complex queries

Connections

# Connection settings
max_connections = 200

Backup and Recovery

Logical Backups

# Create a full database backup
pg_dump -h localhost -U postgres -d mydb > mydb_backup.sql

# Backup specific tables
pg_dump -h localhost -U postgres -d mydb -t mytable1 -t mytable2 > tables_backup.sql

# Backup all databases
pg_dumpall -h localhost -U postgres > all_dbs_backup.sql

# Restore from backup
psql -h localhost -U postgres -d mydb < mydb_backup.sql

Physical Backups (Base Backup)

# Create a base backup
pg_basebackup -h localhost -U postgres -D /backup/path -Ft -z -P

# Restore from base backup
# 1. Stop PostgreSQL
systemctl stop postgresql

# 2. Clear data directory
rm -rf /var/lib/postgresql/data/*

# 3. Extract backup
tar -xzf /backup/path/base.tar.gz -C /var/lib/postgresql/data/

# 4. Create recovery.conf
echo "restore_command = 'cp /backup/path/pg_wal/%f %p'" > /var/lib/postgresql/data/recovery.conf

# 5. Start PostgreSQL
systemctl start postgresql

Point-In-Time Recovery (PITR)

# In postgresql.conf
wal_level = replica
archive_mode = on
archive_command = 'cp %p /archive/path/%f'

Recovery steps:

# 1. Restore base backup as above

# 2. Create recovery.conf
echo "restore_command = 'cp /archive/path/%f %p'" > /var/lib/postgresql/data/recovery.conf
echo "recovery_target_time = '2023-05-25 14:30:00'" >> /var/lib/postgresql/data/recovery.conf

# 3. Start PostgreSQL
systemctl start postgresql

Monitoring and Maintenance

VACUUM and ANALYZE

-- Basic vacuum (reclaims space, updates statistics)
VACUUM ANALYZE;

-- Vacuum specific tables
VACUUM ANALYZE products, orders;

-- Full vacuum (locks table, reclaims more space)
VACUUM FULL orders;

-- Just update statistics
ANALYZE;

Monitoring Queries

-- Find long-running queries
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '5 minutes'
ORDER BY duration DESC;

-- Cancel a specific query
SELECT pg_cancel_backend(12345);  -- PID of the process

-- Terminate a connection
SELECT pg_terminate_backend(12345);  -- PID of the process

Table Bloat Check

SELECT
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename)) AS total_size,
    pg_size_pretty(pg_relation_size(schemaname || '.' || tablename)) AS table_size,
    pg_size_pretty(pg_total_relation_size(schemaname || '.' || tablename) - pg_relation_size(schemaname || '.' || tablename)) AS index_size,
    ROUND(100 * (pg_relation_size(schemaname || '.' || tablename) - pg_relation_size(schemaname || '.' || tablename, 'main', 'vm')) / pg_relation_size(schemaname || '.' || tablename)) AS bloat_percentage
FROM
    pg_catalog.pg_statio_user_tables
ORDER BY
    bloat_percentage DESC LIMIT 20;

Index Usage Statistics

SELECT
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid::regclass)) AS index_size
FROM
    pg_stat_user_indexes
ORDER BY
    idx_scan DESC;

Replication and High Availability

Streaming Replication Setup

Primary server (postgresql.conf):

listen_addresses = '*'
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10

Primary server (pg_hba.conf):

# Allow replication connections
host    replication     replicator      10.0.0.0/24          md5

Create replication user:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secure_password';

Standby server setup:

# Create a base backup
pg_basebackup -h primary_server -U replicator -D /var/lib/postgresql/data -P -Xs -R

Standby server (postgresql.conf):

primary_conninfo = 'host=primary_server port=5432 user=replicator password=secure_password'
hot_standby = on

Logical Replication

-- On primary server
CREATE PUBLICATION my_publication FOR TABLE customers, orders, products;

-- On replica server
CREATE SUBSCRIPTION my_subscription
CONNECTION 'host=primary_server port=5432 dbname=mydb user=replicator password=secure_password'
PUBLICATION my_publication;

Connection Pooling and Load Balancing

Using pgpool-II for load balancing:

# pgpool.conf
backend_hostname0 = 'primary.example.com'
backend_port0 = 5432
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/data'

backend_hostname1 = 'replica1.example.com'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/data'

load_balance_mode = on

PostgreSQL Best Practices

Database Design

Normalization Guidelines

  1. First Normal Form (1NF)

    • Each table cell should contain a single value
    • Each record needs to be unique
  2. Second Normal Form (2NF)

    • Meet all requirements of 1NF
    • All non-key attributes depend on the entire primary key
  3. Third Normal Form (3NF)

    • Meet all requirements of 2NF
    • All attributes are directly dependent on the primary key

Example of normalization:

Unnormalized:

CREATE TABLE orders_unnormalized (
    order_id INTEGER,
    customer_name VARCHAR(100),
    customer_email VARCHAR(100),
    customer_address VARCHAR(200),
    product_id INTEGER,
    product_name VARCHAR(100),
    product_category VARCHAR(50),
    quantity INTEGER,
    price NUMERIC(10,2),
    order_date DATE
);

Normalized (3NF):

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100) UNIQUE,
    address VARCHAR(200)
);

CREATE TABLE product_categories (
    category_id SERIAL PRIMARY KEY,
    name VARCHAR(50) UNIQUE
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    category_id INTEGER REFERENCES product_categories(category_id),
    price NUMERIC(10,2)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER REFERENCES customers(customer_id),
    order_date DATE
);

CREATE TABLE order_items (
    order_id INTEGER REFERENCES orders(order_id),
    product_id INTEGER REFERENCES products(product_id),
    quantity INTEGER,
    price_at_time NUMERIC(10,2),
    PRIMARY KEY (order_id, product_id)
);

Effective Use of Constraints

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER NOT NULL REFERENCES customers(customer_id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    status VARCHAR(20) NOT NULL CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'canceled')),
    total_amount NUMERIC(10,2) NOT NULL CHECK (total_amount >= 0),
    shipping_address TEXT NOT NULL,
    CONSTRAINT orders_customer_date_unique UNIQUE (customer_id, order_date, shipping_address)
);

Proper Indexing Strategy

General indexing guidelines:

  • Index foreign keys
  • Index columns used in WHERE clauses
  • Index columns used in JOIN conditions
  • Index columns used in ORDER BY or GROUP BY
  • Avoid over-indexing (indexes slow down writes)
  • Monitor index usage and remove unused indexes
-- Index for foreign key
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Compound index for common query pattern
CREATE INDEX idx_orders_status_date ON orders(status, order_date);

-- Partial index for specific query patterns
CREATE INDEX idx_orders_recent_pending ON orders(order_date)
WHERE status = 'pending' AND order_date > (CURRENT_DATE - INTERVAL '30 days');

-- Expression index for case-insensitive search
CREATE INDEX idx_customers_name_lower ON customers(LOWER(name));

Query Optimization

Use Prepared Statements

Prepared statements improve performance and protect against SQL injection:

-- Define a prepared statement
PREPARE find_products(VARCHAR, NUMERIC) AS
SELECT * FROM products
WHERE category = $1 AND price <= $2;

-- Execute the prepared statement
EXECUTE find_products('Electronics', 1000);

-- Deallocate when done
DEALLOCATE find_products;

Avoid SELECT *

Always specify only the columns you need:

-- Bad: Retrieves all columns
SELECT * FROM customers WHERE country = 'US';

-- Good: Retrieves only needed columns
SELECT customer_id, name, email FROM customers WHERE country = 'US';

Use EXISTS Instead of COUNT for Existence Check

-- Less efficient
SELECT COUNT(*) > 0
FROM orders
WHERE customer_id = 123;

-- More efficient
SELECT EXISTS (
    SELECT 1
    FROM orders
    WHERE customer_id = 123
);

EXPLAIN ANALYZE for Query Analysis

EXPLAIN ANALYZE
SELECT c.name, COUNT(o.order_id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.signup_date > '2023-01-01'
GROUP BY c.name
HAVING COUNT(o.order_id) > 5
ORDER BY order_count DESC;

Use Set-Returning Functions Effectively

-- Less efficient: multiple function calls
SELECT generate_series(1, 1000) AS num,
       random() AS random_value;  -- Called for each row

-- More efficient: single function call
SELECT s.num, r.random_value
FROM generate_series(1, 1000) AS s(num)
CROSS JOIN (SELECT random() AS random_value) AS r;

Performance and Scalability

Effective Use of Transactions

-- Group related operations in a single transaction
BEGIN;

-- Create an order
INSERT INTO orders (customer_id, status, total_amount, shipping_address)
VALUES (123, 'pending', 0, '123 Main St') RETURNING order_id;

-- Add order items
INSERT INTO order_items (order_id, product_id, quantity, price_at_time)
VALUES
    (currval('orders_order_id_seq'), 101, 2, 29.99),
    (currval('orders_order_id_seq'), 102, 1, 49.99);

-- Update order total
UPDATE orders
SET total_amount = (
    SELECT SUM(quantity * price_at_time)
    FROM order_items
    WHERE order_id = currval('orders_order_id_seq')
)
WHERE order_id = currval('orders_order_id_seq');

COMMIT;

Batch Processing for Large Operations

-- Process in batches of 1000
DO $$
DECLARE
    batch_size INTEGER := 1000;
    total_rows INTEGER;
    processed_rows INTEGER := 0;
BEGIN
    SELECT COUNT(*) INTO total_rows FROM large_table WHERE needs_processing = true;

    WHILE processed_rows < total_rows LOOP
        -- Process a batch
        WITH batch AS (
            SELECT id
            FROM large_table
            WHERE needs_processing = true
            ORDER BY id
            LIMIT batch_size
            FOR UPDATE SKIP LOCKED
        )
        UPDATE large_table lt
        SET
            processed = true,
            needs_processing = false,
            processed_at = NOW()
        FROM batch
        WHERE lt.id = batch.id;

        processed_rows := processed_rows + batch_size;
        COMMIT;

        RAISE NOTICE 'Processed % of % rows', LEAST(processed_rows, total_rows), total_rows;
    END LOOP;
END $$;

Partitioning for Large Tables

-- Create a partitioned table for time-series data
CREATE TABLE metrics (
    metric_id BIGSERIAL,
    device_id INTEGER NOT NULL,
    metric_time TIMESTAMPTZ NOT NULL,
    value NUMERIC NOT NULL,
    PRIMARY KEY (metric_id, metric_time)
) PARTITION BY RANGE (metric_time);

-- Create monthly partitions
CREATE TABLE metrics_y2023m01 PARTITION OF metrics
    FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');

CREATE TABLE metrics_y2023m02 PARTITION OF metrics
    FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');

-- Create a function to manage partitions automatically
CREATE OR REPLACE FUNCTION create_metrics_partition()
RETURNS TRIGGER AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date TEXT;
    end_date TEXT;
BEGIN
    partition_date := DATE_TRUNC('month', NEW.metric_time);
    partition_name := 'metrics_y' || TO_CHAR(partition_date, 'YYYY') || 'm' || TO_CHAR(partition_date, 'MM');
    start_date := partition_date;
    end_date := partition_date + INTERVAL '1 month';

    IF NOT EXISTS (SELECT 1 FROM pg_class WHERE relname = partition_name) THEN
        EXECUTE FORMAT(
            'CREATE TABLE %I PARTITION OF metrics FOR VALUES FROM (%L) TO (%L)',
            partition_name, start_date, end_date
        );

        EXECUTE FORMAT(
            'CREATE INDEX %I ON %I (device_id, metric_time)',
            'idx_' || partition_name || '_device_time', partition_name
        );

        RAISE NOTICE 'Created partition % for date range % to %', partition_name, start_date, end_date;
    END IF;

    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Create the trigger
CREATE TRIGGER create_metrics_partition_trigger
BEFORE INSERT ON metrics
FOR EACH ROW EXECUTE FUNCTION create_metrics_partition();

Security Best Practices

Principle of Least Privilege

-- Create role with minimal permissions
CREATE ROLE app_readonly;
GRANT CONNECT ON DATABASE mydb TO app_readonly;
GRANT USAGE ON SCHEMA public TO app_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO app_readonly;

-- For future tables
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO app_readonly;

-- Create application-specific role with specific permissions
CREATE ROLE order_processor;
GRANT CONNECT ON DATABASE mydb TO order_processor;
GRANT USAGE ON SCHEMA public TO order_processor;

-- Grant specific permissions
GRANT SELECT ON customers, products TO order_processor;
GRANT SELECT, INSERT, UPDATE ON orders, order_items TO order_processor;
GRANT USAGE, SELECT ON SEQUENCE orders_order_id_seq TO order_processor;

Connection Security

# In pg_hba.conf
# Require SSL for all connections
hostssl    all             all             0.0.0.0/0               scram-sha-256
host       all             all             127.0.0.1/32            scram-sha-256

# In postgresql.conf
ssl = on
ssl_cert_file = 'server.crt'
ssl_key_file = 'server.key'
ssl_ciphers = 'HIGH:!aNULL'
password_encryption = scram-sha-256

Sensitive Data Protection

-- Create encrypted columns using pgcrypto
CREATE EXTENSION IF NOT EXISTS pgcrypto;

CREATE TABLE user_data (
    user_id INTEGER PRIMARY KEY,
    username VARCHAR(50) NOT NULL,
    -- Store encrypted sensitive data
    ssn_encrypted BYTEA,
    credit_card_encrypted BYTEA
);

-- Insert with encryption
INSERT INTO user_data (user_id, username, ssn_encrypted, credit_card_encrypted)
VALUES (
    1,
    'johndoe',
    pgp_sym_encrypt('123-45-6789', 'encryption_key'),
    pgp_sym_encrypt('4111-1111-1111-1111', 'encryption_key')
);

-- Query with decryption
SELECT
    user_id,
    username,
    pgp_sym_decrypt(ssn_encrypted, 'encryption_key') AS ssn,
    pgp_sym_decrypt(credit_card_encrypted, 'encryption_key') AS credit_card
FROM user_data
WHERE user_id = 1;

Tools and Extensions Ecosystem

Essential Extensions

PostGIS for Geospatial Data

-- Enable PostGIS
CREATE EXTENSION postgis;

-- Create a table with geometry
CREATE TABLE locations (
    location_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    address TEXT,
    geom GEOMETRY(Point, 4326)  -- WGS84 coordinate system
);

-- Insert point data
INSERT INTO locations (name, address, geom)
VALUES (
    'Eiffel Tower',
    'Champ de Mars, 5 Avenue Anatole France, 75007 Paris, France',
    ST_SetSRID(ST_MakePoint(2.2945, 48.8584), 4326)
);

-- Query by distance
SELECT
    name,
    address,
    ST_Distance(
        geom,
        ST_SetSRID(ST_MakePoint(2.3522, 48.8566), 4326)  -- Paris center
    ) * 111000 AS distance_meters  -- Approximate conversion to meters
FROM locations
ORDER BY geom <-> ST_SetSRID(ST_MakePoint(2.3522, 48.8566), 4326)
LIMIT 10;

-- Find locations within a radius
SELECT name, address
FROM locations
WHERE ST_DWithin(
    geom,
    ST_SetSRID(ST_MakePoint(2.3522, 48.8566), 4326),
    0.01  -- Approximately 1km
);

pgcrypto for Encryption

-- Enable pgcrypto
CREATE EXTENSION pgcrypto;

-- Hash passwords
SELECT crypt('my_password', gen_salt('bf'));

-- Verify passwords
SELECT (
    crypt('entered_password', stored_hash) = stored_hash
) AS password_match;

-- Symmetric encryption
SELECT pgp_sym_encrypt('sensitive data', 'encryption_key');
SELECT pgp_sym_decrypt(encrypted_data, 'encryption_key');

-- Public key encryption
SELECT pgp_pub_encrypt('sensitive data', dearmor('public key here'));
SELECT pgp_pub_decrypt(encrypted_data, dearmor('private key here'), 'key password');

pg_stat_statements for Query Analysis

-- Enable the extension
CREATE EXTENSION pg_stat_statements;

-- Configure in postgresql.conf
# pg_stat_statements.max = 10000
# pg_stat_statements.track = all

-- Query for most time-consuming queries
SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    stddev_exec_time,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

-- Reset statistics
SELECT pg_stat_statements_reset();

hstore for Key-Value Storage

-- Enable hstore
CREATE EXTENSION hstore;

-- Create a table with hstore
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    attributes hstore
);

-- Insert with hstore
INSERT INTO products (name, attributes)
VALUES (
    'Smartphone X',
    'color => "black", storage => "128GB", ram => "8GB", screen => "6.1 inch"'
);

-- Query hstore data
SELECT name, attributes -> 'color' AS color
FROM products
WHERE attributes ? 'ram' AND attributes @> 'storage => "128GB"';

-- Update hstore values
UPDATE products
SET attributes = attributes || 'camera => "12MP"'::hstore
WHERE product_id = 1;

-- Get all keys and values
SELECT name, skeys(attributes) AS attribute, svals(attributes) AS value
FROM products;

Administration Tools

pgAdmin

pgAdmin is a comprehensive GUI administration tool for PostgreSQL:

  • Database object management
  • SQL query tool with visual explain
  • Server monitoring
  • Backup and restore interface
  • User and role management
  • Scheduled tasks

pg_dump and pg_restore

# Dump a database in custom format
pg_dump -Fc -f backup.dump mydb

# Dump schema only
pg_dump -s -f schema.sql mydb

# Dump data only
pg_dump -a -f data.sql mydb

# Restore a database
pg_restore -d mydb_new backup.dump

# Restore only specific tables
pg_restore -d mydb_new -t customers -t orders backup.dump

psql Command-Line Utility

# Connect to a database
psql -h localhost -U myuser -d mydb

# Common psql commands
\l                  # List databases
\c dbname           # Connect to database
\dt                 # List tables
\d tablename        # Describe table
\du                 # List roles
\timing             # Toggle query timing
\e                  # Edit in external editor
\i file.sql         # Execute SQL from file
\copy               # Import/export CSV data
\?                  # Help on psql commands
\q                  # Quit

Connection Poolers

PgBouncer is a lightweight connection pooler:

  • Low memory footprint
  • Supports connection pooling modes:
    • Session pooling
    • Transaction pooling
    • Statement pooling
  • Handles connection failover

Pgpool-II provides additional features:

  • Connection pooling
  • Load balancing
  • Automatic failover
  • Query caching
  • Parallel query execution

Monitoring Solutions

pg_stat_* Views

PostgreSQL provides built-in statistics views:

-- Database statistics
SELECT * FROM pg_stat_database WHERE datname = current_database();

-- Table statistics
SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
ORDER BY seq_scan DESC;

-- Index statistics
SELECT schemaname, relname, indexrelname, idx_scan, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Activity monitoring
SELECT pid, usename, application_name, client_addr, state, query_start, query
FROM pg_stat_activity
WHERE state != 'idle';

-- Lock monitoring
SELECT locktype, relation::regclass, mode, granted
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE relation IS NOT NULL;

Conclusion

PostgreSQL stands as one of the most powerful, flexible, and reliable database systems available today. Its combination of SQL standard compliance, advanced features, and extensibility makes it suitable for a wide range of applications, from simple CRUD operations to complex analytical workloads.

As a software engineer new to SQL, you now have a comprehensive foundation in PostgreSQL that covers:

  1. Core SQL concepts and how to manipulate data effectively
  2. Advanced querying techniques to solve complex data problems
  3. PostgreSQL-specific features that extend beyond standard SQL
  4. Performance optimization strategies to ensure your database runs efficiently
  5. Security best practices to protect your data
  6. Operational knowledge for maintaining healthy PostgreSQL deployments

The journey to PostgreSQL mastery is ongoing, as the database continues to evolve with each new release. The PostgreSQL community is active and supportive, with extensive documentation, forums, and resources available to help you solve specific challenges.

Remember that database design decisions have long-lasting impacts on application performance and maintainability. Take time to properly model your data, choose appropriate indexes, and optimize your queries. Regular monitoring and maintenance will ensure your PostgreSQL databases continue to perform optimally as your applications grow.

By leveraging PostgreSQL's rich feature set and following the best practices outlined in this guide, you're well-equipped to build robust, efficient, and scalable database-backed applications.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment