Skip to content

Instantly share code, notes, and snippets.

@jbollman7
Created January 23, 2023 19:57
Show Gist options
  • Save jbollman7/b00cfd901948eaa8a461a16f0f8b65d8 to your computer and use it in GitHub Desktop.
Save jbollman7/b00cfd901948eaa8a461a16f0f8b65d8 to your computer and use it in GitHub Desktop.
SQL Server- Scaling - performance

Design for Scale

Introduction

  • Know your workloads, and apply correct db features ie columnstore index is great for warehouse, but would be terrible for transaction database.

Most apps break into two categories

  • OLTP Online transaction processing (ERP, sales website) - Lots of writing, want transactions to finsih quickly.
  • Online analytical processiong (OLAP) - Data warehouse
    • high cpu

Wait time for blocking actions are unpredictable. ie if a read is taking place, the write cannot proceed until block is finish. --Solution-- Read committed snapshot isolation-- This will prevent readers from blocking writers.

OLTP (write fast, reading afterthought)

  • Minimize indexes, except indexes for updates and deletes
  • Keep transactions small and avoid use of TRIGGERS, which slow down transactip speed.

What slows down OLTP Apps

  • Blocking
  • Latching - Lightweight locks placed on pages in SQL bugger pool (RAM).
  • Writing to the transaction log.
  • Ultimately CPU - rare for CPU to be bottleneck after obvious optimizations that took place.

OLTP Ultimate Design: Indexes

  • Minimize indexes to bare minimum needs for updates and delete
    • Only use a clustered index on each table -- Use a narrow key like an int.
  • Additional nonclustered indexes should only support UPDATE and DELETE operations.

Cluster Indexes vs Uncluster indexes

Cluster indexes are...

  • Only time rows are sorted is when table contains a clustered index.
  • Only one index(column) per table, because the rows can be stored in only one order.
  • Auto created when you add a new column to a table.
  • Cannot modify existing clustered index.
  • PRIMARY KEY auto created a clustered index.
  • Unique, only one row can exist in each cluster with an index on the table. Nonclustered indexes are...
  • When using UNIQUE constriant, auto creates a non clustered index

Indexes are auto created when PRIMARY KEY and UNIQUE constraints are

defined on table columns

OLTP Ultimate Design: Transaction Log

  • Transaction log should be on a very fast storage volume
    • SQL Server 2016 up allows putting the tail of the transaction log on persistemnt memory
  • Use SSDs, not spinning disks.

For High=volume apps

  • consider using in-memory OLTP. Eliminate bottlenecks but is more cplicated.
  • Latchless ddata structure that uses row versioning to provide consistency.
    • NO locking
  • No writing transactions to transaction log - bit risky.

OLAP Applications

  • OLAP apps need to support large-scale read operations
  • Also support bulk-loading operations.
  • Users may issue complex aggregation (ad-hoc) queries that have large impact on performance.

When thinking OLAP Performance think Columnstore indexes

Archiving Data

  • Many apps fail to consider long-term data storage reqs
  • This negatively impacts perf and maintenance and adds expense to supporitng apps.

Archiving Data Solutions

  • Dumping data into flatfiles
  • Moving files to another database

Archiving Data SQL Server specific Solutions

  • Stretch DB - Rows moved to azure database, based on predicate value
    • Move all sales records from 2017 older would be slowly moved over automatically to Azure. Then a query would be provided and seamless
    • Major benefit is you did not need to change ANY app code.
    • Downside to Stretch DB? - Extremely expensive 3k a month starting and will go up from there\
  • Temporal tables -not strictly archival feature
    • Allows you track data change over time, change record is a history table, and keeps a record of all data changes over time.
  • Data partitioning
    • Split tables - Archive older records in their own tables.
    • When archiving out sub-tables, we dont have to do I/O. we do a metadata operatio
    • No rewriting of data --Partition switching
  • Design these features early!! Hard to do after the fact. Archiving should be a consideration when designing the schema.

Features That help App SCALE

In-memory OLTP Overview

  • Perfect if you have insert performance issues.
  • 100x performance vs on disk.
  • Uses row versioning ot eliminale all latching activity.
  • You can use non durable tables, this bypasses the transaction log
    • Good use case is staging env. You want durable tables in prod for obvious reasons.
  • Native compiled Stored procedures
    • Eliminate runtime compilation overhead.
    • Queries do not need to be compiled before they're executed against the engine.
  • working set stored entirely in memory, still accessible via T-SQL.
    • Fundamentally different than the "DBCC PINTABLE" feature in earlier SQL Server versions.

Columnstore overview

  • You Really should use Columnstore for all OLAP tables
    • Best on fat tables (lots of columns)
    • Or table with millions of records(rows)
    • Tables that are frequently inserted updated or deleted, you Wont get great performance with columnstore.
      • At least if you are insert/update/delete a single record at a time, little performance gain.
  • Makes aggregations 10-100x faster
  • Gives great compression
  • Excellent bulk loading performance.

Alway On Availability groups: Read-only routing

  • Reads are easier to scale than writes are.
  • Make multiple copies of your data, allow users to access the copies
  • Although, routing users to those copies of your data is challenging. 2012 R2 and up.
  • Takeaway have HA Availability Groups of Read-only nodes. If you have to do a bulk insert, it wont block the readonly for other users.

Partitioning and compression

  • Partitioning splits one large table into multiple tables.
  • Should be thought of as a management feature, and secondly as a performance feature.
  • You can
    • Load data into empty partition
    • Switch that partition into your primary table
    • MetaData operation
      • Much shorter lock on primary table
      • Much less I/O
  • Compression
    • Higher CPU for less IOPS and less memory
    • Almost every workload will benefit from data compression
      • IO is expensive operation compared to CPU cycles
    • Row Compression & Page compression
      • Row Compression makes fixed with colum the same as variable width
        • char(15) => varchar(15)
        • Extremely cheap in cpu - SAP recommends all tables be row compressed
        • Low compression ratio
      • Page compression
        • High Compression ratio.
        • deduplication
        • Uses more CPU, don't use if you are close to maxing out your CPU.
      • You can compress granually, to get more performance

Indexing strategies

  • Cluster indexes are tables organized by the cluster key
  • All of your tables should have a clustered index
    • Any query that calls a sort, its much more efficient to have a clustered index.
    • Notable exception data warehouse (OLAP) staging table.
  • Your tables should have some nonclustered indexes
    • Columns in your WHERE clause.
  • Index is more of an art than a science.
  • When testing queries, you need to look at execution plans
    • SQL Server issues a warning if an index would have helped the query performance.
    • See if the indexes you do have are being used in execution plan.
  • Review missing Index Dynamic Managemeng Reviews (DMVs)
  • THere is some built in scripts you can find that tells you
    • All columns that would benefit of adding an index
    • All indexes, and the Read/Writes for them.
      • IE if you have an index that is all writes, you want to remove the index because you get no perf boost

Building a Tenacy Model

Tenacy models

  • All tenants in a single database
    • Single set of tables.
  • Sharded - partitioned data across databases
    • Horizontal scale
    • Hardest to implement.
  • Database per tenant
    • No noisy neighbor problem

Multitenant app with 1 db for all tenants

  • Good pattern if you have large number of customers/consumers
  • Row level security to manage reporting and access
  • if customer interaction is low, this is good
    • NOT a good design pattern if you are managing the db as a managed service.
      • It would be really hard to do a point restore for a single tenant
        • Grabbing the single set of rows to restore, this would have hughe impacts on other customers, high level of error, etc.
    • Limitations
      • Costly - scale up is an option but you will run out of scale.
    • Simplest but is not always best

Multitenant app with sharded multitenant databases Sharding

  • Best pattern for horizontal scale.
  • Sharding Tools
    • Elastic Scale toolkit in VS
      • Allows splitting and merging of shards.

Database per tenant: Single-Tenancy

  • Provision a databaser for each new customer.
  • Highest level of seperation
  • Good for B2B applications
  • Reduces security concerns for mixed-data workloads.
    • Dont need to worry about row-level security.
  • Cannot Scale to to many dbs.
    • Cloud can help with auto backup, HA etc.
    • Still keeping creds and managment of 2k+ dbs would be challenging.

Azure Design Patters for Scale

Caching layer

  • Will require code changes in the app. Scenario:
    • App needs data,
    • App asks Cache
      • If data is found, this is a HIT
      • No connection to database/ no query
      • big performance
    • APp asks Cache and data wasnt there
      • Query is made to DB
      • Data is returned
      • Data is stored in Cache
      • ALl subsequent calls will find the data in cache, until it expires.
        • Big performance gain
    • Best used for apps that have more Reads that writes
  • Cache tiers can scale out
    • No state to worry about like dbs
    • Add nodes to build it out.
    • Geting cache performance is cheapter than database performance.
    • Pattern is not great with heavy insert/update
      • Because these requests have to go to the Database to perform the operation.

Scale-out options in Azure

  • Many small things are normally chepater than one really big thing in the cloud
  • Azure SQL DB offers
    • Elastic pools
      • Share resources across many dbs. (Memory, cpu, IOPS)
    • Data sync - replicate small amount of data
    • No dynamic scale

Design anti-patterns

  • Triggers
  • Long transactions
    • If transactions are longer than microseconds, blocking is a real concern.
  • Using SELECT *
  • Not using clustered indexes (put it on PK) helps
  • Not using pooled connection
    • Has several connections open. Call can use one of these open connections. versus explicit connection with cpu/mempry/iops overhead
  • Row based operations
    • Cursors, While loops, scalar valued functions are processed row by row.
    • Scales linearly
    • Alternatives to using these row based operations
      • Temp table to help do a join
      • Use Common Table Expression CTE
      • Dynamc Sql - stored procedures without parameters.
        • Like running ad-hoc queries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment