- Know your workloads, and apply correct db features ie columnstore index is great for warehouse, but would be terrible for transaction database.
Most apps break into two categories
- OLTP Online transaction processing (ERP, sales website) - Lots of writing, want transactions to finsih quickly.
- Online analytical processiong (OLAP) - Data warehouse
- high cpu
Wait time for blocking actions are unpredictable. ie if a read is taking place, the write cannot proceed until block is finish. --Solution-- Read committed snapshot isolation-- This will prevent readers from blocking writers.
OLTP (write fast, reading afterthought)
- Minimize indexes, except indexes for updates and deletes
- Keep transactions small and avoid use of TRIGGERS, which slow down transactip speed.
- Blocking
- Latching - Lightweight locks placed on pages in SQL bugger pool (RAM).
- Writing to the transaction log.
- Ultimately CPU - rare for CPU to be bottleneck after obvious optimizations that took place.
- Minimize indexes to bare minimum needs for updates and delete
- Only use a clustered index on each table -- Use a narrow key like an int.
- Additional nonclustered indexes should only support UPDATE and DELETE operations.
Cluster indexes are...
- Only time rows are sorted is when table contains a clustered index.
- Only one index(column) per table, because the rows can be stored in only one order.
- Auto created when you add a new column to a table.
- Cannot modify existing clustered index.
- PRIMARY KEY auto created a clustered index.
- Unique, only one row can exist in each cluster with an index on the table. Nonclustered indexes are...
- When using UNIQUE constriant, auto creates a non clustered index
defined on table columns
- Transaction log should be on a very fast storage volume
- SQL Server 2016 up allows putting the tail of the transaction log on persistemnt memory
- Use SSDs, not spinning disks.
- consider using in-memory OLTP. Eliminate bottlenecks but is more cplicated.
- Latchless ddata structure that uses row versioning to provide consistency.
- NO locking
- No writing transactions to transaction log - bit risky.
- OLAP apps need to support large-scale read operations
- Also support bulk-loading operations.
- Users may issue complex aggregation (ad-hoc) queries that have large impact on performance.
When thinking OLAP Performance think Columnstore indexes
- Many apps fail to consider long-term data storage reqs
- This negatively impacts perf and maintenance and adds expense to supporitng apps.
- Dumping data into flatfiles
- Moving files to another database
- Stretch DB - Rows moved to azure database, based on predicate value
- Move all sales records from 2017 older would be slowly moved over automatically to Azure. Then a query would be provided and seamless
- Major benefit is you did not need to change ANY app code.
- Downside to Stretch DB? - Extremely expensive 3k a month starting and will go up from there\
- Temporal tables -not strictly archival feature
- Allows you track data change over time, change record is a history table, and keeps a record of all data changes over time.
- Data partitioning
- Split tables - Archive older records in their own tables.
- When archiving out sub-tables, we dont have to do I/O. we do a metadata operatio
- No rewriting of data --Partition switching
- Design these features early!! Hard to do after the fact. Archiving should be a consideration when designing the schema.
- Perfect if you have insert performance issues.
- 100x performance vs on disk.
- Uses row versioning ot eliminale all latching activity.
- You can use non durable tables, this bypasses the transaction log
- Good use case is staging env. You want durable tables in prod for obvious reasons.
- Native compiled Stored procedures
- Eliminate runtime compilation overhead.
- Queries do not need to be compiled before they're executed against the engine.
- working set stored entirely in memory, still accessible via T-SQL.
- Fundamentally different than the "DBCC PINTABLE" feature in earlier SQL Server versions.
- You Really should use Columnstore for all OLAP tables
- Best on fat tables (lots of columns)
- Or table with millions of records(rows)
- Tables that are frequently inserted updated or deleted, you Wont get great performance with columnstore.
- At least if you are insert/update/delete a single record at a time, little performance gain.
- Makes aggregations 10-100x faster
- Gives great compression
- Excellent bulk loading performance.
- Reads are easier to scale than writes are.
- Make multiple copies of your data, allow users to access the copies
- Although, routing users to those copies of your data is challenging. 2012 R2 and up.
- Takeaway have HA Availability Groups of Read-only nodes. If you have to do a bulk insert, it wont block the readonly for other users.
- Partitioning splits one large table into multiple tables.
- Should be thought of as a management feature, and secondly as a performance feature.
- You can
- Load data into empty partition
- Switch that partition into your primary table
- MetaData operation
- Much shorter lock on primary table
- Much less I/O
- Compression
- Higher CPU for less IOPS and less memory
- Almost every workload will benefit from data compression
- IO is expensive operation compared to CPU cycles
- Row Compression & Page compression
- Row Compression makes fixed with colum the same as variable width
- char(15) => varchar(15)
- Extremely cheap in cpu - SAP recommends all tables be row compressed
- Low compression ratio
- Page compression
- High Compression ratio.
- deduplication
- Uses more CPU, don't use if you are close to maxing out your CPU.
- You can compress granually, to get more performance
- Row Compression makes fixed with colum the same as variable width
- Cluster indexes are tables organized by the cluster key
- All of your tables should have a clustered index
- Any query that calls a sort, its much more efficient to have a clustered index.
- Notable exception data warehouse (OLAP) staging table.
- Your tables should have some nonclustered indexes
- Columns in your WHERE clause.
- Index is more of an art than a science.
- When testing queries, you need to look at execution plans
- SQL Server issues a warning if an index would have helped the query performance.
- See if the indexes you do have are being used in execution plan.
- Review missing Index Dynamic Managemeng Reviews (DMVs)
- THere is some built in scripts you can find that tells you
- All columns that would benefit of adding an index
- All indexes, and the Read/Writes for them.
- IE if you have an index that is all writes, you want to remove the index because you get no perf boost
- All tenants in a single database
- Single set of tables.
- Sharded - partitioned data across databases
- Horizontal scale
- Hardest to implement.
- Database per tenant
- No noisy neighbor problem
- Good pattern if you have large number of customers/consumers
- Row level security to manage reporting and access
- if customer interaction is low, this is good
- NOT a good design pattern if you are managing the db as a managed service.
- It would be really hard to do a point restore for a single tenant
- Grabbing the single set of rows to restore, this would have hughe impacts on other customers, high level of error, etc.
- It would be really hard to do a point restore for a single tenant
- Limitations
- Costly - scale up is an option but you will run out of scale.
- Simplest but is not always best
- NOT a good design pattern if you are managing the db as a managed service.
- Best pattern for horizontal scale.
- Sharding Tools
- Elastic Scale toolkit in VS
- Allows splitting and merging of shards.
- Elastic Scale toolkit in VS
- Provision a databaser for each new customer.
- Highest level of seperation
- Good for B2B applications
- Reduces security concerns for mixed-data workloads.
- Dont need to worry about row-level security.
- Cannot Scale to to many dbs.
- Cloud can help with auto backup, HA etc.
- Still keeping creds and managment of 2k+ dbs would be challenging.
- Will require code changes in the app.
Scenario:
- App needs data,
- App asks Cache
- If data is found, this is a HIT
- No connection to database/ no query
- big performance
- APp asks Cache and data wasnt there
- Query is made to DB
- Data is returned
- Data is stored in Cache
- ALl subsequent calls will find the data in cache, until it expires.
- Big performance gain
- Best used for apps that have more Reads that writes
- Cache tiers can scale out
- No state to worry about like dbs
- Add nodes to build it out.
- Geting cache performance is cheapter than database performance.
- Pattern is not great with heavy insert/update
- Because these requests have to go to the Database to perform the operation.
- Many small things are normally chepater than one really big thing in the cloud
- Azure SQL DB offers
- Elastic pools
- Share resources across many dbs. (Memory, cpu, IOPS)
- Data sync - replicate small amount of data
- No dynamic scale
- Elastic pools
- Triggers
- Long transactions
- If transactions are longer than microseconds, blocking is a real concern.
- Using SELECT *
- Not using clustered indexes (put it on PK) helps
- Not using pooled connection
- Has several connections open. Call can use one of these open connections. versus explicit connection with cpu/mempry/iops overhead
- Row based operations
- Cursors, While loops, scalar valued functions are processed row by row.
- Scales linearly
- Alternatives to using these row based operations
- Temp table to help do a join
- Use Common Table Expression CTE
- Dynamc Sql - stored procedures without parameters.
- Like running ad-hoc queries