-
keys
- partition key: https://stackoverflow.com/a/45581869/5733330
- sort key:
-
provisioned throughput
-
Overview
- RCU (Read Capacity Units) & WCU (Write Capacity Units)
- Tables must have provisioned read and write capacity units
- Read Capacity Units (RCU): throughput for reads
- Write Capacity Units (WCU) : throughput for writes
- Option to setup auto-scaling of throughput to meet demand
- Throughput can be exceeded temporarily using "burst credit"
- If burst credit are empty, you'll get a "ProvisionedThroughputException"
- It's then advised to do an exponential back-off retry
-
Write Capacity Units
- One write capacity unit represents one write per second for an item up to 1 KB in size
- If the items are larger than 1 KB, more WCU are consumed
Example: we write 10 objects per second of 2 KB each.
- We need 2*10 = 20 WCU
-
Strongly Consistent Read vs Eventually Consistent Read
-
Eventually Consistent Read: If we read just after a write, it's possible we'll get unexpected response because of replication
-
Strongly Consistent Read: If we read just after a write, we'll get the correct data.
-
By default: DynamoDb uses Eventually Consistent Reads, but GetItem, Query & Scan provide a "ConsistentRead" parameter that you can set to True.
-
-
Read Capacity Units
- One read capacity unit represents one strongly consistent read per second, or 2 eventually consistent reads per second, for an item upto 4 KB in size.
-
-
Basic APIs
-
Writing Data
- PutItem: Write data to DynamoDb (create data or full replace)
- UpdateItem - Update data in DynamoDb (partial update of attributes)
- Atomic counters: You can use the UpdateItem operation to implement an atomic counter — a numeric attribute that is incremented, unconditionally, without interfering with other write requests.
- Conditional Writes: Accept a write / update only if conditions are respected, otherwise reject.
-
DeleteItem
- Delete an individual row
- Ability to perform a conditional delete
-
DeleteTable
- Delete a whole table and all its items
- Much quicker deletion than calling DeleteItem on all items.
-
Batching Writes
-
BatchWriteItem
- Up to 25 PutItem and / or DeleteItem in one call
- Up to 16 MB of data written
- Up to 400 KB of data per item
-
Batching allows you to save in latency by reducing the number of API calls done against DynamoDb
-
Operations are done in parallel for better efficiency
-
It's possible for part of a batch to fail, in which case we have the try the failed items (using exponential back-off algorithm)
-
-
Reading Data
-
GetItem
- Read based on Primary key
- Primary Key = HASH or HASH-RANGE
- Eventually consistent read by default
- Option to use strongly consistent reads (more RCU - might take longer)
ProjectionExpression
can be specified to include only certain attributes
-
BatchGetItem
- Up to 100 items
- Up to 16 MB of data
- Items are retrieved in parallel to minimize latency
-
-
Query
-
Query returns items based on:
- PartitionKey value (must be "=" operator)
- SortKey value (=, <, <=, >, >=, Between, Begin) - optional
FilterExpression
to further filter (client side filtering)
-
Returns:
- Up to 1 MB of data
- Or number of items specified in
Limit
-
Able to do pagination on the results.
-
Can query table, a local secondary index, or a global secondary index.
-
-
Scan
- Scan the entire table and then filter out data (inefficient)
- Returns up to 1 MB of data - use pagination to keep on reading
- Consumes a lot of RCU
- Limit impact using
Limit
or reduce the size of the result and pause - For faster performance, use
parallel scans
:- Multiple instances scan multiple partitions at the same time
- Increases the throughput and RCU consumed
- Limit the impact of parallel scans just like you would for Scans
- Can use a
ProjectionExpression + FilterExpression
(no change to RCU)
-
-
Indexes (GSI + LSI)
- LSI: Local Secondary Indexes still rely on the original Hash Key. When you supply a table with hash+range, think about the LSI as hash+range1, hash+range2.. hash+range6. You get 5 more range attributes to query on. Also, there is only one provisioned throughput.
- GSI: Global Secondary Indexes defines a new paradigm - different hash/range keys per index. This breaks the original usage of one hash key per table. This is also why when defining GSI you are required to add a provisioned throughput per index and pay for it.
-
DynamoDb Concurrency
- DynamoDb has a feature called "Conditional Update / Delete"
- That means that you can ensure an item hasn't changed before altering it.
- That makes DynamoDb an optimistic locking / concurrency database
-
DynamoDb TTL (Time to Live)
- TTL = automatically delete an item after an expiry date / time
- TTL is provided at no extra cost, deletions do not use WCU / RCU
- TTL is a background task operated by the DynamoDb service itself
- Helps reduce storage and manage the table size over time
- Helps adhere to regulatory norms
- TTL is enabled per row (you define a TTL column, and add a date there)
- DynamoDb typically deletes expired items within 48 hours of expiration
- Deleted items due to TTL are also deleted in GSI / LSI
- DynamoDb Streams can help recover expired items.
-
DynamoDb CLI - Good to KNow
-
--projection-expression
: attributes to receive -
--filter-expression
: filter results -
General CLI pagination options including DynamoDb / S3:
- Optimization:
--page-size
: full dataset is still received but each API call will request less data (helps avoid timeout)
- Pagination:
--max-items
: max number of results returned by the CLI. Returns NextToken--starting-token
: specify the last received NextToken to keep on reading.
- Optimization:
-
-
DynamoDb Transactions
- Transaction = Ability to Create / Update / Delete multiple rows in different tables at the same time.
- It's an "all or nothing" type of operation.
- Write Modes: Standard, Transactional
- Read Modes: Eventual Consistency, Strong Consistency, Transactional
- Consume 2x of WCU / RCU
-
DynamoDb as Session State Cache
- It's common to use DynamoDb to store session state.
- vs ElastiCache:
- ElastiCache is in-memory, but DynamoDb is serverless
- Both are key / value stores
- vs EFS:
- EFS must be attached to EC2 instances as a network drive
- vs EBS & Instance Store:
- EBS & Instance Store can only be used for local caching, not shared caching.
- vs S3:
- S3 is higher latency, and not meant for small objects.
-
DynamoDb Write Sharding
- Imagine we have a voting application with 2 candidates, candidate A and candidate B.
- If we use a partition key of candidate_id, we will run into partition issues, as we only have 2 partitions.
- Solution: add a suffix (usually random suffic, sometimes calculated suffix).
-
DynamoDb - Write Types
- Concurrent Writes
- Conditional Writes
- Atomic Writes
-
DynamoDb - Large Objects Pattern
-
DynamoDb Operations
-
Table Cleanup
- Option 1: Scan + Delete = very slow, expensive, consumes RCU & WCU
- Option 2: Drop Table + Recreate Table = fast, cheap, efficient
-
Copying a DynamoDb Table:
- Option 1: Use AWS DataPipeline (uses EMR)
- Option 2: Create a backup and restore the backup into a new table name (can take some time)
- Option 3: Scan + Write => write own code
-
-
DynamoDb - Security & Other features
- Security:
- VPC Endpoints available to access DynamoDb without internet
- Access fully controlled by IAM
- Encryption at rest using KMS
- Encryption in transit using SSL / TLS
- Backup and Restore feature available
- Point in time restore like RDS
- No performance impact
- Global Tables
- Multi-region, fully replicated, high performance
- Amazon DMS can be used to migrate to DynamoDb (from Mongo, Oracle, MySQL, S3, etc...)
- You can launch a local DynamoDb on your computer for development purposes.
- Security:
Created
March 5, 2021 12:26
-
-
Save nikkaroraa/805dbe203040494ebe5a4c29cff1e2cd to your computer and use it in GitHub Desktop.
AWS DynamoDb
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment