A bash script to list or delete delete-markers in a versioning-enabled Amazon S3 bucket. Deleting delete-markers restores the deleted object.
This script is useful for identifying, restoring & validating accidental deletes on Amazon S3.
Step 1) Define below env variables and aliases in the ~/.bash_profile OR ~/.bashrc:
export SPARK_CONF_DIR='/Users/dixitm/Workspace/conf/spark-conf-dir'
# DATA_PLATFORM_ROOT: Local root dir where spark catalog & metastore is setup
export DATA_PLATFORM_ROOT="/Users/dixitm/Workspace/data/local-data-platform"
References:
Use below commands to install pyenv
$ sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
Use the below steps to run the elastic search container
Create a spark application written in Scala or PySpark that reads in the provided signals dataset, processes the data, and stores the entire output as specified below.
For each entity_id in the signals dataset, find the item_id with the oldest and newest month_id.In some cases it may be the same item. If there are 2 different items with the same month_id then take the item with the lower item_id. Finally sum the count of signals for each entity and output as the total_signals. The correct output should contain 1 row per unique entity_id.
import json | |
import boto3 as boto3 | |
import os | |
dynamodb = boto3.resource('dynamodb') | |
def truncate_table(table_name): | |
table = dynamodb.Table(table_name) | |
scan_kwargs = { |