dharamsk’s gists

dharamsk / load.py

Created November 21, 2024 17:27

Florida corporate data scrape


	# portal page: https://dos.fl.gov/sunbiz/other-services/data-downloads/corporate-data-file/
	# login to SFTP: https://sftp.floridados.gov/ (login creds in webpage above)
	# download contents of /doc/cor/
	# run this script in that directory

	# paste schema from webpage into LLM and ask for it in a structured format
	# https://dos.fl.gov/sunbiz/other-services/data-downloads/corporate-data-file/file-structure/

	FIELD_SPECS = [

dharamsk / tf_big_plans.py

Created April 23, 2020 20:38

Display relevant terraform policy diffs, omitting redundant items

	#!/usr/bin/env python3

	"""
	This python script improves the usability of terraform 0.12 by
	eliminating the display of redundant changes, typically found in
	resource attributes like policy/policy_data for AWS/GCP providers.

	This is a known limitation with the legacy terraform SDK, as described here:
	https://github.com/hashicorp/terraform/issues/21901

dharamsk / tf_big_plans.py

Created April 23, 2020 20:38

Display relevant terraform policy diffs, omitting redundant items

	#!/usr/bin/env python3

	"""
	This python script improves the usability of terraform 0.12 by
	eliminating the display of redundant changes, typically found in
	resource attributes like policy/policy_data for AWS/GCP providers.

	This is a known limitation with the legacy terraform SDK, as described here:
	https://github.com/hashicorp/terraform/issues/21901

dharamsk / bigquery_relax_schema_on_all_tables.py

Created April 3, 2020 18:41

This python snippet was written to modify all schemas in a dataset to "relax" all columns that were REQUIRED to be NULLABLE. In this case, I applied it only to table that were modified in the last 24 hours, however this could be modified to do other useful operations on all tables in a dataset.

	# for all tables modified in the last 24 hours
	# relax all columns to be NULLABLE instead of REQUIRED
	# Python3

	from google.cloud import bigquery
	from datetime import datetime, timedelta

	CLIENT = bigquery.Client() # auth using default credentials/project

	DATASET = 'your_dataset'

dharamsk / alter_table_attributes.py

Created October 17, 2018 05:40

Redshift: Programmatically configure dist style and sort key of *existing* tables

	# I wrote this script to alter 700 redshift tables to diststyle all (and remove sort keys)
	# but it is partially setup to specify any dist style and sort key on a table by table basis
	# all that's needed is to modify the main() function to accept a dict with the table configs

	# Redshift clusters with large node types will waste disk space and network bandwidth
	# when small tables use EVEN or DISTKEY dist styles
	# sort keys will double the minimum size of a table, also wasting space
	# see here for more on minimum table size calculation:
	# https://aws.amazon.com/premiumsupport/knowledge-center/redshift-cluster-storage-space/

dharamsk / debug_avro_schemas.py

Created April 14, 2018 00:33

Avro Debugger Script

	import random
	import argparse
	import mock
	from where_ever.stream_handler import *
	from where_ever.tests.test_stream_handler import *



	"""
	This will check an avro schema against an example data record

Dharam dharamsk