Skip to content

Instantly share code, notes, and snippets.

@arianagiorgi
arianagiorgi / README.md
Created December 21, 2017 21:47
Uploading Large CSVs in Postgres

Uploading Large CSVs in Postgres

When importing a large data csv file into postgres, sometimes importing with csvkit alone will stall, and it's useful to use csvkit to build the table and the postgres COPY command to get the rest of the data in there.

You will need to pip install csvkit as well as psycopg2, preferably in a virtualenv.

  1. First we'll make the CREATE TABLE statement and pipe it into a sql file. This example uses the first 100 lines of the file to determine the data type of each field, but this can be adjusted as needed.
  • csvfile.csv = large data file
  • yourtable = table you want to create
  • outputfile.sql = file which will contain CREATE TABLE statement
@arianagiorgi
arianagiorgi / README.md
Last active June 27, 2017 19:43
AWS Lambda Automation

Automation with AWS Lambda

Setup Python-lambda

We'll be using Python-lambda

  1. Create new directory, virtualenv. If you're attaching this scraper to an interactive, you can just add a new folder to the main directory.

  2. (venv) $ pip install python-lambda

@arianagiorgi
arianagiorgi / README.md
Last active July 27, 2018 21:41
PostGIS Conversion

PostGIS Conversion

Shapefile > PostGIS

Easiest way to do this is through shp2pgsql.

If you're uploading to a database that isn't already primed for geometries, you'll have to run CREATE EXTENSION postgis;

Example:

@arianagiorgi
arianagiorgi / README.md
Last active December 14, 2022 01:51
Jittering overlapping GeoJSON points

Jittering GeoJSON points

Problem: You have geometry points that have the same (or nearly same) lat and lon in your GeoJSON/shp file and when you map them they overlap.

Solution: Using Leaflet... You can use the markercluster plugin and spiderfy the cluster. Using MapboxGL... you'll have to add random jitter (slightly change the lat and lon of the overlapping points). This process will take you through how to do that using PostGIS and QGIS.

  1. Get data into PostGIS

  2. You can use the following SQL query to determine what points will need to be jittered. This will depend on what your fields are. In my dataset, some points had the exact same geom (lat/lon) while other had a slightly different lat/lon but the same address. In both cases, I wanted to jitter these points.