Skip to content

Instantly share code, notes, and snippets.

@camallen
camallen / code_used_to_break_and_fix_workflow_and_translation_strings_data.rb
Created October 22, 2019 09:56
Broken survey task labels on AmazonCam Tambopata project (invalid translation strings data)
# 1. The underlying cause of the issue
# ~17:00-17:30 on Friday 18th October, 2019
# the commands I used to work on the translation fix issue
# fix the workflow strings for an older survey task (missing descriptions)
# https://www.zooniverse.org/lab/3040/workflows/2485
workflow = Workflow.find 2485
tasks = workflow.tasks
task_string_extractor = TasksVisitors::ExtractStrings.new
@camallen
camallen / one_task_workflows_with_single_questions.sql
Created September 20, 2019 13:01
Query to find all the 1 task workflows with single question only
select id, tasks --(kv.value -> 'type')
from
workflows w,
jsonb_each(w.tasks) kv
WHERE kv.value -> 'type' = '"single"'
AND id IN (
select id FROM
(SELECT id, (SELECT COUNT(*) FROM jsonb_object_keys(tasks)) as keys FROM workflows)as keys_tuple
WHERE keys = 1
)
@camallen
camallen / aggregation_caesar_via_docker.sh
Last active August 6, 2019 15:01
Aggregation caesar for offline use via docker
# use docker to run the aggregation code container with the current dir as /data
docker run --rm -it --name aggregation_caesar -v ${PWD}:/data zooniverse/aggregation-for-caesar:latest bash
# based on the examples in the documentation at https://aggregation-caesar.zooniverse.org/Scripts.html
# configure the extractors/reducers from the workflows export
panoptes_aggregation config /data/workflows.csv $WORKFLOW_ID
# extract the data for the different tasks
# note the extractor config from the step above
@camallen
camallen / empty_all_submitted_wambo_anntoation_data.rb
Created July 22, 2019 16:29
Empty all the PII classification task annotation data from the WAMBO project
# https://www.zooniverse.org/admin/project_status/h-spiers/where-are-my-body-organs
# https://www.zooniverse.org/lab/
wambo_project_id = 5372
wambo_classifications_scope = Classification.where(project_id: wambo_project_id).select(:id)
wambo_classifications_scope.find_in_batches do |classifications|
classifications_update_scope = Classification.where(id: classifications)
# overwrite the submitted task annotations for all workflows
# this will most likley break dumps in API and that's ok.
# we've shipped the data safely
@camallen
camallen / prn-snapshot-generator.js
Created January 17, 2019 16:34
Generate a snapshot of PRN map overlay using chrome headless and puppeteer
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://prn-maps.planetaryresponsenetwork.org/', {waitUntil: 'networkidle0'});
await page.screenshot({path: 'headless-screenshot.jpeg', type: 'jpeg', fullPage: true});
// looks like pdf generation is problematic for centering the page
@camallen
camallen / large_toast_relations.sql
Created December 6, 2018 11:03
find large toast relation pg
SELECT nspname || '.' || relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_relation_size(C.oid) DESC LIMIT 20;
select t1.oid, t1.relname, t1.relkind, t2.relkind, t2.relpages, t2.reltuples
from pg_class t1
inner join pg_class t2
on t1.reltoastrelid = t2.oid
where t1.relkind = 'r'
and t2.relkind = 't'
order by reltuples DESC;
@camallen
camallen / clear_sidekiq_locks_redis_cli.sh
Created November 2, 2018 13:37
Clear out sidekiq unique & congestion locks from redis
# ssh onto the redis node in question and run these cmds
# congestion:* keys are for https://github.com/parrish/Sidekiq-Congestion
# uniquejobs:* https://github.com/mhenrixon/sidekiq-unique-jobs
# get a count of the locks
redis-cli -n 0 --scan --pattern 'congestion:*' | wc -l
redis-cli -n 0 --scan --pattern 'uniquejobs:*' | wc -l
# clear them for both key namespaces
@camallen
camallen / glacier_restore_files.sh
Created September 25, 2018 12:57
Request glacier restore from file
#!/bin/sh
bucket="zooniverse-cold-storage"
path_prefix="data/Galaxy-Zoo-SDSS/COS_stamps_large"
for x in `cat glacier_to_restore.txt`
do
bucket_path="${path_prefix}/${x}"
echo "Begin restoring ${bucket_path}"
aws s3api restore-object --restore-request '{"Days":5,"GlacierJobParameters":{"Tier":"Bulk"}}' --bucket ${bucket} --key ${bucket_path}
echo "Done restoring $bucket_path"
@camallen
camallen / download_or_restore_from_galcier.sh
Created September 25, 2018 12:55
Download data or request from glacier storage
while read row; do
#echo $row
file_name=$(echo $row | awk '{print $4}')
#echo $file_name
#test if the file is already downloaded
file -E $file_name &> /dev/null
LOCAL_FILE_TEST_RESULT=$?
if [ $LOCAL_FILE_TEST_RESULT -eq 0 ]; then
#echo "${file_name} exists locally - skipping"
continue
@camallen
camallen / test_file_mime_type.py
Created September 21, 2018 08:17
Python magic lib test file mime types
# https://github.com/ahupp/python-magic#usage
import magic, csv
file_paths = (
'480_CornellFeeders_20171024_0921_000.mp4',
'480_CornellFeeders_20171024_0921_000.mp4',
'480_CornellFeeders_20171024_0921_001.mp4',
'480_CornellFeeders_20171024_0921_002.mp4'
)