camallen’s gists

camallen / code_used_to_break_and_fix_workflow_and_translation_strings_data.rb

Created October 22, 2019 09:56

Broken survey task labels on AmazonCam Tambopata project (invalid translation strings data)

	# 1. The underlying cause of the issue
	# ~17:00-17:30 on Friday 18th October, 2019
	# the commands I used to work on the translation fix issue

	# fix the workflow strings for an older survey task (missing descriptions)
	# https://www.zooniverse.org/lab/3040/workflows/2485
	workflow = Workflow.find 2485
	tasks = workflow.tasks
	task_string_extractor = TasksVisitors::ExtractStrings.new

camallen / one_task_workflows_with_single_questions.sql

Created September 20, 2019 13:01

Query to find all the 1 task workflows with single question only

	select id, tasks --(kv.value -> 'type')
	from
	workflows w,
	jsonb_each(w.tasks) kv
	WHERE kv.value -> 'type' = '"single"'
	AND id IN (
	select id FROM
	(SELECT id, (SELECT COUNT(*) FROM jsonb_object_keys(tasks)) as keys FROM workflows)as keys_tuple
	WHERE keys = 1
	)

camallen / aggregation_caesar_via_docker.sh

Last active August 6, 2019 15:01

Aggregation caesar for offline use via docker

	# use docker to run the aggregation code container with the current dir as /data
	docker run --rm -it --name aggregation_caesar -v ${PWD}:/data zooniverse/aggregation-for-caesar:latest bash

	# based on the examples in the documentation at https://aggregation-caesar.zooniverse.org/Scripts.html

	# configure the extractors/reducers from the workflows export
	panoptes_aggregation config /data/workflows.csv $WORKFLOW_ID

	# extract the data for the different tasks
	# note the extractor config from the step above

camallen / empty_all_submitted_wambo_anntoation_data.rb

Created July 22, 2019 16:29

Empty all the PII classification task annotation data from the WAMBO project

	# https://www.zooniverse.org/admin/project_status/h-spiers/where-are-my-body-organs
	# https://www.zooniverse.org/lab/
	wambo_project_id = 5372

	wambo_classifications_scope = Classification.where(project_id: wambo_project_id).select(:id)
	wambo_classifications_scope.find_in_batches do \|classifications\|
	classifications_update_scope = Classification.where(id: classifications)
	# overwrite the submitted task annotations for all workflows
	# this will most likley break dumps in API and that's ok.
	# we've shipped the data safely

camallen / prn-snapshot-generator.js

Created January 17, 2019 16:34

Generate a snapshot of PRN map overlay using chrome headless and puppeteer

	const puppeteer = require('puppeteer');

	(async () => {
	const browser = await puppeteer.launch();
	const page = await browser.newPage();
	await page.goto('https://prn-maps.planetaryresponsenetwork.org/', {waitUntil: 'networkidle0'});

	await page.screenshot({path: 'headless-screenshot.jpeg', type: 'jpeg', fullPage: true});

	// looks like pdf generation is problematic for centering the page

camallen / large_toast_relations.sql

Created December 6, 2018 11:03

find large toast relation pg

	SELECT nspname \|\| '.' \|\| relname AS "relation", pg_size_pretty(pg_relation_size(C.oid)) AS "size" FROM pg_class C LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE nspname NOT IN ('pg_catalog', 'information_schema') ORDER BY pg_relation_size(C.oid) DESC LIMIT 20;

	select t1.oid, t1.relname, t1.relkind, t2.relkind, t2.relpages, t2.reltuples
	from pg_class t1
	inner join pg_class t2
	on t1.reltoastrelid = t2.oid
	where t1.relkind = 'r'
	and t2.relkind = 't'
	order by reltuples DESC;

camallen / clear_sidekiq_locks_redis_cli.sh

Created November 2, 2018 13:37

Clear out sidekiq unique & congestion locks from redis

	# ssh onto the redis node in question and run these cmds

	# congestion:* keys are for https://github.com/parrish/Sidekiq-Congestion
	# uniquejobs:* https://github.com/mhenrixon/sidekiq-unique-jobs

	# get a count of the locks
	redis-cli -n 0 --scan --pattern 'congestion:*' \| wc -l
	redis-cli -n 0 --scan --pattern 'uniquejobs:*' \| wc -l

	# clear them for both key namespaces

camallen / glacier_restore_files.sh

Created September 25, 2018 12:57

Request glacier restore from file

	#!/bin/sh
	bucket="zooniverse-cold-storage"
	path_prefix="data/Galaxy-Zoo-SDSS/COS_stamps_large"

	for x in `cat glacier_to_restore.txt`
	do
	bucket_path="${path_prefix}/${x}"
	echo "Begin restoring ${bucket_path}"
	aws s3api restore-object --restore-request '{"Days":5,"GlacierJobParameters":{"Tier":"Bulk"}}' --bucket ${bucket} --key ${bucket_path}
	echo "Done restoring $bucket_path"

camallen / download_or_restore_from_galcier.sh

Created September 25, 2018 12:55

Download data or request from glacier storage

	while read row; do
	#echo $row
	file_name=$(echo $row \| awk '{print $4}')
	#echo $file_name
	#test if the file is already downloaded
	file -E $file_name &> /dev/null
	LOCAL_FILE_TEST_RESULT=$?
	if [ $LOCAL_FILE_TEST_RESULT -eq 0 ]; then
	#echo "${file_name} exists locally - skipping"
	continue

camallen / test_file_mime_type.py

Created September 21, 2018 08:17

Python magic lib test file mime types

	# https://github.com/ahupp/python-magic#usage

	import magic, csv
	file_paths = (
	'480_CornellFeeders_20171024_0921_000.mp4',
	'480_CornellFeeders_20171024_0921_000.mp4',
	'480_CornellFeeders_20171024_0921_001.mp4',
	'480_CornellFeeders_20171024_0921_002.mp4'
	)

Campbell Allen camallen