Skip to content

Instantly share code, notes, and snippets.

@isoboroff
isoboroff / my-ps-to-pdf.sh
Created June 28, 2023 20:39
Convert Postscript files to PDF with embedded OCR
#!/bin/bash
psfile=$1
tmpfoo=`basename $0`
TMPDIR=`mktemp -d /tmp/${tempfoo}.XXXXXX` || exit 1
echo $TMPDIR
gs -o $TMPDIR/%05d.png -sDEVICE=png16m -r300 -dPDFFitPage=true $psfile
@isoboroff
isoboroff / index.html
Created November 17, 2021 19:23
Updating times according to the viewer's timezone
<script src="moment.min.js"></script>
<script src="moment-timezone-with-data-10-year-range.js"></script>
<script>
const my_zone = moment.tz.guess(true);
// Set up timezone selector with all the zones.
// The user's current guessed zone is selected.
const sel = document.querySelector('select.timezone');
moment.tz.names().forEach( zone => {
option = document.createElement('option');
@isoboroff
isoboroff / elastic-baseline.py
Created August 27, 2021 13:42
Do a TREC title-only run against an ElasticSearch index.
#!/usr/bin/env python3
from elasticsearch import Elasticsearch, TransportError
import argparse
import re
import sys
ap = argparse.ArgumentParser(description='Do a baseline run against an Elasticsearch index')
ap.add_argument('--host', default='localhost', help='Elasticsearch host')
ap.add_argument('--port', default=9200, help='Elasticsearch port')
@isoboroff
isoboroff / gen-key.py
Created February 18, 2021 20:07
Django ./manage.py command to generate secret keys
from django.core.management.base import BaseCommand, CommandError
from django.utils.crypto import get_random_string
class Command(BaseCommand):
help='''Generate a secret key the same way Django does.
You will need to install it either in settings.py for dev or externally for production
'''
def add_arguments(self, parser):
parser.add_argument('-l', '--length', type=int, default=50)
@isoboroff
isoboroff / storage.py
Created February 18, 2021 14:08
A minimal Django custom storage backend to use GitPython to store revisions to uploaded files and disallow deletes
from django.conf import settings
from django.core.files.storage import FileSystemStorage
from django.core.files.base import File
from django.utils.deconstruct import deconstructible
from git import Repo
import io
@deconstructible
class VersionedStorage(FileSystemStorage):
#!/usr/bin/env python3
if __name__ == "__main__":
import json
import argparse
import spacy
import dateparser
import signal
from contextlib import contextmanager
from tqdm import tqdm
@isoboroff
isoboroff / tweets-to-fortunes.py
Last active July 26, 2018 16:49
Convert a file of tweets into a file of fortunes as used by Unix fortune(1) and Emacs cookie-mode.
#!/usr/bin/env python3
import json
import argparse
import re
# source files from https://github.com/bpb27/trump_tweet_data_archive
# Removes URLs since M-x cookie-doctor gets confused by them
argparser = argparse.ArgumentParser(description='Convert from JSON array of condensed tweets to cookie format')
@isoboroff
isoboroff / getTrueName.c
Created November 9, 2017 22:19
Dereference an OS X Finder alias
// getTrueName.c
//
// DESCRIPTION
// Resolve HFS and HFS+ aliased files (and soft links), and return the
// name of the "Original" or actual file. Directories have a "/"
// appended. The error number returned is 255 on error, 0 if the file
// was an alias, or 1 if the argument given was not an alias
//
// BUILD INSTRUCTIONS
// gcc-3.3 -o getTrueName -framework Carbon getTrueName.c
@isoboroff
isoboroff / gist:bca2aee7877567cf781b
Created July 17, 2015 19:25
Sometimes you mistakenly unpack tens of thousands of files into a single directory and maybe your OS/filesystem is unhappy about deleting it. This is recursive rm(1) in pure C.
#include <sys/types.h>
#include <dirent.h>
/**
* Usage: rm_all <dir-name>
* From lkml...
*/
#define u32 unsigned int
@isoboroff
isoboroff / gist:424fcdf63fa760c1d1a7
Created May 20, 2014 17:06
Getting out of Solr Zookeeper /solr/overseer/queue hell
I had a large index job crash at about doc 75M. This is with CDH4.6. I could not list the /solr/overseer/queue directory in ZK because it had millions of entries.
Here are the steps I followed to avoid a re-index:
1. Shut down all solr-servers and zookeeper-servers
2. Run zookeeper-server-initialize --force --myid X on each ZK server. This should result in an empty ZK space.
3. solrctl --init
4. hadoop mv /solr/the-collection /hold
5. Restart solr-server instances
6. solrctl instancedir --create ... # re-upload the config info
7. solrctl collection --create # with the same number of nodes, repls, as before