Roy Firestein r0yfire

Building fontconfig

Start up a lambda-like docker container:

docker run -i -t -v /tmp:/var/task lambci/lambda:build /bin/bash

Install some dependencies inside the container:

yum install gperf freetype-devel libxml2-devel git libtool -y

easy_install pip

ssh -i keyfile.pem ubuntu@<ip>

sudo apt -y update && sudo apt -y upgrade
sudo apt install -y p7zip-full build-essential linux-image-extra-virtual linux-source

echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u

# to activate latest kernel

Using Python 3 + Google Cloud Vision API's OCR to extract text from photos and scanned documents

Just a quickie test in Python 3 (using Requests) to see if Google Cloud Vision can be used to effectively OCR a scanned data table and preserve its structure, in the way that products such as ABBYY FineReader can OCR an image and provide Excel-ready output.

The short answer: No. While Cloud Vision provides bounding polygon coordinates in its output, it doesn't provide it at the word or region level, which would be needed to then calculate the data delimiters.

On the other hand, the OCR quality is pretty good, if you just need to identify text anywhere in an image, without regards to its physical coordinates. I've included two examples:

####### 1. A low-resolution photo of road signs

Python VS ES6 syntax comparison

Python syntax here : 2.7 - online REPL

Javascript ES6 via Babel transpilation - online REPL

Imports

import math

Better local require() paths for Node.js

Problem

When the directory structure of your Node.js application (not library!) has some depth, you end up with a lot of annoying relative paths in your require calls like:

const Article = require('../../../../app/models/article');

Those suck for maintenance and they're ugly.

	import Crawler from 'crawler';
	import url from 'url';

	const BASE_ADDRESS = 'https://en.wikipedia.org/';
	const COUNTRY_PATTERN = /.?Visa_requirements_for_(.?)_citizens.*?/i;
	const VISA_REQUIRED_PATTERN = /.?visa\s+required.?/i;
	const VISA_NOT_REQUIRED_PATTERN = /.?visa\s+not\s+required.?/i;

	const visaRequirements = {};

	var AWS = require('aws-sdk');
	var http = require('http');
	var httpProxy = require('http-proxy');
	var express = require('express');
	var bodyParser = require('body-parser');
	var stream = require('stream');

	if (process.argv.length != 3) {
	console.error('usage: aws-es-proxy <my-cluster-endpoint>');
	process.exit(1);

	"""
	Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
	BSD License
	"""
	import numpy as np

	# data I/O
	data = open('input.txt', 'r').read() # should be simple plain text file
	chars = list(set(data))
	data_size, vocab_size = len(data), len(chars)

	# Change YOUR_TOKEN to your prerender token
	# Change example.com (server_name) to your website url
	# Change /path/to/your/root to the correct value

	server {
	listen 80;
	server_name example.com;

	root /path/to/your/root;
	index index.html;

	1) Start with only one known domain from a botnet: qwmrxczhrcmbcagehqwxlvsnj.ru

	2) Get the intersection of names looked up by the IPs having looked up this domain. It takes less than 1 minute.

	$ curl https://sgraph.umbrella.com/dnsdb/clientlookups/i/name/qwmrxczhrcmbcagehqwxlvsnj.ru \| sort -rn > /tmp/a

	3) Remove popular domains

	cut -f2 /tmp/a \| filter-popular > /tmp/aa