Skip to content

Instantly share code, notes, and snippets.

View anvius's full-sized avatar

Antonio Villamarin anvius

View GitHub Profile
@r4v
r4v / mailcheck_helper.php
Last active May 21, 2017 17:55
Advanced Mail Validation
<?php
/*
* Check if given mail is valid using:
* filter_var()
* check MX record
* verify is user exist using telnet and SMPT session
*
* I'm useing this helper in ajax respone so results are echo'ed as json
*/
# -*- coding: utf-8 -*-
"""
common-crawl-cdx.py
A simple example program to analyze the Common Crawl index.
This is implemented as a single stream job which accesses S3 via HTTP,
so that it can be easily be run from any laptop, but it could easily be
converted to an EMR job which processed the 300 index files in parallel.
@penguinpowernz
penguinpowernz / README.md
Last active March 8, 2022 20:19
OVPN Splitter

OVPN File Splitter

Have an OpenVPN config file with inline certificates that you need to split up because the Ubuntu team haven't fixed that 2+ year old bug in Network Manger?

You've come to the right place!

This script will split the file out into it's respective certs/keys and output a replacement .ovpn file with cert/key paths.

Usage

@miku
miku / ngram.go
Last active February 13, 2021 23:23
Handbaked N-Gram string similarity in Golang.
package main
import "fmt"
import "github.com/juju/utils/set"
func jaccard(a, b set.Strings) float64 {
return float64(a.Intersection(b).Size()) / float64(a.Union(b).Size())
}
func ngrams(s string, n int) set.Strings {
@keo
keo / bootstrap.sh
Last active January 25, 2024 15:49
Setup encrypted partition for Docker containers
#!/bin/sh
# Setup encrypted disk image
# For Ubuntu 14.04 LTS
CRYPTFS_ROOT=/cryptfs
apt-get update
apt-get -y upgrade
apt-get -y install cryptsetup
@alehandrof
alehandrof / simpletask gtd.md
Last active January 23, 2025 15:53
How to GTD with Simpletask

How to GTD with Simpletask

This is a guide to implementing Getting Things Done (GTD) using [Simpletask][] by [Mark Janssen][].

Simpletask uses the [todo.txt][] syntax, but has sufficient differences and quirks of its own to be worth describing in detail---at least, that's the story I'm going with. I actually began this guide as an exploration of my own trusted system. Personal workflows are by definition eccentric; I have included only what seems to me to be broadly useful.

This implementation of GTD covers the "standard" classifications: next actions by context, projects, somedays, agendas by person and meeting, etc. In a departure from strict GTD, each entry in these lists is also tagged with an area of focus, interest or responsibility. I find that the ability to slice the system by this extra dimension is worth the additional complexity at the processing and organizing stages. Limitations, issues and workarounds are discussed at the end.

Before we begin, some words of wisdom

@chilts
chilts / alexa.js
Created October 30, 2013 09:27
Getting the Alexa top 1 million sites directly from the server, unzipping it, parsing the csv and getting each line as an array.
var request = require('request');
var unzip = require('unzip');
var csv2 = require('csv2');
request.get('http://s3.amazonaws.com/alexa-static/top-1m.csv.zip')
.pipe(unzip.Parse())
.on('entry', function (entry) {
entry.pipe(csv2()).on('data', console.log);
})
;
@tbrianjones
tbrianjones / free_email_provider_domains.txt
Last active May 4, 2026 00:18
A list of free email provider domains. Some of these are probably not around anymore. I've combined a dozen lists from around the web. Current "major providers" should all be in here as of the date this is created.
1033edge.com
11mail.com
123.com
123box.net
123india.com
123mail.cl
123qwe.co.uk
126.com
150ml.com
15meg4free.com
@pe3
pe3 / scrape_entire_website_with_wget.sh
Last active January 12, 2026 09:51
Scrape An Entire Website with wget
this worked very nice for a single page site
```
wget \
--recursive \
--page-requisites \
--convert-links \
[website]
```
wget options
@christianroman
christianroman / test.py
Created May 30, 2013 16:02
Bypass Captcha using 10 lines of code with Python, OpenCV & Tesseract OCR engine
import cv2.cv as cv
import tesseract
gray = cv.LoadImage('captcha.jpeg', cv.CV_LOAD_IMAGE_GRAYSCALE)
cv.Threshold(gray, gray, 231, 255, cv.CV_THRESH_BINARY)
api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetVariable("tessedit_char_whitelist", "0123456789abcdefghijklmnopqrstuvwxyz")
api.SetPageSegMode(tesseract.PSM_SINGLE_WORD)
tesseract.SetCvImage(gray,api)
print api.GetUTF8Text()