Skip to content

Instantly share code, notes, and snippets.

View jnv's full-sized avatar

Jan Vlnas jnv

View GitHub Profile
@jnv
jnv / dropbox-thumbs-names.js
Created October 12, 2013 13:00
Get numerical filenames of pictures in Dropbox gallery (thumbnails view)
var thumbs = document.querySelectorAll("#gallery-view-media a[onclick^=Lightbox]");
var ids = [];
for (var i = 0; i < thumbs.length; ++i)
{
var item = thumbs[i];
var fname = item.href.split('/').pop();
var id = fname.match(/^\d+/)[0];
ids.push(id);
}
@jnv
jnv / rfacebook-wordcloud.R
Last active December 29, 2015 06:09
Načetní postů z facebooku, tokenizace, sestavení korpusu a vytvoření wordcloudu. Pro StuNoMe Digital Humanities
### Rfacebook
# Instalace knihovny - staci spustit jenom jednou
install.packages("Rfacebook")
# Nacteni knihovny
library(Rfacebook)
# Token ziskany z https://developers.facebook.com/tools/explorer
token <- "..."
@jnv
jnv / dh-ngram-cloud.R
Created November 25, 2013 17:02
N-gramy do wordcloudu
# Podle https://gist.github.com/josefslerka/4148592
## Nacteni knihoven
library(textcat)
library(tau)
library(wordcloud)
## Vytvoreni korpusu
# Pro texty ve Windows kodovani pouzijte encoding="cp1250"
mujKorpus <- Corpus(DirSource("klaus", encoding="UTF-8"), readerControl = list(language = "cz"))
@jnv
jnv / srt2txt.rb
Last active November 20, 2022 16:11
Converts SRT subtitles to plain text, strips irrelevant parts. Requires gems: srt, sanitize. Created for Doctor Who text analysis project.
#!/usr/bin/env ruby
require "srt"
require "sanitize"
REJECT_LINES = [/Best watched using Open Subtitles MKV Player/,
/Subtitles downloaded from www.OpenSubtitles.org/, /^Subtitles by/,
/www.tvsubtitles.net/, /[email protected]/, /addic7ed/, /allsubs.org/,
/www.seriessub.com/, /www.transcripts.subtitle.me.uk/, /~ Bad Wolf Team/,
/^Transcript by/, /^Update by /, /UKsubtitles.ru/
]
library(tm)
dirname <- "episodes"
rawCorpus <- Corpus(DirSource(dirname, recursive=TRUE), readerControl=list(language="en"))
my.corpus <- rawCorpus
my.stopwords <- c(stopwords("english"),"ain't","just","can","get","got","will")
my.stopwords <- rev(my.stopwords) # Hack to apply i'll etc. before i
my.stopwords <- my.stopwords[my.stopwords != "who"] # Not a stopword. Not here.
@jnv
jnv / 11a-dsl.ipynb
Last active December 29, 2015 23:59
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jnv
jnv / effective_tld_names.dat
Created December 23, 2013 22:01
Public Suffix List minus the private part
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at http://mozilla.org/MPL/2.0/.
// ===BEGIN ICANN DOMAINS===
// ac : http://en.wikipedia.org/wiki/.ac
ac
com.ac
edu.ac
@jnv
jnv / shotdetect2csv.rb
Created March 14, 2014 22:35
Converts Shotdetect's results.xml file to CSV
#!/usr/bin/env ruby
# Extracts shotdetect's results.xml file to csv
# with relative position of shot in movie.
# Writes converted file to results.csv file.
#
# Usage: ./shotdetect2csv.rb results.xml
require "nokogiri"
require "csv"
@jnv
jnv / data.tsv
Last active August 29, 2015 13:57
(WIP) Srovnání kalorické hodnoty a sacharidů
type name price kcal sacharides sugar qty
d Cappy Jablko 39.9 47 11.3 11.3 1000
d Hello Jablko 37.9 45 11 9.6 1000
d Hello Čerstvě vylisovaná jablečná šťáva 39.9 40 9.8 8.4 1000
d Relax Jablko 39.9 42 10.2 10.2 1000
o Kubík Multivitamín 16.9 52.10333333 12.2 11.9 300
o Jupík Jahoda 16.5 24 6 6 330
m Rubín Jablko 40.9 40 9.83 0 1000
d Pfanner Jablko 47.9 44 10.3 9.9 1000
d Relax Pomeranč 42.9 44 10 10 1000