Skip to content

Instantly share code, notes, and snippets.

@pcarrier
Created January 3, 2014 15:00
Show Gist options
  • Select an option

  • Save pcarrier/8239243 to your computer and use it in GitHub Desktop.

Select an option

Save pcarrier/8239243 to your computer and use it in GitHub Desktop.
HASH_CHUNK_SIZE = 65536
UINT64_MAX = 2**64 - 1
require 'xmlrpc/client'
require 'time'
require 'pp'
def withOST
server = XMLRPC::Client.new_from_uri 'http://api.opensubtitles.org/xml-rpc'
login = server.call 'LogIn', '', '', 'en', 'OS Test User Agent'
status = login['status']
raise status unless status == '200 OK'
token = login['token']
begin
yield server, token
ensure
server.call 'LogOut', token
end
end
def log msg
STDERR.puts "#{DateTime.now.rfc3339}: #{msg}"
end
def hashFile(file)
# beginning of file
file.seek(0, IO::SEEK_SET)
buffer = file.sysread(HASH_CHUNK_SIZE)
file.seek(-HASH_CHUNK_SIZE, IO::SEEK_END)
buffer << file.sysread(HASH_CHUNK_SIZE)
bufsize = buffer.size
if bufsize != 2 * HASH_CHUNK_SIZE
raise "Only read #{bufsize} bytes"
end
buffhash = buffer.unpack('Q*').reduce do |acc, v|
(acc + v) & UINT64_MAX
end
return (file.pos + buffhash) & UINT64_MAX
end
def scrobble(files)
md = Hash.new do
{:filenames => []}
end
files.each do |name|
hash = File.open(name, 'rb') { |f| '%08x' % hashFile(f) }
md[hash][:filenames] << name
# Added for #ruby question
STDERR.puts "#{hash}: #{name}"
end
# Added for #ruby question
STDERR.puts "md: #{md}"
## Output:
# a2ce46cd62b81668: 1312.7128v1.pdf
# fa3ed90f2ffcedba: Berkeley-Latency-Mar2012.pdf
# bb960143a7334619: MIT-CSAIL-TR-2009-002.pdf
# 311cdf88ad358bef: S0002-9947-1965-0170805-7.pdf
# e39b083c3485094: acm-float-10.1.1.102.244.pdf
# e925f184a114cecf: bitcoin.pdf
# ca50f53cac1082d8: edmonds.pdf
# 9a38c83de294a47f: gidra13asplos-naps.pdf
# f6d2f7e5b3e8e844: karp.pdf
# cd3200dd8af596ac: p761-thompson.pdf
# f5cbf07550c62ab5: paper-reading.pdf
# f798d8a6d059e58b: rcix-oopsla-2013.pdf
# 593caea3394f5e61: re-writing.pdf
# de53d1e1dfa988fd: shannon1948.pdf
# ad78d8fa3dd06b69: thenightwatch.pdf
# f358aa1753759610: turing-1936.pdf
# md: {}
res = withOST do |server, token|
server.call 'CheckMovieHash2', token, md.keys
end
status = res['status']
raise status unless status == '200 OK'
res['data'].each do |hash, infos|
md[hash][:infos] = infos
end
require 'awesome_print'
ap md
end
scrobble(ARGV) if __FILE__ == $0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment