Last active
August 29, 2015 14:22
-
-
Save robinsloan/60c3349fb01eaba9c3e1 to your computer and use it in GitHub Desktop.
Voice memo transcriber engine. This is a very simple, highly inflexible script that probably won't be useful to many other people, but hey, you never know!
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=begin README | |
Here's what this script does: | |
1. checks a Gmail inbox | |
2. finds attachments (which it expects to be WAV files) | |
3. pipes them through Google's voice transcription service | |
4. emails you the results | |
I use it with the Instacorder iPhone app for a super-fast, push-to-talk | |
voice memo transcription service, with no steps/confirmations along the way. | |
A few important things to know: | |
1. Google's transcription API maxes out at around 60 seconds. Longer recordings | |
~fail silently~. | |
2. I have this script running as a cron job on a desktop computer. You could | |
just as easily run it manually, put it on a server, etc. | |
3. I use a dedicated email account for voice memos, so the script naively plows | |
through EVERYTHING. It wouldn't be hard to modify it to discriminate between | |
emails/attachments -- maybe most easily using some key in the subject line? | |
3a. But, absent such modification: don't run this script on your everyday Gmail | |
account!! | |
4. To get your GOOGLE_KEY for the Speech API, you'll need to follow the | |
directions here, including the part where you sign up for the chromium-dev | |
group: http://www.chromium.org/developers/how-tos/api-keys | |
That's it! | |
=end | |
require "rubygems" | |
require "mail" | |
require "gmail" | |
require "json" | |
# also requires: curl on the command line | |
# CONST # | |
GMAIL_POWERED_ADDRESS = "[email protected]" # also works with google apps for your domain | |
PASSWORD = "password" | |
GOOGLE_KEY = "key_with_speech_api_enabled" # see: http://www.chromium.org/developers/how-tos/api-keys | |
TRANSCRIBED_EMAIL_TO = "[email protected]" | |
TRANSCRIBED_EMAIL_SUBJECT = "Transcribed voice memo" | |
THIS_DIR = __dir__ | |
AUDIO_DIR = "#{THIS_DIR}/audio" # you must create this directory | |
LOGFILE = "log.txt" # you must create this file | |
SPEW_LOG_TO_CONSOLE = false # false = save to logfile instead | |
# GLOBAL # | |
$gmail = Gmail.connect!(GMAIL_POWERED_ADDRESS, PASSWORD) | |
# METHODS # | |
def transcribe_audio(full_path_to_audio_file) | |
log "transcribing #{full_path_to_audio_file}..." | |
cmd = "curl -X POST --data-binary @'#{full_path_to_audio_file}' \ | |
--header 'Content-Type: audio/l16; rate=16000;' \ | |
'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=#{GOOGLE_KEY}'" | |
raw_response = `#{cmd}` | |
best_transcript = raw_response.scan(/\"transcript\"\:\"(.+?)\"/)[0][0] | |
# need to use [0][0] b/c of the way the capture group (.+?) returns its results | |
log "transcribed as: #{best_transcript}..." | |
return best_transcript | |
end | |
def log(msg) | |
if SPEW_LOG_TO_CONSOLE then | |
puts msg | |
else | |
open(File.join(THIS_DIR,"log.txt"), 'a') do |log_file| | |
log_file.puts msg | |
end | |
end | |
end | |
def log_end | |
log("---\n\n") | |
# if logfile is larger than 10 megs... | |
if ((File.size(File.join(THIS_DIR,"log.txt"))/1024000.0) > 10) then | |
$gmail.deliver do | |
to TRANSCRIBED_EMAIL_TO | |
from GMAIL_POWERED_ADDRESS | |
subject "ALERT: your transcriber.rb logfile is getting large" | |
body "that's all :)" | |
end | |
end | |
end | |
##################### | |
# # | |
# BEGIN ZE SCRIPT # | |
# # | |
##################### | |
if $gmail.inbox.count(:unread) <= 0 then | |
exit | |
end | |
log "found unread message(s) at #{Time.now}" | |
$gmail.inbox.find(:unread).each do |email| | |
email.message.attachments.each do |attachment| | |
full_path = File.join(AUDIO_DIR, attachment.filename) | |
File.write(full_path, attachment.body.decoded) | |
transcription_email = $gmail.compose do | |
to TRANSCRIBED_EMAIL_TO | |
from GMAIL_POWERED_ADDRESS | |
subject TRANSCRIBED_EMAIL_SUBJECT | |
body transcribe_audio(full_path) | |
end | |
begin | |
transcription_email.deliver! | |
email.read! | |
log "transcription sent!" | |
rescue | |
log "well, something went wrong, so I didn't mark the email as read..." | |
end | |
end | |
end | |
log_end | |
$gmail.logout |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment