alotaiba/google_speech2text.md

Created February 3, 2012 13:20

Star (308) You must be signed in to star a gist
Fork (101) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/alotaiba/1730160.js"></script>
Save alotaiba/1730160 to your computer and use it in GitHub Desktop.

Download ZIP

Google Speech To Text API

Raw

google_speech2text.md

Google Speech To Text API

Base URL: https://www.google.com/speech-api/v1/recognize
It accepts POST requests with voice file encoded in FLAC format, and query parameters for control.

Query Parameters

client
The client's name you're connecting from. For spoofing purposes, let's use chromium

lang
Speech language, for example, ar-QA for Qatari Arabic, or en-US for U.S. English

maxresults
Maximum results to return for utterance

POST

body
Should contain FLAC formatted voice binary

HTTP Header

Content-Type
Should be audio/x-flac; rate=16000;, where MIME and sample rate of the FLAC file is included

User-Agent
Can be the client's user agent string, for spoofing purposes, we'll use Chrome's

Examples

These examples assume you have a voice file encoded in FLAC called alsalam-alikum.flac.

wget

This will save JSON response in a file called recognized.json

wget --post-file='alsalam-alikum.flac' \
--user-agent='Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header='Content-Type: audio/x-flac; rate=16000;' \
-O 'recognized.json' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=ar-QA&maxresults=10'

curl

curl -X POST \
--data-binary @alsalam-alikum.flac \
--user-agent 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.7 (KHTML, like Gecko) Chrome/16.0.912.77 Safari/535.7' \
--header 'Content-Type: audio/x-flac; rate=16000;' \
'https://www.google.com/speech-api/v1/recognize?client=chromium&lang=ar-QA&maxresults=10'

Kaspler commented May 22, 2014

@akifnaeem21

I'm in same situation as you - need to finish my BS project till the end of May.
I also was working fine with V1 till now... Anyway, now, thanks to this page and all the comments here I managed to make it work with V2.

Here is my C# code (it's just a prototype where I tested, so it's simple):


using System;
using System.IO;
using System.IO.Compression;
using System.Net;


class Program
{
    static void Main(string[] args)
    {
        using (var fileStream = new FileStream(@"C:\LetsArrangeMeeting.flac", FileMode.Open))
        {
            const string requestUrl = "https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=KEY_GOES_HERE&client=chromium&maxresults=6&pfilter=2";
            var request = (HttpWebRequest)WebRequest.Create(requestUrl);
            ConfigureRequest(request);
            var requestStream = request.GetRequestStream();

            CopyStream(fileStream, requestStream);

            using (var response = request.GetResponse())
            {
                using (var responseStream = response.GetResponseStream())
                {
                    using (var zippedStream = new GZipStream(responseStream, CompressionMode.Decompress))
                    {
                        using (var sr = new StreamReader(zippedStream))
                        {
                            var res = sr.ReadToEnd();
                            Console.WriteLine(res);
                        }
                    }
                }
            }
        }

        Console.ReadLine();
    }

    private static void CopyStream(FileStream fileStream, Stream requestStream)
    {
        var buffer = new byte[32768];
        int read;
        while ((read = fileStream.Read(buffer, 0, buffer.Length)) > 0)
        {
            requestStream.Write(buffer, 0, read);
        }
    }

    private static void ConfigureRequest(HttpWebRequest request)
    {
        request.KeepAlive = true;
        request.SendChunked = true;
        request.ContentType = "audio/x-flac; rate=44100";
        request.UserAgent =
            "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2";
        request.Headers.Set(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch");
        request.Headers.Set(HttpRequestHeader.AcceptLanguage, "en-GB,en-US;q=0.8,en;q=0.6");
        request.Headers.Set(HttpRequestHeader.AcceptCharset, "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
        request.Method = "POST";
    }
}

And this is the result:


{"result":[]}
{"result":[{"alternative":[{"transcript":"hello let's arrange a meeting in Tel Aviv at 5 o'clock"},{"transcript":"hello lets arrange a meeting in Tel Aviv at 5 o'clock"},{"transcript":"hello let's arrange a meeting in Tel Aviv at 5 o'clock Buy"},{"transcript":"hello lips arrange a meeting in Tel Aviv at 5 o'clock"},{"transcript":"hello let's arrange a meeting in Tel Aviv at 5 o'clock by"},{"transcript":"hello let's arrange a meeting in Tel Aviv at 5 o'clock buy"}],"final":true}],"result_index":0}

Hope this helps!

ecmnet commented May 22, 2014

V2 seems to be down. None of my API keys work. Same experience?

kkaarrss commented May 22, 2014

Everything seems normal for me.

ghost commented May 24, 2014

I also only got an empty response. Does anyone know how to fix this?

mudler commented May 24, 2014

empty response too here, i'm digging on the chromium code, but no luck for now.
Edit: seems pfilter is doing the trick, gillesdemey/google-speech-v2#6 (comment)

pratyushkhatait commented May 26, 2014

@NSThread how did you get the key? Is there any constraints of limited number of use of the key? I have used the following link to get a key "https://console.developers.google.com" but it is not working.

hauhhvn commented May 27, 2014

I want to use more than 50 requests / day. How to do it?

Homez386 commented May 29, 2014

Same as hauhhvn! Why doesn't Google have commercial offerings for this API (at least, I haven't seen them)? When I try to extend quota in my developer's cabinet, some private Google doc is about to be shown, two times I requested permission to read it, but no further reaction!

pannous commented Jun 2, 2014

<title>Error 403 (Forbidden)!!1</title> original google response "!!1" lol

wildroo commented Jun 12, 2014

Working java code:

//libs to import
import java.io.BufferedReader;
import java.io.DataOutputStream;
import java.io.InputStreamReader;
import java.net.URL;
import java.nio.file.Files;
import java.nio.file.Paths;

/**
 * Send post to google
 */
private void sendPost() throws Exception {
    String USER_AGENT = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.2 (KHTML, like Gecko) Chrome/15.0.874.121 Safari/535.2",
    url = "https://www.google.com/speech-api/v2/recognize?output=json&lang=ru-RU&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw&client=chromium&maxresults=6&pfilter=2";

    URL obj = new URL(url);
    HttpsURLConnection con = (HttpsURLConnection) obj.openConnection();

    // add reuqest header
    con.setRequestMethod("POST");
    con.setRequestProperty("User-Agent", USER_AGENT);
    con.setRequestProperty("Content-Type", "audio/l16; rate=16000");
    con.setRequestProperty("AcceptEncoding", "gzip,deflate,sdch");

    // Send post request
    con.setDoOutput(true);
    DataOutputStream wr = new DataOutputStream(con.getOutputStream());
    wr.write(Files.readAllBytes(Paths
            .get("C:\\tmp\\test_sounds\\1_16000.wav")));
    wr.flush();
    wr.close();

    int responseCode = con.getResponseCode();
    System.out.println("\nSending 'POST' request to URL : " + url);
    System.out.println("Response Code : " + responseCode);

    BufferedReader in = new BufferedReader(new InputStreamReader(
            con.getInputStream()));
    String inputLine;
    StringBuffer response = new StringBuffer();

    while ((inputLine = in.readLine()) != null) {
        response.append(inputLine);
    }
    in.close();

    // print result
    System.out.println(response.toString());

}

yochze commented Jun 13, 2014

Here's my working Ruby code.

        url = 'https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw&results=6&pfilter=2'
        uri = URI.parse(url)
        http = Net::HTTP.new(uri.host, uri.port)
        http.use_ssl = true
        http.verify_mode = OpenSSL::SSL::VERIFY_NONE    
        request = Net::HTTP::Post.new(uri.request_uri)
        request.body = File.read("your_flac_file.flac}")
        request.content_type = 'audio/x-flac; rate=16000;'
        res = http.request(request)

BTW - Does anybody know what's the time limit for audio file?

bezerko commented Jun 28, 2014

I thought I would share this for those who just get back:
{"result":[]}

I was setting the "rate" in the Content-Type HTTP header to 16000 that had always worked in V1. Changing this to 44100 fixed the issue:

"Content-type": "audio/x-flac; rate=44100"

LRNAB commented Jul 2, 2014

Does anyone know how to get the key?

voicer commented Jul 3, 2014

@LRNAB
The key for google api you can get in the google APIs console for free.
But the question is how to increase the max limit of 50 requests 'a day?
I found no way to get more, and the link to order quote is dead.

The Google corp. forces me and other developers to find any other way to solve this problem.
I see the solution in java script for Chrome browser. But if anybody need this API i.e. for C# then he must to switch the source of sound from microphone to the file stream in flac format.

My question is now - how to change the source of sound from microphone into binary file on disk?

sakhnevych commented Jul 28, 2014

@voicer
Could you tell if there is a way to use java script for Chrome browser from terminal? I realize it is a little bit strange question, but I can't find a way to use unlimited access to google speech API from linux using annyang.

smithas commented Aug 8, 2014

I went through the docs for creating key but it gives me forbidden error 403. I had tried a key few weeks ago shared by someone here which had worked but now that also fails. What step am I missing that I get 403 error? Is there any other way. Please help.

ijazsarwar commented Sep 23, 2014

Anyone got it working in Java with V2?? I am getting 403 error.
Also looking at the comments looks like it only converts 10 to 15 secs of audio ... Anyway we can extend that time limit to say one minute or even longer??

amd5200 commented Oct 11, 2014

I changed to use pocketsphinx, it can be works offline and supports Chinese language.

marplx commented Oct 16, 2014

I also have issues with the authentication for Google Speech2Text API on Android.
It works with my API key in WiFi if I tell Google my current IP. But the Android Key isn't working. I provided the correct debug-keystore SHA1 but am receiving: The client does not have permission to get url ... from this server. I use the Apache HttpClient within my Android application. Any ideas?

@ijazsarwar This Google API is not intended to continuously recognize speech. Its for short commands. Thats why there is this few seconds limit. Take a look at my master thesis project where I try to send 15sec chunks to Google to get something continuous-like.

@amd5200 I'd love to talk to you about PocketSphinx on Andoird for generic vocabulary searches since I already tried unsuccessfully to get it to work as described here - you'll find my email in my github profile.

amd5200 commented Oct 27, 2014

@marfnk I just used it on my single board, like raspberry pi, and for Chinese. ( https://www.youtube.com/watch?v=EucxVToC58E&list=UUgBhkLQyk0LwxpKAmIZQNgA&hd=1 )
I never tried PocketSphinx on Andoird, so i got no idea for this.

shinichi0802 commented Aug 18, 2015

IS this possible to use ajax to post in to Google Speech API ?

amsehili commented Oct 27, 2015

Hello everybody,

Some time ago I created this shell script which packages everything you need to use the API v2 (record data for a given duration or use a file, specify language, filter out results, etc.): https://github.com/amsehili/gspeech-rec

For more details about the reverse engineering being used, check out this article: https://aminesehili.wordpress.com/2015/02/08/on-the-use-of-googles-speech-recognition-api-version-2/

Cheers!

rogo21 commented Feb 4, 2016

how to divide audio into frames of 15 sec using java code or command line??

ghost commented Jul 21, 2016

Hi, https://www.google.com/speech-api/v2/recognize -> 400. That’s an error.
I am in Canada

Swethamr402 commented Jul 29, 2016

HI, Sending 'POST' request to URL : https://www.google.com/speech-api/v2/recognize?output=json&lang=en-us&key=AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw&results=6&pfilter=2
Response Code : 200
{"result":[]}

I am getting above code.. But no output of recorded file to be convert in to text. How do I get that ?

IvanZhao commented Sep 5, 2016

Hi everybody,
When I was "POST https://speech.googleapis.com/v1beta1/speech:asyncrecognize".
I receive the error message like this:
{
"code": 403,
"errors": [
{
"domain": "global",
"message": "Requests from this Android client application are blocked.",
"reason": "forbidden"
}
],
"message": "Requests from this Android client application are blocked.",
"status": "PERMISSION_DENIED"
}
Dose any buddy knows how to resolve this problem?
Thanks.

indrabayu commented Nov 24, 2016 •

edited

Loading

    public static async Task<string> RequestGoogleSpeechAPIAsync(byte[] byteArray) 
    {
        var httpClient = new HttpClient();
        var mediaType = new MediaTypeWithQualityHeaderValue("audio/x-flac");
        var parameter = new NameValueHeaderValue("rate", "16000");
        mediaType.Parameters.Add(parameter);

        var url = "https://www.google.com/speech-api/v2/recognize?output=json&lang=en-US&key=";
        var appSettings = ConfigurationManager.AppSettings;
        var apiKey = "AIzaSyBOti4mM-6x9WDnZIjIeyEU21OpBXqWBgw";
        var uri = new Uri(url + apiKey);

        using (MemoryStream ms = new MemoryStream(byteArray, 0, byteArray.Length))
        {
            var param = new StreamContent(ms);
            param.Headers.ContentType = mediaType;

            var result = await httpClient.PostAsync(uri, param);

            var responseFromServer = await result.Content.ReadAsStringAsync();
            var responseArray = responseFromServer.Split('\n');
            var responseJson = await Task.Factory.StartNew(() => JsonConvert.DeserializeObject<SpajamHonsen.Models.GoogleSpeechAPIResponseModel.Resuls>(responceArray[1]));

            return responseJson.result[0].alternative[0].transcript;
        }
    }

fj4870 commented Dec 1, 2016

@akifnaeem21
Request you to please share speech to text c# code.

Nalinh commented Apr 4, 2017

i can't run :(
result is blank, tell me why, plz

Reejesh-PK commented Oct 27, 2022

Here is the updated Api : https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize

Stackoverflow Ref : https://stackoverflow.com/questions/50760057/how-to-use-googles-cloud-speech-to-text-rest-api-to-transcribe-a-video

alotaiba/google_speech2text.md

Google Speech To Text API

Query Parameters

POST

HTTP Header

Examples

wget

curl

Kaspler commented May 22, 2014

Uh oh!

ecmnet commented May 22, 2014

Uh oh!

kkaarrss commented May 22, 2014

Uh oh!

ghost commented May 24, 2014

Uh oh!

mudler commented May 24, 2014

Uh oh!

pratyushkhatait commented May 26, 2014

Uh oh!

hauhhvn commented May 27, 2014

Uh oh!

Homez386 commented May 29, 2014

Uh oh!

pannous commented Jun 2, 2014

Uh oh!

wildroo commented Jun 12, 2014

Uh oh!

yochze commented Jun 13, 2014

Uh oh!

bezerko commented Jun 28, 2014

Uh oh!

LRNAB commented Jul 2, 2014

Uh oh!

voicer commented Jul 3, 2014

Uh oh!

sakhnevych commented Jul 28, 2014

Uh oh!

smithas commented Aug 8, 2014

Uh oh!

ijazsarwar commented Sep 23, 2014

Uh oh!

amd5200 commented Oct 11, 2014

Uh oh!

marplx commented Oct 16, 2014

Uh oh!

amd5200 commented Oct 27, 2014

Uh oh!

shinichi0802 commented Aug 18, 2015

Uh oh!

amsehili commented Oct 27, 2015

Uh oh!

rogo21 commented Feb 4, 2016

Uh oh!

ghost commented Jul 21, 2016

Uh oh!

Swethamr402 commented Jul 29, 2016

Uh oh!

IvanZhao commented Sep 5, 2016

Uh oh!

indrabayu commented Nov 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fj4870 commented Dec 1, 2016

Uh oh!

Nalinh commented Apr 4, 2017

Uh oh!

Reejesh-PK commented Oct 27, 2022

Uh oh!

indrabayu commented Nov 24, 2016 •

edited

Loading