Monday 2 November 2020

How to end Google Speech-to-Text streamingRecognize gracefully and get back the pending text results?

I'd like to be able to end a Google speech-to-text stream (created with streamingRecognize), and get back the pending SR (speech recognition) results.

In a nutshell, the relevant Node.js code:

// create SR stream
const stream = speechClient.streamingRecognize(request);

// observe data event
const dataPromise = new Promise(resolve => stream.on('data', resolve));

// observe error event
const errorPromise = new Promise((resolve, reject) => stream.on('error', reject));

// observe finish event
const finishPromise = new Promise(resolve => stream.on('finish', resolve));

// send the audio
stream.write(audioChunk);

// for testing purposes only, give the SR stream 2 seconds to absorb the audio
await new Promise(resolve => setTimeout(resolve, 2000));

// end the SR stream gracefully, by observing the completion callback
const endPromise = util.promisify(callback => stream.end(callback))();

// a 5 seconds test timeout
const timeoutPromise = new Promise(resolve => setTimeout(resolve, 5000)); 

// finishPromise wins the race here
await Promise.race([
  dataPromise, errorPromise, finishPromise, endPromise, timeoutPromise]);

// endPromise wins the race here
await Promise.race([
  dataPromise, errorPromise, endPromise, timeoutPromise]);

// timeoutPromise wins the race here
await Promise.race([dataPromise, errorPromise, timeoutPromise]);

// I don't see any data or error events, dataPromise and errorPromise don't get settled

What I experience is that the SR stream ends successfully, but I don't get any data events or error events. Neither dataPromise nor errorPromise gets resolved or rejected.

How can I signal the end of my audio, close the SR stream and still get the pending SR results?

I need to stick with streamingRecognize API because the audio I'm streaming is real-time, even though it may stop suddenly.

To clarify, it works as long as I keep streaming the audio, I do receive the real-time SR results. However, when I send the final audio chunk and end the stream like above, I don't get the final results I'd expect otherwise.

To get the final results, I actually have to keep streaming silence for several more seconds, which may increase the ST bill. I feel like there must be a better way to get them.

Updated: so it appears, the only proper time to end a streamingRecognize stream is upon data event where StreamingRecognitionResult.is_final is true. As well, it appears we're expected to keep streaming audio until data event is fired, to get any result at all, final or interim.

This looks like a bug to me, filing an issue.

Updated: it now seems to have been confirmed as a bug. Until it's fixed, I'm looking for a potential workaround.

Updated: for future references, here is the list of the current and previously tracked issues involving streamingRecognize.

I'd expect this to be a common problem for those who use streamingRecognize, surprised it hasn't been reported before. Submitting it as a bug to issuetracker.google.com, as well.



from How to end Google Speech-to-Text streamingRecognize gracefully and get back the pending text results?

No comments:

Post a Comment