Wednesday, 30 January 2019

PocketSphynix is not continuously outputting speech to text as expected

Motivation & back story (No need to read Optional):

I had a working piece of code that would use pocket sphynix and google recognizer together in order to support both menu based voice navigation and also continuous speeh to text.

Therefore the user code navigate the app using their voice if the words that they used matched with a menu item for example:

"Ok atom, start x" "Ok atom, stop x" "Ok atom, do z"

Note: "Ok atom" trigger phrase was found by PocketSphinx, then PocketSphinx killed was canceled and Google voice was used for longer phrases due to better accuracy.

Also if the recognizer did not match it would start continuous speech to text that was sent to backend like a chatbot (handled by google voice):

"Hi my name is dr green thumb.... "What for dinner tonight...."

If there was silence for 10 secs google voice would be canceled & PocketSphinx would be started again for keyword/wakeword trigger = "Ok atom"

**Everything was ok until, I need the actual raw audio bytes also for a backend application. I have search all over stackoverflow and none of the solution work or are realistic for my project.

Then I found that PocketSphinx can give you back the raw data, so I had to factor out the Google voice and ONLY use PocketSphinx**

The Actual Problem:

However after refactor out/removing Google voice recognizer and only using PocketSphinx, because I require easy access to the "raw audio bytes" of the recognition, PocketSphinx keeps hearing its trigger word "ok atom" and not any other words/voice before or after that trigger word!!!

No matter what I say for example "hello", "hi", "1,2,3" , etc... it only hears the trigger "ok atom", see my relevant code snippet:

The code snippet of the AsynTask that sets up the pocket sphynix:

 @Override
        protected Exception doInBackground(Void... params) {
            try {
                SpeechRecognitionService speechService = serviceReference.get();

                Assets assets = new Assets(speechService);


                File assetDir = assets.syncAssets();


                speechService.pocketSphinxRecognizer = defaultSetup()
                        .setAcousticModel(new File(assetDir, "en-us-ptm"))
                        .setDictionary(new File(assetDir, "cmudict-en-us.dict"))
                        // threshold to balance between false +/- (higher is less sensitive, was 1e-45f)
                        .setKeywordThreshold(1e-30f)
                        .getRecognizer();
                speechService.pocketSphinxRecognizer.addListener(listener);

                // create keyword-activation search
                speechService.pocketSphinxRecognizer.addKeyphraseSearch(KWS_SEARCH, KEYPHRASE);
            } catch (IOException e) {
                return e;
            }
            return null;
        }

The code snippet for pocket sphynix life cycle methods:

private String KWS_SEARCH = "ok atom"; 

private void switchSearch(String searchName) {
            pocketSphinxRecognizer.stop();
            if (searchName.equals(KWS_SEARCH))
                pocketSphinxRecognizer.startListening(searchName);
            else
                pocketSphinxRecognizer.startListening(searchName, 10000);
        }

        private class PocketSphinxRecognitionListener implements edu.cmu.pocketsphinx.RecognitionListener {
            @Override
            public void onPartialResult(Hypothesis hypothesis) {

                try {
                    if (hypothesis != null) {
                        String cmd = hypothesis.getHypstr();

                        Log.d(TAG, "onPartialResult:" + cmd);

                        if (cmd.equalsIgnoreCase(KWS_SEARCH))
                        {

                            handleResults(cmd);
                        }
                        else
                        {
                            sendToBacknedForProcessing(cmd);
                        }

                    }

                }
                catch (NullPointerException exc)
                {
                    exc.printStackTrace();
                }
            }

            @Override
            public void onBeginningOfSpeech() {}

            @Override
            public void onEndOfSpeech() {

                if (!pocketSphinxRecognizer.getSearchName().equals(KWS_SEARCH))
                {
                  switchSearch(KWS_SEARCH);
                }
            }

            @Override
            public void onResult(Hypothesis hypothesis) {


                if (hypothesis != null) {
                    String cmd = hypothesis.getHypstr();
                    Log.d(TAG, "onResult:" + cmd);

                        sendToBacknedForProcessing(cmd);
                }
                }


            @Override
            public void onError(Exception e) {
                Log.e(TAG, "Pocketsphinx encounted an exception: " + e.getMessage());
            }

            @Override
            public void onTimeout() {
                switchSearch(KWS_SEARCH);
            }
        }

I actually followed this popular article (https://www.guidearea.com/pocketsphinx-continuous-speech-recognition-android-tutorial/) step by step but still no correct results.

What I want to do is to support to modes both voice navigation if the right words are recognized else continous specch to text that is sent to the backend.

Thanks a million!



from PocketSphynix is not continuously outputting speech to text as expected

No comments:

Post a Comment