Wednesday, 22 May 2019

Why does Azure Content Moderator fail to detect custom terms?

I am trying to detect custom flagged keywords in a chunk of text, using Azure Cognitive services (more specifically azure-cognitiveservices-vision-contentmoderator==1.0.0).

Code runs as follows:

from azure.cognitiveservices.vision.contentmoderator import ContentModeratorClient
from msrest.authentication import CognitiveServicesCredentials

subscription_key = '<my_key>'
endpoint_url = 'https://westeurope.api.cognitive.microsoft.com/'

client = ContentModeratorClient(endpoint_url, CognitiveServicesCredentials(subscription_key))

Afterwards, I am also able to create a list (named 123) of custom terms (using the methods client.list_management_term_lists.create and client.list_management_term.add_term)

client.list_management_term_lists.create(
        content_type="application/json",
        body={
            "name": "My custom list",
            "description": "Monty Python related terms",
        }
)

client.list_management_term.add_term(
        list_id=123,
        term="eggs",
        language="eng"
)

and I can verify that this is working as intended as

terms_data = client.list_management_term.get_all_terms(list_id=123, language="eng").data
terms_data.as_dict()

yields

{'language': 'eng', 'terms': [{'term': 'eggs'}, {'term': 'spam'}], 'status': {'code': 3000, 'description': 'OK'}, 'tracking_id': 'some_id'}

However, if I attempt detection with

import io

TEXT = "Do you like spam and eggs ?"

text_screen = client.text_moderation.screen_text(
    text_content_type="text/plain",
    text_content=io.StringIO(TEXT),
    language="eng",
    list_id=123,
    classify=True
)

text_screen.as_dict()

no "Terms" entity appears, as what I get from the above is just:

{'original_text': 'Do you like spam and eggs ?',
 'normalized_text': ' you like spam  eggs ?',
 'classification': {'category1': {'score': 0.028309470042586327},
  'category2': {'score': 0.14004242420196533},
  'category3': {'score': 0.12679287791252136},
  'review_recommended': False},
 'status': {'code': 3000, 'description': 'OK'},
 'language': 'eng'}

What am I doing wrong and how should I do it properly?

Also (not sure if relevant), firing

client.list_management_term_lists.refresh_index_method(list_id=123, language="eng")

yields me an APIErrorException: Operation returned an invalid status code 'Not Found'



from Why does Azure Content Moderator fail to detect custom terms?

No comments:

Post a Comment