Tuesday 2 May 2023

Django SearchRank not taking full text search operators into account

I'm trying to add a new endpoint that does full text search with AND, OR, NOT operators and also tolerates typos with TriagramSimilarity.

I came across this question: Combine trigram with ranked searching in django 1.10 and was trying to use that approach but SearchRank is not behaving as I'd expect, and I'm confused about how it works.

When my code looks like the basic implementation of full text search the negative filter is working fine

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")

        qs = Project.objects.annotate(
            search=vector,
        ).filter(
            search=query,
        )

        return Response({
            "results": qs.values()
        })

the returned documents

But I need to implement this using SearchRank so I can later do some logic with the rank score and the similarity score.

This is what my code looks like annotating for rank instead of using the tsvector annotation:

    @action(detail=False, methods=["get"])
    def search(self, request, *args, **kwargs):
        search_query = request.query_params.get("search")
        vector = SearchVector("name", weight="A")
        query = SearchQuery(search_query, search_type="websearch")
        rank = SearchRank(vector, query, cover_density=True)

        qs = Project.objects.annotate(
            rank=rank,
        ).order_by("-rank")
        return Response({
            "results": qs.values()
        })



And the response looks like: The documents I got back

The rank given to the document named "APT29 Attack Graph" is 1. I'd expect the - operator would rank it lower, ideally 0.

Does SearchRank not take into consideration any search operators?

This is what the PostgreSQL looks like for the queryset

'Sort  (cost=37.78..37.93 rows=62 width=655)\n  Sort Key: (ts_rank_cd(setweight(to_tsvector(COALESCE(name, \'\'::text)), \'A\'::"char"), websearch_to_tsquery(\'apt29 -graph\'::text))) DESC\n  ->  Seq Scan on firedrill_project  (cost=0.00..35.93 rows=62 width=655)'

Also if there is a better way to do this kind of search without introducing new dependencies (Elasticsearch, haystack, etc) please reference it.

I tried different search operators. Looked for alternative ways to do this, I had no success so far.



from Django SearchRank not taking full text search operators into account

No comments:

Post a Comment