Sunday, 18 October 2020

Sending large dictionary via API call breaks development server

I am running a django app with a postgreSQL database and I am trying to send a very large dictionary (consisting of time-series data) to the database.

My goal is to write my data into the DB as fast as possible. I am using the library requests to send the data via an API-call (built with django REST):

My API-view is simple:

@api_view(["POST"])
def CreateDummy(request):

    for elem, ts in request.data['time_series'] :
        TimeSeries.objects.create(data_json=ts)

    msg = {"detail": "Created successfully"}
    return Response(msg, status=status.HTTP_201_CREATED)

request.data['time_series'] is a huge dictionary structured like this:

{Building1: {1:123, 2: 345, 4:567 .... 31536000: 2345}, .... Building30: {..... }}

That means I am having 30 keys with 30 values, whereas the values are each a dict with 31536000 elements.

My API request looks like this (where data is my dictionary described above):

 payload = {
            "time_series": data,
           } 

 requests.request(
        "post", url=endpoint, json=payload
    )

The code saves the time-series data to a jsonb-field in the backend. Now that works if I only loop over the first 4 elements of the dictionary. I can get that data in in about 1minute. But when I loop over the whole dict, my development server shuts down. I guess it's because the memory is insufficient. I get a requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')). Is the whole dict saved to memory before it starts iterating? I doubt it because I read that in python3 looping with .items() returns an iterator and is the preferred way to do this.

Is there a better way to deal with massive dicts in django/python? Should I loop through half of it and then through the other half? Or is there a faster way? Maybe using pandas? Or maybe sending the data differently? I guess I am looking for the most performant way to do this.

Happy to provide more code if needed.

Any help, hints or guides are very much appreciated! Thanks in advance

EDIT2: I think it is not my RAM usage or the size of the dict. I still have 5GiB of RAM left when the server shuts down. ~~And the size of the dict is 1176bytes~~ Dict is much larger, see comments

EDIT3: I can't even print the huge dict. It also shuts down then

EDIT4: When split the data up and send it not all at once the server can handle it. But when I try to query it back the server breaks again. It breaks on my production server (nginx AWS RDS setup) and it breaks on my local dev server. I am pretty sure it's because django can't handle queries that big with my current setup. But how could I solve this?

EDIT5: So what I am looking for is a two part solution. One for the creation of the data and one for the querying of the data. The creation of the data I described above. But even if I get all that data into the database, I will still have problems getting it out again.

I tried this by creating the data not all together but every time-series on its own. So let's assume I have this huge data in my DB and I try to query it back. All time-series objects belong to a network so I tried this like so:


class TimeSeriesByTypeAndCreationMethod(ListAPIView):
    """Query time-series in specific network."""

    serializer_class = TimeSeriesSerializer

    def get_queryset(self):
        """Query time-series

        Query by name of network, type of data, creation method and
        source.
        """

        network = self.kwargs["name_network"]

        if TimeSeries.objects.filter(
            network_element__network__name=network,
        ).exists():
            time_series = TimeSeries.objects.filter(
                network_element__network__name=network,
            )
            return time_series
        else:
            raise NotFound()

But the query breaks the server like the data creation before. I think also this is too much data load. I thought I could use raw sql avoid breaking the server... Or is there also a better way?



from Sending large dictionary via API call breaks development server

No comments:

Post a Comment