Hemant Vishwakarma: Encode/decode mechanism for LARGE UNSTRUCTURED JSON data

Friday, 6 October 2023

Encode/decode mechanism for LARGE UNSTRUCTURED JSON data

I have a JSON data that could have a structure like



    [
    {
        "action": 4,
        "key1": {
            "key1": "key1",
            "key2": "key1",
            "key3": "key3",
            "key4": ["key4"],
            "key5": [
                {
                    "key5": "key5"
                }
            ]
        }
    },
    {
        "action": 2,
        "key2": [
            3,
            {
                "key121": "key1",
                "key21": "key1",
                "key33": "key3",
                "key4": ["key4"],
                "key5": [
                    {
                        "key5": "key5"
                    }
                ]
            },
            {
                "key121": "key1",
                "key2133": "key1",
                "key33333": "key3",
                "key41": ["key4"],
                "key521": [
                    {
                        "key531": "key5"
                    }
                ]
            }
        ],
        "key3": "key3",
        // .... more and more here
    }
// .... more and more here
]

Size could be more than 1MB.

Data is just an example to show that structure could be very dynamic. I need some way to encode/decode this kind of data. The flow is the following

encode -> store in the Redis
read from Redis -> decode

I need the best performance way for the second part read from Redis -> decode. So if that will have less size it will reduce the time to get from Redis, and then I need an effective way to decode encoded data to have a JSON.

What I have tired

JSON.stringify/JSON.parse - works but I need better performance avsc (https://www.npmjs.com/package/avsc) - it is good, but in my case, since I have a very dynamic structure I'm having a lot of problems with it regarding having, for example, different record types in the array(which is not supporting by avsc as far I got it), etc.

msgpack-lite - not more effective than JSON.stringify/JSON.parse
cbor-x - not more effective than JSON.stringify/JSON.parse
flatbuffers - https://www.npmjs.com/package/flatbuffers - schema based - not for this case
schemapack - https://www.npmjs.com/package/schemapack works not bad for simple objects, But 7 years old
protobufjs - https://www.npmjs.com/package/protobufjs - schema-based.

The process will work in load(more than 15 million times an hour) so the performance is the key feature.

My benchmark results for JSON.stringify/JSON.parse from 10k iterations

JSON parse

total 20ms

avg-iteration 0.0016619213000001163ms
JSON stringify

total 18ms

avg-iteration 0.0013616301999999764ms

I would admit that key performance will be on the decode part. So the encode process could take time if the decode takes less.

Some benchmark code example


const data = require('./data.json');

class time {
    startTime = 0;

    init() {
        this.startTime = process.hrtime();
    }

    capture() {
        const end = process.hrtime(this.startTime);
        this.startTime = process.hrtime();
        return end[0] + end[1] / 1000000;
    }
}

const results = [];

const mainNow = Date.now();

const t = new time();

for (let i = 0; i < 1000; i++) {
    t.init();
    const stringified = JSON.stringify(data);
    JSON.parse(stringified);
    results.push(t.capture());
}

console.log('results', results);
console.log('total', Date.now() - mainNow);
console.log('avg', results.reduce((acc, curr) => acc + curr, 0) / results.length);

from Encode/decode mechanism for LARGE UNSTRUCTURED JSON data

Hemant Vishwakarma

Friday, 6 October 2023

Encode/decode mechanism for LARGE UNSTRUCTURED JSON data

No comments:

Post a Comment