Sunday 6 December 2020

Batch convert pandas dataframe to protobuf

I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.

Naive method

# generated with protoc
import my_proto

pb = my_proto.Table()
for _, row in big_table.iterrows():
    e = pb.rows.add()
    e.similarity = row["similarity"] = row["id"]

The throughput is about 100 rows per second. The total running time is about a couple of hours.

Is there a way to do this in a non-incremental fashion?

from Batch convert pandas dataframe to protobuf

No comments:

Post a Comment