I have a dataframe with about 1.5 million rows. I want to convert this to a protobuf.
Naive method
# generated with protoc
import my_proto
pb = my_proto.Table()
for _, row in big_table.iterrows():
e = pb.rows.add()
e.similarity = row["similarity"]
e.id = row["id"]
The throughput is about 100 rows per second. The total running time is about a couple of hours.
Is there a way to do this in a non-incremental fashion?
from Batch convert pandas dataframe to protobuf
No comments:
Post a Comment