I have various "unknown" field-separated files that are being uploaded by users (I have zero control or even knowledge of what they will be other then that they will end in "v"), and I would like to see if there are existing libraries (hopefully in python) that infer the following information about an unknown field-separated file:
- What line number the header is on.
- Whether there is a header or not.
- What the separator is.
- If any rows are skipped after the header
In the above example, the header would start one line 2, and the data would start on line 4 (the separator here is a tab, but that's not shown in the grid above).
Are there any open-source libraries (ML/AI?) that try to infer file heading information based on the first ~100 lines of data or so? Here's one approach from a Google search, but doesn't specify any software packages: https://www.computer.org/csdl/proceedings/hpcc/2016/4297/00/07828554.pdf.
Update: essentially, I'm looking if a library exists (in any language) where I could pass it the first ~100 rows of data and it would be able to make an educated guess on (1) what line the header is on (2) what line the data starts at; and (3) what the delimiter is.
from Library to infer field-delimited file information

No comments:
Post a Comment