I have a large amount of html clipboard data from Excel, about 250MB (though it contains a lot of formatting, so when actually pasting it in, the data is much, much smaller than that).
Currently I am using the following DOMParser
, which is just one line of code and everything happens behind the scenes:
const doc3 = parser.parseFromString(htmlString, "text/html");
However, it takes ~18s to parse this, and during this time the page is entirely blocking until it finishes -- or, if offloaded to a webworker, an action that gives no progress and just 'waits' for 18s until something ends up happening -- which I would argue is almost the same as freezing even though yes the user can literally interact with the page.
Is there an alternative way to parse a large html/xml file? Perhaps using something that doesn't load everything at once and so can be responsive, or what might be a good solution for this? I suppose the following might be inline with it? But not really sure: https://github.com/isaacs/sax-js.
Update: here is a sample Excel file: https://drive.google.com/file/d/1GIK7q_aU5tLuDNBVtlsDput8Oo1Ocz01/view?usp=sharing. You can download the file, open it in Excel, press Cmd-A (select-all), and Cmd-C (Copy), and it'll paste the data into your clipboard. For me copying it takes up 249MB for the text/html format in the clipboard.
Yes, it is also available in teext/plain (which we use as a backup), but the point of grabbing it from the text/html is to capture the formatting (both data formatting, for example numberType=Percent, 3 decimals and stylistic, for example, background color=red). Please use that as a test for any sample code. Here is the actual test/html
content (in asci) when it's in the clipboard here: https://drive.google.com/file/d/1ZUL2A4Rlk3KPqO4vSSEEGBWuGXj7j5Vh/view?usp=sharing
from DOMParser for large html
No comments:
Post a Comment