I'm trying to load a very large complex PDF that contains tables and figures. Its roughly 600 pages. When I use the fast option with Unstructured API in Langchain-JS with NextJS it seems to work but doesn't gather some necessary data. However when use the hi_res option it gives me a timeout error. I've tried setting the timeout option to various settings to no avail. I'm perfectly ok with the process taking as much time as it needs. Any help would be very much appreciated.
ERROR:
error TypeError: fetch failed
at Object.fetch (node:internal/deps/undici/undici:11576:11)
at UnstructuredLoader._partition (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:139:26)
at UnstructuredLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/unstructured.js:154:26)
at UnstructuredDirectoryLoader.load (e:/Web-Development/Developing/Nextjs/projects/gpt4-pdf/node_modules/langchain/dist/document_loaders/fs/directory.js:80:40)
at run (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:48:21)
at <anonymous> (e:\Web-Development\Developing\Nextjs\projects\gpt4-pdf\scripts\ingest.ts:78:3) {
cause: HeadersTimeoutError: Headers Timeout Error
at Timeout.onParserTimeout [as callback] (node:internal/deps/undici/undici:9748:32)
at Timeout.onTimeout [as _onTimeout] (node:internal/deps/undici/undici:8047:17)
at listOnTimeout (node:internal/timers:573:17)
at process.processTimers (node:internal/timers:514:7) {
code: 'UND_ERR_HEADERS_TIMEOUT'
}
}
The code I'm using where error occurs:
const options = {
apiKey: process.env.UNSTRUCTURED_API_KEY,
strategy: "hi_res",
timeout: 10000, //Tried various from 10000-10000000
};
const unstructuredLoader = new UnstructuredDirectoryLoader(
filePath,
options
);
const rawDocs = await unstructuredLoader.load();
from Langchain UnstructuredDirectoryLoader Timeout error
No comments:
Post a Comment