I export huge pdf files some of the pdf over 1GB and also reduce thread_count 4. What's else do I need to do to avoid timeout. Thanks
ERROR contentpump.DatabaseContentReader: RuntimeException reading /pdf/docIns/docIns-
222581.pdf :com.marklogic.xcc.exceptions.StreamingResultException: RequestException
instantiating ResultItem 301805: Time limit exceeded
22/01/24 17:48:09 INFO contentpump.DatabaseContentReader: host name: xxx.us-
central.compute.internal
22/01/24 17:48:09 INFO contentpump.DatabaseContentReader: Retrying connect
22/01/24 17:53:16 INFO contentpump.LocalJobRunner: completed 3%
Thread count won't make a difference as each doc can only be read by one thread concurrently. The limiting factor is either network transfer time or time to read the file off MarkLogic's disk and into available memory (or some combination of these factors).
You could try grabbing the document over REST (/v1/documents/ endpoint) and see if that is quicker. You could also use
xdmp:zip-createto try and compress it within MarkLogic and see if downloading the compressed file is fast enough.Alternatively, consider using MarkLogic to store a URL alongside the searchable (meta)data to grab the document from something else (like a CDN or S3 for example).