Thursday, 16 May 2019

fs.createWriteStream doesn't use back-pressure when writing data to a file, causing high memory usage

Problem

I'm trying to scan a drive directory (recursively walk all the paths) and write all the paths to a file (as it's finding them) using fs.createWriteStream in order to keep the memory usage low, but it doesn't work, the memory usage reaches 2GB during the scan.

Expected

I was expecting fs.createWriteStream to automatically handle memory/disk usage at all times, keeping memory usage at a minimum with back-pressure.

Code

const fs = require('fs')
const walkdir = require('walkdir')

let dir = 'C:/'

let options = {
  "max_depth": 0,
  "track_inodes": true,
  "return_object": false,
  "no_return": true,
}

const wstream = fs.createWriteStream("C:/Users/USERNAME/Desktop/paths.txt")

let walker = walkdir(dir, options)

walker.on('path', (path) => {
  wstream.write(path + '\n')
})

walker.on('end', (path) => {
  wstream.end()
})

Is it because I'm not using .pipe()? I tried creating a new Stream.Readable({read{}}) and then inside the .on('path' emitter pushing paths into it with readable.push(path) but that didn't really work.

UPDATE:

Method 2:

I tried the proposed in the answers drain method but it doesn't help much, it does reduce memory usage to 500mb (which is still too much for a stream) but it slows down the code significantly (from seconds to minutes)

Method 3:

I also tried using readdirp, it uses less memory and is faster but I don't know how to pause it and use the drain method there:

const readdirp = require('readdirp')

let dir = 'C:/'
const wstream = fs.createWriteStream("C:/Users/USERNAME/Desktop/paths.txt")

readdirp(dir, {alwaysStat: false, type: 'files_directories'})
  .on('data', (entry) => {
    wstream.write(`${entry.fullPath}\n`)
  })

Method 4:

I also tried doing this operation with a custom recursive walker, and even though it uses only 30mb of memory, it is like 10 times slower than the other ones and it is synchronous which is undesirable:

const fs = require('fs')
const path = require('path')

let dir = 'C:/'
function customRecursiveWalker(dir) {
  fs.readdirSync(dir).forEach(file => {
    let fullPath = path.join(dir, file)
    // Folders
    if (fs.lstatSync(fullPath).isDirectory()) {
      fs.appendFileSync("C:/Users/USERNAME/Desktop/paths.txt", `${fullPath}\n}`)
      customRecursiveWalker(fullPath)
    } 
    // Files
    else {
      fs.appendFileSync("C:/Users/USERNAME/Desktop/paths.txt", `${fullPath}\n}`)
    }  
  })
}
customRecursiveWalker(dir)



from fs.createWriteStream doesn't use back-pressure when writing data to a file, causing high memory usage

No comments:

Post a Comment