I have an app which has to extract color info from a video and it does this by analyzing each frame. First I extract the frames and then load an array of their locations in memory. As you might imagine, for even a small video it can be in the thousands.
The function I use to extract each frames color info is a promise so I opted to batch an array of promises with Promise.all
With each files absolute path, I read the file with fs and then pass it along to be processed. I've done this with many stand alone images and know the process only takes about a second but suddenly it was taking almost 20min to process 1 image. I finally figured out that using promisify on fs.readFile was what caused the bottle neck. What I don't understand is why?
In the first one fs.readFile is transformed inside of the promise that's returned while in the second one fs.readFile is just used as it normally would be and I wait for resolve to be called. I don't mind using the non-promise one, I'm just curious why this would cause such a slow down?
The second I stopped using promisify the app sped back up to 1 frame / second
The slow code:
async analyzeVideo(){
await this._saveVideo();
await this._extractFrames();
await this._removeVideo();
const colorPromises = this.frameExtractor.frames.map(file => {
return new Promise(resolve => {
//transform image into data
const readFile = promisify(fs.readFile);
readFile(file)
.then(data => {
const analyzer = new ColorAnalyzer(data);
analyzer.init()
.then(colors => {
resolve(colors)
})
})
.catch((e)=> console.log(e));
})
});
const colors = await runAllQueries(colorPromises);
await this._removeFrames();
this.colors = colors;
async function runAllQueries(promises) {
const batches = _.chunk(promises, 50);
const results = [];
while (batches.length) {
const batch = batches.shift();
const result = await Promise.all(batch)
.catch(e=>console.log(e));
results.push(result)
}
return _.flatten(results);
}
}
The fast code:
async analyzeVideo(){
await this._saveVideo();
await this._extractFrames();
await this._removeVideo();
const colorPromises = this.frameExtractor.frames.map(file => {
return new Promise(resolve => {
//transform image into data
fs.readFile(file, (err, data) => {
const analyzer = new ColorAnalyzer(data);
analyzer.init()
.then(colors => {
resolve(colors)
})
});
})
});
const colors = await runAllQueries(colorPromises);
await this._removeFrames();
this.colors = colors;
async function runAllQueries(promises) {
const batches = _.chunk(promises, 50);
const results = [];
while (batches.length) {
const batch = batches.shift();
const result = await Promise.all(batch)
.catch(e=>console.log(e));
results.push(result)
}
return _.flatten(results);
}
}
You don't need to
promisifyin each loop iteration, just do it once at the top of the module.Most likely the issue is caused by Promises that are never settled. You are not handling the error correctly, so
Promise.allmay never finish if an error is thrown.Instead of logging the error in
.catch, you'll have torejecttoo, orresolveat least if you don't care about the errors. Alsoanalyzer.init()errors are not being catched (if that function can reject)Aside from that
runAllQueriesis not doing what you think it's doing. You already executed all the promises.I recommend you use p-limit instead
Furthermore, if you're executing 50 reads at a time, you should increase the value of
UV_THREADPOOL_SIZEwhich defaults to 4.At your entry point, before any require:
Or call the script as:
UV_THREADPOOL_SIZE=64 node index.js