Good approaches for queuing simultaneous NodeJS processes
I am building a simple application to download a set of XML files and
parse them into a database using the async module
(https://npmjs.org/package/node-async) for flow control. The overall flow
is as follows:
Download list of datasets from API (single Request call)
Download metadata for each dataset to get link to XML file (async.forEach)
Download XML for each dataset (async.parallel)
Parse XML for each dataset into JSON objects (async.parallel)
Save each JSON object to a database (async.forEach)
In effect, for each dataset there is a parent process (2) which sets of a
series of asynchronous child processes (3, 4, 5). The challenge that I am
facing is that, because so many parent processes fire before all of the
children of a particular process are complete, child processes seem to be
getting queued up in the event loop, and it takes a long time for all of
the child processes for a particular parent process to resolve and allow
garbage collection to clean everything up. The result of this is that even
though the program doesn't appear to have any memory leaks, memory usage
is still too high, ultimately crashing the program.
One solution which worked was to make some of the child processes
synchronous so that they can be grouped together in the event loop.
However, I have also seen an alternative solution discussed here:
https://groups.google.com/forum/#!topic/nodejs/Xp4htMTfvYY, which pushes
parent processes into a queue and only allows a certain number to be
running at once. My question then is does anyone know of a more robust
module for handling this type of queueing, or any other viable alternative
for handling this kind of flow control. I have been searching but so far
no luck.
Thanks.
No comments:
Post a Comment