My Netty based app
- ingests hundreds of thousands of messages per second from a single TCP connection
- processes those messages in a couple of inbound handlers
- sends processing results somewhere downstream
Currently, all this is running on thread, as it is on a single TCP connection. I would like to know how I can parallelize 2. The difficulty is that messages cannot just be processed in parallel nilly-willy, because there is a partial order of messages. You can think of this as there being a key(message) function, and all messages for which this function returns the same result need to processed sequentially, but if the results are different, they may run in parallel. So I am thinking of having a mapping from message to thread like hash(key(message)) % threadCount.
Imagine this pipeline:
pipeline.addLast(deframer);
pipeline.addLast(new IdleStateHandler(...));
pipeline.addLast(decoder);
pipeline.addLast(bizLogicHandler1);
pipeline.addLast(bizLogicHandler2);
In the decoder, I am able to compute the result of key(message), so I would like to parallelize everything downstream of decoder. It is documented that in order to use multiple threads I can do
static final EventExecutorGroup group = new DefaultEventExecutorGroup(16);
...
pipeline.addLast(group, "bizLogicHandler1", bizLogicHandler1);
pipeline.addLast("bizLogicHandler2", bizLogicHandler2);
which I guess means bizLogicHandler1 and everything below it (in the above example that would be bizLogicHandler2) will be able to run in parallel? (Or would I have to specify group for bizLogicHandler2 as well?)
The above will however still run completely serial, as the documentation explains, offering UnorderedThreadPoolEventExecutor as an alternative to maximize parallelism, at the cost of getting rid of ordering completely, which does not work in my case.
Looking at the interfaces EventExecutorGroup and EventExecutor, I don't see how it would be possible to convey which messages can be processed in parallel, and which must be processed sequentially.
Any idea?
It turned out this is quite easy to do with one LocalServerChannel and as many LocalChannels as parallelism desired.
The server channel will receive messages and dispatch them to one of the client channels. The other direction (from client channels to the server channel) works as well. I successfully parallelized an app this way to allow for much higher throughput, scaling out to more cores.
Here's a bare bones version, with most of the error handling, logging, and business logic stripped away: