What type of communication protocol should I use for a shortlived data stream to an web application

18 Views Asked by At

I am developing a system of backend microservices that constitute an LLM Inference system. Think like openai API. In my microservice that serves as gateway (the front end websites will connect to that), I need to stream a bunch of string for few seconds (20 at max) until the response is finished. I am lost on what kind of protocol or technic or tool etc to use for this. Some details:

Websockets

I have used websockets before and now opt not to use them as sticky connections create a problem with downscaling on container orchestration systems. I am aware that I can close the connection one-sided from the server when the transaction stops. However, the problem that I have with that is that I can not send the payload of my request with the connection but immediately after the connection is established. Which is x2 latency and feels bug prone for some reason.

HTTP Stream

I am aware that you can directly stream data in normal HTTP endpoints. Problem is browser buffers them so it is not really streaming

SSE

Right now I am using short lived sse connections where the initial endpoint is a HTTP post and the responses are on text/event-stream . However browser still buffers them, I don't think it is suppose to. Maybe bug in my code.

So this does not seem like a problem so hard that it requires a novel solution. What would be the standard solution to this following the best practices?

0

There are 0 best solutions below