Quick backstory. We've been running all our domains and subdomains through Cloudflare (proxied via Cloudflare, not simply DNS) for years. Our SaaS app is running on heroku. Info on Cloudflare proxying
In November 2023 some customers, and only some, started reporting timeout issues and slow pages. This happened on some requests and not all, and could happen even if the same page was reloaded a few times. A request to a static page may succeed 10 times and then stalled when reloaded an 11th time, for example.
The stalling was weird as it happened even before the DNS request was made by the browser. Here's an example with a 10 second stall. Sometimes this times out completely, then immediately reloads and succeeds.
In Safari, the stalling showed up as 'redirects', and also take exactly 10 seconds

We eliminated backend problems with our app server and all issues ceased as soon as we turned Cloudflare proxying off.
I turned to Cloudflare support and had the strangest tech support experience ever. Tickets went unanswered for up to a week. When I turned to their community support forum, my post was shut down by a moderator. Going back to Cloudflare support I was asked to submit HAR and netlog files including some captured from Chrome after launching it via a script to capture a clean log, which I did multiple times. I was able to easily reproduce the problem using various machines and browsers. Customers in other locations around the UK also experienced the same symptoms. While Cloudflare initially told me they could see the stalling issue, they couldn't tell me what caused it. They said that they could not find any clues in the netlogs.
Stranger still, even hitting Cloudflare's diagnostics URL at https://mysub.domain.com/cdn-cgi/trace showed the same random timeout issue - but only on our proxied subdomain. This URL never hits our backend servers and completely eliminates it as a source of the problem.
Then, a few days ago, I was suddenly unable to reproduce the issue any longer. Cloudflare said they didn't make any changes. When I asked them again if they had found the root cause they said that 'The situation seems to be that we don't have anything broken at present, and don't have any reproduction of an error.'
I'm now at a point where a widespread issue has seemingly solved itself (I have my doubts) and I'm left to risk turning on Cloudflare proxying again and hoping for the best.
The reason I am posting here is to find out if anyone has experienced a similar issue. I've found several similar issues on the Cloudflare forums but they have a policy of shutting down each topic after 5 days of no replies which makes it impossible to post a solution if one is found later.
