Cloud Version - Sporadic API Failures

I have…

  • [ ] Checked the logs and have provided the logs if I found something suspicious there

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [?] Bug report
  • [?] Performance issue
  • [ ] Documentation issue or request

Current behavior

API Request such as GET::https://cloud.squidex.io/api/apps/ivagaming/schemas are failing sporadically. I may be able to run 5 requests successfully, and then the 6th one times out after 30 seconds and returns a CF error. We first noticed this issues on 28/07/2020 09:32 GMT+2.

Expected behavior

Requests succeed 100%.

Minimal reproduction of the problem

Perform the request GET::https://cloud.squidex.io/api/apps/ivagaming/schemas multiple times until it fails. There is no reliable way of reproducing this error.

Environment

  • [ ] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [X] Cloud version

Version: Current cloud version

Browser:

  • [ ] Chrome (desktop)
  • [ ] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge
  • [X] Not a browser issue

Others:
I tried the tests both from Malta and through VPN from Netherlands (which is where our production systems are). Both are behaving the same way which might exclude a CF edge issue.

Sometimes requests succeed after taking a long time to complete (eg. 15 seconds)

Tests were made both with PostMan and RestSharp.

We’re getting the same: 522 cloudflare errors

I have just restarted everything, therefore the 522, but I still do not know what the source problem is, because no component was reporting an error during the last hours.

1 Like

It looks good now Sebastian. Do you have a reverse proxy set up on the kubernetes? Maybe Nginx? If so, check if it’s hitting max workers (maybe for some reason some workers are not returning to the pool leading to starvation)

No, I have no reverse proxy.

Ah, alrite then. If you need any help with this let me know. I know how difficult it is to debug and find the root cause of inconsistent infrastructure issues.