Timeout error - Failed to load schemas

I have…

  • [X] Checked the logs and have provided the logs if I found something suspicious there

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [X] Bug report
  • [ ] Performance issue
  • [ ] Documentation issue or request

Current behavior

Once in a while, our users are getting “Failed to load schema” errors in Squidex. The response time of the web app goes up to 10+ seconds according to Azure, while normally it’s <1 second.
Sometimes the problem goes away by itself, sometimes we need to restart the app service.

Expected behavior

No errors while using Squidex.

Minimal reproduction of the problem

Not easy to reproduce, it happens randomly.

Environment

  • [X] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [ ] Cloud version

Version: 4.2.0

Browser:

  • [X] Chrome (desktop)
  • [ ] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge

Others:

Log level is set to Information and up. Couple of warnings about Orleans timeouts:

Response did not arrive on time in 00:00:30 for message: Request S127.0.0.1:11111:323927419*cli/3d72f395@9aa42091->S127.0.0.1:11111:323927419*grn/70B001E1/f1aa11a5@c5db9f9f #261976: . Target History is: <S127.0.0.1:11111:323927419:*grn/70B001E1/f1aa11a5:@c5db9f9f>. About to break its promise.

Orleans is set to Development mode as we only have 1 node.

We’re hosting on Linux App Services in a Docker Container on Azure. Mongo is on a separate VM and we’re using Premium storage for assets.

Thanks for your support request, but I have no idea how to help. It looks like something is blocking all threads, either temporarily or permanent, but without insights I have no idea what it could be. It would be good to have some log analysis in place, so that you can see what has happened before such a timeout issue.

In general I also recommend to set up docker health checks.