502 Bad Gateway on some graphql queries

I have…

  • [ x ] Checked the logs and have provided the logs if I found something suspicious there

I’m submitting a…

  • [ x ] Regression (a behavior that stopped working in a new release)
  • [ ] Bug report
  • [ ] Performance issue
  • [ ] Documentation issue or request

Current behavior

Tried to run basic query on existing schema in the graphql ui getting 502 Bad Gateway result.

Expected behavior

query to return data records

Minimal reproduction of the problem

I’m unsure how to reproduce it as some of the queries are working and some don’t.
Even in the sample query above, some fields are not working but if we removed those fields the query works. That same working query after sometime stopped working. Very hard to track what’s going on. We haven’t make any changes to the schema or the data since the version 4.3 from 4.0.3 upgrade.

I took the backup of the app and restored it in my local docker as fresh copy and the query seems to be working. Also on the problem app, there is no error logs or anything that make sense comes out of it.

I just like to check if anyone ever experience this kind of 502 error in squidex or know any possible cause of it?

Environment

  • [ x ] Self hosted with docker

Version: 4.3

Browser:

  • [ x ] Chrome (desktop)

Do you have something in front of squidex? like nginx or load balancer or so? If yes, I would check the logs of these services. My assumption is that the response headers get too large or so.

Our devops says:

we run squidex in google kubernetes, we have a nginx ingress in front of squidex

He’s trying with all versions from 4.0.3 and above. It seems to work be working for 4.1.
I’ll let him try other versions and update the finding tomorrow.

He should check nginx first.

These are the configs I found and i think they are more than enough?

nginx.ingress.kubernetes.io/proxy-body-size: 50m
nginx.ingress.kubernetes.io/proxy-buffer-size: 8k

Should be, I can only speak about my experience. When you search the forum for 502 Bad Gateway it was always nginx…

Thanks for the prompt reply @Sebastian :slight_smile:

To add more details to @mayvelous’ description:

  1. We manage Squidex deployment with helm 2 in google cloud Kubernetes cluster
  2. We have a Nginx ingress in gke, it redirects traffic to Squidex

We were using 4.0.2 before, the GraphQL query shown in May’s screenshot worked very well since beginning, imho, it shouldn’t be Nginx config issue, I also have attempted to upgrade to Squidex 4.1.3, I can confirm that works well too.

This only happens when we upgrade to 4.2.0, I tried 4.3.0, it’s the same issue.

Also noticed in 4.2.0, if I removed a few fields from the query, for example, if I query:

{
  queryShowContents {
    flatData {
      name
      logoImage { url }
      bannerImage {  id }
    }
  }
}

this works, when I add more properties, it starts to failing again.

I looked the logging, couldn’t find any errors. I assume this might be an migration issue? could you advise how I can investigate further?

Thank you.

I have made a few changes to the headers over time to support the CDN and this also affects graphql. I am pretty sure that the issue is somewhere there. I never had a 502 coming from Squidex

thanks @Sebastian we have solved the issue by increasing buffer to 64k.

edited:

increasing buffer to 64k fixed a small graphql query, the larger ones are still failing, we are getting this error:

          "message": "request to https://m************.com.au/api/content/myradio-hit/graphql#f4b401c9bca0c1db7e9836bacbaf58a25fcec1a2 failed, reason: Parse Error",
          "type": "system",
          "errno": "HPE_HEADER_OVERFLOW",
          "code": "HPE_HEADER_OVERFLOW"

could you point us to the header changes you made so we could investigate? Thank you. @Sebastian

I added better surrogate key support: https://docs.fastly.com/en/guides/getting-started-with-surrogate-keys

But there is a setting: https://github.com/Squidex/squidex/blob/master/backend/src/Squidex/appsettings.json#L42

You can also try to disablee them with the X-SurrogateKeys: 0 header.

1 Like

Hi @Sebastian setting CACHING__MAXSURROGATEKEYSSIZE to 0 solves the problem, thanks!

2 Likes

Thank you both. Was pulling hair, bugging all the devops guys this whole week.
This fix just unblock my work. Cheers :slight_smile: