Warnings about silos, orleans, pings, connection timeouts, sockets etc

Hi,

Today when I woke up and was going to continue working on a blog post I created yesterday I was welcomed with a server error in my front-end app (SquidexClient exception because Squidex was down) and I noticed squidex was behaving strangely.

I spent an hour reading logs and trying to understand but finally after one hour decided to restart my two squidex pods and Squidex started working again.

The Deployment and Pods themselves did not show any errors. But the Squidex logs show the following warnings - it’s been like this from the first time I deployed Squidex 19 days ago approx and I’m starting to believe it has somehow exausted some limit or something similar:

(Also, I’m seeing some error messages whenever I Save a blog post - I have to click save 2-3 times before it saves successfully - it says something that another client is already trying to do something or something - I think this could be related to the issue I experience today)

{
  "logLevel": "Warning",
  "message": "-Did not get ping response for ping #1096 from S10.244.0.169:11111:276688976. Reason = Original Exc Type: Orleans.Runtime.OrleansMessageRejectionException Message:Silo S10.244.0.173:11111:277454327 is rejecting message: Request S10.244.0.173:11111:277454327*stg/15/0000000f@S0000000f->S10.244.0.169:11111:276688976*stg/15/0000000f@S0000000f #10754: . Reason = Exception getting a sending socket to endpoint S10.244.0.169:11111:276688976",
  "eventId": {
    "id": 100613
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:53.8207321Z",
  "category": "Orleans.Runtime.MembershipService.MembershipOracleData"
}

{
  "logLevel": "Warning",
  "message": "Noticed that silo S10.244.0.169:11111:276688976 has not updated it's IAmAliveTime table column recently. Last update was at 10/08/2018 10:23:08, now is 10/17/2018 07:39:56, no update for 8.21:16:47.9160000, which is more than 00:10:00.",
  "eventId": {
    "id": 100625
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:56.2473998Z",
  "category": "Orleans.Runtime.MembershipService.MembershipOracleData"
}

{
  "logLevel": "Warning",
  "message": "Noticed that silo S10.244.0.171:11111:276688976 has not updated it's IAmAliveTime table column recently. Last update was at 10/08/2018 10:23:12, now is 10/17/2018 07:39:56, no update for 8.21:16:44.1400000, which is more than 00:10:00.",
  "eventId": {
    "id": 100625
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:56.2474549Z",
  "category": "Orleans.Runtime.MembershipService.MembershipOracleData"
}

{
  "logLevel": "Warning",
  "message": "Noticed that silo S10.244.0.170:11111:276688976 has not updated it's IAmAliveTime table column recently. Last update was at 10/08/2018 10:23:07, now is 10/17/2018 07:39:56, no update for 8.21:16:48.9160000, which is more than 00:10:00.",
  "eventId": {
    "id": 100625
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:56.2474747Z",
  "category": "Orleans.Runtime.MembershipService.MembershipOracleData"
}

{
  "logLevel": "Warning",
  "message": "Exception getting a sending socket to endpoint S10.244.0.171:11111:276688976",
  "eventId": {
    "id": 101021
  },
  "exception": {
    "type": "Orleans.Runtime.OrleansException",
    "message": "Could not connect to 10.244.0.171:11111: HostUnreachable",
    "stackTrace": "   at Orleans.Runtime.SocketManager.Connect(Socket s, IPEndPoint endPoint, TimeSpan connectionTimeout)\n   at Orleans.Runtime.SocketManager.SendingSocketCreator(IPEndPoint target)\n   at Orleans.Runtime.LRU`2.Get(TKey key)\n   at Orleans.Runtime.Messaging.SiloMessageSender.GetSendingSocket(Message msg, Socket& socket, SiloAddress& targetSilo, String& error)"
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:56.9221413Z",
  "category": "Runtime.Messaging.SiloMessageSender/PingSender"
}

{
  "logLevel": "Warning",
  "message": "Exception getting a sending socket to endpoint S10.244.0.171:11111:276688976",
  "eventId": {
    "id": 101021
  },
  "exception": {
    "type": "Orleans.Runtime.OrleansException",
    "message": "Could not connect to 10.244.0.171:11111: HostUnreachable",
    "stackTrace": "   at Orleans.Runtime.SocketManager.Connect(Socket s, IPEndPoint endPoint, TimeSpan connectionTimeout)\n   at Orleans.Runtime.SocketManager.SendingSocketCreator(IPEndPoint target)\n   at Orleans.Runtime.LRU`2.Get(TKey key)\n   at Orleans.Runtime.Messaging.SiloMessageSender.GetSendingSocket(Message msg, Socket& socket, SiloAddress& targetSilo, String& error)"
  },
  "app": {
    "name": "Squidex",
    "version": "1.0.0.0",
    "sessionId": "710f4057-6d7e-4091-a025-bc2fd0e5f32e"
  },
  "timestamp": "2018-10-17T07:39:56.9335387Z",
  "category": "Runtime.Messaging.SiloMessageSender/SystemSender"
}

Any ideas why this might have happened and how I can fix it? I can provide with more logs/testing if necessary.

I would migrate to the newest version. It could be a bug that I already fixed: [SOLVED] New schema -> 404

Thanks, killed pods so latest Squidex image has been pulled. Will keep an eye on the logs.

You can also have a look the dashboard under http://my-squidex/orleans to check if your cluster looks okay.

I tried the orleans dashboard but I cannot access it; it give me a:

502 Bad Gateway
nginx/x.x.x

the URL that I tried btw: https://subdomain.mycustomdomain.com/orleans

any ideas?

No idea, can you bypass Nginx?

Not sure how I would be able to test that… not the greatest at nginx…

Depending on your setup you can try to use the internal port. Nginx forwards all requests to port 5000 or so.

Okay, don’t really have time for that right now, but will try it in the future.

I am also working on a health endpoint that can be used by monitoring tools or kubernetes.

The health endpoint has been implemented: https://cloud-staging.squidex.io/healthz

The memory limit is configurable.

In kubernetes I added the health endpoint as readiness and liveness probe:

      readinessProbe:
        httpGet:
          path: /healthz
          port: 80
      livenessProbe:
        httpGet:
          path: /healthz
          port: 80
1 Like