Squidex restarts due to failed readiness probe

I have…

  • Read the following guideline: Troubleshooting and Support | Squidex. I understand that my support request might get deleted if I do not follow the guideline.
  • Used code blocks with ``` to format my code examples like JSON or logs properly.

I’m submitting a…

  • Regression (a behavior that stopped working in a new release)
  • Bug report
  • Performance issue
  • Documentation issue or request

Current behavior

Expected behavior

Minimal reproduction of the problem

Self hosted squidex instance on EKS using 7.0.2 Helm version

Environment

App Name:

  • Self hosted squidex instance on EKS using 7.0.2 Helm version

Version: 7.0.2

Browser:

  • Chrome (desktop)
  • Chrome (Android)
  • Chrome (iOS)
  • Firefox
  • Safari (desktop)
  • Safari (iOS)
  • IE
  • Edge

Others:

I am not sure what you expect from me. This is what liveness probes are for (it is probably not a readiness probe). The interesting thing would be to understand and analyze the root cause. Perhaps with kubernetes events or logs.

Hello Sebastian thank you for the quick response, unfortunately we do not have pod logs and only have these log file indicating token issue from mongodb. We did not make changes in Squidex before the issue as well.

kubectl logs squidex-squidex7-66cf56d798-lrvgm -n squidex --previous

Unhandled exception. System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at MongoDB.Driver.Core.Misc.SemaphoreSlimSignalable.IsSignaled(CancellationTokenSource signalTokenSource, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Misc.SemaphoreSlimSignalable.GetCancellationTokenContext(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Misc.SemaphoreSlimSignalable.WaitSignaledAsync(TimeSpan timeout, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.AcquireConnectionHelper.AcquireConnectionAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool.AcquireConnectionAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Servers.Server.GetChannelAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableReadContext.InitializeAsync(CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.RetryableReadContext.CreateAsync(IReadBinding binding, Boolean retryRequested, CancellationToken cancellationToken)
   at MongoDB.Driver.Core.Operations.FindOperation`1.ExecuteAsync(IReadBinding binding, CancellationToken cancellationToken)
   at MongoDB.Driver.OperationExecutor.ExecuteReadOperationAsync[TResult](IReadBinding binding, IReadOperation`1 operation, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.ExecuteReadOperationAsync[TResult](IClientSessionHandle session, IReadOperation`1 operation, ReadPreference readPreference, CancellationToken cancellationToken)
   at MongoDB.Driver.MongoCollectionImpl`1.UsingImplicitSessionAsync[TResult](Func`2 funcAsync, CancellationToken cancellationToken)
   at MongoDB.Driver.IAsyncCursorSourceExtensions.FirstOrDefaultAsync[TDocument](IAsyncCursorSource`1 source, CancellationToken cancellationToken)
   at Squidex.Infrastructure.States.MongoSnapshotStoreBase`2.ReadAsync(DomainId key, CancellationToken ct) in /src/src/Squidex.Infrastructure.MongoDb/States/MongoSnapshotStoreBase.cs:line 36
   at Squidex.Infrastructure.States.Persistence`1.ReadSnapshotAsync(CancellationToken ct) in /src/src/Squidex.Infrastructure/States/Persistence.cs:line 128
   at Squidex.Infrastructure.States.Persistence`1.ReadAsync(Int64 expectedVersion, CancellationToken ct) in /src/src/Squidex.Infrastructure/States/Persistence.cs:line 102
   at Squidex.Infrastructure.States.SimpleState`1.LoadAsync(CancellationToken ct) in /src/src/Squidex.Infrastructure/States/SimpleState.cs:line 46
   at Squidex.Infrastructure.EventSourcing.Consume.EventConsumerWorker.StartAsync(CancellationToken ct) in /src/src/Squidex.Infrastructure/EventSourcing/Consume/EventConsumerWorker.cs:line 37
   at Squidex.Hosting.BackgroundHost.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.RunAsync(IHost host, CancellationToken token)
   at Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions.Run(IHost host)
   at Squidex.Program.Main(String[] args) in /src/src/Squidex/Program.cs:line 17

All TS performed till now:

• The Squidex Kubernetes deployment fails to rollout a new release.
• Issues noticed with readiness probe(0 sec) and liveness probe(300 sec).
• Squidex does not have application logs, as the pods are not in Ready State.
• By checking the Squidex pod configuration, we found that container port 80 were blocked. However, helm configuration and ENV variable shows port 80 to be open.
• Rolled back the helm chart to a known good release version but still the issue persists.
• Deleted and redeployed Squidex application using the same helm configuration still issue persists

Please format your logs properly using code blocks.

Hi Sebastian, please find the formatted log

Somehow your DB is just super broken. It cannot even complete this easy operation.

Can you suggest any solution or what we have to do to recover the squidex, it is not coming up.

We were making a Patch API call to a schema which was consistently giving time out after that it went into this state. Will this clue be useful?

@Sebastian is there a way to see what operation has made the DB to break? Please suggest what do we do to recover the DB?

I think it is just a correlation.

It is just general slowness. What is the max pool size for your MongoDB connection?

Perhaps restart MongoDB and hope it solves the problem. Somehow it is not responsive anymore. The exception from above ist just a simple update on a primary key on a collection with 5-6 collections.

mongodb://%s?maxPoolSize=40000"

20,000 is max Pool size in th ecluster we have isse=ues with, but this setting is been working fine since a year