Cannot modify anything when clustering is enabled

I have…

  • [X] Checked the logs and have uploaded a log file and provided a link because I found something suspicious there. Please do not post the log file in the topic because very often something important is missing.

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [X] Bug report
  • [ ] Performance issue
  • [ ] Documentation issue or request

Current behavior

Situation. Three Squidex Nodes and one instance of MongoDB. One of them acts as the Admin, and this one does not serve content to end-users, the others two do so. All of them setting in Cluster Mode = Development (so no cluster at all).

Then we set all cluster’s members to Cluster Mode = MongoDB. Checking /orleans show three silos as OK, but when we try to add schemas, add fields to existing schema, change the order of fields… or whatever that implies a “write” this error is shown in screen:

Failed to create schema. Please reload.
Failed to add field. Please reload.
Failed to reorder fields. Please reload.

If disable clustering again, we can modify again (with the counterpart that we have to restart the other two nodes for changes to take effect for end-users)

Expected behavior

Be able to modify when cluster mode is set to MongoDB.

Minimal reproduction of the problem

The one exposed earliers.

Environment

Self hosted with docker running in AWS ECS

  • [X] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [ ] Cloud version

Version: [VERSION]

5.6.0

Browser:

  • [X] Chrome (desktop)
  • [ ] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge

Others:
StackTrace associated:

{“logLevel”:“Error”,“message”:“An unexpected exception has occurred.”,“timestamp”:“2021-10-06T11:48:46Z”,“app”:{“name”:“Squidex”,“version”:“5.6.0”,“sessionId”:“1f182070-a028-4989-a0fd-75cf7a7596c5”},“web”:{“requestId”:“00-60f3accd8f2a524f8694e5afcc0f3603-6ab8579009a72840-00”,“requestPath”:"/apps/joopbox/schemas",“requestMethod”:“POST”,“routeValues”:{“area”:“Api”,“action”:“PostSchema”,“controller”:“Schemas”}},“exception”:{“type”:“System.TypeAccessException”,“message”:“Named type \u0022Squidex.Infrastructure.Orleans.J\u00601\u003CSquidex.Infrastructure.Commands.CommandRequest\u003E\u0022 is invalid: Type string \u0022Squidex.Infrastructure.Commands.CommandRequest\u0022 cannot be resolved.”,“stackTrace”:" at Orleans.Internal.OrleansTaskExtentions.\u003CToTypedTask\u003Eg__ConvertAsync|4_0[T](Task\u00601 asyncTask)\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.ExecuteCommandAsync(TCommand typedCommand) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 52\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.ExecuteCommandAsync(CommandContext context) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 35\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Domain.Apps.Entities.Comments.DomainObject.CommentsCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Comments/DomainObject/CommentsCommandMiddleware.cs:line 50\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Domain.Apps.Entities.Contents.DomainObject.ContentsBulkUpdateCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Contents/DomainObject/ContentsBulkUpdateCommandMiddleware.cs:line 119\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Domain.Apps.Entities.Assets.DomainObject.AssetCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Assets/DomainObject/AssetCommandMiddleware.cs:line 145\n at Squidex.Domain.Apps.Entities.Assets.DomainObject.AssetsBulkUpdateCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Assets/DomainObject/AssetsBulkUpdateCommandMiddleware.cs:line 111\n at Squidex.Infrastructure.Commands.GrainCommandMiddleware\u00602.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/GrainCommandMiddleware.cs:line 29\n at Squidex.Domain.Apps.Entities.Apps.DomainObject.AppCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Apps/DomainObject/AppCommandMiddleware.cs:line 48\n at Squidex.Domain.Apps.Entities.Schemas.Indexes.SchemasIndex.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Schemas/Indexes/SchemasIndex.cs:line 135\n at Squidex.Domain.Apps.Entities.Schemas.Indexes.SchemasIndex.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Schemas/Indexes/SchemasIndex.cs:line 147\n at Squidex.Domain.Apps.Entities.Rules.Indexes.RulesIndex.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Rules/Indexes/RulesIndex.cs:line 60\n at Squidex.Domain.Apps.Entities.Apps.Indexes.AppsIndex.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Apps/Indexes/AppsIndex.cs:line 222\n at Squidex.Domain.Apps.Entities.Apps.Invitation.InviteUserCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Domain.Apps.Entities/Apps/Invitation/InviteUserCommandMiddleware.cs:line 58\n at Squidex.Infrastructure.Commands.CustomCommandMiddlewareRunner.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Infrastructure/Commands/CustomCommandMiddlewareRunner.cs:line 32\n at Squidex.Web.CommandMiddlewares.ETagCommandMiddleware.HandleAsync(CommandContext context, NextDelegate next) in /src/src/Squidex.Web/CommandMiddlewares/ETagCommandMiddleware.cs:line 55\n at Squidex.Infrastructure.Commands.InMemoryCommandBus.PublishAsync(ICommand command) in /src/src/Squidex.Infrastructure/Commands/InMemoryCommandBus.cs:line 71\n at Squidex.Areas.Api.Controllers.Schemas.SchemasController.InvokeCommandAsync(ICommand command) in /src/src/Squidex/Areas/Api/Controllers/Schemas/SchemasController.cs:line 380\n at Squidex.Areas.Api.Controllers.Schemas.SchemasController.PostSchema(String app, CreateSchemaDto request) in /src/src/Squidex/Areas/Api/Controllers/Schemas/SchemasController.cs:line 121\n at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.\u003CInvokeActionMethodAsync\u003Eg__Awaited|12_0(ControllerActionInvoker invoker, ValueTask\u00601 actionResultValueTask)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.\u003CInvokeNextActionFilterAsync\u003Eg__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State\u0026 next, Scope\u0026 scope, Object\u0026 state, Boolean\u0026 isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.\u003CInvokeInnerFilterAsync\u003Eg__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)\n at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.\u003CInvokeNextExceptionFilterAsync\u003Eg__Awaited|25_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)"}}

I have never seen this. I am running the same version in Squidex Cloud, so I have to dig into this.

I can give you some more info based in my tests. As it is deployed on AWS ECS, there are autoscaling activities, so new members join and exit the cluster.

When it is not working, If i drop mongoDB collections:

  • Orleans_OrleansMembershipSingle
  • Orleans_OrleansReminderV2

it starts working again, but when I force the entrance of new nodes something gets messed and I can update in certain nodes, but not in others. When the nodes became to exit the cluster, the node that did not permit me add items, starts letting me do so.

I have seen that in 5.9.0 version, there is support for Clustering in Kubernetes. Is that something exclusive for k8, or it is somethin related to the orleans implementation to permit ins and outs of nodes very frequently?

The cluster members are communicated over the storage (MongoDB). The kubernetes support only means that while starting up, Orleans will query all pods with certain labels from the kubernetes engine and remove dead entries from the table. It makes it more stable, because when a member dies unexpectedly it takes a while until all members agree that the member is actually dead.

Btw: Are you sure that all your pods run the same version? Or have you built an extension in some way?

I have been monitoring the way tasks get shutdown in ECS, and simulated locally. When the dotnet recieves the SIGTERM signal it starts a controlled shutdown, but I see these two warnings in the logs:

Oct 7 15:21:42 staging-customercms01 squidex[115]: {“logLevel”:“Warning”,“message”:“Lifecycle stop operations canceled by request.”,“timestamp”:“2021-10-07T13:21:42Z”,“app”:{“name”:“Squidex”,“version”:“5.6.0”,“sessionId”:“2ceb27e4-3ad0-439b-8a76-73c0898a5f81”},“category”:“Orleans.Runtime.SiloLifecycleSubject”}
Oct 7 15:21:42 staging-customercms01 squidex[115]: {“logLevel”:“Warning”,“message”:“Lifecycle stop operations canceled by request.”,“timestamp”:“2021-10-07T13:21:42Z”,“app”:{“name”:“Squidex”,“version”:“5.6.0”,“sessionId”:“2ceb27e4-3ad0-439b-8a76-73c0898a5f81”},“category”:“Orleans.Runtime.SiloLifecycleSubject”}

Don’t know if that “cancel of lifecycle operations” is what also fills my membership table with lots of dead silos.

I am only in charge of the infraestructure, I am investigating with the development team if they have extended the product in anyway that could interfere. I will keep you posted.

Thanks in advanced for your support.

Both are normal things:

  1. It seems that Orleans needs longer to shutdown than your grace period, therefore the warnings.
  2. The dead entries are not remove immediately, but after a few days.