The future of Orleans?
At the moment Squidex uses Microsoft Orleans for many parts of the architecture, but it makes deployment and operations more complicated. Therefore I think about removing this dependency.
What is Orleans?
Orleans is an actor framework. An actor is more or less a class with methods where everything is handled single threaded. So when a method is called and another method is in progress it is queued and handled afterwards. Furthermore these actors are distributed in the cluster automatically and when the actor that you want to talk with lives on another node the communication is automatically established.
What are the benefits?
As mentioned before, method called are queued. Usually when you update an object in the database you use a mechanism called optimistic concurrency. The idea is that you also store an version number in the database and when you read a value to be updated you also get the version. Then you only make the update when the version that you currently have in memory has not been changed in the meantime. The problem is that you are not detecting updates before you make the write to the database. So a lot of expensive operations might have happened already, like reading the value from the database.
With actors you have one instance of every domain object (content, assets and so on) and therefore you do not have this performance problem.
Because actors just live in the cluster you do not need special nodes for background operations. Squidex has a lot of background jobs that cannot run in parallel. You just tell the system that you only want one actor of each type to be exist in the cluster.
What are the downsides?
- Cluster management is complicated. You have to ensure that the nodes can talk to each other and you need a stable cluster. This makes a lot of scenarios quiete complex, for example auto scaling is not that easy.
When we talk about an architecture without Orleans we would have to talk about how Orleans is used today.
These operations are only allowed to run once for the deployment. Typical tasks are:
- Rule execution.
- Event processing.
- Content scheduling
- Backup restore
- Backup operations
Without Orleans you need typical kind of Squidex nodes (just a configuration) and in some cases a way to talk to each other. For example with PubSub mechanisms like Redis or Google PubSub.
=> Without Orleans these deployments become more complex.
Because the actor lives in memory it is very cheap to access some data that is used very often. Typical example are apps and schemas. The app instance is needed for almost every database call and often you want to have the newest version and not a cached instance.
=> Without Orleans you have way more calls to the database or more caching for some endpoints.
As mentioned above you need to solve parallel updates on domain objects. Without Orleans you would solve it on the database leve.
=> Without Orleans you have more database calls and also more exceptions for parallel updates.
Rules are a special case. They work on events. Lets talk about a concrete example: When you make an update to a content item, you create a new ContentUpdated evvent. Based on this event an enriched event is created in a background process that contains the new data and the old data of the content item. The new data is already part of the event that has been queried from the database. But the question is, where we can get the old data from. Therefore the content actors keeps the previous data in memory for a little while. Usually the event is handled by the rule system more or less immediately, so it is very likely that the content actor is still alive and has the old data available. So this operation is basically free in the most cases. If the old data is not available we have to query it. We could use a distributed cache for that if configured, but it is more expensive.
- PRO: A lot of implementations are easy if you understand Orleans.
- PRO: Less database calls, because actors work like a cache, but are always consistent.
- PRO: Easier to deploy because you have less requirements (like redis).
- PRO: Some operations are very fast, because they are in-memory.
- CON: The architecture is very special, making it harder for new developers.
- CON: Harder to deploy in some cases and harder in operations.
- PRO: Easier architecture
- PRO: Easier to deploy to things like managed containers or even serverless deployments.
- CON: More database calls, perhaps you shift some of the operational challenges to the database.
- CON: Some operations are slower, because you have more database calls.