I am working on a very technical feature, which is very relevant for integrations and legacy systems and I want to share my thoughts with you.
At the moment, all Ids are generated using a GUID (Global unique identifier). This is 16 byte long globally unique identifier that is generated for each new content id, app, schema, rule and so on.
A GUID looks like this:
I find this article very useful: https://devblogs.microsoft.com/oldnewthing/20080627-00/?p=21823
Because it is globally unique you only need the identifier to identify a content item, you do not need the name or id of the app for example. When you use the API, the app name and schema name are first converted to their internal identifier and then we use this id for all further operations. It is very easy to deal with them because of the uniqueness; you cannot by accident address the content item from another app, because you have forgotten to include the app id or name in your database query.
But there are a few problems:
When we restore a backup we cannot use the previous ids anymore, because there are already in use. Because the previous app has not been deleted and the app is a clone and furthermore because Squidex does not delete things and the previous app is still somewhere in the system. Therefore we have to generate a new ID for each entity in the backup (rules, schemas, content items, assets, asset folders…). The code to do it is very ugly and uses the JSON serializer for that, which is like a working hack.
But another problem is that you cannot store or hard code ids. Lets imagine that a particular content item should always been shown on the start page of an website. Then you cannot just hard code the ID to fetch this content item, you have to to use another field, because when you restore a backup, the ids has changed. You can also not cache or store the content ID in a database, because again: It might change.
2. Legacy systems and migration
When people start to work with Squidex they have usually existing data, they need to migrate and these data records typically come from a database. So they already have an ID, for example an auto incremented integer, a string, URI or just GUID as well. And because the ID is auto generated by Squidex, it cannot be your existing ID. Therefore you have to introduce an field with just the old ID and sometimes you have to resolve references just because you need to access the previous ID, but you are not interested in other fields. This makes the data modelling more complicated than necessary.
Because you cannot define the ID yourself, synchronization processes are harder than they need to be. When you want to make an upsert (either insert or update) you always need an extra query per content item for the synchronization which really slows everything down. And because of the weak consistency of unique validation in Squidex you have no guarantee that your content item is not generated twice.
Solution: Define your own ID
To solve all three problems I am working on a big improvement, so that you can define the content ids your self.
This consists of the following tasks:
- Convert all code that uses GUIDs to use strings.
- Ensure that the app id is always used when referencing other content items.
- Getting rid of the hack for backups.
It only sounds like a small feature, but you can have a look to the PR: https://github.com/Squidex/squidex/pull/524/files
But this provides the following advantages:
- New upsert endpoint is possible
- Define your own content ids.
- IDs do not change when restoring an backup.
- Better performance for upsert and synchronization scenarios.
There will be two restrictions:
- Content ids must be unique within an app (e.g. you cannot have two same ids in two schemas). I am not 100% sure about this yet.
- Content ids cannot be reused. Once a content item has been deleted, the ID is “lost” for this app.
Feedback is very welcome.