[IMPLEMENTED] Custom content ids

Sebastian · May 25, 2020, 4:22pm

I am working on a very technical feature, which is very relevant for integrations and legacy systems and I want to share my thoughts with you.

Current Situation

At the moment, all Ids are generated using a GUID (Global unique identifier). This is 16 byte long globally unique identifier that is generated for each new content id, app, schema, rule and so on.

A GUID looks like this: eb793bbc-2be5-48c5-9cce-8dbca5d57d21.

I find this article very useful: https://devblogs.microsoft.com/oldnewthing/20080627-00/?p=21823

Because it is globally unique you only need the identifier to identify a content item, you do not need the name or id of the app for example. When you use the API, the app name and schema name are first converted to their internal identifier and then we use this id for all further operations. It is very easy to deal with them because of the uniqueness; you cannot by accident address the content item from another app, because you have forgotten to include the app id or name in your database query.

But there are a few problems:

1. Backups

When we restore a backup we cannot use the previous ids anymore, because there are already in use. Because the previous app has not been deleted and the app is a clone and furthermore because Squidex does not delete things and the previous app is still somewhere in the system. Therefore we have to generate a new ID for each entity in the backup (rules, schemas, content items, assets, asset folders…). The code to do it is very ugly and uses the JSON serializer for that, which is like a working hack.

But another problem is that you cannot store or hard code ids. Lets imagine that a particular content item should always been shown on the start page of an website. Then you cannot just hard code the ID to fetch this content item, you have to to use another field, because when you restore a backup, the ids has changed. You can also not cache or store the content ID in a database, because again: It might change.

2. Legacy systems and migration

When people start to work with Squidex they have usually existing data, they need to migrate and these data records typically come from a database. So they already have an ID, for example an auto incremented integer, a string, URI or just GUID as well. And because the ID is auto generated by Squidex, it cannot be your existing ID. Therefore you have to introduce an field with just the old ID and sometimes you have to resolve references just because you need to access the previous ID, but you are not interested in other fields. This makes the data modelling more complicated than necessary.

3. Performance

Because you cannot define the ID yourself, synchronization processes are harder than they need to be. When you want to make an upsert (either insert or update) you always need an extra query per content item for the synchronization which really slows everything down. And because of the weak consistency of unique validation in Squidex you have no guarantee that your content item is not generated twice.

Solution: Define your own ID

To solve all three problems I am working on a big improvement, so that you can define the content ids your self.

This consists of the following tasks:

Convert all code that uses GUIDs to use strings.
Ensure that the app id is always used when referencing other content items.
Getting rid of the hack for backups.

It only sounds like a small feature, but you can have a look to the PR: https://github.com/Squidex/squidex/pull/524/files

But this provides the following advantages:

New upsert endpoint is possible
Define your own content ids.
IDs do not change when restoring an backup.
Better performance for upsert and synchronization scenarios.

There will be two restrictions:

Content ids must be unique within an app (e.g. you cannot have two same ids in two schemas). I am not 100% sure about this yet.
Content ids cannot be reused. Once a content item has been deleted, the ID is “lost” for this app.

Feedback is very welcome.

Matthias · May 25, 2020, 8:54pm

Hi @Sebastian,

interesting improvements!

But another problem is that you cannot store or hard code ids. Lets imagine that a particular content item should always been shown on the start page of an website.

I really like that it will be possible to reference content by a string.

When we restore a backup we cannot use the previous ids anymore, because there are already in use.

It took me a couple of minutes to understand how you are solving the restore problem in case anyone else is wondering himself. By adding the app name to each call, the tuple of (app name, string id) is unique enough to allow restoring a backup since the restored backup has another app name. Then, the internal GUID (which you will probably still use but instead just let us developers you the string id).

Ensure that the app id is always used when referencing other content items.

Yes, given how you solved the “restore same id” problem, that is no problem. I haven’t looked at the PR in detail, but please mark the old method as obsolete and have the new way side-by-side to avoid breaking changes when publishing the new version to Squidex Cloud.

Content ids must be unique within an app (e.g. you cannot have two same ids in two schemas). I am not 100% sure about this yet.

This definetly makes sense. Even with generic names like, e.g., root, it’s reasonable to expect the user to use root-home, root-blog or else.

Content ids cannot be reused. Once a content item has been deleted, the ID is “lost” for this app.

This is sort of a no-go. I hoped that I can now hardcode string ids instead of GUIDs but I see the problem given that you even want to store deleted items. The problem I’m seeing is that when I want to reuse an id (which I’ve given a semantically meaningful name, such as homepage), I’ll just go for homepage2, homepage3, … whenever I deleted a content. Then I’ll also have FAQ7, FAQ8. I might be a bit exaggerating here, but it sort of feels wrong to discard an id.

Let’s say I start a new project, delete some schemas and contents and then I feel like damn, I can’t use ‘homepage’ or ‘faq’ again. Right now, I can only think of deleting the app and creating another one to use the string id again.

So, a couple of thoughts:

Upon deleting a content, does it make sense to alter the id to something else, like appending a suffix, e.g., the delete date?
Does it make sense to hard-delete (not just soft-delete) data? Or somehow rename them so that their string id becomes available again?
Does it make sense to use another parameter when retrieving the content, so instead of (app ip, string id) it would be like using (app ip, string id, parameter X), where X could be, for instance, a not before date. So when I delete homepage and recreate it, I could query for *give me the content of app ip + string id + ignore everthing I did with that string id before date X)? So X could be optional in case the string id has never been deleted?

Sebastian · May 26, 2020, 4:54am

No, the internal GUID does not exist anymore. It is just a combination of (app-id + content-id) which makes it unique.

One problem why I cannot delete old items are that background processes might be still running, e.g. a kafka or webhook integration.

Sebastian · July 18, 2020, 3:52pm

This has been implemented in the master branch

Sebastian · October 8, 2020, 5:11pm