[IMPLENTED] Permanent deletion

If you direct me a little for this, maybe we can add a small option on the schema side and activate the permanent deletion when this option is approved.

You have to add an abstract method to the base domain object:

And then implementations in the derived classes DomainObject and LogSnapshotDomainObject class. Both have access to persistence.DeleteAsync() or so. Then you call this method when the command has a certain flag. Then we need to populate this flag through all layers.

Hi Sebastian,

There is no proper use case for bulk import and I do not understand exactly how to use it.

PostContents (string app, string name, [FromBody] ImportContentsDto request) method is available on ContentsController.cs.

I think we should use this. However, a usage example of this method is not available on squidex-samples. My thinking is; Bool HardDeleteContents {get; set;} property adding on ImportContentsDto.

CommandMiddleware, which processes the command when this is true, also execute the command to delete all data on the corresponding schema. In fact, I would be very happy if you encoded this flow with a sample because you have a much more control and info over the system. But if you show me I will try to do it anyway.

Are you using the C# client SDK?

If yes, it provides some endpoints, if not, please do not use this endpoint. The correct one is this:

How does it work?

You have to create a post like this

POST `/api/content/{app}/{name}/bulk`
{
	"jobs": [{
		"type": "Upsert",
		"data": {
			"number": {
				"iv": 1
			}
		}
	}, {
		"type": "Delete",
		"id": "123..."
	]
}

What id does not provide, is an endpoint to delete all contents for a schema and there are several reasons for that. One of them is that it is harder to do with event sourcing, because you not only need to delete the data in the content collections, which would in fact be a single DELETE command like DELETE CONTENTS WHERE schemaId = '...'.

You also have to create the deletion events.

This the root of the problem. To work reliable you have to keep the deletion events. Squidex works like a database in some ways. A database maintains a list of sorted operations (e.g. Oplog in MongoDB), which is used for synchronization. If you just delete everything, all other systems like indices, usage counters and external systems where the data might be synced as well, does not get the information that the content has been deleted.

Therefore the delete command to be updated to allow a permanent flag. When the flag is true, the data has to be delete (there is a persistence.DeleteAsync()) method for that and then the delete event needs to be published. It is not possible otherwise. We could make an optimization to delete these events after a months or so, but this is another story.

If this does not work for you, Squidex is probably not the best tool for you, at least not for this use case.

When a schema is deleted, are events deleted? My only fear is that the database is garbage after while doing too much delete or add…

No. Right now, nothing gets deleted. It is by design because I think that for the kind of content that Squidex is built for (manually built content in the range of < 1 Mio Records) the content grows slower than the disk sizes.

Deleting 60.000 content items over the bulk update should be acceptable fast like 1000 items / sec or so. So I think this is not a big issue. I am open to allow permanent deletions as it also has advantages like GRPC compliance and so on, but at least the deleted-event must stay in the system.

I am working on this at the moment.

Hi Sebastian,

I did the BulkInsert job and tried it with 20,000 demo data. It took about 1 min and it is working properly. Before doing this, I was thinking about deleting all existing data. I think we can solve this when you create a method such as clear all contents of this scheme on ContentRepository and create an action / command named “truncate” on the contents side.

Expected work;
When the truncate command is run or by specifying a feature before bulk insert in the model;

1- Deletion of all data belonging to the scheme from the ContentsDatabase (Published, All).
2- Deletion of all events belonging to this scheme from the Event2 collection.

Thus, when 60.000 data is created, garbage data will not be accumulated in the system every time.

It would not help that much on the content repository because you also have to delete the events. If it takes only one minute I think it is fast enough for now.

Yes, To delete events, we need a method like RemoveSchemaEvents (DomainId schemaId). This method will actually be set to be called before batch registration, like “TruncateSchema = true” on BulkUpdateDto model. Thus, all contents and events of the relevant schema will be cleared. A clean registration will then take place. Can you help improve this?

In this way, we can completely clean the data that will turn into garbage before collective recording.

I will not implement that. The reason is that some users have business rules that need to run on every deletion. Of course you can have methods to bypass everything, but I think it is not worth it. Squidex is not a general-purpose database.

I think that developing this option on the model will contribute to the product. It also provides support for data gdpr. But you still know. With a special development, I can write this on my side.

But you can already delete content, GDPR compliant. With the new option of course.

Are you doing a new development on this topic?
When can you improve ? I will also code on my own needs.

Thank you

It takes a while to deploy it to the cloud, but I will merge it today. The change adds a permanent flag to all asset and contentr deletion endpoints.

Can you tell us how we can use development and how it works?
Is it in the “feature / status” package?

Thank you.

The dev packages are always the versions number in the build pipeline:

So dev-5647 in this case.

how it works hard delete?

The delete endpoint for assets and contents has a query parameter ?permanent=true, the same for all bulk endpoints.