2023-07-09 Query optimization

Sebastian · July 9, 2023, 4:48pm

Query Optimizations

I was thinking about how to optimize the queries. This feature is available in docker tag dev-7594 and not deployed yet.

Motivation

One idea was to only query the data fields that are needed. Some of you have content items with very large array fields or components fields and in this case the list view is really slow.

Therefore I created a very small example where I create mongo documents with around 5 MB and only I make 2 tests:

Query all fields
Query a subset of fields only.

And the difference is huge:

Query Normal 00:00:04.6009169: 100
Query Limited 00:00:00.0294461: 100

You can find the source code here: https://gist.github.com/SebastianStehle/3d3b1bd18da989c1b285c9e4db1c57b0

Therefore I also integrated it into Squidex as an experiment:

Implementation

I have added a new header called X-Fields where you can define the list of fields as a comma-separated list. The whole query system on the API level needs a redesign in my opinion but it would be a major breaking change. But this is a topic for another time.

This field is respected by the UI in the contents list and also in GraphQL.

To enable the feature in GraphQL you need a special directive:

query {
  queryMySchemaContents @optimizeFieldQueries {
    data {
      myNumber {
        iv
      }
    }
  }
}

The reason is the following: If you have very connected data you might query the same object twice when references during GraphQL. In this case Squidex uses a per-request-cache and only queries the object once. But this does not work, because the cached entity might not have all fields for the second part of the query.

Therefore you have to opt-in for the feature now, until I understand it better.

This header is also not used for other parts of the UI yet, for example when resolving references. But this will come.

What do you have to do?

It depends: If you have a schema with large array or component fields this can be a performance improvement for you. But it depends if this array is list field or not. If it is a list field (or reference field), only the number of items are shown in the overview and it has a high cost to query the array, just because we need the count. In this case it is better to introduce a new field and to use scripting to automatically calculate this field value, e.g.

ctx.data.arrayCount.iv = ctx.data.array.iv.length || 0;
replace();

Sebastian · July 9, 2023, 4:48pm

Sebastian · July 9, 2023, 4:48pm

Sebastian · July 9, 2023, 4:49pm

Sebastian · July 9, 2023, 4:50pm

Sebastian · July 9, 2023, 4:50pm