Slow response from content API

I have…

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [ ] Bug report
  • [X] Performance issue
  • [ ] Documentation issue or request

Current behavior

When performing a Get request to the Content Api with a full text search the query often takes longer than 10 seconds. Some queries even take longer than 20 seconds.

Query that takes +/- 10 seconds with Postman:
{“Take”:10,“Skip”:0,“FullText”:“when i do this it is pretty slow”,“Sort”:[{“Path”:“data/CreateDate/iv”,“Order”:“descending”}],“Filter”:null}

Expected behavior

I would expect it to be faster.

Minimal reproduction of the problem

GET request: /api/content/{APP}/{SCHEMA}?q=%7b%22Take%22%3a10%2c%22Skip%22%3a0%2c%22FullText%22%3a%22when+i+do+this+it+is+pretty+slow%22%2c%22Sort%22%3a%5b%7b%22Path%22%3a%22data%2fCreateDate%2fiv%22%2c%22Order%22%3a%22descending%22%7d%5d%2c%22Filter%22%3anull%7d

Environment

App Name:

  • [ ] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [X] Cloud version

Version: [VERSION]

Browser:

  • [ ] Chrome (desktop)
  • [ ] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge

Others:

If this is the expected result, does someone have tips or tricks to improve the performance?

I need the app and schema name. Otherwise it is hard to analyze it.

in my case /removed/removed/

I had a first look and I think I can explain it to you.

The full text search that is currently implemented is very basic. All content items are stored across all apps in a single index.

This index is called a reverse index, so it looks like this:

hello: [id1, id2, id3]
world: [id1, id5]

Usually you remove stop words from such an index. These are words, that exist in all texts like “a, an, the, is” and so on. These are the stop words for english (just as an example): https://github.com/apache/lucene/blob/d5d6dc079395c47cd6d12dcce3bcfdd2c7d9dc63/lucene/analysis/common/src/java/org/apache/lucene/analysis/en/EnglishAnalyzer.java#L48

The “bad” thing is that Squidex content is multi-language and therefore we cannot really optimize for a single language and the search in MongoDB is not clever enough to search across languages.

Therefore stop words are not removed. Your search is not very specific and therefore the result set is very big. It is only an explanation and I will dig into it, how it can be improved. In general I recommend:

  1. Always use the CDN: contents.squidex.io for production.
  2. Use a specialized search engine like Elastic or algolia to customize the search experience to your workload.

When you search for something like “slow” it is much faster.

2 Likes

Thanks for the explanation and advice, the use of the CDN might improve performance a bit.

I could use a service like Elastic but I think it is also important for Squidex that this works well. From my point of view this is a potential security (availability) issue. Potentially a (D)DoS attack would be possible here.

100% agree, I already made two changes here that are not deployed yet:

  1. I eliminate stopwords now on the server side (not sure yet what to do with unlocalized content).
  2. I have ensured that the timeout works properly.
1 Like

Can you provide an update on if or when these changes have been deployed?
I would like to test it to determine if this fixes the problems for me.

It has just been deployed…

1 Like