Lucene.Net error: Index was outside the bounds of the array

Hi, everyone!

I’m facing a strange error when making a full text search in the contents page for some terms.

My main language is pt-BR and when I search for “Rio de Janeiro” in my collection (and other terms containing “de”, like “Duque de Caxias” - “de” is equivalent to “of”), an exception is being thrown in the TextIndexerGrain class, on line 98.

Does anyone has some idea on what is hapening here?

The line:

var hits = index.Searcher.Search(query, MaxResults).ScoreDocs;

The message:

Index was outside the bounds of the array.

The stack trace:

at Lucene.Net.Codecs.Lucene41.ForUtil.ReadBlock(IndexInput in, Byte[] encoded, Int32[] decoded) at Lucene.Net.Codecs.Lucene41.Lucene41PostingsReader.BlockDocsAndPositionsEnum.RefillDocs() at Lucene.Net.Codecs.Lucene41.Lucene41PostingsReader.BlockDocsAndPositionsEnum.Advance(Int32 target) at Lucene.Net.Search.ExactPhraseScorer.NextDoc() at Lucene.Net.Search.Weight.DefaultBulkScorer.ScoreRange(ICollector collector, Scorer scorer, Int32 currentDoc, Int32 end) at Lucene.Net.Search.BooleanScorer.Score(ICollector collector, Int32 max) at Lucene.Net.Search.BulkScorer.Score(ICollector collector) at Lucene.Net.Search.IndexSearcher.Search(IList1 leaves, Weight weight, ICollector collector)
at Lucene.Net.Search.IndexSearcher.Search(IList1 leaves, Weight weight, ScoreDoc after, Int32 nDocs) at Lucene.Net.Search.IndexSearcher.Search(Weight weight, ScoreDoc after, Int32 nDocs) at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n) at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n) at Squidex.Domain.Apps.Entities.Contents.Text.TextIndexerGrain.SearchAsync(String queryText, SearchContext context) in D:\Projetos\Observatorio\src\admin\backend\src\Squidex.Domain.Apps.Entities\Contents\Text\TextIndexerGrain.cs:line 98 at Squidex.Domain.Apps.Entities.Contents.Text.OrleansCodeGenTextIndexerGrainMethodInvoker.<Invoke>d__0.MoveNext() in D:\Projetos\Observatorio\src\admin\backend\src\Squidex.Domain.Apps.Entities\obj\Debug\netcoreapp3.0\Squidex.Domain.Apps.Entities.orleans.g.cs:line 1429 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Orleans.Runtime.GrainMethodInvoker.<Invoke>d__21.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Squidex.Infrastructure.Orleans.StateFilter.<Invoke>d__0.MoveNext() in D:\Projetos\Observatorio\src\admin\backend\src\Squidex.Infrastructure\Orleans\StateFilter.cs:line 21 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Orleans.Runtime.GrainMethodInvoker.<Invoke>d__21.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at Squidex.Infrastructure.Orleans.LoggingFilter.<Invoke>d__2.MoveNext() in D:\Projetos\Observatorio\src\admin\backend\src\Squidex.Infrastructure\Orleans\LoggingFilter.cs:line 30

Do you host it yourself? Please fill in all the information from the template

I have…

  • [x] Checked the logs and have provided the logs if I found something suspicious there

I’m submitting a…

  • [ ] Regression (a behavior that stopped working in a new release)
  • [x] Bug report
  • [ ] Performance issue
  • [ ] Documentation issue or request

Current behavior

My main language is pt-BR and when I search for “Rio de Janeiro” in my collection (and other terms containing “de”, like “Duque de Caxias” - “de” is equivalent to “of”), an exception is being thrown in the TextIndexerGrain class, on line 98. Index was outside the bounds of the array.

Expected behavior

The system must return all the contents containg the exat searched term (since I’m using full text with double quotes).

Minimal reproduction of the problem

I have a collection called “Localization” (cities, states, regions, etc) that has a field “Name”. When I type “Rio de Janeiro” in the search box of the contents page, a message “Feiled to load contents. Please reload.” is shown. It also happens with other names containing the preposition “de” (equivalent to “of”), like “Duque de Caxias”.

I saw the error occurs in the line 98 of the TextIndexerGrain class: var hits = index.Searcher.Search(query, MaxResults).ScoreDocs; and it seems to be a Lucene.Net error.

Environment

  • [x] Self hosted with docker
  • [ ] Self hosted with IIS
  • [ ] Self hosted with other version
  • [ ] Cloud version

Version: 4.0.0 Beta 1 - 2019-10-27

Browser:

  • [x] Chrome (desktop)
  • [x] Chrome (Android)
  • [ ] Chrome (iOS)
  • [ ] Firefox
  • [ ] Safari (desktop)
  • [ ] Safari (iOS)
  • [ ] IE
  • [ ] Edge

Others:
Apparently the error only occurs if the term searched exists in the name field. That is, if I search for a name that doesn’t exists, no exception is thrown.

The Log:
{ "logLevel": "Error", "action": "GrainInvoked", "status": "Failed", "grain": "Squidex.Domain.Apps.Entities.Contents.Text.TextIndexerGrain", "grainMethod": "System.Threading.Tasks.Task\u00601[System.Collections.Generic.List\u00601[System.Guid]] SearchAsync(System.String, Squidex.Domain.Apps.Entities.Contents.Text.SearchContext)", "exception": { "type": "System.IndexOutOfRangeException", "message": "Index was outside the bounds of the array.", "stackTrace": " at Lucene.Net.Codecs.Lucene41.ForUtil.ReadBlock(IndexInput in, Byte[] encoded, Int32[] decoded)\r\n at Lucene.Net.Codecs.Lucene41.Lucene41PostingsReader.BlockDocsAndPositionsEnum.RefillDocs()\r\n at Lucene.Net.Codecs.Lucene41.Lucene41PostingsReader.BlockDocsAndPositionsEnum.Advance(Int32 target)\r\n at Lucene.Net.Search.ExactPhraseScorer.NextDoc()\r\n at Lucene.Net.Search.Weight.DefaultBulkScorer.Score(ICollector collector, Int32 max)\r\n at Lucene.Net.Search.BooleanScorer.Score(ICollector collector, Int32 max)\r\n at Lucene.Net.Search.BulkScorer.Score(ICollector collector)\r\n at Lucene.Net.Search.IndexSearcher.Search(IList\u00601 leaves, Weight weight, ICollector collector)\r\n at Lucene.Net.Search.IndexSearcher.Search(IList\u00601 leaves, Weight weight, ScoreDoc after, Int32 nDocs)\r\n at Lucene.Net.Search.IndexSearcher.Search(Weight weight, ScoreDoc after, Int32 nDocs)\r\n at Lucene.Net.Search.IndexSearcher.Search(Query query, Filter filter, Int32 n)\r\n at Lucene.Net.Search.IndexSearcher.Search(Query query, Int32 n)\r\n at Squidex.Domain.Apps.Entities.Contents.Text.TextIndexerGrain.SearchAsync(String queryText, SearchContext context) in D:\\Projetos\\Observatorio\\src\\admin\\backend\\src\\Squidex.Domain.Apps.Entities\\Contents\\Text\\TextIndexerGrain.cs:line 98\r\n at Squidex.Domain.Apps.Entities.Contents.Text.OrleansCodeGenTextIndexerGrainMethodInvoker.Invoke(IAddressable grain, InvokeMethodRequest request) in D:\\Projetos\\Observatorio\\src\\admin\\backend\\src\\Squidex.Domain.Apps.Entities\\obj\\Debug\\netcoreapp3.0\\Squidex.Domain.Apps.Entities.orleans.g.cs:line 1429\r\n at Orleans.Runtime.GrainMethodInvoker.Invoke()\r\n at Squidex.Infrastructure.Orleans.StateFilter.Invoke(IIncomingGrainCallContext context) in D:\\Projetos\\Observatorio\\src\\admin\\backend\\src\\Squidex.Infrastructure\\Orleans\\StateFilter.cs:line 21\r\n at Orleans.Runtime.GrainMethodInvoker.Invoke()\r\n at Squidex.Infrastructure.Orleans.LoggingFilter.Invoke(IIncomingGrainCallContext context) in D:\\Projetos\\Observatorio\\src\\admin\\backend\\src\\Squidex.Infrastructure\\Orleans\\LoggingFilter.cs:line 30" }, "app": { "name": "Squidex", "version": "1.0.0.0", "sessionId": "8f558af1-96d5-4531-ba6b-25e5802d403c" }, "web": { "requestId": "d3d98738-68a9-47da-ae1c-e773b97687db", "requestPath": "/content/observatorio/6f7685fa-db7e-4b50-a766-c78d40bb3aea", "requestMethod": "GET", "routeValues": { "area": "Api", "action": "GetContents", "controller": "Contents" } }, "timestamp": "2020-03-24T20:37:28Z" }

Can you provide me a backup of your mongo database?

Yes, I can. I’m preparing the backup right now and I’ll send to you ASAP.

Thank you.

Hi, Sebastian. Sorry for the delay.

My friend @AlanBazan is assuming this task. He’s going to send you (in private) the link to download our MongoDB backup.

Thank you!

Hi Sebastian. I’ve sent the backup by email (I don’t know the right channel to do that, but I’ve sent on the hello@squidex.io email).

Some additional information…

I’ve got the Lucene.Net source code for debugging and I’ve found the code is causing the error.
The Lucene.Net class “Lucene.Net.Codecs.Lucene41.ForUtil” has the following method:

internal void ReadBlock(IndexInput @in, byte[] encoded, int[] decoded)
    {
        int numBits = @in.ReadByte();
        Debug.Assert(numBits <= 32, numBits.ToString());

        if (numBits == ALL_VALUES_EQUAL)
        {
            int value = @in.ReadVInt32();
            Arrays.Fill(decoded, 0, Lucene41PostingsFormat.BLOCK_SIZE, value);
            return;
        }

        int encodedSize = encodedSizes[numBits];
        @in.ReadBytes(encoded, 0, encodedSize);

        PackedInt32s.IDecoder decoder = decoders[numBits];
        int iters = iterations[numBits];
        Debug.Assert(iters * decoder.ByteValueCount >= Lucene41PostingsFormat.BLOCK_SIZE);

        decoder.Decode(encoded, 0, decoded, 0, iters);
    }

The variables “encodedSizes”, “decoders” and “iterations” are 33 positions Arrays (0 to 32).

The first line of this method ( int numBits = @in.ReadByte(); ) calls the ReadByte method of the “Squidex.Domain.Apps.Entities.MongoDb.FullText.MongoIndexInput” class. But this method returns the value “49” while reading the “_ix_Lucene41_0.doc” file, causing the IndexOutOfRangeException when getting the “encodedSizes” array value.

I forced the exection of the next step of “@in.ReadByte()” and it returns 0. I might be wrong but it seems to me that this finishes the file reading.

When I change the “if” of this method to filter the values outside the range, the result appears correctly and no error is thrown.

        if (numBits == ALL_VALUES_EQUAL || numBits > 32)
        {
            int value = @in.ReadVInt32();
            Arrays.Fill(decoded, 0, Lucene41PostingsFormat.BLOCK_SIZE, value);
            return;
        }

But I don’t know exactly what’s the real source of this error: if it’s from Lucene.Net, or MongoIndexInput, or any configuration mistake on our database.

One more thing…

The change above does not work for the “Duque de Caxias” term, just for “Rio de Janeiro”.

The MongoIndexInput is obsolete anyway. I have replaced it with another approach. Perhaps it just works with the current version…I will test your backup.

I got your DB, but it is an older version. If you have the time you could clone your DB and deploy a second copy of Squidex with the newest version and see if the problem still occurs. I have changed a few things about the full text search there, which I can explain in detail if you want.

I have tried with the latest version of Squidex and it worked.
Thanks @Sebastian !

1 Like