[SOLVED] Backuping app kills StatefulSet pod

Hi,

I will post more details later.

Squidex crashed just now after I clicked on Backup and then Download the backup .zip file.

That’s literally all I did.

I don’t know why it is doing this but it’s not the first time this has happened.

I will gather logs later and post them here.

Thanks.

This is from the Backup UI after I click Backup so that you can see how many events there are etc.:

Started:
Duration: less than a minute
00:00:10
Events: **3.4k**, Assets: **323**
Download: Ready

Update 1:
It took 5 mins for the new StatefulSet pod to come back online.

Update 2:
The .zip file is 32 MB.

Update 3:
Took another backup (from a different app). This time the .zip file was 103 MB. The StatefulSet pod did not crash this time.

The only option I see is to create a backup of your DB with mongobackup and provide this to me via PM.

Only a stackoverflow exception or OOM can kill a pod (usually).

Okay.

I scaled up the number of replicas from 1 to 5.

I know; you should not have just 1 pod.

Maybe this is the cause, I’m not sure. I still think it should terminate gracefully and give you a user friendly error and continue with life; not crash your application. But yeah, without that data you mentioned it’s hard to troubleshoot/reproduce.

But I will backup again in the not so distant future and if it crashes then (after more memory etc has been accumulated) I will give you some material to work with.

Until then I will see if the scaling up of the pods helps.

I’ll keep you updated - better leave this thread open for a little while longer if possible.

Thanks.

If the bug is a stackoverflow exception there is nothing I can do, there is no chance to provide a log or so.

Hi Sebastian,

This was due to the node having insufficient resources (CPU/Memory).

Figured it out after installing Prometheus and Grafana and monitoring the operation.

So you can close this one now. Thanks for the help.

1 Like