MongoDb connection failed to connect to database Squidex

cbrams · March 25, 2020, 3:14pm

Hi there. I tried following the setup for kubernetes on google cloud, and I after a lot of back and forth, I have managed to get it to start up – except that it cannot connect to the mongodb database. The inner error is very confusing, at it says that the status is connected:

—> System.TimeoutException: A timeout occured after 30000ms selecting a server using CompositeServerSelector{ Selectors = MongoDB.Driver.MongoClient+AreSessionsSupportedServerSelector, LatencyLimitingServerSelector{ AllowedLatencyRange = 00:00:00.0150000 } }. Client view of cluster state is { ClusterId : “1”, ConnectionMode : “ReplicaSet”, Type : “ReplicaSet”, State : “Connected”, Servers : [{ ServerId: “{ ClusterId : 1, EndPoint : “Unspecified/mongo-0.mongo:27017” }”, EndPoint: “Unspecified/mongo-0.mongo:27017”, State: “Connected”, Type: “ReplicaSetGhost”, WireVersionRange: “[0, 5]”, LastUpdateTimestamp: “2020-03-25T12:16:57.4123533Z” }] }.
at MongoDB.Driver.Core.Clusters.Cluster.ThrowTimeoutException(IServerSelector selector, ClusterDescription description)
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedHelper.HandleCompletedTask(Task completedTask)
at MongoDB.Driver.Core.Clusters.Cluster.WaitForDescriptionChangedAsync(IServerSelector selector, ClusterDescription description, Task descriptionChangedTask, TimeSpan timeout, CancellationToken cancellationToken)
at MongoDB.Driver.Core.Clusters.Cluster.SelectServerAsync(IServerSelector selector, CancellationToken cancellationToken)
at MongoDB.Driver.MongoClient.AreSessionsSupportedAfterSeverSelctionAsync(CancellationToken cancellationToken)
at MongoDB.Driver.MongoClient.AreSessionsSupportedAsync(CancellationToken cancellationToken)
at MongoDB.Driver.MongoClient.StartImplicitSessionAsync(CancellationToken cancellationToken)
at MongoDB.Driver.MongoCollectionImpl1.UsingImplicitSessionAsync[TResult](Func2 funcAsync, CancellationToken cancellationToken)
at Squidex.Infrastructure.MongoDb.MongoRepositoryBase`1.InitializeAsync(CancellationToken ct) in /src/src/Squidex.Infrastructure.MongoDb/MongoDb/MongoRepositoryBase.cs:line 109

I am hoping someone can point me toward a cause for this, so I can start fixing my setup. The yaml for setup is almost entirely based on the kubernetes subfolder of the docker-squidex git repo, not using Helm, with values only changed for naming.

Sebastian · March 25, 2020, 4:14pm

Without details, I cannot help. Have you checked that your mongo cluster is healthy?

cbrams · March 25, 2020, 4:44pm

Hi Sebastian, thanks for your reply.

Everything seems healthy in GCP, the services are healthy, the persistent storage volumes are bound, the mongo stateful set and all its 3 pods are healthy.

The only one unhealthy is the (single) squidex pod, which fails over and over on startup due to the above error, and google thus has classified as “CrashLoopBackOff”.

They are all in the same cluster.

What seems really weird is that it fails on choosing a server, when all the states are connected (Added linebreaks and such for clarity)

Client view of cluster state is 
	{ ClusterId : "1", 
	ConnectionMode : "Automatic", 
	Type : "ReplicaSet", 
	State : "Connected", 
	Servers : 
		[
			{ ServerId: "
				{ ClusterId : 1, 
				EndPoint : "Unspecified/mongo-0.mongo:27017" 
			}",
			EndPoint: "Unspecified/mongo-0.mongo:27017", 
			State: "Connected", 
			Type: "ReplicaSetGhost", 
			WireVersionRange: "[0, 5]", 
			LastUpdateTimestamp: "2020-03-25T16:31:41.6665895Z" 
		}, 
			{ ServerId: "
				{ ClusterId : 1,
				EndPoint : "Unspecified/mongo-1.mongo:27017" 
			}",
			EndPoint: "Unspecified/mongo-1.mongo:27017",
			State: "Connected",
			Type: "ReplicaSetGhost",
			WireVersionRange: "[0, 5]",
			LastUpdateTimestamp: "2020-03-25T16:31:41.7031505Z" 
		}, 
			{ ServerId: "
				{ ClusterId : 1, 
				EndPoint : "Unspecified/mongo-2.mongo:27017" 
			}", 
			EndPoint: "Unspecified/mongo-2.mongo:27017", 
			State: "Connected", 
			Type: "ReplicaSetGhost", 
			WireVersionRange: "[0, 5]", 
			LastUpdateTimestamp: "2020-03-25T16:31:41.7024407Z" 
		}
		] 
	}.

The stateful set is defined as:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: mongo
spec:
  serviceName: "mongo"
  replicas: 3
  template:
    metadata:
      labels:
        role: mongo
    spec:
      terminationGracePeriodSeconds: 10
      containers:
        - name: mongo
          image: mongo:3.4
          command:
          - mongod
          - "--replSet"
          - rs0
          - "--bind_ip"
          - 0.0.0.0
          - "--smallfiles"
          - "--noprealloc"
          ports:
          - containerPort: 27017
          volumeMounts:
          - name: mongo-persistent-storage
            mountPath: /data/db
        - name: mongo-sidecar
          image: cvallance/mongo-k8s-sidecar
          env:
          - name: MONGO_SIDECAR_POD_LABELS
            value: "role=mongo"
  volumeClaimTemplates:
  - metadata:
      name: mongo-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "mongo-persistent-storage"
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 100Gi

And the connection string in the squidex is set as

- name: EVENTSTORE__MONGODB__CONFIGURATION
  value: "mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/"
- name: EVENTSTORE__MONGODB__DATABASE
  value: "Squidex"
- name: STORE__MONGODB__CONFIGURATION
  value: "mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/"
- name: STORE__MONGODB__DATABASE
  value: "Squidex"
- name: STORE__MONGODB__CONTENTDATABASE
  value: "SquidexContent"

Sebastian · March 25, 2020, 4:46pm

Can you connect to one of the mongo pods and check their status? Just to be sure.

cbrams · March 25, 2020, 5:12pm

I can see the process running in the pods with

gcloud container clusters get-credentials ****** --zone europe-west1-b --project ************ && kubectl exec mongo-0 -c mongo – ps -f

Which gives me

UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 15:42 ?        00:00:15 mongod --replSet rs0 --bind_ip 0.0.0.0 --smallfiles --noprealloc
root         201       0  0 17:06 ?        00:00:00 ps -f

Looks like time and stime (which I assume is start-time) do not match, correct? It looks like it might have only run for 15 seconds, but was started a little over an hour ago. Is that correctly interpreted ? I will look for the logs, but in google cloud log viewer the last log also seems from the start time:

{
   "textPayload": "2020-03-25T15:27:53.293+0000 I NETWORK  [conn69] received client metadata from 10.12.1.13:40126 conn69: { driver: { name: \"mongo-csharp-driver\", version: \"2.10.1.0\" }, os: { type: \"Linux\", name: \"Linux 4.14.138+ #1 SMP Tue Sep 3 02:58:08 PDT 2019\", architecture: \"x86_64\", version: \"4.14.138+\" }, platform: \".NET Core 3.1.2\" }\n",
   "insertId": "a7ggcr3nvjps3so0i",
   "resource": {
     "type": "k8s_container",
     "labels": {
       "location": "europe-west1-b",
       "project_id": "*********",
       "cluster_name": "******",
       "pod_name": "mongo-0",
       "container_name": "mongo",
       "namespace_name": "default"
     }
   },
   "timestamp": "2020-03-25T15:27:53.293527736Z",
   "severity": "INFO",
   "labels": {
     "k8s-pod/role": "mongo",
     "k8s-pod/controller-revision-hash": "mongo-b55568c7c",
     "k8s-pod/statefulset_kubernetes_io/pod-name": "mongo-0"
   },
   "logName": "projects/*********/logs/stdout",
   "receiveTimestamp": "2020-03-25T15:28:00.235383033Z"
 }

Sebastian · March 25, 2020, 5:14pm

Use a tool or the mongo client and then connect to one pod

kubectl por-forward mongo-0 27018:27017

And then use your mongo client and check the status with

rs.status()

If you see three members, 1 primary, 2 secondary, everything is fine.

cbrams · March 25, 2020, 6:00pm

I figured out how to ssh into the pod, with just

gcloud container clusters get-credentials
kubectl exec -it mongo-0 -c mongo /bin/bash

Now, I guess that the replicaset is not initialized correctly then. After starting a mongo interactive shell (just running “mongo”), and then running “rs.status()”, it gives me

> rs.status()
{
        "info" : "run rs.initiate(...) if not yet done for the set",
        "ok" : 0,
        "errmsg" : "no replset config has been received",
        "code" : 94,
        "codeName" : "NotYetInitialized"
}

I will look into why it is not initialized, I guess I thought the command run at the beginning (from the .yaml) would initilize it as well:

command:
  - mongod
  - "--replSet"
  - rs0
  - "--bind_ip"
  - 0.0.0.0
  - "--smallfiles"
  - "--noprealloc"

Sebastian · March 25, 2020, 6:46pm

The sidecar should handle that.

cbrams · March 25, 2020, 8:02pm

Okay, it seems like maybe I should have looked at the sidecar logs as well, I apologize for the oversight.

The sidecar loops a complaint like this:

‘pods is forbidden: User “system:serviceaccount:default:default” cannot list resource “pods” in API group “” at the cluster scope’,

Looks like I will need to look into roles and such on kubernetes as well. Thanks for all your help Sebastian.

cbrams · March 25, 2020, 8:33pm

In the interest of completeness, it seems that for gke versions larger than 1.8, it is required to have an additional part to kubernetes setup. We are currently running 1.14.10, but up to 1.15.9 is available for us in google cloud.

For the sidecar to have the read access it needs (as it uses the default serviceuser from the default namespace, as is apparent from my error message, you can grant this acces by this script:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: default-view
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: view
subjects:
  - kind: ServiceAccount
    name: default
    namespace: default

Sebastian · March 26, 2020, 10:37am

I am running 1.15.9 and I don’t need the service account.

michael_yip · August 3, 2020, 7:29am

hi Sebastian, I also encounter the similar problem like this/ but my rs.status() is not the your expected

Sebastian · August 3, 2020, 1:38pm

Thats fine. It just means that your mongodb instance does not run in cluster mode (aka replica set)