Unable to schedule jobs every couple of days

0 votes

Ihave set up a single-node Kubernetes cluster, using flannel.

Most of the time everything works perfectly fine but after a few days I've noticed that the cluster reached a stage where it wasn't able to schedule new pods and the pods were stuck in "pending" stage. I then realized this happens after every couple of days.

Its a very weird problem and i have no idea what to do.

Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason      Message
  --------- --------    -----   ----                -------------   --------    ------      -------
  2m        2m      1   {default-scheduler }                Normal      Scheduled   Successfully assigned dex-1939802596-zt1r3 to superserver-03
  1m        2s      21  {kubelet superserver-03}            Warning     FailedSync  Error syncing pod, skipping: failed to "SetupNetwork" for "somepod-1939802596-zt1r3_somenamespace" with SetupNetworkError: "Failed to setup network for pod \"somepod-1939802596-zt1r3_somenamespace(167f8345-faeb-11e6-94f3-0cc47a9a5cf2)\" using network plugins \"cni\": no IP addresses available in network: cbr0; Skipping pod"

Technical details:

kubeadm version: version.Info{Major:"1", Minor:"6+", GitVersion:"v1.6.0-alpha.0.2074+a092d8e0f95f52", GitCommit:"a092d8e0f95f5200f7ae2cba45c75ab42da36537", GitTreeState:"clean", BuildDate:"2016-12-13T17:03:18Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Kubernetes Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.3", GitCommit:"029c3a408176b55c30846f0faedf56aae5992e9b", GitTreeState:"clean", BuildDate:"2017-02-15T06:34:56Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

Started the cluster with these commands:

kubeadm init --pod-network-cidr 10.244.0.0/16 --api-advertise-addresses 192.168.1.200

kubectl taint nodes --all dedicated-

kubectl -n kube-system apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Some syslog logs that may be relevant (I got many of those):

Feb 23 11:07:49 server-03 kernel: [  155.480669] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 23 11:07:49 server-03 dockerd[1414]: time="2017-02-23T11:07:49.735590817+02:00" level=warning msg="Couldn't run auplink before unmount /var/lib/docker/aufs/mnt/89bb7abdb946d858e175d80d6e1d2fdce0262af8c7afa9c6ad9d776f1f5028c4-init: exec: \"auplink\": executable file not found in $PATH"
Feb 23 11:07:49 server-03 kernel: [  155.496599] aufs au_opts_verify:1597:dockerd[24704]: dirperm1 breaks the protection by the permission bits on the lower branch
Feb 23 11:07:49 server-03 systemd-udevd[29313]: Could not generate persistent MAC address for vethd4d85eac: No such file or directory
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.756976    1228 cni.go:255] Error adding network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kernel: [  155.514994] IPv6: eth0: IPv6 duplicate address fe80::835:deff:fe4f:c74d detected!
Feb 23 11:07:49 server-03 kernel: [  155.515380] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Feb 23 11:07:49 server-03 kernel: [  155.515588] device vethd4d85eac entered promiscuous mode
Feb 23 11:07:49 server-03 kernel: [  155.515643] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kernel: [  155.515663] cni0: port 34(vethd4d85eac) entered forwarding state
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757001    1228 cni.go:209] Error while adding to cni network: no IP addresses available in network: cbr0
Feb 23 11:07:49 server-03 kubelet[1228]: E0223 11:07:49.757056    1228 docker_manager.go:2201] Failed to setup network for pod "somepod-752955044-58g59_somenamespace(5d6c28e1-f8dd-11e6-9843-0cc47a9a5cf2)" using network plugins "cni": no IP addresses available in network: cbr0; Skipping pod
Sep 19, 2018 in Kubernetes by lina
• 8,110 points
63 views

1 answer to this question.

0 votes

Yes that is a very weird problem and a weird stage to be stuck at.

Try setting up a cron job to run this script on reboot

There is a garbage collecting the pods on docker daemon restart which should help with your issue.

answered Sep 19, 2018 by Kalgi
• 40,480 points

Related Questions In Kubernetes

0 votes
1 answer

unable to start Kubernetes due to so many open files in system

You can try the following steps: You can ...READ MORE

answered May 1, 2018 in Kubernetes by shubham
• 6,890 points
211 views
0 votes
1 answer

Unable to run Kubernetes on rancher cluster

switch Docker to 1.12.x; Kubernetes doesn't support ...READ MORE

answered Aug 28, 2018 in Kubernetes by Kalgi
• 40,480 points
134 views
0 votes
1 answer

Unable to get cgroup stats for docker and kubelet services

Try and start kubelet with the following ...READ MORE

answered Sep 3, 2018 in Kubernetes by DareDev
• 6,810 points
245 views
0 votes
1 answer

Unable to access pods using nodeIP

Your kubernetes cluster is missing the ingress ...READ MORE

answered Sep 7, 2018 in Kubernetes by Kalgi
• 40,480 points
48 views
0 votes
1 answer
0 votes
3 answers

Error while joining cluster with node

Hi Kalgi after following above steps it ...READ MORE

answered Jan 17 in Others by anonymous
2,186 views
+3 votes
1 answer
0 votes
1 answer

How to run 2 cron jobs scheduled for every month?

Unfortunately, you cannot run the CronJob inside a container ...READ MORE

answered Sep 27, 2018 in Kubernetes by Kalgi
• 40,480 points
74 views
0 votes
1 answer

Unable to access kubernetes dashboard

You’re trying to access a private IP. ...READ MORE

answered Aug 27, 2018 in Kubernetes by Kalgi
• 40,480 points
216 views