Page MenuHomeSolus

Docker 20.10.5 Fails to Start
Closed, ResolvedPublic

Description

I just noticed after upgrading to 20.10.5 the systemd unit for dockerd fails to start with the following error. Reverting to 19.03.14 fixes the issue for me. @JoshStrobl did something change in the most recent update where network manager would be blocking the new bridge?

bash
Apr 05 11:43:28 io dockerd[4676]: time="2021-04-05T11:43:28.485698755-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock.ttrpc
Apr 05 11:43:28 io dockerd[4676]: time="2021-04-05T11:43:28.485721312-07:00" level=info msg=serving... address=/var/run/docker/containerd/containerd.sock
Apr 05 11:43:28 io dockerd[4676]: time="2021-04-05T11:43:28.485734661-07:00" level=info msg="containerd successfully booted in 0.021669s"
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.495539700-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.495567612-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.495596550-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock  <nil> 0 <nil>}] <nil> >
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.495610139-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.496006355-07:00" level=info msg="parsed scheme: \"unix\"" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.496020170-07:00" level=info msg="scheme \"unix\" not registered, fallback to default scheme" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.496037135-07:00" level=info msg="ccResolverWrapper: sending update to cc: {[{unix:///var/run/docker/containerd/containerd.sock  <nil> 0 <nil>}] <nil> >
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.496047002-07:00" level=info msg="ClientConn switching balancer to \"pick_first\"" module=grpc
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.505758961-07:00" level=info msg="[graphdriver] using prior storage driver: overlay2"
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.509422198-07:00" level=info msg="Loading containers: start."
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.513367756-07:00" level=info msg="Firewalld: docker zone already exists, returning"
Apr 05 11:43:28 io dockerd[4661]: time="2021-04-05T11:43:28.887999032-07:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a prefe>
Apr 05 11:43:29 io dockerd[4661]: time="2021-04-05T11:43:29.020490647-07:00" level=info msg="stopping event stream following graceful shutdown" error="<nil>" module=libcontainerd namespace=moby
Apr 05 11:43:29 io dockerd[4661]: time="2021-04-05T11:43:29.021181039-07:00" level=info msg="stopping healthcheck following graceful shutdown" module=libcontainerd
Apr 05 11:43:29 io dockerd[4661]: time="2021-04-05T11:43:29.021283057-07:00" level=info msg="stopping event stream following graceful shutdown" error="context canceled" module=libcontainerd namespace=plugins.mo>
Apr 05 11:43:30 io dockerd[4661]: failed to start daemon: Error initializing network controller: Error creating default "bridge" network: Failed to program NAT chain: COMMAND_FAILED: 'python-nftables' failed: i>
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: internal:0:0-0: Error: Could not process rule: No such file or directory
Apr 05 11:43:30 io dockerd[4661]: JSON blob:
Apr 05 11:43:30 io dockerd[4661]: {"nftables": [{"metainfo": {"json_schema_version": 1}}, {"insert": {"rule": {"family": "inet", "table": "firewalld", "chain": "filter_INPUT_ZONES", "expr": [{"match": {"left": >
Apr 05 11:43:30 io systemd[1]: docker.service: Main process exited, code=exited, status=1/FAILURE
Apr 05 11:43:30 io systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 05 11:43:30 io systemd[1]: Failed to start Docker Application Container Engine.
Apr 05 11:43:30 io systemd[1]: docker.service: Scheduled restart job, restart counter is at 3.
Apr 05 11:43:30 io systemd[1]: Stopped Docker Application Container Engine.
Apr 05 11:43:30 io systemd[1]: docker.service: Start request repeated too quickly.
Apr 05 11:43:30 io systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 05 11:43:30 io systemd[1]: Failed to start Docker Application Container Engine.
Apr 05 11:44:01 io systemd[1]: docker.service: Start request repeated too quickly.
Apr 05 11:44:01 io systemd[1]: docker.service: Failed with result 'exit-code'.
Apr 05 11:44:01 io systemd[1]: Failed to start Docker Application Container Engine.

Event Timeline

alecbcs created this task.Mon, Apr 5, 6:53 PM
JoshStrobl triaged this task as Normal priority.
JoshStrobl edited projects, added Software; removed Lacks Project.

Cheers, will start taking a look tomorrow. Reverting to older Docker isn't a viable solution though given we've moved to cgroups v2 support fully.

During a short investigation, I came across an open bug about docker not removing iptables entries. I decided to take a look at ours and noticed our iptables release is practically prehistoric. Have a local build of a new release already but there are some minor ABI changes so will need to perform local rebuilds before I dig further. Doubt that's the issue, but after spotting it I kinda can't look away from that problem either. Your "Failed to program NAT chain" error isn't something I'm getting locally and didn't noticing the networking bug during my testing, so will need to dig further.

@alecbcs could you let me know what container you're using to validate this and if you have any other sort of network configuration? I tested it with the nextcloud image and it's able to use networking just fine.

@alecbcs Does it work if you disable or temporarily remove firewalld?

@alecbcs could you let me know what container you're using to validate this and if you have any other sort of network configuration? I tested it with the nextcloud image and it's able to use networking just fine.

@JoshStrobl this error occurs just starting the dockerd daemon on the host machine. Haven't got the daemon running to test a docker container with 20.10.5.

@ReillyBrogan Yes, completely scrubbing firewalld from my system does seem to fix the issue. However, simply disabling firewalld has no effect.

Yea, I don't have firewalld on my system, but I do know it has been rather disruptive for folks using snap as well. It appears there is an upstream issue filed with firewalld with it blocking the startup of dockerd. https://github.com/firewalld/firewalld/issues/765

May be worth reporting it to them on that specific issue so they can aggregate details. If you don't mind, I'll close this issue as an upstream one. I'll still get our firewalld and iptables updated though.

Ah ha! Glad to hear this is a firewalld issue and not something with our configuration. It must be some new change with how Docker is accessing the socket connections. I'll add myself to that GitHub issue. Thanks for taking a look at this Josh! Totally feel free to close as an upstream bug. I'll watch for firewalld updates.

JoshStrobl closed this task as Resolved.Fri, Apr 9, 8:40 AM
JoshStrobl added a project: Upstream Issue.