202 lines
6.1 KiB
Markdown
202 lines
6.1 KiB
Markdown
# docker-nfsd
|
||
|
||
---
|
||
|
||
### Table of Contents
|
||
1. [Overview](#1-overview)
|
||
2. [Purpose](#2-purpose)
|
||
3. [Architecture](#3-architecture)
|
||
4. [How it Works](#4-how-it-works)
|
||
5. [Why it Was Created](#5-why-it-was-created)
|
||
6. [Limitations and Notes](#6-limitations-and-notes)
|
||
7. [Installation](#7-installation)
|
||
8. [Usage](#8-usage)
|
||
9. [Design Philosophy](#9-design-philosophy)
|
||
10. [License](#10-license)
|
||
|
||
---
|
||
|
||
## 1. Overview
|
||
|
||
`docker-nfsd` is a small daemon that allows Docker and Docker Swarm to mount
|
||
NFS shares directly through the kernel.
|
||
It implements the [Docker VolumeDriver API](https://docs.docker.com/engine/extend/plugins_volume/)
|
||
over a UNIX socket and performs real mounts using the standard
|
||
`mount(2)` and `umount(2)` syscalls.
|
||
|
||
The goal is simple: let Docker request a mount, and let the kernel do the work.
|
||
|
||
---
|
||
|
||
## 2. Purpose
|
||
|
||
Docker supports external "volume drivers" to manage persistent storage.
|
||
In theory, this should make it possible to mount any remote filesystem.
|
||
In practice, on Swarm, **distributed storage that “just works” does not exist**.
|
||
|
||
Even in 2025, Docker provides **no stable driver for NFS, Ceph, or S3** —
|
||
which is absurd, considering NFS has existed longer than Docker itself.
|
||
|
||
Existing NFS “drivers” are mostly containerized plugins written in Go or Python,
|
||
each with their own runtime, namespaces, and orchestration layers.
|
||
They fail on Swarm because mounts are host-level operations that cannot be done
|
||
inside a container in any reliable or consistent way.
|
||
|
||
`docker-nfsd` was written to solve that:
|
||
a native, host-level daemon that performs kernel mounts directly, without tricks.
|
||
|
||
---
|
||
|
||
## 3. Architecture
|
||
|
||
```
|
||
|
||
┌──────────────────────────────┐
|
||
│ Docker Daemon │
|
||
│ (Engine / SwarmKit) │
|
||
└──────────────┬───────────────┘
|
||
│ HTTP over UNIX socket
|
||
▼
|
||
┌──────────────────────────────┐
|
||
│ docker-nfsd │
|
||
│ Implements VolumeDriver API │
|
||
│ - Receives JSON requests │
|
||
│ - Calls mount(2)/umount(2) │
|
||
│ - Exposes mountpoints │
|
||
└──────────────┬───────────────┘
|
||
│ Kernel syscalls
|
||
▼
|
||
┌──────────────────────────────┐
|
||
│ Linux Kernel │
|
||
│ NFS client (v4.1) │
|
||
└──────────────────────────────┘
|
||
|
||
````
|
||
|
||
---
|
||
|
||
## 4. How it Works
|
||
|
||
1. **Socket registration**
|
||
The daemon listens on `/run/docker/plugins/docker-nfsd.sock`.
|
||
|
||
2. **Docker interaction**
|
||
When Docker needs a volume, it connects to that socket and issues
|
||
JSON-encoded requests (`/Plugin.Activate`, `/VolumeDriver.Mount`, etc.).
|
||
|
||
3. **Volume creation and exposure**
|
||
`docker-nfsd` creates a dedicated mount directory under
|
||
`/var/lib/docker-volumes-nfsd/<volume-name>/`
|
||
and performs a real kernel mount using:
|
||
```c
|
||
mount(server:path, target, "nfs4", MS_MGC_VAL, "nfsvers=4.1,rw,noatime,soft");
|
||
```
|
||
|
||
Once mounted, Docker bind-mounts that directory into the container.
|
||
The container never sees NFS directly — only the mounted directory.
|
||
|
||
4. **Unmount and cleanup**
|
||
When the container stops, Docker calls `/VolumeDriver.Unmount`.
|
||
The daemon executes `umount(2)` and releases the directory.
|
||
|
||
At no point does this involve FUSE, RPC daemons, or helper binaries.
|
||
All logic happens through kernel syscalls.
|
||
|
||
---
|
||
|
||
## 5. Why it Was Created
|
||
|
||
While setting up Swarm clusters with shared storage, we encountered
|
||
the following hard limitations:
|
||
|
||
* **Docker provides no reliable host-level storage driver** for NFS or Ceph.
|
||
* The so-called “official” NFS drivers run *inside containers*, which
|
||
cannot perform kernel-level mounts on the host.
|
||
* Other solutions rely on heavyweight sidecars or Go daemons
|
||
that introduce complexity without solving the actual problem.
|
||
|
||
Our requirement was simple:
|
||
|
||
* Persistent shared volumes across Swarm nodes.
|
||
* No extra layers of abstraction.
|
||
* A driver that survives restarts and behaves like any normal service.
|
||
|
||
So we wrote `docker-nfsd` — a 100 KB C daemon that does exactly that.
|
||
|
||
---
|
||
|
||
## 6. Limitations and Notes
|
||
|
||
* **Privileges:** Requires root privileges only as a normal system daemon.
|
||
(No different than `sshd`, `nfsd`, or `dockerd` itself.)
|
||
* **NFS Versions:** Uses NFSv4 by default, but older versions are supported
|
||
by the kernel and can be negotiated automatically if desired.
|
||
* **Concurrency:** Each mount request is independent.
|
||
The daemon can handle multiple volumes concurrently — one mount per request,
|
||
not one per host. Docker may issue many simultaneous mounts, and it will work.
|
||
* **Scope:** Designed for Linux systems using Docker Swarm or standalone Docker.
|
||
(If you’re running Swarm on Windows, you’re on your own — and probably deserve it.)
|
||
|
||
---
|
||
|
||
## 7. Installation
|
||
|
||
### Build and install
|
||
|
||
```bash
|
||
apt install build-essential libmicrohttpd-dev
|
||
make
|
||
sudo make install
|
||
sudo systemctl enable --now docker-nfsd
|
||
```
|
||
|
||
This installs the binary under `/usr/local/sbin/docker-nfsd`
|
||
and registers a systemd service unit.
|
||
|
||
---
|
||
|
||
## 8. Usage
|
||
|
||
Example:
|
||
|
||
```bash
|
||
docker volume create -d nfsd \
|
||
--opt server=127.0.0.1 \
|
||
--opt path=/exports/data \
|
||
myvolume
|
||
|
||
docker run --rm -v myvolume:/mnt alpine df -h /mnt
|
||
```
|
||
|
||
`docker-nfsd` will:
|
||
|
||
* create `/var/lib/docker-volumes-nfsd/myvolume`
|
||
* mount `127.0.0.1:/exports/data` there
|
||
* return the mountpoint to Docker
|
||
|
||
From Docker’s perspective, it’s a normal persistent volume.
|
||
|
||
---
|
||
|
||
## 9. Design Philosophy
|
||
|
||
* Written in **pure C** for transparency and performance.
|
||
* Uses only **libmicrohttpd** and **syscalls**.
|
||
* Does one job, and does it predictably.
|
||
* Follows the same principle as every proper Unix daemon:
|
||
|
||
> “Start once, listen quietly, do your work, and stay out of the way.”
|
||
|
||
---
|
||
|
||
## 10. License
|
||
|
||
GPLv2 with the Affero clause (free as in freedom, and free as in beer).
|
||
|
||
Use it, modify it, improve it, or ignore it.
|
||
Just don’t rewrite it in Go.
|
||
|
||
---
|
||
|
||
### — Sotiris from Greece
|
||
|