There is a subject which seems to be completely abstruse to many users of containers on Linux, it is about sharing data between a host and a container or between containers.
I do think that solving this problem is not much different than it is without containers on Linux and on Unix. From my perspective, there is no much difference between managing file permissions with or without containers, the big change for me is the introduction of namespaces, especially the user namespaces.
So what is exactly the problem? And where does it come from?
The problem is that when running a process within a container, that process will run with a certain user and group ID (respectively UID and GID) and that those IDs might differ from the ones of the caller (the user creating and running the container), this might not be obvious. This is especially true with container technologies like Docker which by default will run the process within the container as root
(unless overridden in the Dockerfile or command line) when any user with write access to the Docker socket can create such container. So you have by default a discrepancy for the UID and GID between the caller – probably a standard user – and a random Docker container.
In traditional Unix / Linux, this is “normal” or “expected” behaviour. You usually cannot run a process as root from your normal user unless you use sudo
or a setuid
program, so usually you do not have the problem that a program you launch might have different UID/GID than your own user. And when you use a program with sudo
you understand that this might become a problem, so if you use sudo
to run `tcpdump -w net-trace.pcap
` you know the file net-trace.pcap
will be owned by root and that you might not be able to access or delete it. This reflex needs to apply to running a container as well.
When you have done Unix/Linux development most of your career – and that you have adopted the principle of least privileges … I still know of few people only using the root
account – you are used to create application that will run in the background (as a service) under a dedicated user and for which you need to handle the permissions for the data this application might need to use. So introducing containers (without user namespaces) should not bring any surprise here, it is part of the expectations. But you will see later that you can still be bitten by some edge cases from the container implementation.
So, let us see how to fix this problem of User/Group ID and file permissions. Note that the solution would be similar if you would use containers or not, and applies to all container implementations (e.g. LXC, Docker, etc.). Then, for everyone, we will see how to handle file permissions when using user namespaces (hint, the principles are the same, but it requires a few extra steps to understand what will be the effective UID/GID). Finally, in the case of Docker, we will see a few edge cases where you can still get off guard with respect to file permissions and volume declaration inside a Dockerfile.
A Sample problem illustration
Here is an example container where we hit the problem of file permission between the host and container. I’m running the following example as the `itsme` user which has no particular rights other than read/write access to the local Dockerd service.
$ whoami itsme $ mkdir ~/test-perm; cd ~/test-perm $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir alpine:latest touch protected $ ls -l protected -rw-r--r-- 1 root root 0 Jul 17 22:00 protected $ echo Hello > protected && cat protected bash: protected: Permission denied
In the above example, I’m running a container based on Alpine Linux (it would work the same with other base images like Ubuntu, Debian, etc.) in which I create a file in the working folder which is a Docker bind-mount “volume”. The file created has root ownership because Docker containers run as root by default (unless overridden on the command line or in a Dockerfile/docker images). Therefore as a normal non-root user, I cannot write to that file and I get a permission denied error.
Note: Docker support 3 types of data storage for containers: bind-mounts (mounting a file or directory from the host inside the container), named volumes (or simply called “volumes”, can provide more flexibility than bind-mounts as you can use different drivers for the backend) and tmpfs mounts (for temporary data). In this article, I will mostly talk about bind-mounts, and I will at the end explain how it is different for named volumes.
Handling File Permissions with Containers
There are several ways to fix the above problem. But the principle is always the same: Unix file permission or ACL must match. This means that both the user launching the container and the running process need to share either the user ID, or have a common group ID or set proper permission on the file/directories and even use things like ACL.
Concretely to solve our example, we can either set the running process to use the same UID as our user, or set file permission so that both can read/write.
Setting the UID for the container running process
With Docker, you can use the --user <user|uid>
to set the user of the running process (or processes). This might not always work as expected if you are using user namespaces (we will see for that after) or if the running process is changing its effective ID (e.g. nginx).
$ whoami itsme $ mkdir ~/test-perm; cd ~/test-perm $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir --user $(id -u) alpine:latest touch protected $ ls -l protected -rw-r--r-- 1 itsme root 0 Jul 17 22:49 protected $ echo Hello > protected && cat protected Hello
In this example, we can see that the file created within the container has the same ownership than the calling user. And because, the user has read/write access we can write to that file and print it.
There is one side effect of this technique, you should use user ID rather that user name (same applies to group) because there is no guarantee that a username inside the container as the same UID as the user with the same username on the host. Example, if you have installed nginx on the host, it might have a completely different UID than the nginx user you would use inside a container. So this is not the easiest solution, but using a user other than root inside your container is highly recommended. Thus you could consider the next solutions as a complementary ones.
Using ACL
ACL are Access Control Lists, they allow to go further than the “simpler” Unix traditional permissions by adding permissions for more than one user or groups. In addition, it allows inheritance of permissions. So we can set permissions to a directory and mark them for inheritance and all files and sub-directories will inherit these permissions. We will use this inheritance feature on the working directory so that the files created within the container still retain the permissions on the parent directory.
$ whoami itsme $ mkdir ~/test-perm; cd ~/test-perm $ setfacl -dm "u:itsme:rw" ~/test-perm $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir alpine:latest touch protected $ ls -l protected -rw-rw-r--+ 1 root root 0 Jul 17 23:11 protected $ echo Hello > protected && cat protected Hello
To understand why my user can write to the protected
file, we need to check the ACL permissions on it. You can see there are some extra permissions because ls -l
shows a little +
sign after the traditional Unix permissions.
$ getfacl protected # file: protected # owner: root # group: root user::rw- user:itsme:rw- group::r-x #effective:r-- mask::rw- other::r--
You can see that the permission we have set on the parent directory have been inherited. This explains why we were able to write and print that file.
ACL are very powerful and not to complex to use, so I really recommend taking a look at them and using them. However, ACL are “optional” and might not be activated on your file system (pretty rare) or difficult to use with special file systems (e.g. NFS) or the tools to manipulate them might not be installed by default on your server.
Other solutions
My example was a very limited one where the container creates an empty file and exit. So there is just a few solutions to be able to write to that file as the “calling” user. In a real world example, where you have a server process (e.g. a web server like nginx, an application server like jetty or databases like mongodb, etc.) which is stateful and needs to store data which you need to access from the host or another container, you can adapt your container and the volume so that multiple users can read or write data. You can prepare the volume to have the necessary permissions so that the UID or GID that the process will have inside the container have permissions to access the data you want it to read or write.
If you want an example on how to do it, you can refer to my GitHub project to run Ubiquiti UniFi Controller application inside Docker. The application and database servers running inside that container do not use the root user, they use a special dedicated user so the container run unprivileged. Therefore for the data volume, there is some extra steps to take like settings permissions and ownership so that the container will be able to access and modify and create files. It actually works, even when using user namespace which is going to be our next discussion.
Another solution could be to try to fix permission problems inside the container by forcing ownership of files and directories during start-up or shutdown of the application. But my opinion is that this should be avoided as this will cause lots of trouble when you will want to deploy such container in production environments where restrictions are potentially higher by removing certain (or all) capabilities, by using user namespaces, etc. This is especially true for Docker which has on its running containers by default CAP_CHOWN and CAP_FOWNER (see capabilities definition) to override file permissions or ownership.
Shared volumes between containers
It is possible to share named volumes or bind-mounts volumes between multiple containers. Those cases are actually similar to the above one with one caveat to keep in mind, if you use different base images for your containers, there is no guarantee that a user with the same name as the same UID or GID in both containers. So you need to make sure that the users created inside those containers share the same UID/GID or you need to use ACL in order to force proper file permissions.
In case of user namespaces (see next chapter for more details), the containers sharing the same volumes should either force to use the same UID or GID within the same UID mapping (see “User and group ID mappings” section of the User Namespaces documentation) or use ACL to set permissions for the UID/GID pairs from each user namespaces.
User namespaces
I will not detail here how to set-up Docker for using user namespaces, you can check online literature for that. But I really do recommend to activate it on your Docker host service, because it is easy to deactivate on a per container basis, but you cannot activate it for one container if your Docker host service was not configured for user namespaces. Take care that using user namespaces might make harder setting proper volume permissions and might break some containers which expect to run as root and will fail to perform certain commands due to their limited privileges.
For those who do not know what are user namespaces, and very briefly, this is a technique offered by the Linux Kernel to create a new namespace where users and groups – and especially UID and GID – will not map to user or group in another such namespace. So root
inside one namespace does not map to root
on the host nor does it map to root
on another namespace.
So our above example fails in a completely new way:
$ whoami itsme $ mkdir ~/test-perm; cd ~/test-perm $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir alpine:latest touch protected touch: protected: Permission denied
This time, the container fails to create the protected file. This is because the user inside the container is no longer root
, or more exactly it is root
but its UID on the host is not 0
, therefore technically it is just a normal user. So the first thing to do is to set properly the permission so that the container can write to the folder and create new file. Basically it means that the fake root
inside the container need the write
attribute on the /workdir
directory. This attribute can be added using standard Unix permissions or using ACL as we have seen above. In those cases, I like to give UID/GID ownership on the directory to the user running inside the container and add myself in the ACL.
I am not using the solution where we set the running UID of the container as this will not work for shared data between host and container. This is because like for the root user, the UID of the running process inside the container is going to be different on the host. However, to share data only between two or more containers – if both container use the same user namespace, e.g. the default dockremap – it is possible to use this solution.
On my test system, I have use the “default” dockremap
user. So I have set userns-remap
to default
inside the Docker daemon.json
file. After having restarted Docker, a new user dockremap
has been created and the file /etc/subuid
and /etc/subgid
have been updated for that new user. On my test system both range starts at 165536:
$ grep dockremap /etc/subuid /etc/subgid /etc/subuid:dockremap:165536:65536 /etc/subgid:dockremap:165536:65536
So the root
user inside the container will have a UID/GID of 0
within this user namespace, so on the host it will be equal to 0+165536
. Armed with this knowledge, we just need to set the proper permissions on the ~/test-perm
directory and we can solve our permissions denied trouble. Here are the commands continuing from last one above:
$ sudo chown 165536:165536 ~/test-perm/ $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir alpine:latest touch protected $ ls -l protected -rw-r--r-- 1 165536 165536 0 Jul 18 21:17 protected $ echo Hello > protected && cat protected bash: protected: Permission denied
So now we are back with the same problem which we already solved, so my recommendation would be to use ACL. Here is the complete example again:
$ whoami itsme $ mkdir ~/test-perm; cd ~/test-perm $ setfacl -dm "u:itsme:rw" ~/test-perm $ sudo chown 165536:165536 ~/test-perm/ $ docker run -it --rm -v ~/test-perm:/workdir --workdir /workdir alpine:latest touch protected $ ls -l protected -rw-rw-r--+ 1 165536 165536 0 Jul 18 21:23 protected $ echo Hello > protected && cat protected Hello
Et voilà ! Actually, the above crude example on using user namespace is very very basic.
Important note: some Docker image maintainers are setting the file and directory permissions in a start-up script which is executed when the container is starting. This would work when not using user namespaces as the container’s root
user will have the right to override the file permissions and set new ones. However when using user namespaces, the container’s root
user is just a standard user for the host, so the start-up script will fails to set file permission properly. If you encounter such problem, contact the image maintainer, they should fix it. If they do not, then you can circumvent this issue by preparing the data so that the container’s user is already the rightful owner of the files and directories (and using ACL if you need more fine grained accessed control). By doing so you might be lucky that the startup script does not set the permissions and thus does not fail.
If you want to know more about how to use user namespace in combination with Docker, you should have a look online but here are some nice pointers:
- Julien Enselme – User namespaces and Docker Volumes
- Jean-Tiare Le Bigot – Introducing User Namespace for Docker
- Linux.com – Using User namespace in Docker
Edge cases, things to know
Eventhough, there is nothing really new with containers and file permission, there are a few quirks and edge cases you should know.
Docker Named Volumes can be pre-populated
When you start a container which will creates a new named volume, this named volume will be automatically populated by the content of the Docker image at the path pointed by the volume.
So if you do docker run --rm -v wordpress-src:/usr/src/wordpress --entrypoint echo wordpress:fpm-alpine "volume created"
this will print the message “volume created” and exit the container, the container is then deleted but not its named volume which should contains the WordPress php source code.
You can see that the volume persist by:
$ docker volume lsDRIVER VOLUME NAME local wordpress-src
And now we can run another container to display it:
$ docker run --rm -t -v wordpress-src:/usr/src/wordpress alpine ls /usr/src/wordpress/ index.php wp-config-sample.php wp-mail.php license.txt wp-content wp-settings.php [...]
You can clean-up the volume using docker volume rm wordpress-src
.
In such a scenario, the pre-population of the volume happens using the UID and GID of the files and directories as set in the container image which was used to start and created the volume.
VOLUME instruction in a Dockerfile
A Dockerfile contains the “source” of a container image, it is the specification or description of how to build the content of a container image. This file contains a set of defined instructions which are then executed to build the container image.
Ones such instruction is the VOLUME ...
. This instruct Docker (or the image builder) to mark a certain path as a volume which will be created, populated and mounted at runtime (even if the user does not specify it). This is used to hold the data that should persist even after the deletion of the container, or this can be used to mark the path where data might be altered, e.g. when running containers in read-only mode.
So when the container will be started, Docker will create a named volume and populate it with any data that exists at the pointed location. But there is a major caveat when using VOLUME
instructions, if any successive instructions in this Dockerfile or any other Dockerfiles based on this one is trying to modify the data within the volume, these modifications are silently DISCARDED!
My recommendations are either:
- do NOT use the
VOLUME
instruction in your Dockerfile; or - if you really really want to use it, put it at the end! But try considering not putting it.
Footnotes
Image credits: Photo container ship by Konflikty.pl (CC BY) with the “Denied” stamp by tswedensky (CC0)