Sadly ZFS on Linux is not at the same maturity level than on FreeBSD (or even Solaris). There is a FUSE implementation but it is now more than 16 month since anything happen there, and in my opinion not yet stable. Regarding native ZFS port, only one ZFS implementation for Linux is still developed by the Lawrence Livermore National Laboratory but it is still a release candidate version.
The state of ZFS on Linux is perhaps not too good today, but there is another file system in development and good support that could soon compete with ZFS, its name is btrfs (pronounce ‘butter-fs‘). Btrfs is still experimental
Yesterday, one of my virtual machines running Oracle Linux 6.3 got its root file system full, as it was configured with LVM it was not so much trouble but I wanted to try btrfs. I decided to move the /var to another partitions using btrfs. I have created a new hard disk in my VM and started it. Here is the rest of the story.
Warning: following these instructions might break your system. As an advice, create a virtual machine and experience with it before doing so on a real system.
Partitioning
First task was to partition the new hard disk. As it is a VM disk, it was only a few tenth of GB big, but I wanted to do as a real life disk which can be nowadays 2 to 4 TB. For partition bigger than 2 TB, you cannot use the traditional MBR partitioning scheme, I went for GPT which is not too annoying if it is not the boot disk.
$ sudo parted /dev/sdb mklabel gpt $ sudo parted -a optimal /dev/sdb mkpart primary 8MB 98%
I have left 8 MB a the beginning of the disk, and did not use it full. It is often better doing so if you want to mirror it later on.
Formatting
There is no terminology such as pool with btrfs, although formatting a partition as btrfs is kind of creating a pool. In btrfs language this is called a volume.
$ sudo mkfs.btrfs /dev/sdb1 WARNING! - Btrfs Btrfs v0.19 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdb1 nodesize 4096 leafsize 4096 sectorsize 4096 size 15.67GB Btrfs Btrfs v0.19 $ sudo btrfs filesystem show /dev/sdb1 Label: none uuid: a18c65b0-83d9-4cb2-b183-d01654bdea9d Total devices 1 FS bytes used 28.00KB devid 1 size 15.67GB used 2.04GB path /dev/sdb1 Btrfs Btrfs v0.19
Btrfs volumes and subvolumes, a story of boxes
I would really recommend to read the article about btrfs and the boxes story by Funtoo Linux. To make a quick summary, if one creates a subvolume and mount it, only the files in the subvolumes are visible, not the one outside. The main volume sees everything, because it is the root. If you know LVM or ZFS, think that the root volume is an equivalent to the volume group for LVM and the pool for ZFS, not completely true. Subvolumes created under the main volume are called Logical Volume in LVM and file system in ZFS, again not completely true but the analogy is good enough.
So I want to create a subvolume where the partition /var will be hosted. This has the advantage that I could create another subvolume for /usr and /var won’t see the file in /usr.
Creating subvolumes
When using ZFS and creating a pool, the pool is directly accessible (no need to mount it). This is not the case for btrfs. After formatting the partition, the root volume needs to be mounted. Then I will create a var subvolume within this root.
$ sudo mkdir /mnt/root $ sudo mount /dev/sdb1 /mnt/root $ cd /mnt/root $ sudo btrfs subvolume create var $ sudo btrfs subvolume list /mnt/root ID 256 top level 5 path var
You should note down the ID of the subvolume, it is needed to mount it. Which I will do now (adding compression and space_cache).
$ sudo mkdir /mnt/var $ sudo mount -o subvolid=256,compress,space_cache /dev/sdb1 /mnt/var
Migrating the data to the new mount point
I need now to copy over all the files and their metadata to the subvolume. This is done using rsync so that as much metadata as possible are transferred over.
$ sudo rsync --progress -aHAXS /var/ /mnt/var
A short explanation about the previous command:
- ‘a’ is for the archive mode, which will keep file types (symlink, etc.) and many other attributes;
- ‘H’, ‘A’ and ‘X’ are preserving hard links and attributes metadata;
- ‘S’ allows to handle sparse files (if any) efficiently;
- It is ‘/var/’ and not ‘/var’ because the trailing slash avoids the creation of the var directory at the destination.
If the system is using SELinux, I can either reboot twice (the first time it will restore the SELinux context) or I can restore the SELinux context manually and reboot once. I chose the second.
$ cd /mnt/var $ sudo /sbin/restorecon -R *
Let’s have a look at the disk space usage. File system usage is a bit peculiar with btrfs.
$ sudo btrfs filesystem df /mnt/var Data: total=1.01GB, used=106.62MB System, DUP: total=8.00MB, used=4.00KB System: total=4.00MB, used=0.00 Metadata, DUP: total=1.00GB, used=4.73MB Metadata: total=8.00MB, used=0.00 $ df -Th /dev/sdb1 btrfs 16G 145M 14G 2% /mnt/var $ sudo du -sh /var 306M /var
I’m still unsure myself how to fully interpret those lines.
System configuration update
The configuration file which holds the information about the mount points needs to be changed. In my case /var was mounted as part of the / partition. So I just needed to add a new line in the file /etc/fstab
, if you had already /var on a dedicated partition, then you need to update the corresponding line.
/dev/sdb1 /var btrfs defaults,subvolid=256,compress,space_cache 1 2
Switching to the new var
I need to close all my running applications and log out. I then open a virtual terminal using the combination Ctrl+Alt+F2. If using a virtualization technology, with VirtualBox it is RightCtrl+F2 on Linux host or Ctrl+Alt+F2 on Windows host, with VMware it is Alt+F2. I am switching to init 3 to stop a few more processes (esp. X11), to stop the logging system, to rename the old /var and to reboot.
$ sudo init 3 $ sudo service rsyslog stop $ sudo mv /var /varold $ sudo reboot
After the reboot, check that /var is the correct mount point using df -Th /var
and that files are updated there: sudo ls -l /var/log/messages
. After a few days, you might want to delete /varold. I did it right away to free-up some precious space.