2017-01-26

Docker 1.10 with Devicemapper direct LVM vs. OverlayFS

During some load testing work we had a big problems with too big disk consumption when on Devicemapper direct LVM - our use-case of very many small (cpu/ram-vise) containers is not good for this storage driver. We have experimented and compared Docker 1.10 (docker-1.10.3-59.el7.x86_64) with Devicemapper direct LVM vs. OverlayFS (with XFS as a "hosting" FS) and there are some random numbers in this post. Docker also have page comparing different storage drivers. Red Hat documentation warns about OverlayFS stability, so we will see how it goes later in real load.

Configuring Docker for Devicemapper direct LVM

You need LVM volume group with free space for this.
systemctl stop docker
:>/etc/sysconfig/docker-storage
rm -rf /var/lib/docker/*
echo "VG='volume_group_with_free_space_for_docker'" >/etc/sysconfig/docker-storage-setup
docker-storage-setup
systemctl start docker

Configuring Docker for OverlayFS

Docker's documentation advises to use separate partition for OverlayFS as this storage driver consumes lots of inodes ("overlay2" should be better for that, but afaict this comes with Docker 1.12). We will see how it goes as we did not do any tunings when creating filesystem there. How to create XFS filesystem for OverlayFS and Changing Storage Configuration and "Overlay Graph Driver" below:
systemctl stop docker
sed -i '/OPTIONS=/s/--selinux-enabled//' /etc/sysconfig/docker
:>/etc/sysconfig/docker-storage
rm -rf /var/lib/docker/*
lvcreate --name docker --extents 100%FREE volume_group_with_free_space_for_docker
mkfs -t xfs -n ftype=1 /dev/volume_group_with_free_space_for_docker/docker
echo "/dev/volume_group_with_free_space_for_docker/docker /var/lib/docker xfs defaults 0 0" >>/etc/fstab
mount /var/lib/docker
echo "STORAGE_DRIVER='overlay'" >/etc/sysconfig/docker-storage-setup
docker-storage-setup
systemctl start docker

Comparing Devicemapper direct LVM vs. OverlayFS

This is the container we are using.

Starting containers

# time for i in $(seq 10); do docker run -h "con$i.example.com" -d r7perfsat; done
Devicemapper direct LVM: 38.832s
OverlayFS: 7.530s

Inspecting container sizes

# time docker inspect --size -f "SizeRw={{.SizeRw}} SizeRootFs={{.SizeRootFs}}" some_container
Devicemapper direct LVM: 18.254s
  SizeRw=2917 SizeRootFs=1633247414
OverlayFS: 2.694s
  SizeRw=2921 SizeRootFs=1633247406

Note: not sure if above is a relevant test case.

Stopping containers

# time docker stop 10_containers_we_have_created_before
Devicemapper direct LVM: 4.888s
OverlayFS: 3.266s

Removing stopped containers

# time docker rm 10_containers_we_have_created_before
Devicemapper direct LVM: 2.289s
OverlayFS: 0.206s

Registering containers

Running ansible playbook which registers via subscription-manager to Satellite 6 and installs Katello Agent and does few quick changes to the container on 5 of these containers in parallel:

Devicemapper direct LVM: 2m54.169s
OverlayFS: 2m45.890s

Note that during this load on the docker host seems much smaller on OverlayFS enable docker host, but that night be because of lots of other containers there are running on the host.

Downgrading few packages in containers

Running ansible playbook which downgrades few most small packages on 5 of these containers in parallel:

Devicemapper direct LVM: 1m34.035s
OverlayFS: 1m1.685s

At first glance, OverlayFS seems faster.

2017-01-06

Changing (growing) xfs partition size which is not on the LVM

XFS is a default file systems type in Red Hat Enterprise Linux 7. It is quite well known to me how to grow file system size when it is on LVM, but when it is not, I was very unsure. Anyway, this worked:

I want to increase root partition size and because the host is actually a virtual machine, it is easy to add more space. So in the VM we are using about 50 GB out of 161 GB available disk space. When using fdisk, guide advises to switch it to "sectors" mode. That seems to be default, but option is there just to be sure. Note the start sector (5222400 in my case) for the /dev/vda3 partition which hosts root file system:

[root@rhevm ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        48G   36G   13G  75% /
devtmpfs         11G     0   11G   0% /dev
tmpfs            11G     0   11G   0% /dev/shm
tmpfs            11G  8.4M   11G   1% /run
tmpfs            11G     0   11G   0% /sys/fs/cgroup
/dev/vda1       497M  208M  289M  42% /boot
tmpfs           2.1G     0  2.1G   0% /run/user/0
[root@rhevm ~]# fdisk -u=sectors -l /dev/vda 

Disk /dev/vda: 161.1 GB, 161061273600 bytes, 314572800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000e38b0

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048     1026047      512000   83  Linux
/dev/vda2         1026048     5222399     2098176   82  Linux swap / Solaris
/dev/vda3         5222400   104857599    49817600   83  Linux

If we try to extend XFS file system now, it wont work because it does not have space for the expansion (ignore the actual numbers, as I have taken them after the expansion shown below):

[root@rhevm ~]# xfs_info /
meta-data=/dev/vda3              isize=256    agcount=9, agsize=3113600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6081, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@rhevm ~]# xfs_growfs -D 30000000 /
meta-data=/dev/vda3              isize=256    agcount=9, agsize=3113600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=26214400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6081, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data size 30000000 too large, maximum is 26214400

So, lets delete the partition and create it again at required size (I want it 100 GB big):

[root@rhevm ~]# fdisk -u=sectors /dev/vda 
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): m   
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   g   create a new empty GPT partition table
   G   create an IRIX (SGI) partition table
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): p

Disk /dev/vda: 161.1 GB, 161061273600 bytes, 314572800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000e38b0

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048     1026047      512000   83  Linux
/dev/vda2         1026048     5222399     2098176   82  Linux swap / Solaris
/dev/vda3         5222400   104857599    49817600   83  Linux

Command (m for help): d
Partition number (1-3, default 3): 3
Partition 3 is deleted

Command (m for help): n
Partition type:
   p   primary (2 primary, 0 extended, 2 free)
   e   extended
Select (default p): p
Partition number (3,4, default 3): 3
First sector (5222400-314572799, default 5222400): 5222400
Last sector, +sectors or +size{K,M,G} (5222400-314572799, default 314572799): +100G
Partition 3 of type Linux and of size 100 GiB is set

Command (m for help): p

Disk /dev/vda: 161.1 GB, 161061273600 bytes, 314572800 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0x000e38b0

   Device Boot      Start         End      Blocks   Id  System
/dev/vda1   *        2048     1026047      512000   83  Linux
/dev/vda2         1026048     5222399     2098176   82  Linux swap / Solaris
/dev/vda3         5222400   214937599   104857600   83  Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.
[root@rhevm ~]# partprobe 
Error: Partition(s) 3 on /dev/vda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
[root@rhevm ~]# shutdown -r now

After reboot, we are finally ready to grow the file system:

[root@rhevm ~]# xfs_growfs /
meta-data=/dev/vda3              isize=256    agcount=4, agsize=3113600 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=0        finobt=0 spinodes=0
data     =                       bsize=4096   blocks=12454400, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=6081, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
data blocks changed from 12454400 to 26214400
[root@rhevm ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3       100G   36G   65G  36% /
devtmpfs         11G     0   11G   0% /dev
tmpfs            11G     0   11G   0% /dev/shm
tmpfs            11G  8.4M   11G   1% /run
tmpfs            11G     0   11G   0% /sys/fs/cgroup
/dev/vda1       497M  208M  289M  42% /boot
tmpfs           2.1G     0  2.1G   0% /run/user/0

2017-01-02

Remember that in Bash each command in pipeline is executed in a subshell

This took me some time to debug recently so wanted to share. Not new or so at all, but good to be reminded that from time to time :-)

$ function aaa() {
>   return 10 | true
> }
$ aaa
$ echo $?
0

Above can surprise a bit (why the heck I'm not getting exit code 10 when that return was executed?) especially when hidden in some bigger chunk of code, but with set -o pipefail I get what I wanted:

$ set -o pipefail
$ aaa
$ echo $?
10