BtrFS and readonly snapshots

In a previous posting I started with BtrFS and as mentioned BtrFS supports snapshotting. With this you can create a point in time copy of a subvolume and even create a clone that can be used as a new working subvolume. To start we first need the BtrFS volume which can and must always be identified as subvolid 0. This as the default volume to be mounted can be altered to a subvolume instead of the real root of a BtrFS volume. We start with updating /etc/fstab so we can mount the BtrFS volume.

LABEL=datavol	/home	btrfs	defaults,subvol=home	0	0
LABEL=datavol	/media/btrfs-datavol	btrfs	defaults,noauto,subvolid=0	0	0

As /media is a temporary file system, meaning it is being recreated with every reboot, we need to create a mountpoint for the BtrFS volume before mounting. After that we create two read-only snapshots with a small delay in between. As there is currently no naming guide for how to call snapshots, I adopted the ZFS naming schema with the @-sign as separator between the subvolume name and timestamp.

$ sudo mkdir -m 0755 /media/btrfs-datavol
$ sudo mount /media/btrfs-datavol
$ cd /media/btrfs-datavol
$ sudo btrfs subvolume snapshot -r home home\@`date "+%Y%M%d-%H%m%S-%Z"`
Create a readonly snapshot of 'home' in './home@20124721-080109-CET
...
$ sudo btrfs subvolume snapshot -r home home\@`date "+%Y%M%d-%H%m%S-%Z"`
Create a readonly snapshot of 'home' in './home@20124721-080131-CET'
$ ls -l
totaal 0
drwxr-xr-x 1 root root 52 nov 21  2010 home
drwxr-xr-x 1 root root 52 nov 21  2010 home@20124721-080109-CET
drwxr-xr-x 1 root root 52 nov 21  2010 home@20124721-080131-CET

We now have two read-only snapshots and lets test to see if they are real read-only subvolumes. The creation a new file shouldn’t be possible.

$sudo touch home@20124721-080109-CET/test.txt
touch: cannot touch `home@20124721-080109-CET/test.txt': Read-only file system

Creating snapshots is fun and handy for migrations or as on disk backup solution, but they do consume space as the delta’s between snapshots is being kept on disk. Meaning that changes between the snapshots are being keept on disk even when you remove them. Freeing diskspace will not only be removing them from the current snapshot, but also removing previous snapshots that include the removed data.

$ sudo btrfs subvolume delete home@20124721-080109-CET
Delete subvolume '/media/btrfs-datavol/home@20124721-080109-CET'
$ ls -l 
totaal 0
drwxr-xr-x 1 root root 52 nov 21  2010 home
drwxr-xr-x 1 root root 52 nov 21  2010 home@20124721-080131-CET

As last step we unmount the BtrFS volume again. This is where ZFS and BtrFS differ too much for my taste. To create and access snapshots on ZFS the zpool doesn’t needs to be mounted, but then again with the first few release of ZFS the zpool needed to mounted as well. So there is still hope as BtrFS is still under development.

$ sudo umount /media/btrfs-datavol

Seeing what is possible with BtrFS, Sun’s TimeSlider becomes an option. Also the option of Live Upgrades with rollbacks as is possible with Solaris 11, but for that BtrFS with read-write snapshots needs to be tested in the near future.

First steps with BtrFS

After using ZFS on Solaris, I missed the ZFS features on Linux and with no chance of ZFS coming to Linux I had to do with MD and LVM. Or at least until BtrFS became mature enough and since the Linux 3.0 that time slowly has come. With Linux 3.0 BtrFS supports autodefragmentation and scrubbing of volumes. The second is maybe the most important feature of both ZFS and BtrFS as it can be used to actively scan data on disk for errors.

The first tests with BtrFS where in a virtual machine already a longtime ago, but the userland tools where still in development. Now the command btrfs follows the path set by Sun Microsystems and basically combines the commands zfs and zpool for ZFS. But nothing compares to a test in the real world and so I broke a mirror and created a BtrFS volume with the name datavol:

$ sudo mkfs.btrfs -L 'datavol' /dev/sdb2

Now we can mount the volume and create a subvolume on it which we are going to be using as our new home volume for users homedirectories.

$ sudo mount /dev/sdb2 /mnt
$ sudo btrfs subvolume create /mnt/home
$ sudo umount /dev/sdb2

When updating /etc/fstab we can tell mount to use the volumename instead of a physical path to a device or some obscure UUID number. Also you can tell which subvolume you want to mount.

LABEL=datavol	/home	btrfs	defaults,subvol=home	0	0

After unmounting and disabling the original volume for /home we can mount everything and copy all the data with rsync for example to see how BtrFS is working in the real world.

$ sudo mount -a

As hinted before scrubbing is important as you can verify that all your data and metadata on disk is still correct. You can do a read-write test by default or only read test to see if all data can be accessed. There is even an option to read parts of the volume that are still unused. In the example below the subvolume for /home is being scrubbed and with success.

$ sudo btrfs scrub status /home
scrub status for afed6685-315d-4c4d-bac2-865388b28fd2
	scrub started at Sat Jan 17 15:11:58 2012, running for 106 seconds
	total bytes scrubbed: 5.77GB with 0 errors
...
$ sudo btrfs scrub status /mnt
scrub status for afed6685-315d-4c4d-bac2-865388b28fd2
	scrub started at Sat Jan 17 15:11:58 2012 and finished after 11125 seconds
	total bytes scrubbed: 792.82GB with 0 errors

The first glances of BtrFS in the real world are a lot better with kernel 3.1 then somewhere with kernel 2.6.30 and I’m slowly starting to say it becomes ready to be included in RHEL 7 of Debian 8 for example as default storage solution. The same as ZFS became in Solaris 11. But it is not all glory as still a lot of work needs to be done.

The first is encryption as the LUKS era ends with BtrFS as it is not smart to put it between your disks and BtrFS. You lose the advantage of balancing data between disks when you do mirroring for example. But then again LVM has the same issue where you then also first need to setup software raid with MD with LUKS on top of it and LVM on top of that. For home directories EncFS maybe an option, but it still leaves a lot of area’s uncovered that would be covered by LUKS out of the box.

The second issue is the integration of BtrFS in distributions and the handling of snapshots. As for now you first need to mount the volume before you can make a snapshot of a subvolume. The same for access a snapshot and for that I think ZFS still has an advantage with the .zfs directory accessible for everyone who has access to the filesystem. But time will tell and for now the first tests look great.

Internet Packet Filter gone wild

Was er recent nog een discussie op usenet in n.c.o.l.n waarbij iemand aangaf dat statefull filtering misschien niet heel verstandig kan zijn. Vandaag kwam er een grappige Sun Alert mijn mailbox binnen.

Sun Alert ID: 274710
Title: Solaris 10 IP Filter (ipfilter(5)) Patches (WITHDRAWN) May Cause a Memory Leak for Systems
With IPF’s Stateful Filtering Configured
Product: Solaris 10 Operating System
Category: Availability
Release Phase: Workaround
Workaround Date: 22-Dec-2009

To view this Sun Alert document please go to the following URL:
http://sunsolve.sun.com/search/document.do?assetkey=1-66-274710-1

Voordat mensen FUD gaan posten, Sun heeft voor oa Solaris een redelijk strikte gate om code te laten opnemen in Solaris en een onderdeel hiervan is peer-review van de code om problemen te voorkomen. Maar blijkbaar slippen er toch soms dingen doorheen zoals bij de 10Gbit/s driver in Solaris wat eigenlijk niet opviel totdat er echt belasting op de driver kwam en de machine bijna door zijn geheugen heen was. Wat dit misschien aangeeft waarom sommige sysadmins wat terughoudender zijn met patches en updates, maar ook met het opzetten van infra niet alleen techniek vereist, maar ook logisch inzicht wat mogelijk is.

Diskinformatie in Linux

Binnen Solaris kan je met behulp van het commando iostat -En opvragen wat oa het serienummer van een harddisk is, maar ook welk versie van de firmware actief is. Maar hoe zie je dit binnen Linux en het was jarenlang de vraag welke file in het /proc-filesysteem je moest hebben. Nu met het commando lshw kan deze informatie opvragen.

$ sudo lshw -C disk
*-disk:0
description: ATA Disk
product: ST31000340AS
vendor: Seagate
physical id: 0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: SD1A
serial: 9QJ0K3J6
size: 931GiB (1TB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=4b1bb901
*-disk:1
description: ATA Disk
product: ST31000340AS
vendor: Seagate
physical id: 1
bus info: scsi@2:0.0.0
logical name: /dev/sdb
version: SD1A
serial: 9QJ0VXJF
size: 931GiB (1TB)
capabilities: partitioned partitioned:dos
configuration: ansiversion=5 signature=e605e605
*-cdrom
description: DVD-RAM writer
product: CDDVDW SH-S223F
vendor: TSSTcorp
physical id: 0.0.0
bus info: scsi@3:0.0.0
logical name: /dev/cdrom
logical name: /dev/cdrw
logical name: /dev/dvd
logical name: /dev/dvdrw
logical name: /dev/scd0
logical name: /dev/sr0
version: SB01
capabilities: removable audio cd-r cd-rw dvd dvd-r dvd-ram
configuration: ansiversion=5 status=nodisc

Het is belangrijk om te beseffen dat oa ook USB-storage zichtbaar wordt bij dit commando.

ZFS vs Btrfs

Solaris 10 ZFS EssentialsIn 2006 integreerde Sun Microsystems een nieuw filesystem onder de naam ZFS in Solaris 10 en vele waren skeptisch. Ook was er voldoende commentaar vanwege de gekozen licentie en het commentaar is er nog steeds nu ZFS ook ontbreekt in Snow Leopard van Apple. Met de overname door Oracle wordt het er mogelijk niet beter op, maar toch staat er een boek voor ZFS op de roadmap.

In dezelfde periode is Oracle begonnen aan Btrfs voor Linux welke in de kernel is op genomen sinds Linux 2.6.29. Nu begint ZFS redelijk volwassen te worden en het heeft wat jaren geduurt, maar hoe staat Btrfs ervoor. Een virtuele testmachine met een paar lege disken en de laatste versie van Debian Testing zou dus voldoende moeten zijn. Helaas is dit nog geen succesvolle combinatie en geeft aan dat Btrfs misschien nog niet helemaal volwassen is.

Als we naar de techniek en implementatie kijken dan zijn er nog meer verschillen tussen ZFS en Btrfs. Waar ZFS beschikt over RAID 0, 1, 10 maar ook over verschillende RAIDZ generaties en combinaties hiervan waarmee RAID 5 en 6 worden geimplementeerd. Dit terwijl Btrfs voorlopig alleen beschikt over RAID0, RAID1 en RAID10 zonder enige uitbreidingen hierop. Ook blijft Btrfs vertrouwen op bv andere RAID- en LVM-oplossingen, maar ook op de hardware die altijd correct is. Een mooi punt van oa ZFS is dat het de hardware meeneemt in de beslissingen waar het snelst te lezen is bv.

Op bijna alle punten lijkt ZFS volwassen te zijn en behoeft Btrfs nog vele jaren ontwikkeling terwijl ZFS al in 2006 productierijp was. En hoewel ZFS strict in Solaris verweven is en een niet GPL-licentie heeft is het zeer zeker de moeite waard om te gebruiken. Het is misschien ook de vraag of met de komt van Solaris 11 en de uitkomst van de Sun-Oracle merger het nog wel interessant is naar Btrfs te kijken. Zeker als je kijkt naar hoe volwassen OpenSolaris nu is tov wat Linux niet biedt. Alleen een licentie is vaak niet voldoende of het moet de moeite waard zijn om te wachten, maar live upgrade met ZFS is tegenwoordig echt de moeite waard en een verademing.