Ezjail host
Background
This is my personal checklist for when I am setting up a new ezjail host. I like my jail hosts configured in a very specific way. There is a good chance that what is right for me is not right for you. As always, YMMV.
Also note that I talk a lot about the German hosting provider Hetzner, if you are using another provider or you are doing this at home, just ignore the Hetzner specific stuff. Much of the content here can be used with little or no changes outside Hetzner.
Installation
OS install with mfsbsd
After receiving the server from Hetzner I boot it using the rescue system which puts me at an mfsbsd prompt via SSH. This is perfect for installing a zfs-only server.
Changes to zfsinstall
I edit the zfsinstall script /root/bin/zfsinstall and add "usr" to FS_LIST near the top of the script. I do this because I like to have /usr as a seperate ZFS dataset.
Check disks
I create a small zpool using just 30gigs, enough to confortably install the base OS and so on. The rest of the diskspace will be used for GELI which will have the other zfs pool on top. This encrypted zpool will house the actual jails and data. This setup allows me to have all the important data encrypted, while allowing the physical server to boot without human intervention like full disk encryption would require.
Note that the disks in this server are not new, they have been used for around two years (18023 hours/24 = 702 days):
[root@rescue ~]# grep "ada[0-9]:" /var/run/dmesg.boot | grep "MB " ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C) [root@rescue ~]# smartctl -a /dev/ada0 | grep Power_On_Hours 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 18023 [root@rescue ~]# smartctl -a /dev/ada1 | grep Power_On_Hours 9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 18023 [root@rescue ~]#
Destroy existing partitions
Any existing partitions need to be deleted first. This can be done with the destroygeom command like shown below:
[root@rescue ~]# destroygeom -d ada0 -d ada1
Destroying geom ada0:
    Deleting partition 3 ... done
Destroying geom ada1:
    Deleting partition 1 ... done
    Deleting partition 2 ... done
    Deleting partition 3 ... done
Install FreeBSD
Installing FreeBSD with mfsbsd is easy. I run the below command, adjusting the release I want to install of course:
[root@rescue ~]# zfsinstall -d ada0 -d ada1 -r mirror -z 30G -t /nfs/mfsbsd/10.0-release-amd64.tbz
Creating GUID partitions on ada0 ... done
Configuring ZFS bootcode on ada0 ... done
=>        34  3907029101  ada0  GPT  (1.8T)
          34        2014        - free -  (1M)
        2048         128     1  freebsd-boot  (64k)
        2176    62914560     2  freebsd-zfs  (30G)
    62916736  3844112399        - free -  (1.8T)
Creating GUID partitions on ada1 ... done
Configuring ZFS bootcode on ada1 ... done
=>        34  3907029101  ada1  GPT  (1.8T)
          34        2014        - free -  (1M)
        2048         128     1  freebsd-boot  (64k)
        2176    62914560     2  freebsd-zfs  (30G)
    62916736  3844112399        - free -  (1.8T)
Creating ZFS pool tank on ada0p2 ada1p2 ... done
Creating tank root partition: ... done
Creating tank partitions: var tmp usr ... done
Setting bootfs for tank to tank/root ... done
NAME            USED  AVAIL  REFER  MOUNTPOINT
tank            270K  29.3G    31K  none
tank/root       127K  29.3G    34K  /mnt
tank/root/tmp    31K  29.3G    31K  /mnt/tmp
tank/root/usr    31K  29.3G    31K  /mnt/usr
tank/root/var    31K  29.3G    31K  /mnt/var
Extracting FreeBSD distribution ... done
Writing /boot/loader.conf... done
Writing /etc/fstab...Writing /etc/rc.conf... done
Copying /boot/zfs/zpool.cache ... done
Installation complete.
The system will boot from ZFS with clean install on next reboot
You may type "chroot /mnt" and make any adjustments you need.
For example, change the root password or edit/create /etc/rc.conf for
for system services.
WARNING - Don't export ZFS pool "tank"!
[root@rescue ~]#
Post install configuration (before reboot)
Before rebooting into the installed FreeBSD I need to make certain I can reach the server through SSH after the reboot. This means:
- Adding network settings to /etc/rc.conf
- Adding sshd_enable="YES" to /etc/rc.conf
- Change PermitRootLogin to Yes in /etc/ssh/sshd_configNote: In the current This is now the default in the zfsinstall image that Hetzner provides
- Add nameservers to /etc/resolv.conf
- Finally I set the root password.
All of these steps are essential if I am going to have any chance of logging in after reboot. Most of these changes can be done from the mfsbsd shell but the password change requires chroot into the newly installed environment.
I use the chroot command but start another shell as bash is not installed in /mnt:
[root@rescue ~]# chroot /mnt/ csh rescue# ee /etc/rc.conf rescue# ee /etc/ssh/sshd_config rescue# passwd New Password: Retype New Password: rescue#
So, the network settings are sorted, root password is set, and root is permitted to ssh in. Time to reboot (this is the exciting part).
Remember to use shutdown -r now and not reboot when you reboot. shutdown -r now performs the proper shutdown process including rc.d scripts and disk buffer flushing. reboot is the "bigger hammer" to use when something is preventing shutdown -r now from working.
Basic config after first boot
If the server boots without any problems, I do some basic configuration before I continue with the disk partitioning.
Timezone
I run the command tzsetup to set the proper timezone, and set the time using ntpdate if neccesary.
Note: The current hetzner freebsd image has the timezone set to CEST, I like my servers configured as UTC
Basic ports
I also add some basic ports with pkg so I can get screen etc. up and running as soon as possible:
# pkg install bash screen sudo portmaster
I then add the following to /usr/local/etc/portmaster.rc:
ALWAYS_SCRUB_DISTFILES=dopt PM_DEL_BUILD_ONLY=pm_dbo SAVE_SHARED=wopt PM_LOG=/var/log/portmaster.log PM_IGNORE_FAILED_BACKUP_PACKAGE=pm_ignore_failed_backup_package
An explanation of these options can be found on the Portmaster page.
After a rehash and adding my non-root user with adduser, I am ready to continue with the disk configuration. I also remember to disable root login in /etc/ssh/sshd_config.
Further disk configuration
After the reboot into the installed FreeBSD environment, I need to do some further disk configuration.
Create swap partitions
Swap-on-zfs is not a good idea for various reasons. To keep my swap encrypted but still off zfs I use geli onetime encryption. To avoid problems if a disk dies I also use gmirror. First I add the partitions with gpart:
$ sudo gpart add -t freebsd-swap -s 10G /dev/ada0 ada0p3 added $ sudo gpart add -t freebsd-swap -s 10G /dev/ada1 ada1p3 added $
Then I make sure gmirror is loaded, and loaded on boot:
$ sudo sysrc -f /boot/loader.conf geom_mirror_load="YES" $ sudo kldload geom_mirror
Then I create the gmirror:
sudo gmirror label swapmirror /dev/ada0p3 /dev/ada1p3
Finally I add the following line to /etc/fstab to get encrypted swap on top of the gmirror:
/dev/mirror/swapmirror.eli none swap sw,keylen=256,sectorsize=4096 0 0
I can enable the new swap partition right away:
$ sudo swapon /dev/mirror/swapmirror.eli $ swapinfo Device 1K-blocks Used Avail Capacity /dev/mirror/swapmirror.eli 8388604 0 8388604 0% $
Create GELI partitions
First I create the partitions to hold the geli devices:
$ sudo gpart add -t freebsd-ufs ada0 ada0p4 added $ sudo gpart add -t freebsd-ufs ada1 ada1p4 added
I add them as freebsd-ufs type partitions, as there is no dedicated freebsd-geli type.
Create GELI key
To create a GELI key I copy some data from /dev/random:
$ sudo dd if=/dev/random of=/root/geli.key bs=256k count=1 1+0 records in 1+0 records out 262144 bytes transferred in 0.003347 secs (78318372 bytes/sec) $
Create GELI volumes
I create the GELI volumes with 4k blocksize and 256bit AES encryption:
$ sudo geli init -s 4096 -K /root/geli.key -l 256 /dev/ada0p4
Enter new passphrase:
Reenter new passphrase:
Metadata backup can be found in /var/backups/ada0p4.eli and
can be restored with the following command:
        # geli restore /var/backups/ada0p4.eli /dev/ada0p4
$ sudo geli init -s 4096 -K /root/geli.key -l 256 /dev/ada1p4
Enter new passphrase:
Reenter new passphrase:
Metadata backup can be found in /var/backups/ada1p4.eli and
can be restored with the following command:
        # geli restore /var/backups/ada1p4.eli /dev/ada1p4
$
Enable AESNI
Most Intel CPUs have hardware acceleration of AES which helps a lot with GELI performance. I load the aesni module during boot from /boot/loader.conf:
$ sudo sysrc -f /boot/loader.conf aesni_load="YES"
Attach GELI volumes
Now I just need to attach the GELI volumes before I am ready to create the second zpool:
$ sudo geli attach -k /root/geli.key /dev/ada0p4 Enter passphrase: $ sudo geli attach -k /root/geli.key /dev/ada1p4 Enter passphrase: $
Create second zpool
$ sudo zpool create gelipool mirror /dev/ada0p4.eli /dev/ada1p4.eli
$ zpool status
  pool: gelipool
 state: ONLINE
  scan: none requested
config:
        NAME            STATE     READ WRITE CKSUM
        gelipool        ONLINE       0     0     0
          mirror-0      ONLINE       0     0     0
            ada0p4.eli  ONLINE       0     0     0
            ada1p4.eli  ONLINE       0     0     0
errors: No known data errors
  pool: tank
 state: ONLINE
  scan: none requested
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
errors: No known data errors
$
Create ZFS filesystems on the new zpool
The last remaining thing is to create a filesystem in the new zfs pool:
$ zfs list NAME USED AVAIL REFER MOUNTPOINT gelipool 624K 3.54T 144K /gelipool tank 704M 28.6G 31K none tank/root 704M 28.6G 413M / tank/root/tmp 38K 28.6G 38K /tmp tank/root/usr 291M 28.6G 291M /usr tank/root/var 505K 28.6G 505K /var $ sudo zfs set mountpoint=none gelipool $ sudo zfs set compression=on gelipool $ sudo zfs create -o mountpoint=/usr/jails gelipool/jails $ zfs list NAME USED AVAIL REFER MOUNTPOINT gelipool 732K 3.54T 144K none gelipool/jails 144K 3.54T 144K /usr/jails tank 704M 28.6G 31K none tank/root 704M 28.6G 413M / tank/root/tmp 38K 28.6G 38K /tmp tank/root/usr 291M 28.6G 291M /usr tank/root/var 505K 28.6G 505K /var $
Disable atime
One last thing I like to do is to disable atime or access time on the filesystem. Access times are recorded every time a file is read, and while this can have it's use cases, I never use it. Disabling it means a lot fewer write operations, as a read operation doesn't automatically include a write operation when atime is disabled. Disabling it is easy:
$ sudo zfs set atime=off tank $ sudo zfs set atime=off gelipool $
The next things are post-install configuration stuff like OS upgrade, ports, firewall and so on. The basic install is finished \o/
Reserved space
Running out of space in ZFS is bad. Stuff will run slowly and may stop working entirely until some space is freed. The problem is that ZFS is a journalled filesystem which means that all writes, even a deletion, requires writing data to the disk. I've more than once wound up in a situation where I couldn't delete file to free up diskspace because the disk was full.
Sometimes this can be resolved by overwriting a large file, like:
$ echo > /path/to/a/large/file
This will overwrite the file thereby freeing up some space, but sometimes even this is not possible. This is where reserved space comes in. I create a new filesystem in each pool, set them readonly and without a mountpoint, and with 1G reserved each:
$ sudo zfs create -o mountpoint=none -o reservation=1G -o readonly=on gelipool/reserved $ sudo zfs create -o mountpoint=none -o reservation=1G -o readonly=on tank/reserved
If I run out of space for some reason, I can just delete the dataset, or unset the reserved property, and I immediately have 1G diskspace available. Yay!
Ports
Installing the ports tree
I need to bootstrap the ports system, I use portsnap as it is way faster than using c(v)sup. Initially I run portsnap fetch extract and when I need to update the tree later I use portsnap fetch update.
smartd
I install smartd to monitor the disks for problems:
$ sudo pkg install smartmontools
I create the file /usr/local/etc/smartd.conf and add this line to it:
DEVICESCAN -a -m thomas@gibfest.dk
This makes smartd monitor all disks and send me an email if it finds an error.
Remember to enable smartd in /etc/rc.conf and start it:
sudo sysrc smartd_enable="YES" sudo service smartd start
openntpd
I install net/openntpd to keep the clock in sync. I find this a lot easier to configure than the base ntpd.
sudo pkg install openntpd
I enable openntpd in /etc/rc.conf:
sudo sysrc openntpd_enable="YES"
and add a one line config file:
$ grep -v "^#" /usr/local/etc/ntpd.conf | grep -v "^$" servers de.pool.ntp.org $
sync the clock and start openntpd:
sudo ntpdate de.pool.ntp.org sudo service openntpd start
ntpdate
I also enable ntpdate to help set the clock after a reboot. I add the following two lines to /etc/rc.conf:
sudo sysrc ntpdate_enable="YES" sudo sysrc ntpdate_hosts="de.pool.ntp.org"
Upgrade OS (buildworld)
I usually run -STABLE on my hosts, which means I need to build and install a new world and kernel. I also like having rctl available on my jail hosts, so I can limit jail ressources in all kinds of neat ways. I also like having DTRACE available. Additionally I also need the built world to populate ezjails basejail.
Note: I will need to update the host and the jails many times during the lifespan of this server, which is likely > 2-3 years. As new security problems are found or features are added that I want, I will update host and jails. There is a section about staying up to date later in this page. This section (the one you are reading now) only covers the OS update I run right after installing the server.
Fetching sources
I install subversion:
sudo pkg install subversion
and then get the sources:
sudo svn checkout https://svn0.eu.freebsd.org/base/stable/10/ /usr/src/ Error validating server certificate for 'https://svn0.eu.freebsd.org:443': - The certificate is not issued by a trusted authority. Use the fingerprint to validate the certificate manually! Certificate information: - Hostname: svnmir.bme.FreeBSD.org - Valid: from Jun 29 12:24:17 2013 GMT until Jun 29 12:24:17 2015 GMT - Issuer: clusteradm, FreeBSD.org, CA, US(clusteradm@FreeBSD.org) - Fingerprint: 39:B0:53:35:CE:60:C7:BB:00:54:96:96:71:10:94:BB:CE:1C:07:A7 (R)eject, accept (t)emporarily or accept (p)ermanently? p A /usr/src/sys A /usr/src/sys/arm A /usr/src/sys/arm/ti A /usr/src/sys/arm/ti/cpsw A /usr/src/sys/arm/ti/am335x A /usr/src/sys/arm/ti/usb A /usr/src/sys/arm/ti/twl A /usr/src/sys/arm/ti/ti_mmchs.c A /usr/src/sys/arm/ti/ti_cpuid.h A /usr/src/sys/arm/ti/ti_mmchs.h A /usr/src/sys/arm/ti/ti_i2c.c A /usr/src/sys/arm/ti/ti_i2c.h A /usr/src/sys/arm/ti/am335x/am335x_pwm.c A /usr/src/sys/arm/ti/ti_sdma.c <snip> A /usr/src/sys/fs/nandfs/nandfs_cleaner.c A /usr/src/sys/fs/nandfs/nandfs_bmap.c A /usr/src/sys/fs/nandfs/bmap.c A /usr/src/sys/fs/nandfs/nandfs_subr.c A /usr/src/sys/fs/nandfs/bmap.h A /usr/src/sys/fs/nandfs/nandfs_vfsops.c A /usr/src/sys/fs/nandfs/nandfs_sufile.c U /usr/src Checked out revision 282899. $
This takes a while the first time, but subsequent runs are much faster.
Note: The reason I use the full Subversion port instead of just svnup is that using full SVN client means that you get the SVN revision in uname -a when checking out code with the full SVN client.
Create kernel config
After the sources finish downloading, I create a new kernel config file /etc/TYKJAIL with the following content:
include GENERIC ident TYKJAIL #rctl options RACCT options RCTL
I then create a symlink to the kernel config file in /etc/:
# ln -s /etc/TYKJAIL /usr/src/sys/amd64/conf/ # ls -l /usr/src/sys/amd64/conf/TYKJAIL lrwxr-xr-x 1 root wheel 9 Jul 22 16:14 /usr/src/sys/amd64/conf/TYKJAIL -> /etc/TYKJAIL
Finally I enable the kernel config in /etc/make.conf:
$ cat /etc/make.conf KERNCONF=TYKJAIL
Building world and kernel
Finally I start the build. I use -j to start one thread per core in the system. sysctl hw.ncpu shows the number of available cores:
# sysctl hw.ncpu hw.ncpu: 12
To build the new system:
# cd /usr/src/ # time (sudo make -j$(sysctl -n hw.ncpu) buildworld && sudo make -j$(sysctl -n hw.ncpu) kernel) && date
After the build finishes, reboot and run mergemaster, installworld, and mergemaster again:
# cd /usr/src/ # sudo mergemaster -pFUi && sudo make installworld && sudo mergemaster -FUi
DO NOT OVERWRITE /etc/group AND /etc/master.passwd AND OTHER CRITICAL FILES!
Reboot after the final mergemaster completes, and boot into the newly built world.
Preparing ezjail
ezjail needs to be installed and a bit of configuration is also needed, in addition to bootstrapping /usr/jails/basejail and /usr/jails/newjail.
Installing ezjail
Just install it with pkg:
sudo pkg install ezjail
Configuring ezjail
Then I go edit the ezjail config file /usr/local/etc/ezjail.conf and add/change these three lines near the bottom:
ezjail_use_zfs="YES" ezjail_jailzfs="gelipool/jails" ezjail_use_zfs_for_jails="YES"
This makes ezjail use seperate zfs datasets under gelipool/jails for the basejail and newjail, as well as for each jail created. ezjail_use_zfs_for_jails is supported since ezjail 3.2.2.
Bootstrapping ezjail
Finally I populate basejail and newjail from the world I build earlier:
$ sudo ezjail-admin update -i
The last line of the output is a message saying:
Note: a non-standard /etc/make.conf was copied to the template jail in order to get the ports collection running inside jails.
This is because ezjail defaults to symlinking the ports collection in the same way it symlinks the basejail. I prefer having seperate/individual ports collections in each of my jails though, so I remove the symlink and make.conf from newjail:
$ sudo rm /usr/jails/newjail/etc/make.conf /usr/jails/newjail/usr/ports /usr/jails/newjail/usr/src $ sudo mkdir /usr/jails/newjail/usr/src
ZFS goodness
Note that ezjail has created two new ZFS datasets to hold basejail and newjail:
$ zfs list -r gelipool/jails NAME USED AVAIL REFER MOUNTPOINT gelipool/jails 239M 3.54T 476K /usr/jails gelipool/jails/basejail 236M 3.54T 236M /usr/jails/basejail gelipool/jails/newjail 3.10M 3.54T 3.10M /usr/jails/newjail
ezjail flavours
ezjail has a pretty awesome feature that makes it possible to create templates or flavours which apply common settings when creating a new jail. I always have a basic flavour which adds a user for me, installs an SSH key, adds a few packages like bash, screen, sudo and portmaster - and configures those packages. Basically, everything I find myself doing over and over again every time I create a new jail.
It is also possible, of course, to create more advanced flavours, I've had one that installs a complete nginx+php-fpm server with all the neccesary packages and configs.
ezjail flavours are technically pretty simple. By default, they are located in the same place as basejail and newjail, and ezjail comes with an example flavour to get you started. Basically a flavour is a file/directory hierachy which is copied to the jail, and a shell script called ezjail.flavour which is run once, the first time the jail is started, and then deleted.
For reference, I've included my basic flavour here. First is a listing of the files included in the flavour, and then the ezjail.flavour script which performs tasks beyond copying config files.
$ find /usr/jails/flavours/tykbasic /usr/jails/flavours/tykbasic /usr/jails/flavours/tykbasic/ezjail.flavour /usr/jails/flavours/tykbasic/usr /usr/jails/flavours/tykbasic/usr/local /usr/jails/flavours/tykbasic/usr/local/etc /usr/jails/flavours/tykbasic/usr/local/etc/portmaster.rc /usr/jails/flavours/tykbasic/usr/local/etc/sudoers /usr/jails/flavours/tykbasic/usr/home /usr/jails/flavours/tykbasic/usr/home/tykling /usr/jails/flavours/tykbasic/usr/home/tykling/.ssh /usr/jails/flavours/tykbasic/usr/home/tykling/.ssh/authorized_keys /usr/jails/flavours/tykbasic/usr/home/tykling/.screenrc /usr/jails/flavours/tykbasic/etc /usr/jails/flavours/tykbasic/etc/fstab /usr/jails/flavours/tykbasic/etc/rc.conf /usr/jails/flavours/tykbasic/etc/periodic.conf /usr/jails/flavours/tykbasic/etc/resolv.conf
As you can see, the flavour contains files like /etc/resolv.conf and other stuff to make the jail work. The name of the flavour here is tykbasic which means that if I want a file to end up in /usr/home/tykling after the flavour has been applied, I need to put that file in the folder /usr/jails/flavours/tykbasic/usr/home/tykling/ - remember to also chown the files in the flavour appropriately.
Finally, my ezjail.flavour script looks like so:
#!/bin/sh # # BEFORE: DAEMON # # ezjail flavour example # Timezone ########### # ln -s /usr/share/zoneinfo/Europe/Copenhagen /etc/localtime # Groups ######### # pw groupadd -q -n tykling # Users ######## # # To generate a password hash for use here, do: # openssl passwd -1 "the password" echo -n '$1$L/fC0UrO$bi65/BOIAtMkvluDEDCy31' | pw useradd -n tykling -u 1001 -s /bin/sh -m -d /usr/home/tykling -g tykling -c 'tykling' -H 0 # Packages ########### # env ASSUME_ALWAYS_YES=YES pkg bootstrap pkg install -y bash pkg install -y sudo pkg install -y portmaster pkg install -y screen #change shell to bash chsh -s bash tykling #update /etc/aliases echo "root: thomas@gibfest.dk" >> /etc/aliases newaliases #remove adjkerntz from crontab cat /etc/crontab | grep -E -v "(Adjust the time|adjkerntz)" > /etc/crontab.new mv /etc/crontab.new /etc/crontab #remove ports symlink rm /usr/ports # create symlink to /usr/home in / (adduser defaults to /usr/username as homedir) ln -s /usr/home /home
Creating a flavour is easy: just create a folder under /usr/jails/flavours/ that has the name of the flavour, and start adding files and folders there. The ezjail.flavour script should be placed in the root (see the example further up the page).
Finally I add the following to /usr/local/etc/ezjail.conf to make ezjail always use my new flavour:
ezjail_default_flavour="tykbasic"
Configuration
This section outlines what I do to further prepare the machine to be a nice ezjail host.
Firewall
One of the first things I fix is to enable the pf firewall from OpenBSD. I add the following to /etc/rc.conf to enable pf at boot time:
sudo sysrc pf_enable="YES" sudo sysrc pflog_enable="YES"
I also create a very basic /etc/pf.conf:
[root@ ~]# cat /etc/pf.conf 
### macros
if="em0"
table <portknock> persist
#external addresses
tykv4="a.b.c.d"
tykv6="2002:ab:cd::/48"
table <allowssh> { $tykv4,$tykv6 }
#local addresses
glasv4="w.x.y.z"
### scrub
scrub in on $if all fragment reassemble
################
### filtering
### block everything
block log all
################
### skip loopback interface(s)
set skip on lo0
################
### icmp6                                                                                                        
pass in quick on $if inet6 proto icmp6 all icmp6-type {echoreq,echorep,neighbradv,neighbrsol,routeradv,routersol}
################
### pass outgoing
pass out quick on $if all
################
### portknock rule (more than 5 connections in 10 seconds to the port specified will add the "offending" IP to the <portknock> table)
pass in quick on $if inet proto tcp from any to $glasv4 port 32323 synproxy state (max-src-conn-rate 5/10, overload <portknock>)
### pass incoming ssh and icmp
pass in quick on $if proto tcp from { <allowssh>, <portknock> } to ($if) port 22
pass in quick on $if inet proto icmp all icmp-type { 8, 11 }
################
### pass ipv6 fragments (hack to workaround pf not handling ipv6 fragments)
pass in on $if inet6
block in log on $if inet6 proto udp
block in log on $if inet6 proto tcp
block in log on $if inet6 proto icmp6
block in log on $if inet6 proto esp
block in log on $if inet6 proto ipv6
To load pf without rebooting I run the following:
[root@ ~]# kldload pf [root@ ~]# kldload pflog [root@ ~]# pfctl -ef /etc/pf.conf && sleep 60 && pfctl -d No ALTQ support in kernel ALTQ related functions disabled
I get no prompt after this because pf has cut my SSH connection. But I can SSH back in if I did everything right, and if not, I can just wait 60 seconds after which pf will be disabled again. I SSH in and reattach to the screen I am running this in, and press control-c, so the "sleep 60" is interrupted and pf is not disabled. Neat little trick for when you want to avoid locking yourself out :)
Replacing sendmail with Postfix
I always replace Sendmail with Postfix on every server I manage. See Replacing_Sendmail_With_Postfix for more info.
Listening daemons
When you add an IP alias for a jail, any daemons listening on * will also listen on the jails IP, which is not what I want. For example, I want the jails sshd to be able to listen on the jails IP on port 22, instead of the hosts sshd. Check for listening daemons like so:
$ sockstat -l46 USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root master 1554 12 tcp4 *:25 *:* root master 1554 13 tcp6 *:25 *:* root sshd 948 3 tcp6 *:22 *:* root sshd 948 4 tcp4 *:22 *:* root syslogd 789 6 udp6 *:514 *:* root syslogd 789 7 udp4 *:514 *:* [tykling@glas ~]$
This tells me that I need to change Postfix, sshd and syslogd to stop listening on all IP addresses.
Postfix
The defaults in Postfix are really nice on FreeBSD, and most of the time a completely empty config file is fine for a system mailer (sendmail replacement). However, to make Postfix stop listening on port 25 on all IP addresses, I do need one line in /usr/local/etc/postfix/main.cf:
$ cat /usr/local/etc/postfix/main.cf inet_interfaces=localhost $
sshd
To make sshd stop listening on all IP addresses I uncomment and edit the ListenAddress line in /etc/ssh/sshd_config:
$ grep ListenAddress /etc/ssh/sshd_config ListenAddress x.y.z.226 #ListenAddress ::
(IP address obfuscated..)
syslogd
I don't need my syslogd to listen on the network at all, so I add the following line to /etc/rc.conf:
$ grep syslog /etc/rc.conf syslogd_flags="-ss"
Restarting services
Finally I restart Postfix, sshd and syslogd:
$ sudo /etc/rc.d/syslogd restart Stopping syslogd. Waiting for PIDS: 789. Starting syslogd. $ sudo /etc/rc.d/sshd restart Stopping sshd. Waiting for PIDS: 948. Starting sshd. $ sudo /usr/local/etc/rc.d/postfix restart postfix/postfix-script: stopping the Postfix mail system postfix/postfix-script: starting the Postfix mail system
A check with sockstat reveals that no more services are listening on all IP addresses:
$ sockstat -l46 USER COMMAND PID FD PROTO LOCAL ADDRESS FOREIGN ADDRESS root master 1823 12 tcp4 127.0.0.1:25 *:* root master 1823 13 tcp6 ::1:25 *:* root sshd 1617 3 tcp4 x.y.z.226:22 *:* $
Network configuration
Network configuration is a big part of any jail setup. If I have enough IP addresses (ipv4 and ipv6) I can just add IP aliases as needed. If I only have one or a few v4 IPs I will need to use rfc1918 addresses for the jails. In that case, I create a new loopback interface, lo1 and add the IP aliases there. I then use the pf firewall to redirect incoming traffic to the right jail, depending on the port in use.
IPv4
If rfc1918 jails are needed, I add the following to /etc/rc.conf to create the lo1 interface on boot:
### lo1 interface for ipv4 rfc1918 jails cloned_interfaces="lo1"
When the lo1 interface is created, or if it isn't needed, I am ready to start adding IP aliases for jails as needed.
IPv6
On the page Hetzner_ipv6 I've explained how to make IPv6 work on a Hetzner server where the supplied IPv6 default gateway is outside the IPv6 subnet assigned.
When basic IPv6 connectivity works, I am ready to start adding IP aliases for jails as needed.
Allow ping from inside jails
I add the following to /etc/sysctl.conf so the jails are allowed to do icmp ping. This enables raw socket access, which can be a security issue if you have untrusted root users in your jails. Use with caution.
#allow ping in jails security.jail.allow_raw_sockets=1
Tips & tricks
Get jail info out of top
To make top show the jail id of the jail in which the process is running in a column, I need to specify the -j flag to top. Since this is a multi-cpu server I am working on, I also like giving the -P flag to top, to get a seperate line of cpu stats per core. Finally, I like -a to get the full commandline/info of the running processes. I add the following to my .bashrc in my homedir on the jail host:
alias top="nice top -j -P -a"
...this way I don't have to remember passing -j -P -a to top every time. Also, I've been told to run top with nice to limit the cpu used by top itself. I took the advice so the complete alias looks like above.
Base jails
My jails need various services - they all need firewalling and backup, but some jails also need a database. So I always have a postgres jail on each jail host, so I only need to maintain one postgres server. Also, many of my jails serve some sort of web application, and I like to terminate SSL for those in one place in an attempt to keep the individual jails as simple as possible. This section describes how I configure the postgres and web jails that provide services to the other jails.
Postgres jail
I don't need a public v4 IP for this jail, so I configure it with an RFC1918 v4 IP on a loopback interface, and of course a real IPv6 address. I add a AAAA record in DNS for the v6 IP so I have something to point the clients at.
I install the latest Postgres server port, at the time of writing that is databases/postgresql93-server. But before I can run /usr/local/etc/rc.d/postgresql initdb I need to permit the use of SysV shared memory in the jail. This is done in the ezjail config file for the jail, in the _parameters line. I need to add allow.sysvipc=1 so I change the line from:
export jail_postgres_kush_tyknet_dk_parameters=""
to:
export jail_postgres_kush_tyknet_dk_parameters="allow.sysvipc=1"
After restarting the jail I can run initdb and start Postgres. When a jail needs a database I need to:
- Add a DB user (with the createuser -P someusernamecommand)
- Add a database with the new user as owner (createdb -O someusername somedbname)
- Add permissions in /usr/local/pgsql/data/pg_hba.conf
- Open a hole in the firewall so the jail can reach the database on TCP port 5432
Web Jail
I need a public V4 IP for the web jail and I also give it a V6 IP. Since I use a different V6 IP per website, I will need additional v6 addresses when I start adding websites. I add the v6 addresses to the web jail in batches of 10 as I need them. After creating the jail and bootstrapping the ports collection I install security/openssl and www/nginx and configure it. More on that later.
ZFS snapshots and backup
So, since all this is ZFS based, there is a few tricks I do to make it easier to restore data in case of accidental file deletion or other dataloss.
Periodic snapshots using sysutils/zfs-periodic
sysutils/zfs-periodic is a little script that uses the FreeBSD periodic(8) system to make snapshots of filesystems with regular intervals. It supports making hourly snapshots with a small change to periodic(8), but I've settled for daily, weekly and monthly snapshots on my servers.
After installing sysutils/zfs-periodic I add the following to /etc/periodic.conf:
#daily zfs snapshots daily_zfs_snapshot_enable="YES" daily_zfs_snapshot_pools="tank gelipool" daily_zfs_snapshot_keep=7 #weekly zfs snapshots weekly_zfs_snapshot_enable="YES" weekly_zfs_snapshot_pools="tank gelipool" weekly_zfs_snapshot_keep=5 #monthly zfs snapshots monthly_zfs_snapshot_enable="YES" monthly_zfs_snapshot_pools="tank gelipool" monthly_zfs_snapshot_keep=6 #monthly zfs scrub monthly_zfs_scrub_pools="tank gelipool" monthly_zfs_scrub_enable="YES"
Note that the last bit also enables a monthly scrub of the filesystem. Remember to change the pool name and remember to set the number of snapshots to retain to something appropriate. These things are always a tradeoff between diskspace and safety. Think it over and find some values that make you sleep well at night :)
After this has been running for a few days, you should have a bunch of daily snapshots:
$ zfs list -t snapshot | grep gelipool@ gelipool@daily-2012-09-02 0 - 31K - gelipool@daily-2012-09-03 0 - 31K - gelipool@daily-2012-09-04 0 - 31K - gelipool@daily-2012-09-05 0 - 31K -
Back-to-back ZFS mirroring
I am lucky enough to have more than one of these jail hosts, which is the whole reason I started writing down how I configure them. One of the advantages to having more than one is that I can configure zfs send/receive jobs and make server A send it's data to server B, and vice versa.
Introduction
The concept is pretty basic, but as it often happens, security considerations turn what was a simple and elegant idea into something... else. To make the back-to-back backup scheme work without sacrificing too much security, I first make a jail on each jailhost called backup.jailhostname. This jail will have control over a designated zfs dataset which will house the backups sent from the other server.
Create ZFS dataset
First I create the zfs dataset:
$ sudo zfs create cryptopool/backups $ sudo zfs set jailed=on cryptopool/backups
'jail' the new dataset
I create the jail like I normally do, but after creating it, I edit the ezjail config file and tell it which extra zfs dataset to use:
$ grep dataset /usr/local/etc/ezjail/backup_glas_tyknet_dk export jail_backup_glas_tyknet_dk_zfs_datasets="cryptopool/backups"
This makes ezjail run the zfs jail command with the proper jail id when the jail is started.
jail sysctl settings
I also add the following to the jails ezjail config:
# grep parameters /usr/local/etc/ezjail/backup_glas_tyknet_dk export jail_backup_glas_tyknet_dk_parameters="allow.mount.zfs=1 enforce_statfs=1"
Configuring the backup jail
The jail is ready to run now, and inside the jail a zfs list looks like this:
$ zfs list NAME USED AVAIL REFER MOUNTPOINT cryptopool 3.98G 2.52T 31K none cryptopool/backups 62K 2.52T 31K none $
I don't want to open up root ssh access to this jail, but the remote servers need to call zfs receive which requires root permissions. zfs allow to the rescue! zfs allow makes it possible to say "user X is permitted to do action Y on dataset Z" which is what I need here. In the backup jail I add a user called tykbackup which will be used as the user receiving the zfs snapshots from the remote servers. 
I then run the following commands to allow the user to work with the dataset:
$ sudo zfs allow tykbackup atime,compression,create,mount,mountpoint,readonly,receive cryptopool/backups
$ sudo zfs allow cryptopool/backups
---- Permissions on cryptopool/backups -------------------------------
Local+Descendent permissions:
        user tykbackup atime,compression,create,mount,mountpoint,readonly,receive
$ 
Testing if it worked:
$ sudo su tykbackup $ zfs create cryptopool/backups/test $ zfs list cryptopool/backups/test NAME USED AVAIL REFER MOUNTPOINT cryptopool/backups/test 31K 2.52T 31K none $ zfs destroy cryptopool/backups/test cannot destroy 'cryptopool/backups/test': permission denied $
Since the user tykbackup only has the permissions create,mount,mountpoint,receive on the cryptopool/backups dataset, I get Permission Denied (as I expected) when trying to destroy cryptopool/backups/test. Works like a charm.
To allow automatic SSH operations I add the public ssh key for the root user of the server being backed up to /usr/home/tykbackup/.ssh/authorized_keys:
$ cat /usr/home/tykbackup/.ssh/authorized_keys from="ryst.tyknet.dk",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="/usr/home/tykbackup/zfscmd.sh $SSH_ORIGINAL_COMMAND" ssh-rsa AAAAB3......KR2Z root@ryst.tyknet.dk
The script called zfscmd.sh is placed on the backup server to allow the ssh client to issue different command line arguments depending on what needs to be done. The script is very simple:
#!/bin/sh shift /sbin/zfs $@ exit $?
A few notes: Aside from restricting the command this SSH key can run, I've restricted it to only be able to log in from the IP of the server being backed up. These are very basic restrictions that should always be in place no matter what kind of backup you are using.
Add the periodic script
I then add the script /usr/local/etc/periodic/daily/999.zfs-mirror to each server being backed up with the following content:
#!/bin/sh
### check pidfile
if [ -f /var/run/$(basename $0).pid ]; then
        echo "pidfile /var/run/$(basename $0).pid exists, bailing out"
        exit 1
fi
echo $$ > /var/run/$(basename $0).pid
### If there is a global system configuration file, suck it in.
if [ -r /etc/defaults/periodic.conf ]; then
        . /etc/defaults/periodic.conf
        source_periodic_confs
fi
case "$daily_zfs_mirror_enable" in
    [Yy][Ee][Ss])
        ;;
    *)
        exit
        ;;
esac
pools=$daily_zfs_mirror_pools
if [ -z "$pools" ]; then
        pools='tank'
fi
targethost=$daily_zfs_mirror_targethost
if [ -z "$targethost" ]; then
        echo '$daily_zfs_mirror_targethost must be set in /etc/periodic.conf'
        exit 1
fi
targetuser=$daily_zfs_mirror_targetuser
if [ -z "$targetuser" ]; then
        echo '$daily_zfs_mirror_targetuser must be set in /etc/periodic.conf'
        exit 1
fi
targetfs=$daily_zfs_mirror_targetfs
if [ -z "$targetfs" ]; then
        echo '$daily_zfs_mirror_targetfs must be set in /etc/periodic.conf'
        exit 1
fi
if [ -n "$daily_zfs_mirror_skip" ]; then
        egrep="($(echo $daily_zfs_mirror_skip | sed "s/ /|/g"))"
fi
### get todays date for later use
tday=$(date +%Y-%m-%d)
echo -n "Doing daily ZFS mirroring - "
date
### loop through the configured pools
for pool in $pools; do
        echo "    Processing pool $pool ..."
        ### enumerate datasets with daily snapshots from today
        #echo $egrep
        #echo "@daily-$tday"
        if [ -n "$egrep" ]; then
                datasets=$(zfs list -t snapshot -o name | grep "^$pool/" | egrep -v "$egrep" | grep "@daily-$tday")
        else
                datasets=$(zfs list -t snapshot -o name | grep "^$pool/" | grep "@daily-$tday")
        fi
        echo "found datasets: $datasets"
        for snapshot in $datasets; do
                dataset=$(echo -n $snapshot | cut -d "@" -f 1 | cut -d "/" -f 2-)
                echo "working on dataset $dataset"
                ### find the latest daily snapshot of this dataset on the remote node, if any
                echo ssh ${targetuser}@${targethost} /usr/home/tykbackup/zfscmd.sh list -t snapshot \| grep "^${targetfs}/${dataset}@daily-" \| cut -d " " -f 1 \| tail -1
                lastgoodsnap=$(ssh ${targetuser}@${targethost} /usr/home/tykbackup/zfscmd.sh list -t snapshot | grep "^${targetfs}/${dataset}@daily-" | cut -d " " -f 1 | tail -1)
                if [ -z $lastgoodsnap ]; then
                        echo "No remote daily snapshot found for local daily snapshot $snapshot - cannot send incremental - sending full backup"
                        zfs send -v $snapshot | ssh ${targetuser}@${targethost} /usr/home/tykbackup/zfscmd.sh receive -v -F -u -d $targetfs
                        if [ $? -ne 0 ]; then
                                echo "    Unable to send full snapshot of $dataset to $targetfs on host $targethost"
                        else
                                echo "    Successfully sent a full snapshot of $dataset to $targetfs on host $targethost - future sends will be incremental"
                        fi
                else
                        #check if this snapshot has already been sent for some reason, skip if so..."
                        temp=$(echo $snapshot | cut -d "/" -f 2-)
                        lastgoodsnap="$(echo $lastgoodsnap | sed "s,${targetfs}/,,")"
                        if [ "$temp" = "$lastgoodsnap" ]; then
                                echo "    The snapshot $snapshot has already been sent to $targethost, skipping..."
                        else
                                ### add pool name to lastgoodsnap
                                lastgoodsnap="${pool}/${lastgoodsnap}"
                                ### zfs send the difference between latest remote snapshot and todays local snapshot
                                echo "    Sending the diff between local snapshot $(hostname)@$lastgoodsnap and $(hostname)@$snapshot to ${targethost}@${targetfs} ..."
                                zfs send -I $lastgoodsnap $snapshot | ssh ${targetuser}@${targethost} /usr/home/tykbackup/zfscmd.sh receive -F -u -d $targetfs
                                if [ $? -ne 0 ]; then
                                        echo "    There was a problem sending the diff between $lastgoodsnap and $snapshot to $targetfs on $targethost"
                                else
                                        echo "    Successfully sent the diff between $lastgoodsnap and $snapshot to $targethost"
                                fi
                        fi
                fi
        done
done
### remove pidfile
rm /var/run/$(basename $0).pid
Remember to also chmod +x /usr/local/etc/periodic/daily/999.zfs-mirror and enable it in /etc/periodic.conf:
#daily zfs mirror daily_zfs_mirror_enable="YES" daily_zfs_mirror_targethost="backup.glas.tyknet.dk" daily_zfs_mirror_targetuser="tykbackup" daily_zfs_mirror_targetfs="cryptopool/backups/kush.tyknet.dk" daily_zfs_mirror_pools="tank gelipool" daily_zfs_mirror_skip="gelipool/reserved gelipool/backups" # space seperated list of datasets to skip
Create the target filesystems in the backup jail
Finally I create the destination filesystems in the backupjail, one per server being backed up. The filesystem that needs to be created is the one specified in the setting daily_zfs_mirror_targetfs in /etc/periodic.conf on the server being backed up.
Run the periodic script
I usually to the initial run of the periodic script by hand, so I can catch and fix any errors right away. The script will loop over all datasets in the configured pools and zfs send them including their snapshots to the backup server. Next time the script runs it will send an incremental diff instead of the full dataset.
Caveats
This script does not handle deleting datasets (including their snapshots) on the backup server when the dataset is deleted from the server being backed up. You will need to do that manually. This could be considered a feature, or a missing feature, depending on your preferences. :) The last bit of these instructions are missing intentionally, will be written asap.
Staying up-to-date
I update my ezjail hosts and jails to track -STABLE regularly. This section describes the procedure I use. It is essential that the jail host and the jails use the same world and kernel version, or bad stuff will happen.
Updating the jail host
First I update world and kernel of the jail host like I normally would. This is described earlier in this guide, see Ezjail_host#Building_world_and_kernel.
Updating ezjails basejail
To update ezjails basejail located in /usr/jails/basejail, I run the same commands as when bootstrapping ezjail, see the section Ezjail_host#Bootstrapping_ezjail.
Running mergemaster in the jails
Finally, to run mergemaster in all jails I use the following script. It will run mergemaster in each jail, the script comments should explain the rest. When it is finished the jails can be started:
#! /bin/sh
### check if .mergemasterrc exists,
### move it out of the way if so
MM_RC=0
if [ -e /root/.mergemasterrc ]; then
	MM_RC=1
	mv /root/.mergemasterrc /root/.mergemasterrc.old
fi
### loop through jails
for jailroot in $(ezjail-admin list | cut -c 57- | tail +3 | grep "/"); do
	echo "processing ${jailroot}:"
	### check if jailroot exists
	if [ -n "${jailroot}" -a -d "${jailroot}" ]; then
		### create .mergemasterrc
		cat <<EOF > /root/.mergemasterrc
AUTO_INSTALL=yes
AUTO_UPGRADE=yes
FREEBSD_ID=yes
PRESERVE_FILES=yes
PRESERVE_FILES_DIR=/var/tmp/mergemaster/preserved-files-$(basename ${jailroot})-$(date +%y%m%d-%H%M%S)
IGNORE_FILES="/boot/device.hints /etc/motd"
EOF
		### remove backup of /etc from previous run (if it exists)
		if [ -d "${jailroot}/etc.bak" ]; then
			rm -rfI "${jailroot}/etc.bak"
		fi
		
		### create backup of /etc as /etc.bak
		cp -pRP "${jailroot}/etc" "${jailroot}/etc.bak"
		
		### check if mtree from last mergemaster run exists
		if [ ! -e ${jailroot}/var/db/mergemaster.mtree ]; then
			### delete /etc/rc.d/*
			rm -rfI ${jailroot}/etc/rc.d/*
		fi
		### run mergemaster for this jail
		mergemaster -D "${jailroot}"
	else
		echo "${jailroot} doesn't exist"
	fi
	sleep 2
done
### if an existing .mergemasterrc was moved out of the way in the beginning, move it back now
if [ ${MM_RC} -eq 1 ]; then
	mv /root/.mergemasterrc.old /root/.mergemasterrc
else
	rm /root/.mergemasterrc
fi
### done, a bit of output
echo "Done. If everything went well the /etc.bak backup folders can be deleted now."
exit 0
To restart all jails I run the command ezjail-admin restart.
Replacing a defective disk
I had a broken harddisk on one of my servers this evening. This section describes how I replaced the disk to make everything work again.
Booting into the rescue system
After Hetzner staff physically replaced the disk my server was unable to boot because the disk that died was the first one on the controller. The cheap Hetzner hardware is unable to boot from the secondary disk, bios restrictions probably. If the other disk had broken the server would have booted fine and this whole process would be done with the server running. Anyway, I booted into the rescue system and partitioned the disk, added a bootloader and added it to the root zpool. After this I was able to boot the server normally, so the rest of the work was done without the rescue system.
Partitioning the new disk
The following shows the commands I ran to partition the disk:
[root@rescue ~]# gpart create -s GPT /dev/ad4
ad4 created
[root@rescue ~]# /sbin/gpart add -b 2048 -t freebsd-boot -s 128 /dev/ad4
ad4p1 added
[root@rescue ~]# gpart add -t freebsd-zfs -s 30G /dev/ad4
ad4p2 added
[root@rescue ~]# gpart add -t freebsd-ufs /dev/ad4
ad4p3 added
[root@rescue ~]# gpart show
=>        34  1465149101  ad6  GPT  (698G)
          34        2014       - free -  (1M)
        2048         128    1  freebsd-boot  (64k)
        2176    62914560    2  freebsd-zfs  (30G)
    62916736  1402232399    3  freebsd-ufs  (668G)
=>        34  1465149101  ad4  GPT  (698G)
          34        2014       - free -  (1M)
        2048         128    1  freebsd-boot  (64k)
        2176    62914560    2  freebsd-zfs  (30G)
    62916736  1402232399    3  freebsd-ufs  (668G)
[root@rescue ~]#
Importing the pool and replacing the disk
Next step is importing the zpool (remember altroot=/mnt !) and replacing the defective disk:
[root@rescue ~]# zpool import
   pool: tank
     id: 3572845459378280852
  state: DEGRADED
 status: One or more devices are missing from the system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: http://www.sun.com/msg/ZFS-8000-2Q
 config:
        tank                      DEGRADED
          mirror-0                DEGRADED
            11006001397618753837  UNAVAIL  cannot open
            ad6p2                 ONLINE
[root@rescue ~]# zpool import -o altroot=/mnt/ tank
[root@rescue ~]# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://www.sun.com/msg/ZFS-8000-2Q
  scan: scrub repaired 0 in 0h2m with 0 errors on Thu Nov  1 05:00:49 2012
config:
        NAME                      STATE     READ WRITE CKSUM
        tank                      DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            11006001397618753837  UNAVAIL      0     0     0  was /dev/ada0p2
            ad6p2                 ONLINE       0     0     0
errors: No known data errors
[root@rescue ~]# zpool replace tank 11006001397618753837 ad4p2
Make sure to wait until resilver is done before rebooting.
If you boot from pool 'tank', you may need to update
boot code on newly attached disk 'ad4p2'.
Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:
        gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0
[root@rescue ~]#
[root@rescue ~]# zpool status
  pool: tank
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Nov 27 00:24:41 2012
        823M scanned out of 3.11G at 45.7M/s, 0h0m to go
        823M resilvered, 25.88% done
config:
        NAME                        STATE     READ WRITE CKSUM
        tank                        DEGRADED     0     0     0
          mirror-0                  DEGRADED     0     0     0
            replacing-0             UNAVAIL      0     0     0
              11006001397618753837  UNAVAIL      0     0     0  was /dev/ada0p2
              ad4p2                 ONLINE       0     0     0  (resilvering)
            ad6p2                   ONLINE       0     0     0
errors: No known data errors
[root@rescue ~]# zpool status
  pool: tank
 state: ONLINE
  scan: resilvered 3.10G in 0h2m with 0 errors on Tue Nov 27 01:26:45 2012
config:
        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            ada0p2  ONLINE       0     0     0
            ada1p2  ONLINE       0     0     0
errors: No known data errors
[root@rescue ~]#
Reboot into non-rescue system
At this point I rebooted the machine into the normal FreeBSD system.
Re-create geli partition
To recreate the geli partition on p3 of the new disk, I just follow the same steps as when I originally created it, more info here.
To attach the new geli volume I run geli attach as described here.
Add the geli device to the encrypted zpool
First I check that both geli devices are available, and I check the device name that needs replacing in zpool status output:
[tykling@haze ~]$ geli status
      Name  Status  Components
ada1p3.eli  ACTIVE  ada1p3
ada0p3.eli  ACTIVE  ada0p3
[tykling@haze ~]$ zpool status gelipool
  pool: gelipool
 state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: scrub repaired 68K in 0h28m with 0 errors on Thu Nov  1 05:58:08 2012
config:
        NAME                      STATE     READ WRITE CKSUM
        gelipool                  DEGRADED     0     0     0
          mirror-0                DEGRADED     0     0     0
            18431995264718840299  REMOVED      0     0     0  was /dev/ada0p3.eli
            ada1p3.eli            ONLINE       0     0     0
errors: No known data errors
[tykling@haze ~]$
To replace the device and begin resilvering:
[tykling@haze ~]$ sudo zpool replace gelipool 18431995264718840299 ada0p3.eli
Password:
[tykling@haze ~]$ zpool status
  pool: gelipool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Tue Nov 27 00:53:40 2012
        759M scanned out of 26.9G at 14.6M/s, 0h30m to go
        759M resilvered, 2.75% done
config:
        NAME                        STATE     READ WRITE CKSUM
        gelipool                    DEGRADED     0     0     0
          mirror-0                  DEGRADED     0     0     0
            replacing-0             REMOVED      0     0     0
              18431995264718840299  REMOVED      0     0     0  was /dev/ada0p3.eli/old
              ada0p3.eli            ONLINE       0     0     0  (resilvering)
            ada1p3.eli              ONLINE       0     0     0
errors: No known data errors
[tykling@haze ~]$
When the resilver is finished, the system is good as new.