
Sign up to save your podcasts
Or
OpenZFS and DTrace updates in NetBSD, NetBSD network security stack audit, Performance of MySQL on ZFS, OpenSMTP results from p2k18, legacy Windows backup to FreeNAS, ZFS block size importance, and NetBSD as router on a stick.
##Headlines
merge a new version of the CDDL dtrace and ZFS code. This changes the upstream vendor from OpenSolaris to FreeBSD, and this version is based on FreeBSD svn r315983.
in addition to the 10 years of improvements from upstream, this version also has these NetBSD-specific enhancements:
###NetBSD network stack security audit
Over the last five months, hundreds of patches were committed to the source tree as a result of this work. Dozens of bugs were fixed, among which a good number of actual, remotely-triggerable vulnerabilities.
Changes were made to strengthen the networking subsystems and improve code quality: reinforce the mbuf API, add many KASSERTs to enforce assumptions, simplify packet handling, and verify compliance with RFCs. This was done in several layers of the NetBSD kernel, from device drivers to L4 handlers.
The IPv6 Buffer Overflow: The overflow allowed an attacker to write one byte of packet-controlled data into ‘packet_storage+off’, where ‘off’ could be approximately controlled too. This allowed at least a pretty bad remote DoS/Crash
DigitalOcean
###MySQL on ZFS Performance
I used sysbench to create a table of 10M rows and then, using export/import tablespace, I copied it 329 times. I ended up with 330 tables for a total size of about 850GB. The dataset generated by sysbench is not very compressible, so I used lz4 compression in ZFS. For the other ZFS settings, I used what can be found in my earlier ZFS posts but with the ARC size limited to 1GB. I then used that plain configuration for the first benchmarks. Here are the results with the sysbench point-select benchmark, a uniform distribution and eight threads. The InnoDB buffer pool was set to 2.5GB.
ZFS stores the files in B-trees in a very similar fashion as InnoDB stores data. To access a piece of data in a B-tree, you need to access the top level page (often called root node) and then one block per level down to a leaf-node containing the data. With no cache, to read something from a three levels B-tree thus requires 3 IOPS.
The extra IOPS performed by ZFS are needed to access those internal blocks in the B-trees of the files. These internal blocks are labeled as metadata. Essentially, in the above benchmark, the ARC is too small to contain all the internal blocks of the table files’ B-trees. If we continue the comparison with InnoDB, it would be like running with a buffer pool too small to contain the non-leaf pages. The test dataset I used has about 600MB of non-leaf pages, about 0.1% of the total size, which was well cached by the 3GB buffer pool. So only one InnoDB page, a leaf page, needed to be read per point-select statement.
To correctly set the ARC size to cache the metadata, you have two choices. First, you can guess values for the ARC size and experiment. Second, you can try to evaluate it by looking at the ZFS internal data. Let’s review these two approaches.
You’ll read/hear often the ratio 1GB of ARC for 1TB of data, which is about the same 0.1% ratio as for InnoDB. I wrote about that ratio a few times, having nothing better to propose. Actually, I found it depends a lot on the recordsize used. The 0.1% ratio implies a ZFS recordsize of 128KB. A ZFS filesystem with a recordsize of 128KB will use much less metadata than another one using a recordsize of 16KB because it has 8x fewer leaf pages. Fewer leaf pages require less B-tree internal nodes, hence less metadata. A filesystem with a recordsize of 128KB is excellent for sequential access as it maximizes compression and reduces the IOPS but it is poor for small random access operations like the ones MySQL/InnoDB does.
I was reluctant to grow the ARC to 7GB, which was nearly half the overall system memory. At best, the ZFS performance would only match XFS. A larger InnoDB page size would increase the CPU load for decompression on an instance with only two vCPUs; not great either. The last option, the L2ARC, was the most promising.
ZFS is much more complex than XFS and EXT4 but, that also means it has more tunables/options. I used a simplistic setup and an unfair benchmark which initially led to poor ZFS results. With the same benchmark, very favorable to XFS, I added a ZFS L2ARC and that completely reversed the situation, more than tripling the ZFS results, now 66% above XFS.
We have seen in this post why the general perception is that ZFS under-performs compared to XFS or EXT4. The presence of B-trees for the files has a big impact on the amount of metadata ZFS needs to handle, especially when the recordsize is small. The metadata consists mostly of the non-leaf pages (or internal nodes) of the B-trees. When properly cached, the performance of ZFS is excellent. ZFS allows you to optimize the use of EBS volumes, both in term of IOPS and size when the instance has fast ephemeral storage devices. Using the ephemeral device of an i3.large instance for the ZFS L2ARC, ZFS outperformed XFS by 66%.
###OpenSMTPD new config
OpenSMTPD started ten years ago out of dissatisfaction with other solutions, mainly because I considered them way too complex for me not to get things wrong from time to time.
Anatomy of a design error
The initial configuration format was very different, I was inspired by pyr@’s hoststated, which eventually became relayd, and designed my configuration format with blocks enclosed by brackets.
When I first showed OpenSMTPD to pyr@, he convinced me that PF-like one-line rules would be awesome, and it was awesome indeed.
It helped us maintain our goal of simple configuration files, it helped fight feature creeping, it helped us gain popularity and become a relevant MTA, it helped us get where we are now 10 years later.
That being said, I believe this was a design error. A design error that could not have been predicted until we hit the wall to understand WHY this was an error. One-line rules are semantically wrong, they are SMTP wrong, they are wrong.
One-line rules are making the entire daemon more complex, preventing some features from being implemented, making others more complex than they should be, they no longer serve our goals.
To get to the point: we should move to two-line rules :-)
OpenSMTPD decides to accept or reject messages based on one-line rules such as:
accept from any for domain poolp.org deliver to mbox
Which can essentially be split into three units:
To ensure that we meet the requirements of the transactions, the matching must be performed during the SMTP transaction before we take a decision for the recipient.
The first solution, which we’ve been using for a decade, was to save the action within the envelope and kind of carve it in stone. This works fine… however it comes with the downsides that errors fixed in configuration files can’t be caught up by envelopes, that delivery action must be validated way ahead of time during the SMTP transaction which is much trickier, that the parsing of delivery methods takes place as the _smtpd user rather than the recipient user, and that envelope structures that are passed all over OpenSMTPD carry delivery-time informations, and more, and more, and more. The code becomes more complex in general, less safe in some particular places, and some areas are nightmarish to deal with because they have to deal with completely unrelated code that can’t be dealt with later in the code path.
The second solution can’t be done. An envelope may be the result of nested rules, for example an external client, hitting an alias, hitting a user with a .forward file resolving to a user. An envelope on disk may no longer match any rule or it may match a completely different rule If we could ensure that it matched the same rule, evaluating the ruleset may spawn new envelopes which would violate the transaction. Trying to imagine how we could work around this leads to more and more and more RFC violations, incoherent states, duplicate mails, etc…
There is simply no way to deal with this with atomic rules, the matching and the action must be two separate units that are evaluated at two different times, failure to do so will necessarily imply that you’re either using our first solution and all its downsides, or that you are currently in a world of pain trying to figure out why everything is burning around you. The minute the action is written to an on-disk envelope, you have failed.
A proper ruleset must define a set of matching patterns resolving to an action identifier that is carved in stone, AND a set of named action set that is resolved dynamically at delivery time.
Break
##News Roundup
I have some old Windows servers (10 years and counting) and I have been using rsync to back them up to my FreeNAS box. It has been working great for me.
First of all, I do have my Windows servers backup in virtualized format. However, those are only one-time snapshops that I run once in a while. These are classic ASP IIS web servers that I can easily put up on a new VM. However, many of these legacy servers generate gigabytes of data a day in their repositories. Running VM conversion daily is not ideal.
My solution was to use some sort of rsync solution just for the data repos. I’ve tried some applications that didn’t work too well with Samba shares and these old servers have slow I/O. Copying files to external sata or usb drive was not ideal. We’ve moved on from Windows to Linux and do not have any Windows file servers of capacity to provide network backups. Hence, I decided to use Delta Copy with FreeNAS. So here is a little write up on how to set it up. I have 4 Windows 2000 servers backing up daily with this method.
First, download Delta Copy and install it. It is open-source and pretty much free. It is basically a wrapper for cygwin’s rsync. When you install it, it will ask you to install the Server services which allows you to run it as a Rsync server on Windows. You don’t need to do this. Instead, you will be just using the Delta Copy Client application. But before we do that, we will need to configure our Rsync service for our Windows Clients on FreeNAS.
There you have it. Windows rsync to FreeNAS using DeltaCopy.
iXsystems
###How to write ATF tests for NetBSD
I have recently started contributing to the amazing NetBSD foundation. I was thinking of trying out a new OS for a long time. Switching to the NetBSD OS has been a fun change.
My first contribution to the NetBSD foundation was adding regression tests for the Address Sanitizer (ASan) in the Automated Testing Framework(ATF) which NetBSD has. I managed to complete it with the help of my really amazing mentor Kamil. This post is gonna be about the ATF framework that NetBSD has and how to you can add multiple tests with ease.
In ATF tests we will basically be talking about test programs which are a suite of test cases for a specific application or program.
There are a variety of commands that the atf suite offers. These include :
atf-check: The versatile command that is a vital part of the checking process. man page
atf-run: Command used to run a test program. man page
atf-fail: Report failure of a test case.
atf-report: used to pretty print the atf-run. man page
atf-set: To set atf test conditions.
We will be taking a better look at the syntax and usage later.
Let’s start with the Basics
The ATF testing framework comes preinstalled with a default NetBSD installation. It is used to write tests for various applications and commands in NetBSD. One can write the Test programs in either the C language or in shell script. In this post I will be dealing with the Bash part.
###The Importance of ZFS Block Size
One of the important tunables in ZFS is the recordsize (for normal datasets) and volblocksize (for zvols). These default to 128KB and 8KB respectively.
###Using a Raspberry Pi 2 as a Router on a Stick Starring NetBSD
A few weeks ago I set about upgrading my feeble networking skills by playing around with a Cisco 2970 switch. I set up a couple of VLANs and found the urge to set up a router to route between them. The 2970 isn’t a modern layer 3 switch so what am I to do?
Why not make use of the Raspberry Pi 2 that I’ve never used and put it to some good use as a ‘router on a stick’.
I could install a Linux based OS as I am quite familiar with it but where’s the fun in that? In my home lab I use SmartOS which by the way is a shit hot hypervisor but as far as I know there aren’t any Illumos distributions for the Raspberry Pi. On the desktop I use Solus OS which is by far the slickest Linux based OS that I’ve had the pleasure to use but Solus’ focus is purely desktop. It’s looking like BSD then!
I believe FreeBSD is renowned for it’s top notch networking stack and so I wrote to the BSDNow show on Jupiter Broadcasting for some help but it seems that the FreeBSD chaps from the show are off on a jolly to some BSD conference or another(love the show by the way).
It looks like me and the luvverly NetBSD are on a date this Saturday. I’ve always had a secret love for NetBSD. She’s a beautiful, charming and promiscuous lover(looking at the supported architectures) and I just can’t stop going back to her despite her misgivings(ahem, zfs). Just my type of grrrl!
Let’s crack on…
##Beastie Bits
Tarsnap
##Feedback/Questions
4.9
8989 ratings
OpenZFS and DTrace updates in NetBSD, NetBSD network security stack audit, Performance of MySQL on ZFS, OpenSMTP results from p2k18, legacy Windows backup to FreeNAS, ZFS block size importance, and NetBSD as router on a stick.
##Headlines
merge a new version of the CDDL dtrace and ZFS code. This changes the upstream vendor from OpenSolaris to FreeBSD, and this version is based on FreeBSD svn r315983.
in addition to the 10 years of improvements from upstream, this version also has these NetBSD-specific enhancements:
###NetBSD network stack security audit
Over the last five months, hundreds of patches were committed to the source tree as a result of this work. Dozens of bugs were fixed, among which a good number of actual, remotely-triggerable vulnerabilities.
Changes were made to strengthen the networking subsystems and improve code quality: reinforce the mbuf API, add many KASSERTs to enforce assumptions, simplify packet handling, and verify compliance with RFCs. This was done in several layers of the NetBSD kernel, from device drivers to L4 handlers.
The IPv6 Buffer Overflow: The overflow allowed an attacker to write one byte of packet-controlled data into ‘packet_storage+off’, where ‘off’ could be approximately controlled too. This allowed at least a pretty bad remote DoS/Crash
DigitalOcean
###MySQL on ZFS Performance
I used sysbench to create a table of 10M rows and then, using export/import tablespace, I copied it 329 times. I ended up with 330 tables for a total size of about 850GB. The dataset generated by sysbench is not very compressible, so I used lz4 compression in ZFS. For the other ZFS settings, I used what can be found in my earlier ZFS posts but with the ARC size limited to 1GB. I then used that plain configuration for the first benchmarks. Here are the results with the sysbench point-select benchmark, a uniform distribution and eight threads. The InnoDB buffer pool was set to 2.5GB.
ZFS stores the files in B-trees in a very similar fashion as InnoDB stores data. To access a piece of data in a B-tree, you need to access the top level page (often called root node) and then one block per level down to a leaf-node containing the data. With no cache, to read something from a three levels B-tree thus requires 3 IOPS.
The extra IOPS performed by ZFS are needed to access those internal blocks in the B-trees of the files. These internal blocks are labeled as metadata. Essentially, in the above benchmark, the ARC is too small to contain all the internal blocks of the table files’ B-trees. If we continue the comparison with InnoDB, it would be like running with a buffer pool too small to contain the non-leaf pages. The test dataset I used has about 600MB of non-leaf pages, about 0.1% of the total size, which was well cached by the 3GB buffer pool. So only one InnoDB page, a leaf page, needed to be read per point-select statement.
To correctly set the ARC size to cache the metadata, you have two choices. First, you can guess values for the ARC size and experiment. Second, you can try to evaluate it by looking at the ZFS internal data. Let’s review these two approaches.
You’ll read/hear often the ratio 1GB of ARC for 1TB of data, which is about the same 0.1% ratio as for InnoDB. I wrote about that ratio a few times, having nothing better to propose. Actually, I found it depends a lot on the recordsize used. The 0.1% ratio implies a ZFS recordsize of 128KB. A ZFS filesystem with a recordsize of 128KB will use much less metadata than another one using a recordsize of 16KB because it has 8x fewer leaf pages. Fewer leaf pages require less B-tree internal nodes, hence less metadata. A filesystem with a recordsize of 128KB is excellent for sequential access as it maximizes compression and reduces the IOPS but it is poor for small random access operations like the ones MySQL/InnoDB does.
I was reluctant to grow the ARC to 7GB, which was nearly half the overall system memory. At best, the ZFS performance would only match XFS. A larger InnoDB page size would increase the CPU load for decompression on an instance with only two vCPUs; not great either. The last option, the L2ARC, was the most promising.
ZFS is much more complex than XFS and EXT4 but, that also means it has more tunables/options. I used a simplistic setup and an unfair benchmark which initially led to poor ZFS results. With the same benchmark, very favorable to XFS, I added a ZFS L2ARC and that completely reversed the situation, more than tripling the ZFS results, now 66% above XFS.
We have seen in this post why the general perception is that ZFS under-performs compared to XFS or EXT4. The presence of B-trees for the files has a big impact on the amount of metadata ZFS needs to handle, especially when the recordsize is small. The metadata consists mostly of the non-leaf pages (or internal nodes) of the B-trees. When properly cached, the performance of ZFS is excellent. ZFS allows you to optimize the use of EBS volumes, both in term of IOPS and size when the instance has fast ephemeral storage devices. Using the ephemeral device of an i3.large instance for the ZFS L2ARC, ZFS outperformed XFS by 66%.
###OpenSMTPD new config
OpenSMTPD started ten years ago out of dissatisfaction with other solutions, mainly because I considered them way too complex for me not to get things wrong from time to time.
Anatomy of a design error
The initial configuration format was very different, I was inspired by pyr@’s hoststated, which eventually became relayd, and designed my configuration format with blocks enclosed by brackets.
When I first showed OpenSMTPD to pyr@, he convinced me that PF-like one-line rules would be awesome, and it was awesome indeed.
It helped us maintain our goal of simple configuration files, it helped fight feature creeping, it helped us gain popularity and become a relevant MTA, it helped us get where we are now 10 years later.
That being said, I believe this was a design error. A design error that could not have been predicted until we hit the wall to understand WHY this was an error. One-line rules are semantically wrong, they are SMTP wrong, they are wrong.
One-line rules are making the entire daemon more complex, preventing some features from being implemented, making others more complex than they should be, they no longer serve our goals.
To get to the point: we should move to two-line rules :-)
OpenSMTPD decides to accept or reject messages based on one-line rules such as:
accept from any for domain poolp.org deliver to mbox
Which can essentially be split into three units:
To ensure that we meet the requirements of the transactions, the matching must be performed during the SMTP transaction before we take a decision for the recipient.
The first solution, which we’ve been using for a decade, was to save the action within the envelope and kind of carve it in stone. This works fine… however it comes with the downsides that errors fixed in configuration files can’t be caught up by envelopes, that delivery action must be validated way ahead of time during the SMTP transaction which is much trickier, that the parsing of delivery methods takes place as the _smtpd user rather than the recipient user, and that envelope structures that are passed all over OpenSMTPD carry delivery-time informations, and more, and more, and more. The code becomes more complex in general, less safe in some particular places, and some areas are nightmarish to deal with because they have to deal with completely unrelated code that can’t be dealt with later in the code path.
The second solution can’t be done. An envelope may be the result of nested rules, for example an external client, hitting an alias, hitting a user with a .forward file resolving to a user. An envelope on disk may no longer match any rule or it may match a completely different rule If we could ensure that it matched the same rule, evaluating the ruleset may spawn new envelopes which would violate the transaction. Trying to imagine how we could work around this leads to more and more and more RFC violations, incoherent states, duplicate mails, etc…
There is simply no way to deal with this with atomic rules, the matching and the action must be two separate units that are evaluated at two different times, failure to do so will necessarily imply that you’re either using our first solution and all its downsides, or that you are currently in a world of pain trying to figure out why everything is burning around you. The minute the action is written to an on-disk envelope, you have failed.
A proper ruleset must define a set of matching patterns resolving to an action identifier that is carved in stone, AND a set of named action set that is resolved dynamically at delivery time.
Break
##News Roundup
I have some old Windows servers (10 years and counting) and I have been using rsync to back them up to my FreeNAS box. It has been working great for me.
First of all, I do have my Windows servers backup in virtualized format. However, those are only one-time snapshops that I run once in a while. These are classic ASP IIS web servers that I can easily put up on a new VM. However, many of these legacy servers generate gigabytes of data a day in their repositories. Running VM conversion daily is not ideal.
My solution was to use some sort of rsync solution just for the data repos. I’ve tried some applications that didn’t work too well with Samba shares and these old servers have slow I/O. Copying files to external sata or usb drive was not ideal. We’ve moved on from Windows to Linux and do not have any Windows file servers of capacity to provide network backups. Hence, I decided to use Delta Copy with FreeNAS. So here is a little write up on how to set it up. I have 4 Windows 2000 servers backing up daily with this method.
First, download Delta Copy and install it. It is open-source and pretty much free. It is basically a wrapper for cygwin’s rsync. When you install it, it will ask you to install the Server services which allows you to run it as a Rsync server on Windows. You don’t need to do this. Instead, you will be just using the Delta Copy Client application. But before we do that, we will need to configure our Rsync service for our Windows Clients on FreeNAS.
There you have it. Windows rsync to FreeNAS using DeltaCopy.
iXsystems
###How to write ATF tests for NetBSD
I have recently started contributing to the amazing NetBSD foundation. I was thinking of trying out a new OS for a long time. Switching to the NetBSD OS has been a fun change.
My first contribution to the NetBSD foundation was adding regression tests for the Address Sanitizer (ASan) in the Automated Testing Framework(ATF) which NetBSD has. I managed to complete it with the help of my really amazing mentor Kamil. This post is gonna be about the ATF framework that NetBSD has and how to you can add multiple tests with ease.
In ATF tests we will basically be talking about test programs which are a suite of test cases for a specific application or program.
There are a variety of commands that the atf suite offers. These include :
atf-check: The versatile command that is a vital part of the checking process. man page
atf-run: Command used to run a test program. man page
atf-fail: Report failure of a test case.
atf-report: used to pretty print the atf-run. man page
atf-set: To set atf test conditions.
We will be taking a better look at the syntax and usage later.
Let’s start with the Basics
The ATF testing framework comes preinstalled with a default NetBSD installation. It is used to write tests for various applications and commands in NetBSD. One can write the Test programs in either the C language or in shell script. In this post I will be dealing with the Bash part.
###The Importance of ZFS Block Size
One of the important tunables in ZFS is the recordsize (for normal datasets) and volblocksize (for zvols). These default to 128KB and 8KB respectively.
###Using a Raspberry Pi 2 as a Router on a Stick Starring NetBSD
A few weeks ago I set about upgrading my feeble networking skills by playing around with a Cisco 2970 switch. I set up a couple of VLANs and found the urge to set up a router to route between them. The 2970 isn’t a modern layer 3 switch so what am I to do?
Why not make use of the Raspberry Pi 2 that I’ve never used and put it to some good use as a ‘router on a stick’.
I could install a Linux based OS as I am quite familiar with it but where’s the fun in that? In my home lab I use SmartOS which by the way is a shit hot hypervisor but as far as I know there aren’t any Illumos distributions for the Raspberry Pi. On the desktop I use Solus OS which is by far the slickest Linux based OS that I’ve had the pleasure to use but Solus’ focus is purely desktop. It’s looking like BSD then!
I believe FreeBSD is renowned for it’s top notch networking stack and so I wrote to the BSDNow show on Jupiter Broadcasting for some help but it seems that the FreeBSD chaps from the show are off on a jolly to some BSD conference or another(love the show by the way).
It looks like me and the luvverly NetBSD are on a date this Saturday. I’ve always had a secret love for NetBSD. She’s a beautiful, charming and promiscuous lover(looking at the supported architectures) and I just can’t stop going back to her despite her misgivings(ahem, zfs). Just my type of grrrl!
Let’s crack on…
##Beastie Bits
Tarsnap
##Feedback/Questions
1,971 Listeners
272 Listeners
283 Listeners
265 Listeners
215 Listeners
154 Listeners
65 Listeners
189 Listeners
181 Listeners
44 Listeners
21 Listeners
135 Listeners
92 Listeners
29 Listeners
47 Listeners