
Sign up to save your podcasts
Or
6 metrics for zpool performance, 2FA with ssh on OpenBSD, ZFS maintaining file type information in dirs, everything old is new again, netcat demystified, and more.
##Headlines
The layout of a ZFS storage pool has a significant impact on system performance under various workloads. Given the importance of picking the right configuration for your workload and the fact that making changes to an in-use ZFS pool is far from trivial, it is important for an administrator to understand the mechanics of pool performance when designing a storage system.
Note that when we calculate data rates and IOPS values for the example system, they are only approximations. Many other factors can impact pool access speeds for better (compression, caching) or worse (poor CPU performance, not enough memory).
Let’s apply this to our example system, configured with a 12-wide striped pool:
The blocks are simply striped across the 12 disks in the pool. The LBA column on the left stands for “Logical Block Address”. If we treat each disk as a column in an array, each LBA would be a row. It’s also easy to see that if any single disk fails, we would be missing a color in the rainbow and our data would be incomplete. While this configuration has fantastic read and write speeds and can handle a ton of IOPS, the data stored on the pool is very vulnerable. This configuration is not recommended unless you’re comfortable losing all of your pool’s data whenever any single drive fails.
Here’s a summary:
N-way mirror:
Read IOPS: N * Read IOPS of a single drive
Write IOPS: Write IOPS of a single drive
Streaming read speed: N * Streaming read speed of a single drive
Streaming write speed: Streaming write speed of a single drive
Storage space efficiency: 50% for 2-way, 33% for 3-way, 25% for 4-way, etc. [(N-1)/N]
Fault tolerance: 1 disk per vdev for 2-way, 2 for 3-way, 3 for 4-way, etc. [N-1]
For our first example configuration, let’s do something ridiculous and create a 12-way mirror. ZFS supports this kind of thing, but your management probably will not.
1x 12-way mirror:
Read IOPS: 3000
Write IOPS: 250
Streaming read speed: 1200 MB/s
Streaming write speed: 100 MB/s
Storage space efficiency: 8.3% (6 TB)
Fault tolerance: 11
As we can clearly see from the diagram, every single disk in the vdev gets a full copy of our rainbow data. The chainlink icons between the disk labels in the column headers indicate the disks are part of a single vdev. We can lose up to 11 disks in this vdev and still have a complete rainbow. Of course, the data takes up far too much room on the pool, occupying a full 12 LBAs in the data array.
Obviously, this is far from the best use of 12 drives. Let’s do something a little more practical and configure the pool with the ZFS equivalent of RAID-10. We’ll configure six 2-way mirror vdevs. ZFS will stripe the data across all 6 of the vdevs. We can use the work we did in the striped vdev section to determine how the pool as a whole will behave. Let’s first calculate the performance per vdev, then we can work on the full pool:
1x 2-way mirror:
Read IOPS: 500
Write IOPS: 250
Streaming read speed: 200 MB/s
Streaming write speed: 100 MB/s
Storage space efficiency: 50% (6 TB)
Fault tolerance: 1
Now we can pretend we have 6 drives with the performance statistics listed above and run them through our striped vdev performance calculator to get the total pool’s performance:
6x 2-way mirror:
Read IOPS: 3000
Write IOPS: 1500
Streaming read speed: 3000 MB/s
Streaming write speed: 1500 MB/s
Storage space efficiency: 50% (36 TB)
Fault tolerance: 1 per vdev, 6 total
Again, we will examine the configuration from a visual perspective:
Each vdev gets a block of data and ZFS writes that data to all of (or in this case, both of) the disks in the mirror. As long as we have at least one functional disk in each vdev, we can retrieve our rainbow. As before, the chain link icons denote the disks are part of a single vdev. This configuration emphasizes performance over raw capacity but doesn’t totally disregard fault tolerance as our striped pool did. It’s a very popular configuration for systems that need a lot of fast I/O. Let’s look at one more example configuration using four 3-way mirrors. We’ll skip the individual vdev performance calculation and go straight to the full pool:
While we have sacrificed some write performance and capacity, the pool is now extremely fault tolerant. This configuration is probably not practical for most applications and it would make more sense to use lower fault tolerance and set up an offsite backup system.
###2FA with ssh on OpenBSD
Five years ago I wrote about using a yubikey on OpenBSD. The only problem with doing this is that there’s no validation server available on OpenBSD, so you need to use a different OTP slot for each machine. (You don’t want to risk a replay attack if someone succeeds in capturing an OTP on one machine, right?) Yubikey has two OTP slots per device, so you would need a yubikey for every two machines with which you’d like to use it. You could use a bastion—and use only one yubikey—but I don’t like the SPOF aspect of a bastion. YMMV.
The first thing we need to do is to install the software which will be used to verify the OTPs we submit.
# pkg_add login_oath
We need to create a secret - aka, the seed - that will be used to calculate the Time-based One-Time Passwords. We should make sure no one can read or change it.
$ openssl rand -hex 20 > ~/.totp-key
Now we have a hexadecimal key, but apps usually want a base32 secret. I initially wrote a small script to do the conversion.
We can now move to the configuration of the system to put our new TOTP to use. As you might guess, it’s going to be quite close to what we did with the yubikey.
Again, keeping a root shell around decreases the risk of losing access to the system and being locked outside.
My phone has a long enough password that most of the time, I fail to type it correctly on the first try. Of course, if I had to unlock my phone, launch my TOTP app and use my keyboard to enter what I see on my phone’s screen, I would quickly disable 2FA.
##News Roundup
As an aside in yesterday’s history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn’t. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.
# zdb -dddd fs3-corestaff-01/h/281 1
This is actually an old filesystem (it dates from Solaris 10 and has been transferred around with ‘zfs send | zfs recv’ since then), but various home directories for real and test users have been created in it over time (you can probably guess which one is the oldest one). Sufficiently old directories and files have no file type information, but more recent ones have this information, including .demo-file, which I made just now so this would have an entry that was a regular file with type information.
/*
In yesterday’s entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the ‘inode’ (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean ‘not specified’. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.
###Everything old is new again
Just because KDE4-era software has been deprecated by the KDE-FreeBSD team in the official ports-repository, doesn’t mean we don’t care for it while we still need to. KDE4 was released on January 11th, 2008 — I still have the T-shirt — which was a very different C++ world than what we now live in. Much of the code pre-dates the availability of C11 — certainly the availability of compilers with C11 support. The language has changed a great deal in those ten years since the original release.
###OpenBSD netcat demystified
Owing to its versatile functionalities, netcat earns the reputation as “TCP/IP Swiss army knife”. For example, you can create a simple chat app using netcat:
# nc -l 3003
This means a netcat process will listen on 3003 port in this machine (the IP address of current machine is 192.168.35.176).
# nc 192.168.35.176 3003
Then in the first machine’s terminal, you will see the “hello” text:
# nc -l 3003
A primitive chatroom is built successfully. Very cool! Isn’t it? I think many people can’t wait to explore more features of netcatnow. If you are among them, congratulations! This tutorial may be the correct place for you.
##Beastie Bits
##Feedback/Questions
4.9
8989 ratings
6 metrics for zpool performance, 2FA with ssh on OpenBSD, ZFS maintaining file type information in dirs, everything old is new again, netcat demystified, and more.
##Headlines
The layout of a ZFS storage pool has a significant impact on system performance under various workloads. Given the importance of picking the right configuration for your workload and the fact that making changes to an in-use ZFS pool is far from trivial, it is important for an administrator to understand the mechanics of pool performance when designing a storage system.
Note that when we calculate data rates and IOPS values for the example system, they are only approximations. Many other factors can impact pool access speeds for better (compression, caching) or worse (poor CPU performance, not enough memory).
Let’s apply this to our example system, configured with a 12-wide striped pool:
The blocks are simply striped across the 12 disks in the pool. The LBA column on the left stands for “Logical Block Address”. If we treat each disk as a column in an array, each LBA would be a row. It’s also easy to see that if any single disk fails, we would be missing a color in the rainbow and our data would be incomplete. While this configuration has fantastic read and write speeds and can handle a ton of IOPS, the data stored on the pool is very vulnerable. This configuration is not recommended unless you’re comfortable losing all of your pool’s data whenever any single drive fails.
Here’s a summary:
N-way mirror:
Read IOPS: N * Read IOPS of a single drive
Write IOPS: Write IOPS of a single drive
Streaming read speed: N * Streaming read speed of a single drive
Streaming write speed: Streaming write speed of a single drive
Storage space efficiency: 50% for 2-way, 33% for 3-way, 25% for 4-way, etc. [(N-1)/N]
Fault tolerance: 1 disk per vdev for 2-way, 2 for 3-way, 3 for 4-way, etc. [N-1]
For our first example configuration, let’s do something ridiculous and create a 12-way mirror. ZFS supports this kind of thing, but your management probably will not.
1x 12-way mirror:
Read IOPS: 3000
Write IOPS: 250
Streaming read speed: 1200 MB/s
Streaming write speed: 100 MB/s
Storage space efficiency: 8.3% (6 TB)
Fault tolerance: 11
As we can clearly see from the diagram, every single disk in the vdev gets a full copy of our rainbow data. The chainlink icons between the disk labels in the column headers indicate the disks are part of a single vdev. We can lose up to 11 disks in this vdev and still have a complete rainbow. Of course, the data takes up far too much room on the pool, occupying a full 12 LBAs in the data array.
Obviously, this is far from the best use of 12 drives. Let’s do something a little more practical and configure the pool with the ZFS equivalent of RAID-10. We’ll configure six 2-way mirror vdevs. ZFS will stripe the data across all 6 of the vdevs. We can use the work we did in the striped vdev section to determine how the pool as a whole will behave. Let’s first calculate the performance per vdev, then we can work on the full pool:
1x 2-way mirror:
Read IOPS: 500
Write IOPS: 250
Streaming read speed: 200 MB/s
Streaming write speed: 100 MB/s
Storage space efficiency: 50% (6 TB)
Fault tolerance: 1
Now we can pretend we have 6 drives with the performance statistics listed above and run them through our striped vdev performance calculator to get the total pool’s performance:
6x 2-way mirror:
Read IOPS: 3000
Write IOPS: 1500
Streaming read speed: 3000 MB/s
Streaming write speed: 1500 MB/s
Storage space efficiency: 50% (36 TB)
Fault tolerance: 1 per vdev, 6 total
Again, we will examine the configuration from a visual perspective:
Each vdev gets a block of data and ZFS writes that data to all of (or in this case, both of) the disks in the mirror. As long as we have at least one functional disk in each vdev, we can retrieve our rainbow. As before, the chain link icons denote the disks are part of a single vdev. This configuration emphasizes performance over raw capacity but doesn’t totally disregard fault tolerance as our striped pool did. It’s a very popular configuration for systems that need a lot of fast I/O. Let’s look at one more example configuration using four 3-way mirrors. We’ll skip the individual vdev performance calculation and go straight to the full pool:
While we have sacrificed some write performance and capacity, the pool is now extremely fault tolerant. This configuration is probably not practical for most applications and it would make more sense to use lower fault tolerance and set up an offsite backup system.
###2FA with ssh on OpenBSD
Five years ago I wrote about using a yubikey on OpenBSD. The only problem with doing this is that there’s no validation server available on OpenBSD, so you need to use a different OTP slot for each machine. (You don’t want to risk a replay attack if someone succeeds in capturing an OTP on one machine, right?) Yubikey has two OTP slots per device, so you would need a yubikey for every two machines with which you’d like to use it. You could use a bastion—and use only one yubikey—but I don’t like the SPOF aspect of a bastion. YMMV.
The first thing we need to do is to install the software which will be used to verify the OTPs we submit.
# pkg_add login_oath
We need to create a secret - aka, the seed - that will be used to calculate the Time-based One-Time Passwords. We should make sure no one can read or change it.
$ openssl rand -hex 20 > ~/.totp-key
Now we have a hexadecimal key, but apps usually want a base32 secret. I initially wrote a small script to do the conversion.
We can now move to the configuration of the system to put our new TOTP to use. As you might guess, it’s going to be quite close to what we did with the yubikey.
Again, keeping a root shell around decreases the risk of losing access to the system and being locked outside.
My phone has a long enough password that most of the time, I fail to type it correctly on the first try. Of course, if I had to unlock my phone, launch my TOTP app and use my keyboard to enter what I see on my phone’s screen, I would quickly disable 2FA.
##News Roundup
As an aside in yesterday’s history of file type information being available in Unix directories, I mentioned that it was possible for a filesystem to support this even though its Unix didn’t. By supporting it, I mean that the filesystem maintains this information in its on disk format for directories, even though the rest of the kernel will never ask for it. This is what ZFS does.
# zdb -dddd fs3-corestaff-01/h/281 1
This is actually an old filesystem (it dates from Solaris 10 and has been transferred around with ‘zfs send | zfs recv’ since then), but various home directories for real and test users have been created in it over time (you can probably guess which one is the oldest one). Sufficiently old directories and files have no file type information, but more recent ones have this information, including .demo-file, which I made just now so this would have an entry that was a regular file with type information.
/*
In yesterday’s entry I said that Unix directory entries need to store at least the filename and the inode number of the file. What ZFS is doing here is reusing the 64 bit field used for the ‘inode’ (the ZFS dnode number) to also store the file type, because it knows that object numbers have only a limited range. This also makes old directory entries compatible, by making type 0 (all 4 bits 0) mean ‘not specified’. Since old directory entries only stored the object number and the object number is 48 bits or less, the higher bits are guaranteed to be all zero.
###Everything old is new again
Just because KDE4-era software has been deprecated by the KDE-FreeBSD team in the official ports-repository, doesn’t mean we don’t care for it while we still need to. KDE4 was released on January 11th, 2008 — I still have the T-shirt — which was a very different C++ world than what we now live in. Much of the code pre-dates the availability of C11 — certainly the availability of compilers with C11 support. The language has changed a great deal in those ten years since the original release.
###OpenBSD netcat demystified
Owing to its versatile functionalities, netcat earns the reputation as “TCP/IP Swiss army knife”. For example, you can create a simple chat app using netcat:
# nc -l 3003
This means a netcat process will listen on 3003 port in this machine (the IP address of current machine is 192.168.35.176).
# nc 192.168.35.176 3003
Then in the first machine’s terminal, you will see the “hello” text:
# nc -l 3003
A primitive chatroom is built successfully. Very cool! Isn’t it? I think many people can’t wait to explore more features of netcatnow. If you are among them, congratulations! This tutorial may be the correct place for you.
##Beastie Bits
##Feedback/Questions
1,971 Listeners
272 Listeners
283 Listeners
265 Listeners
215 Listeners
154 Listeners
65 Listeners
189 Listeners
181 Listeners
44 Listeners
21 Listeners
135 Listeners
92 Listeners
29 Listeners
47 Listeners