I want you to act as an expert systems administrator specializing in ZFS filesystems. I want you to give me commands to run to collect information about my ZFS drives and configuration, and I will paste the output back to you, and you will then make configuration recommendations (possibly including the commands to run) to increase performance given my use case and priorities (which you can also ask me anything about).
Great, I'll be happy to help you with your ZFS configuration. Let's start by gathering some basic information about your current ZFS setup.
Please run the following commands and provide me with the output:
- To check the ZFS version and support on your system:
zfs --version
- To list the current ZFS pools and their basic properties:
zpool list
- To get more detailed information about the ZFS pools:
zpool status
- To display properties for each dataset (filesystem or volume) in your pool:
zfs list -o name,used,avail,refer,mountpoint,compression,checksum,atime,recordsize
Once I have this information, I'll be able to make configuration recommendations tailored to your system. Additionally, please provide some information about your use case and priorities, such as:
- What type of data is stored on the ZFS filesystem? (e.g. large files, small files, databases, VM images, etc.)
- What are your primary concerns? (e.g. data integrity, performance, capacity, etc.)
- What is your typical workload like? (e.g. read-heavy, write-heavy, mixed, etc.)
To first answer your questions, the type of data on this filesystem is mainly game data but can also include app data and log data, code in various languages, etc. My primary concerns are performance; I'm happy with the data integrity and capacity for now. My typical workload is read-heavy to mixed.
Here is the output of the commands you requested:
pmarreck 🟢 3h20m43s413ms
20230402133659 ~ bash> zfs --version
zfs-2.1.9-1
zfs-kmod-2.1.9-1
pmarreck 🟢 40ms
20230403231338 ~ bash> zpool list
NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
bpool 3.75G 714M 3.05G - - 1% 18% 1.00x ONLINE -
rpool 8.92T 3.90T 5.02T - - 9% 43% 1.00x ONLINE -
pmarreck 🟢 37ms
20230403231345 ~ bash> zpool status
pool: bpool
state: ONLINE
scan: scrub repaired 0B in 00:00:09 with 0 errors on Sun Apr 2 02:00:22 2023
config:
NAME STATE READ WRITE CKSUM
bpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD101FZBX-00ATAA0_VCKR0ALP-part2 ONLINE 0 0 0
ata-WDC_WD101FZBX-00ATAA0_VCKR0UGP-part2 ONLINE 0 0 0
cache
nvme-nvme.1987-4130443630373032303341373031353239333735-53616272656e7420526f636b657420342e3020325442-00000001-part2 ONLINE 0 0 0
nvme-nvme.1987-3033383530373032313135443031363835333438-53616272656e7420526f636b657420342e3020325442-00000001-part2 ONLINE 0 0 0
errors: No known data errors
pool: rpool
state: ONLINE
scan: scrub repaired 0B in 06:39:41 with 0 errors on Sun Apr 2 08:39:55 2023
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-WDC_WD101FZBX-00ATAA0_VCKR0ALP-part3 ONLINE 0 0 0
ata-WDC_WD101FZBX-00ATAA0_VCKR0UGP-part3 ONLINE 0 0 0
logs
nvme-Sabrent_Rocket_4.0_2TB_A0D6070203A701529375-part1 ONLINE 0 0 0
nvme-nvme.1987-3033383530373032313135443031363835333438-53616272656e7420526f636b657420342e3020325442-00000001-part1 ONLINE 0 0 0
cache
nvme-nvme.1987-4130443630373032303341373031353239333735-53616272656e7420526f636b657420342e3020325442-00000001-part3 ONLINE 0 0 0
nvme-nvme.1987-3033383530373032313135443031363835333438-53616272656e7420526f636b657420342e3020325442-00000001-part3 ONLINE 0 0 0
errors: No known data errors
pmarreck 🟢 22ms
20230403231351 ~ bash> zfs list -o name,used,avail,refer,mountpoint,compression,checksum,atime,recordsize
NAME USED AVAIL REFER MOUNTPOINT COMPRESS CHECKSUM ATIME RECSIZE
bpool 714M 2.93G 96K /boot lz4 on off 128K
bpool/nixos 712M 2.93G 96K none lz4 on off 128K
bpool/nixos/root 712M 2.93G 712M /boot lz4 on off 128K
rpool 3.90T 4.90T 96K / zstd-8 edonr off 1M
rpool/nixos 3.90T 4.90T 96K none zstd-8 edonr off 1M
rpool/nixos/home 3.80T 4.90T 3.80T /home zstd-8 edonr off 1M
rpool/nixos/root 86.6G 4.90T 86.6G / zstd-8 edonr off 1M
rpool/nixos/var 15.8G 4.90T 96K /var zstd-8 edonr off 1M
rpool/nixos/var/lib 15.8G 4.90T 15.8G /var/lib zstd-8 edonr off 1M
rpool/nixos/var/log 1.19M 4.90T 1.19M /var/log zstd-8 edonr off 1M
Thank you for providing the requested information and describing your use case. Based on the information provided, here are a few recommendations to improve performance for your ZFS filesystem:
- Since your workload is read-heavy to mixed, consider adjusting the ZFS Adaptive Replacement Cache (ARC) settings to optimize for read performance. You can limit the ARC size by setting the
zfs_arc_max
parameter. Here's an example of how to set it to 8 GB (adjust the value according to your system's memory):
echo "options zfs zfs_arc_max=8589934592" | sudo tee /etc/modprobe.d/zfs.conf
sudo sysctl -w vfs.zfs.arc_max=8589934592
- Enable the L2ARC (Level 2 Adaptive Replacement Cache) to improve read performance. You already have two NVMe SSDs configured as cache devices in your pools, so the L2ARC is already enabled. However, you can further tune the L2ARC by adjusting the
l2arc_write_max
,l2arc_write_boost
, andl2arc_headroom
parameters. For example:
echo "options zfs l2arc_write_max=104857600" | sudo tee -a /etc/modprobe.d/zfs.conf
echo "options zfs l2arc_write_boost=104857600" | sudo tee -a /etc/modprobe.d/zfs.conf
echo "options zfs l2arc_headroom=2" | sudo tee -a /etc/modprobe.d/zfs.conf
sudo sysctl -w vfs.zfs.l2arc_write_max=104857600
sudo sysctl -w vfs.zfs.l2arc_write_boost=104857600
sudo sysctl -w vfs.zfs.l2arc_headroom=2
These settings increase the L2ARC write speed and the amount of headroom for prefetch requests, which can improve read performance.
-
Consider disabling access time (atime) updates on your filesystems. This can reduce disk I/O overhead during reads, as it prevents the need to write metadata updates each time a file is accessed. You've already disabled atime updates on your datasets, so no further action is needed here.
-
For workloads with a mix of large and small files, it might be beneficial to use a smaller record size. Currently, you have a record size of 1M for your rpool datasets. You can experiment with a smaller record size, such as 128K or 256K, to see if it improves performance for your specific workload. Keep in mind that changing the record size will only affect newly-written data. To change the record size for an existing dataset, run:
sudo zfs set recordsize=128K rpool/nixos/home
Replace rpool/nixos/home
with other datasets as needed.
Please remember that these recommendations are just a starting point, and you might need to fine-tune these settings based on your specific use case and hardware. Always test these changes on a non-production system first to ensure there are no negative impacts on performance or stability.
Let me know if you have any questions or need further assistance.
I use NixOS. How would I add these configurations to /etc/nixos/configuration.nix ?
In NixOS, you can add these configurations to /etc/nixos/configuration.nix
using the boot.kernelParams
and fileSystems
options. Here's how you can modify your configuration.nix
file to include the recommended changes:
- Add the ZFS ARC and L2ARC parameters as kernel options:
boot.kernelParams = [
"zfs.zfs_arc_max=8589934592"
"zfs.l2arc_write_max=104857600"
"zfs.l2arc_write_boost=104857600"
"zfs.l2arc_headroom=2"
];
- Set the record size for the ZFS datasets:
Add or modify the following lines in your fileSystems
configuration:
fileSystems = {
"/home" = {
device = "rpool/nixos/home";
fsType = "zfs";
options = [ "recordsize=128K" ];
};
# Add or modify other datasets with the desired record size as needed
};
After updating your /etc/nixos/configuration.nix
file, you will need to rebuild your NixOS configuration by running:
sudo nixos-rebuild switch
Please note that modifying the record size as shown above will only affect newly-written data. If you want to apply the new record size to existing data, you will need to create a new dataset with the desired record size, copy the data over, and then replace the old dataset with the new one.
Keep in mind that the provided examples are just starting points, and you might need to adjust the values based on your specific use case and hardware. Always test these changes on a non-production system first to ensure there are no negative impacts on performance or stability.
any additional comments on the output of zfs get all rpool
below?
~ bash> zfs get all rpool
NAME PROPERTY VALUE SOURCE
rpool type filesystem -
rpool creation Tue Sep 6 17:54 2022 -
rpool used 3.90T -
rpool available 4.90T -
rpool referenced 96K -
rpool compressratio 1.21x -
rpool mounted no -
rpool quota none default
rpool reservation none default
rpool recordsize 1M local
rpool mountpoint / local
rpool sharenfs off default
rpool checksum edonr local
rpool compression zstd-8 local
rpool atime off local
rpool devices on default
rpool exec on default
rpool setuid on default
rpool readonly off default
rpool zoned off default
rpool snapdir hidden default
rpool aclmode discard default
rpool aclinherit restricted default
rpool createtxg 1 -
rpool canmount off local
rpool xattr sa local
rpool copies 1 default
rpool version 5 -
rpool utf8only on -
rpool normalization formD -
rpool casesensitivity sensitive -
rpool vscan off default
rpool nbmand off default
rpool sharesmb off default
rpool refquota none default
rpool refreservation none default
rpool guid 14439948851242123522 -
rpool primarycache all default
rpool secondarycache all local
rpool usedbysnapshots 0B -
rpool usedbydataset 96K -
rpool usedbychildren 3.90T -
rpool usedbyrefreservation 0B -
rpool logbias latency default
rpool objsetid 54 -
rpool dedup off default
rpool mlslabel none default
rpool sync standard default
rpool dnodesize auto local
rpool refcompressratio 1.00x -
rpool written 96K -
rpool logicalused 4.73T -
rpool logicalreferenced 42K -
rpool volmode default default
rpool filesystem_limit none default
rpool snapshot_limit none default
rpool filesystem_count none default
rpool snapshot_count none default
rpool snapdev hidden default
rpool acltype posix local
rpool context none default
rpool fscontext none default
rpool defcontext none default
rpool rootcontext none default
rpool relatime on local
rpool redundant_metadata all default
rpool overlay on default
rpool encryption off default
rpool keylocation none default
rpool keyformat none default
rpool pbkdf2iters 0 default
rpool special_small_blocks 0 default
rpool nixos:shutdown-time Sat Apr 1 12:26:24 PM EDT 2023 local
Based on the output of zfs get all rpool
, your configuration seems reasonable for your use case. Here are some additional comments:
-
Compression: You are using
zstd-8
compression, which is a good choice for general-purpose datasets. It offers a balance between compression ratio and CPU usage. -
Record size: You have set the record size to 1M, which is suitable for large files and streaming workloads. However, for your gaming and mixed-use case, consider setting the record size to 128K, as it can provide better performance for smaller files and random I/O.
-
Checksum: You are using the
edonr
checksum algorithm, which is considered experimental and not recommended for production use. I recommend using the defaultfletcher4
checksum for better compatibility and reliability, unless you have specific reasons to useedonr
. -
atime: Disabling access time updates (atime) is a good choice for improving performance, as it reduces unnecessary write operations.
-
relatime: Enabling relative access time (relatime) is also a good choice, as it further reduces unnecessary write operations caused by access time updates.
-
primarycache and secondarycache: Both are set to
all
, which is the default and suitable for most workloads. If you have specific requirements, you can experiment with changing these settings to improve performance. -
xattr: You have set extended attributes (xattr) to
sa
(System Attributes), which can provide better performance and is recommended for modern systems. -
dnodesize: You have set the dnodesize to
auto
, which is a good choice as it allows ZFS to automatically choose the appropriate dnode size based on the dataset's characteristics.
Overall, your configuration seems reasonable for your use case. You may want to consider the recommendations provided earlier (record size and checksum algorithm) to further optimize your ZFS filesystem's performance.
Wouldn't setting the recordsize to 128k result in worse compression ratios though? I'm not sure if the implementation of the zstd algorithm in ZFS uses its "dictionary" feature, which might mitigate the negative effect on total compression ratio of reducing the recordsize. (Or does compression in ZFS operate by block? Or by record? I'm not sure.)
You're correct that setting the record size to 128K can potentially result in worse compression ratios compared to using a larger record size. Compression in ZFS operates at the record level, not the block level. When ZFS compresses data, it compresses each record independently. Smaller record sizes may lead to less effective compression because there's less data in each record for the compression algorithm to work with.
However, it's important to balance compression ratio with the performance characteristics of your workload. A smaller record size like 128K can provide better performance for smaller files and random I/O, which might be more important for your use case with gaming and mixed workloads.
The zstd implementation in ZFS does not utilize the dictionary feature. It compresses data on a per-record basis without sharing dictionaries between records.
In summary, there is a trade-off between compression ratio and performance when choosing a record size. While a smaller record size may result in a slightly worse compression ratio, it can provide better performance for certain workloads. You should consider your workload's specific requirements and priorities when deciding on the appropriate record size. If performance is more important in your use case, a smaller record size such as 128K may be more suitable.