All About ZFS Architecture


Let’s see how ZFS works from the inside. There is a set of disks; an abstraction appears above it in the form of a virtual device (device). In ZFS terminology, this is Vdev, short for the virtual device. There are different implementations of Vdev: Mirror (mirror), which duplicates information as it is on two disks, RAID-Z, similar in logic to classic RAID 5, 6 or 7. Also on the way is a new dRAID, which we will talk about later

How exactly we store and back up data – with an emphasis on performance or the amount of usable space – is the responsibility of Vdev. From a set of Vdevs, we make up a shared pool. If you turn it over to the classic understanding of the same mdadm, to assemble RAID 10, a set of mirrors, you need to make several Vdev Mirrors and combine them into one pool. Each Video is, in fact, a stripe, that is, a separate virtual storage unit. Within the pool, each unique block of data will be stored on only one Vdev.

Logically, ZFS elements are divided into three subsystems:

  1. SPA (Storage Pool Allocator) is directly responsible for cutting into disks and storing data on a disk. This element is responsible for where a particular data block is placed but with abstraction from the disk. When we access it, we see a single space, regardless of the set of specific Vdevs.
  2. DMU (Data Management Unit) – ZFS appears to be regular object storage at this level. There are implementations where it is used with some modifications. For example, the Luster distributed file system implements its layer on the ZFS DMU.
  3. The following DSL (Data and Snapshot Layer) uses this object storage. This component deals directly with file systems, snapshots, the logic that implements a POSIX-compatible file system (it includes the ZPL layer – ZFS POSIX layer).

Also, in ZFS, other subsystems are needed to effectively collaborate with all these layers.

ARC (Adaptive Replacement Cache): It was developed to solve the problem with reading. 

ARC is remarkable in that it focuses not only on tracking those objects that were used last (LRU-cache) but also on tracking frequently used objects, which it caches (MFU-cache).

The classic Linux page cache has a flushing problem: if you read a file larger than the amount of RAM, the old data will be flushed out of the cache since the file will be loaded into the page cache by default.

ARC is an intelligent replacement for the page cache. When ZFS was created, hard drives were often used, with small IOPS. By default, reading copy-on-write data is a random operation; to speed it up, various tricks are used. For example, they accumulate data, write in a large block, and so on, but these optimizations do not always work. In this case, smart caching is needed. Usually, if 99% of reading requests get into the cache during regular operation, if less, something is wrong; it is worth adding RAM.

If ARC does not always fit entirely in memory, there are options to move the cache to a faster separate SSD called L2ARC (Layer 2 ARC).

ZIL (ZFS intent log): We write data to ZFS in transactions; this is a set of expensive operations: calculating hash sums, building a tree, and writing metadata written several times to different disk locations for security. Therefore, we try to fill each transaction with the maximum amount of data. Here (surprise), a specific log type appears, which is indispensable if the fast synchronous recording is needed and the delay is critical. Only here it is rendered as an entity, which allows using different solutions for persistent storage of a piece of synchronous recording. This log is usually tiny and is written by ZIL (ZFS intent log).

ARC and ZIL – although these are technically optional components, they are necessary to ensure high storage performance; without them, the system will run slower. ZFS is more commonly used in production for large data warehouse installations. The architecture implies efficient utilization of many HDDs, SSDs, RAM, CPUs.

Also Read: Pros Of Affiliate Marketing That Make It A Standout

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *