How to determine whether enabling ZFS deduplication, which removes redundant data from ZFS file systems, will save you disk space without reducing performance.
What Is ZFS Deduplication?
In Oracle Solaris 11, you can use the deduplication (dedup
) property to remove redundant data from your ZFS file systems. If a file system has the dedup
property enabled, duplicate data blocks are removed as they are written to disk. The result is that only unique data is stored on disk and common components are shared between files, as shown in Figure 1.
Figure 1. Only Unique Data Is Stored on Disk
In some cases, deduplication can result in savings in disk space usage and cost. However, you must consider the memory requirements before enabling the dedup
property. Also, consider whether enabling compression on your file systems would provide an excellent way to reduce disk space consumption.
Use the following steps to enable deduplication. Note that it is important to perform the first two steps before attempting to use deduplication.
Step 1: Determine Whether Your Data Is Dedup-able
Determine if your data would benefit from deduplication space savings by using the ZFS debugging tool, zdb
. If your data is not “dedup-able,” there is no point in enabling dedup
.
Deduplication is performed using checksums. If a block has the same checksum as a block that is already written to the pool, it is considered to be a duplicate and, thus, just a pointer to the already-stored block is written to disk.
Therefore, the process of trying to deduplicate data that cannot be deduplicated simply wastes CPU resources. ZFS deduplication is in-band. This means that deduplication occurs when you write data to disk and impacts both CPU and memory resources.
For example, if the estimated deduplication ratio is greater than 2, you might see deduplication space savings. In the example shown in Listing 1, the deduplication ratio is less than 2, so enabling dedup
is not recommended.
#zdb -S tank
Simulated DDT histogram:
bucket allocated referenced
______ ______________________________ ______________________________
refcnt blocks LSIZE PSIZE DSIZE blocks LSIZE PSIZE DSIZE
------ ------ ----- ----- ----- ------ ----- ----- -----
1 1.00M 126G 126G 126G 1.00M 126G 126G 126G
2 11.8K 573M 573M 573M 23.9K 1.12G 1.12G 1.12G
4 370 418K 418K 418K 1.79K 1.93M 1.93M 1.93M
8 127 194K 194K 194K 1.25K 2.39M 2.39M 2.39M
16 43 22.5K 22.5K 22.5K 879 456K 456K 456K
32 12 6K 6K 6K 515 258K 258K 258K
64 4 2K 2K 2K 318 159K 159K 159K
128 1 512 512 512 200 100K 100K 100K
Total 1.02M 127G 127G 127G 1.03M 127G 127G 127G
dedup = 1.00, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.0
Listing 1: Determining the Deduplication Ratio
Step 2: Determine Whether Your System Has Enough Memory to Support Deduplication Operations
This step is critical because deduplication tables consume memory and eventually spill over and consume disk space. At that point, ZFS has to perform extra read and write operations for every block of data on which deduplication is attempted, which causes a reduction in performance.
Furthermore, the cause of the performance reduction is difficult to determine if you are unaware that deduplication is active and can have adverse effects. A system that has large pools with small memory areas does not perform deduplication well. Some operations, such as removing a large file system with dedup
enabled, severely decrease system performance if the system doesn’t meet the memory requirements.
Calculate memory requirement as follows:
- Each in-core deduplication table (DDT) entry is approximately 320 bytes.
- Multiply the number of allocated blocks by 320.
Here’s an example using the data from the zdb
output in Listing 1:
In-core DDT size (1.02M) x 320 = 326.4 MB of memory is required.
Step 3: Enable the dedup
Property
Be sure that you enable dedup
only for file systems that have dedup-able data, and ensure your systems have enough memory to support dedup operations.
Deduplication is easily enabled on a file system, for example:
#zfs set dedup=on mypool/myfs
Conclusion
After you evaluate the two constraints on deduplication, the deduplication ratio and the memory requirements, you can make a decision about whether to implement deduplication and what the likely savings will be.
See Also
- See the official Oracle Solaris blog
- Check out The Observatory for Oracle Solaris tips and tricks
- Follow Oracle Solaris on Facebook and Twitter
About the Author
This article was originally written by Dominic Kay and was updated by Cindy Swearingen, Oracle Solaris Product Manager