We wanted to test how well Windows 2012 R2 data deduplication worked with our set of 83 VHD's totaling 7.09TB. We did not want to test on our production disaster recovery setup so we needed a separate test environment.
We installed an extra 4TB Western Digital Red Pro in our existing Server 2012 R2 DR Server and copied on the first 3.5TB of Virtual Machines. We left the system to dedup overnight and copied more virtual machines on. After a few cycles of this our 7.09TB of VHD's fitted onto the 4TB drive using just 3.07TB. of space.
Windows performs deduplication when the system is not busy. The "Chunks" are taken out of the file and and placed in the System Volume Information\Dedup\ChunkStore. Files on the hard drive are now links to the relevant data within the chunkstore. Because of this, you can easily see when the deduplication is complete by the fact that a 200GB file is using "0 bytes" on the disk outside of the ChunkStore.
We next ran our first synchronization over the top of the deduplicated data. The files grew a small amount bigger than "0 bytes". Soon after the sync was complete, the deduplication started again.
Once we starting loading the disk up with lots of simultaneous synchronization, we noticed a significant performance drop when hashing data deduplicated files. With deduplication on a large mechanical disk the capacity to read performance ratio is very poor so the synchronization was unable to complete on our test set of data.
This test failed because we had set unreasonable performance expectations. Even at 130MB/s which is about as fast as the test disk runs in ideal circumstances, hash checking all of this data would take at least 16 hours. We were seeing disk speeds of around 40MB/s, far too slow. In the future we will continue testing Data Deduplication on a RAID 10.