In this second part of VMware vSAN Best Practices, I will focus on some of the cluster configuration aspects, the way things work, the “dos & don’ts“, illustrated with scenario. If it is not done yet, I recommend that you read the first part of VMware vSAN Best Practices.
Let’s begin with an important topic in the planning phase of your vSAN Hyper-Converged Infrastructure, the disk groups’ layout.
We already know that a vSAN Disk Group consists of 1 Flash Device (cache) and 1 to 7 Capacity Device(s). In addition, a vSAN host can have up to 5 Disk Groups. With that in mind, as a Best Practice, VMware recommends at least two disk groups.
Working with multiple disk groups enables:
- Increase in performance
- More room for IO (caching)*
- Better tolerance to failure
*Current implementation of vSAN have a functional level of 600-GB for caching device. This means using a larger device won’t increase the write buffer capacity, it will, however, increase its the wear leveling and longevity.
Therefore, as good start (again, depending on the needs) it would be to go for, 2 Disk Groups (1 cache device + 3 capacity devices).
In addition to process IO in parallel, the second cache device would provide better chances to absorb longer bursts of writes during de-staging period, without running into congestion. Apart from this, the host could still serve IO if one Disk Group fails.
A vSAN cluster can scale-up or scale-out to accommodate growth that best reflects the needs of the organization. However, don’t forget you can design your vSAN hosts future scale-up expansion without having to add any host or license…
More details on scale up expansion: blogs.vmware.com
Deduplication & Compression are great opportunities to make space saving when working with All-Flash vSAN Configuration and Advanced licensing. To begin with, the first thing to consider it the fact it is handled globally at the cluster level.
Secondly, it is interesting to know at least 70% of customers who enabled Deduplication & Compression have a least 2x or more savings. However, on the other hand, there’re still workloads not benefiting from efficiency savings like, encrypted data.
The vSAN deduplication algorithm, uses 4-KB fixed blocks which is a significant advantage compared to other solutions. Indeed, using larger block size makes more challenging to find a match to deduplicate to.
Also, deduplication occurs on a per disk group basis. This leads to the recurring question which is: “Why it doesn’t use a Global Deduplication like some other vendors ?”
In fact, Global Deduplication requires to maintain a global metadata table, which will naturally become larger and more fragile to global corruption should some problem occurs… VMware has made the choice to work at disk group level to maintain the best possible balance of benefits versus cost.
Now, you might be concerned about dealing with space saving and performance impact. In most of the case the performance penalty is negligible compared to space savings, but as always it is a matter of balance.
The reason for such an impact is an extra step in the IO process for deduplication.
As a result, in case of write intensive situation, there’s a potential possibility to fill-up the buffer (remember, write buffer in part I). Consequently, it takes a little bit longer for those writes to be de-staged to the capacity (flash devices) tier.
In short, Should I stay or Should I go? In this case, I would say, it depends. Indeed, the best way to evaluate the gain/impact is still to enable it and monitor very closely, for instance in a POC initiative.
To sum up, all environments are different from each other, the savings are truly difficult to evaluate. No need to say that if your workload is composed of 80% of Databases & Analytics, it’s probably not worth to enable it.
However, if you have a majority of VDI, virtualised servers in addition to some unstructured data, you may consider enabling and be surprised of the gain.
vSAN includes a built-in Data-at-Rest Encryption (D@RE) feature, which is a per-cluster setting supporting Hybrid, All–Flash and stretched clusters.
It doesn’t require for self-encrypting drives and works with all vSAN other services.
Encryption is the final step of the write IO process, just before the blocks land in Cache or Capacity tier (data in flight is still transmitted unencrypted):
Write IO to buffer:
- Write IO broken into 64K chunks
- Checksum performed on 4K blocks
- Encryption performed on 4K blocks
- Lands in buffer
- Decrypt performed on 4K blocks
- Dedupe performed on 4K blocks
- Compression performed on 4K blocks
- Encryption performed on 2-4K blocks
- Lands in capacity tier
As a Best Practices, should I say mandatory practice, the KMS server should not be stored inside the vSAN cluster.
Extract from: https://blogs.vmware.com:
Specific to vSAN though, it is important to keep the KMS external to the vSAN datastore it is providing key management for.
If the KMS resides on the datastore it is providing key management for, a circular dependency can occur.
Hosts in a vSAN cluster that has vSAN Encryption enabled will directly contact the KMS they are assigned to upon boot up to unlock/mount disk groups.
Consider the following scenario:
- KMS resides on a vSAN cluster that has vSAN Encryption enabled.
- Hosts that have KMS disks for a virtualized KMS appliance lose power. The KMS is then not accessible.
- Those hosts are rebooted, and attempt to connect to the (now unavailable) KMS appliance.
- The previously failed vSAN hosts will boot, but will not unlock or mount the disk groups.
- The KMS appliance’s disks are still not available and will not be.
It is important to remember that a KMS appliance should not be stored on the vSAN datastore that it is providing keys for. This is not a supported configuration.
Green in not enough
As a rule of thumb, you always want to make sure of the health of your components before you perform some tasks like upgrades.
It is a common, general Best Practice, to check if everything is “green”, running as it should before considering any change.
For the most part, vSAN is actually very good about testing and reporting this for you, telling what’s happening in a very intelligent way.
However, there are tricky situations where a closer upstream monitoring is an absolute necessity as vSAN Health is a little “selfish”.
Consider the following scenario:
A vSAN stretched cluster, no matter the number of hosts with a native Active/Passive uplink port policy – common type of configuration.
In that case, if for whatever reason, the active uplink suffers from high CRC errors rate, you might start to see a substantial decrease in your overall stretched cluster performance, caused by continuously soaring latency (hundreds of milliseconds).
Indeed, incomplete or corrupted received Layer 2 frames (due to bad NIC, switch, cable, SFP…) are dropped, reflecting as output & input errors on the switch(es) ports.
As a result, vSAN core elements (VMKernel, DOM, etc.) might start to maximise their CPU threads having to deal with retransmits or waiting for ACKs. Hence, this kind of snowball effect, can lead to an almost inoperable environment, if not detected soon enough.
Why vSAN doesn’t use the second (or any other) passive link in such situation?
To clarify, vSAN doesn’t do its own path failure, it is in fact at the mercy of the vSphere networking stack to failover.
With that in mind, vSphere only understands link failure, but cannot detect high CRC or other layer-2 errors…. Thus, it cannot act accordingly, which can lead to situation described above.
All this to say, this can be avoided with a strong upfront monitoring strategy.
For instance, involve networking teams to integrate SNMP monitoring with your vROps instance. Another thing to be strongly considered on switches is the use of SYSLOG. Apart from this, dump it with vRLI and run a search for “err* crit*, warn*”… Also, build a dashboard with a specific threshold related to the errors you’ve found above.
Finally, as a Best Practice, I would strongly recommend to closely monitor all the aspects of vSAN. To illustrate that, as written in Part I, the networking layer is a fundament of vSAN, while this is maybe not you first line of investigation when your disk Read/Write latency freaks out.
In short, the final words will be “it depends“. Again, there are “standard” best practices, but what we’ve seen above are some example of the “Dos and Don’ts“.
Next week, for the final VMware vSAN Best Practices Part III, we’ll dive a little bit deeper into SPBM – Storage Policy Based Management.