In this third and final part of vSAN Best Practices Series, let’s dive a little bit deeper into SPBM and advanced vSAN settings. We will also detail the potential impact of the Storage Policies changes and how-to better deal with it.
SPBM – Storage Policy Based Management
SPBM is VMware’s vSAN & VVOLs framework that provides a desired class of service (Configuration, Availability, QoS, Stripping, etc.). These policies can be assigned at Disk or VM level, VMDK of VM or VMDKs for container persistent volumes, depending on your needs. Creating a storage policy is nothing more than defining your requirements for an object.
Levels of protection and performance are no longer reliant upon the capabilities of an external storage array, but controller and managed by the hypervisor.
SPBM enables Policy Based Data Placement option, which allows to control where to run your storage workload, for instance:
- Dual Site Mirroring
- None – Keep on preferred
- None – Keep on non-preferred
How much capacity does it cost ?
Considering a stretched cluster, without site disaster tolerance
Obviously, it all depends on criteria like the Fault Tolerance Method (RAID-1 or RAID-5/6), Failure(s) To Tolerate (FTT, the number of host failures you can support).
The default and more common option is RAID-1, FTT-1. This Mirroring Failure Tolerance Method results in one copy of an object on one host, another copy on a second host and a third witness component acting as tiebreaker, a 100GB object will cost 200GB:
Let’s now increase our FTT to 2. We can now support two failures, two hosts or two disks going down at the same time. It is still a “RAID-1” Fault Tolerance Method, meaning full copies of the objects are spread across hosts, but instead of having two copies, we have 3, added to 2 extra witness components to determine quorum, a 100GB object will cost 300GB:
This time, for capacity saving purpose, we decide to go RAID-5, Erasure-coding. Our data components are stripped across multiple hosts with parity information written to provide tolerance of a failure. With this configuration, a 100GB RAID-5 object will cost 133GB of disk space:
Indeed, vSAN RAID-5 Erasure coding guarantees 30% storage savings compared to RAID-1. Using RAID-5, FTT-1, means that the minimum number of hosts in a cluster will be four. A four hosts configuration allows the failure of one host, without data loss (but losing redundancy). The best practice to maintain full redundancy with the failure of one host, is to opt for minimum 5 hosts.
Finally, RAID-6 configuration, supporting two failures, without data loss (FTT=2) will require 6 hosts but will only cost 150GB to save 100GB, being the most efficient protection method. RAID-6 is nothing more than a dual parity version of the erasure coding scheme used in RAID-5. It does offer a guaranteed 50% savings in capacity overhead compared to RAID-1:
- 100GB object consumes 300GB with RAID-1, FTT=2
- 100GB object consumes 150GB with RAID-6, FTT=2
Nothing comes free, there are tradeoffs to consider with erasure coding. As shown in our examples above, RAID-5 require at least four hosts and six with RAID-6, to satisfy our FTT=1 or FTT=2.
This introduces write amplification as parity components must also be updated when data components are written to. Indeed, a single write operation in RAID-5 results in 2 reads & 2 writes, in RAID-6, in 3 reads & 3 writes (due to double parity). That’s a reason why, vSAN requires All Flash nodes to support erasure coding.
Furthermore, an increase in latency as measured by the VM may also occurs when choosing RAID-5/6. In addition to the write amplification phenomenon, the amplified IO must go through the east-west network in order to complete the IO process.
A last performance impact to consider is the event of operating in a degraded state. In failure conditions, vSAN must deal with additional tasks in order to ensure IOs are protected while it is rebuilding. This also depends whether the missing part is composed of data or parity.
Anyway, even if there are potential performance impact, it is limited and, again, you can take advantage of vSAN’s granularity to exclude, for instance, mission critical VM from erasure coding storage policy.
Here comes to question: “Which is the best” with always the same answer, “It all depends” – we can think about it the same way we did for Deduplication & Compression in Part II: Performance VS Capacity Saving.
Here’s the thing, SPBM are too often overlooked because too simple, but why not benefit from this ease of usage? For that reason, as a general recommendation, I would start with RAID-5, deeply analyze and report the performance. Then, if RAID-5 doesn’t make it, move the object to another more suitable policy.
vSAN gives us flexibility in addition to ease of use, why not consider multiple Storage Policies, somehow like the tiering, we (as legacy storage admin) used and loved ?
This enables to define a certain level of granular criticality for our objects, dealing with performance & capacity needs, leaving room for adjustment as requirements evolve.
Quality of Services
When it comes to Quality of Service, a common error (be honest, we’ve all did that) is being reactive by limiting IOPS of a particular VM object which is cannibalising resources overs the others. Unfortunately, it’s often too late, here’s an example:
Tim is employee in our medium sized company. Our environment is working well, Tim is very happy with the performance on its VDI, everything is fine.
Our company starts growing, with more and more users joining, performance is decreasing.
Tim does not have the same level of performance compared to the past and is less happy compared to the newcomers, Craig and Eddy, who are just starting to use their VDI.
This situation creates frustration for the early users like Tim, because they knew something better, while for Craig or Eddy, this is something they’ve always faced…
As a best practice, I would opt for a more proactive approach which consists to evaluate what is an acceptable level of performance and start with this value, setting the expectation (QoS) upfront, enabling all the users to get the same level of performance, at least in a longer period of time.
vSAN Quality of Services is applied at the DOM client layer, the first vSAN layer, just below the vSCSI layer. If you limit IOPS of a guest VM, this limit only applies to the traffic generated from this guest VM (and its objects) but not to Storage vMotion, vMotion, cloning operations or resync…
The result of throttling IOPS is an increase in reflected latency. Indeed, the performance metrics do not make the difference between latency caused by device or stack and the one imposed by IOPS limitation.
As of today, IOPS limites are not compatible with vSAN File Services enabled objects (vSAN 7).
By design, vSAN is already doing a certain level of striping with its objects. When you increase the Striping value in your Storage Policy, vSAN will decide to stripe on another host in the cluster or within the same disk group (based on resources availability) on local host.
In this case, you won’t feel a big difference, all the writes are still coming into the same disk group, trough the same cache buffer. There’s no special recommendation for Striping, it can be interesting to have a look to where do the objects stripes land… But, do not expect a huge gain in performance by using more stripes.
Object Space Reservation
To start with, by default, vSAN doesn’t use Thick but Thin Provisioning. Even though, the recommendation is to use Thin. Despite there’s the possibility to use Thick, it is Lazy Zeroed only, the Eager option is not supported.
Maybe there is a certain workload for which you definitely need it, otherwise, the Best Practice is to use Thin Provisioning with vSAN. Also, use of Thick provisioning has a negative impact on Deduplication & Compression, preventing the 4KB blocks related to that specific storage policy (Thick enabled) to be deduplicated.
For more information on this vSphere Storage Fundament, I invite you to read this article https://www.nakivo.com/blog/thick-and-thin-provisioning-difference/
Storage Policy Changes
Changing a policy doesn’t come free, repairs and rebuilds are two common causes of backend traffic. To illustrate this, let’s imagine you’ve got a VM, RAID-1 protected, this VM is consuming too much datastore space and you would like to reconfigure it with a RAID-5 protection.
Considering different policies (RAID-1 and RAID-5), the only thing you need to do, is edit the VM, change the policy… straightforward, no big deal.
The below example shows how the additional overhead space changes from the original state, to the transition period, to finally its new overhead matching to RAID-5 EC.
If you had only one single RAID-1 policy, edit this one to go RAID-5, would change the policy for every single VM using it.
Another relevant example, in the context of a Stretched Cluster. Changing the SFTT (Secondary Failure To Tolerate) to Dual Site Mirroring, would result in a copy of the data to the second site to remain compliant.
Same goes if we decide to change the Failure Tolerance Method to RAID-5. vSAN will, to guarantee the data consistency and access during this move, leave the actual mirroring copy and create the RAID-5 stripes.
Again, there’s going to be a transient period where, you’ll have both the mirror and the RAID-5 copies. Obviously, vSAN will get rid of the transient overhead as soon as RAID-5 has been provisioned and in use. For this reason, it is truly important to consider, during this period, to dispose of enough capacity to host both copies of data.
As you’ve read in the above examples, we don’t really want to change a policy and impact all the VM assigned to it, this doesn’t make sense if it can be avoided.
💡 Why not clone the affected policy, set the desired parameters in this clone and assign a few VM at a time to that new policy?
Modifying a larger number of objects’ storage policy will be executed in an arbitrary order. vSAN 6.7u3 brought a regulation in transient space usage from policy changes. Indeed, it is now processed in batches to reduce the amount of transient space consumption.
Slack (Free) Space
This brings us to another very important design consideration, the Slack Space. Every HCI vendors will say the same, you definitely need space for cluster balancing, repairs/rebuild, host evacuation, etc. The VMware best practice for Slack Space is 30%, no less.
It is important to consider the operation which require a rebuild for a capacity or performance perspective and the impact it can have. No doubt vSAN is smart enough to process the background operation and prioritize your workload to avoid contention, limit if needed and use the maximum resource when there’s no contention with to goal of ending the process as quickly as possible.
Anyway, not every modification will require a rebuild:
The goal here was to provide you with some of the keys to start or pursue your journey with vSAN as best as possible.
We could discuss about vSAN for hours, but I cannot cover everything without losing some people along the way (including myself).
Don’t forget that behind vSAN’s ease of access, there’s a powerful engine, don’t treat it too lightly and plan your setup accordingly. I truly encourage you to make use of the different options, RAID-1, RAID-5/6, with or Without QoS, vary the different levels of Failures to Tolerate and please don’t set the networking topic aside.
I hope you guys enjoyed this reading, we are open to feedback, use the comment fields below or get in touch with us on our social networks.
vSAN Best Practices Series:
Check-out our last articles: