vSAN Best Practices – Part I

It is usual to hear “vSAN is piece of cake” or “vSAN is straightforward“, that’s not false without being 100% true… vSAN is easy and straightforward to setup because it has been designed for that purpose. However, there are part of configuration to understand and consider to make your journey with vSAN as great as it should.

In this 3 parts “version agnostic” article, I’ll review some of the Best Practices and real life advices to definitely understand in order to stay away from the “easy peasy next-next-yes-ok” trap.

The Big Picture

I’ll begin, with a couple of rules of thumb that I would recommend to know if you don’t want to go down in flames with your HCI infrastructure. Again, even this is not a vSAN training, it is always good to review some of the fundamentals.

For those already familiar with the solution, maybe this is an opportunity to review some concepts. Anyway, make yourself comfortable (not too much) and grab a coffee!

First, vSAN is not block storage (neither file btw)! That is said, what I mean here, is a vSAN datastore is more than bunch of aggregated disks, it’s an object store. In fact, vSAN is everything about one or more objects (virtual disks, VM, Snapshot, etc.)

Each of these objects is made of a single or multiple components, depending on their size and your requirements. Indeed, the maximum size of a given object being 255-GB, some object types (e.g. vmdk) would require several component. These components are distributed across the cluster based on VM Storage Policy, here comes the Storage Policy Based Management.

As you may already know, each host participating to a vSAN cluster is providing storage through its local disks. Those disks are part of Disk Group(s), composed of a Flash device (referred as Cache) and one or more Capacity devices (Flash or Spinning Disks).

Consequently, this identifies two vSAN cluster configurations that are: All Flash & Hybrid.

Always a matter of cash cache!

Something we will never say often enough, is the importance of cache! Even though, I’m not talking about “legacy” storage (I don’t like this term, but that’s the way it is), our write IO are still landing in cache, how lucky we are!

Where’s my write?

To demonstrate how things happen in vSAN cluster, let’s have a look on how it handles IO reads/writes, on All-Flash and Hybrids configurations.

In a nutshell, the writes IO come to cache tier and are immediately acknowledged back to guest VM:

Following this, an All-Flash system, de-stages the data to its capacity tier, also made of Flash devices. As a consequence, reads can occur from cache or capacity depending on de-stage periodicity and everything is fine, life is good you’re walking on air.

It is another story when it comes to Hybrid. A Hybrid configuration uses a different approach in terms of cache tier handling, splitting it into write Buffer (30%) and read Cache (70%).

Do You Wanna Hit It?

The Donnas

Indeed, like its Enterprise storage big brothers, vSAN wants Read Hits. No need to break the rules, the goal is always to make 90% of the reads happening from cache, much faster than spinning capacity tier.

For that purpose, Hot blocks, that keep getting read, are gonna be elevated into this Read Cache avoiding Read Miss to occur. This prevents increasing latency and causing performance penalty.

Therefore, you can imagine the importance to give to this cache tier, even more with an Hybrid cluster. That’s why as a Best Practice, it might be a good idea to opt for things like Intel Optane NVMe as the cache tier and something a little bit cheaper for capacity.

This would enable increase in IOPS and reduction in disk latencies, while not adding complexity to your infrastructure.
Another Best Practice, if available, is to dedicate one Storage Controller per Disk Group.

I would also recommend to opt for solutions like DellEMC VxRAIL or vSAN Ready nodes that are co-engineered with VMware, making easier to match the recommended configurations.

Keep in mind, failure is not binary and when it comes to hardware, you are as weak as your weakest component…

Networking : A penny saved is a penny earned…

Please, don’t. Just don’t. We all know this situation when your company has invested a lot of money in your brand new fancy HCI infrastructure and you (or someone else) might be tempted to save money on something “less important” like networking.

This is exactly where you’re wrong, except if you want to see your environment (and your day) going down in flames!

You need to keep in mind that networking is the core, literally the backbone of an HCI infrastructure. In fact, each action is responsible for creating activity; put a host in maintenance, recover from failure, change in a policy, etc.

No need to say that a majority of performance issues reported to support are related to customer network trouble…

With that in mind, we can definitely say 25GbE is the new 10GbE, if you are planning a new environment, this is the way to go. On the other hand, you might consider to leverage enterprise-class Ethernet switches with sufficient buffers to handle significant workload.

In addition to that, another hardware consideration is the usage of RDMA compatible network interfaces. Indeed, RDMA compatible nics are able to bypass CPU/Kernel and transfer the data directly to memory of a given host.

This is a pretty simple way to speed up operations like vMotion, background & meta data operations or rebuilds, making your environment simply much more performant. This technology is fully supported since vSphere 6.7

The more the merrier

Now comes the question of how many of hosts are needed to match your requirements. I would say « Livin’ on the Edge » might be left to Aerosmith but stay away from your production.

What I’m saying here, is when the minimum hosts required to setup a vSAN cluster is 3, maybe, it might be handy to add an extra one.

To illustrate, if I consider a Mirroring, a RAID-1 cluster, with a minimum of 3 hosts, then I have 1 component on a host, 1 copy on another host and a witness on a third one. I would (in fact VMware would) in this case recommend adding an additional host.

The reason beside this is in case of host failure, you want to have a host where you can immediately rebuild, without having to wait. This will provide extra flexibility and extra protection for rebuild.

And this is not limited to RAID-1 FTM, adding extra node(s) should be definitely part of the cluster sizing decisions.

And this is it for this first Part, please don’t hesitate to let us know your opinion on this article, leave a comment and share if you liked.

The Part II will focus on the Cluster Configuration.

8 Replies to “vSAN Best Practices – Part I”

  1. Very good read!
    1 question: taking Erasure Coding (RAID-5) as local protection method with SFTT=1 in a stretched cluster, is it possible to expand the vSAN cluster by adding another host without the cluster needing to rebuild for a certain amount of time?

    1. Hi Tim, thanks for reading!
      I would say no matter your Failure Tolerance Method (except 2-Nodes clusters),it is supported to add an additional host without involving any automatic rebalancing, there’s no specific action in such a situation. Depending on the reason why you are adding a host (e.g. running out space) you might want to trigger a manual rebalancing across all the host in the cluster, again, it is manual process… You may find more information over here:
      Hope this help!

  2. I read the new way is 25Gbe, but can i use 10Gbe when i am using big SQL machines with 256K blocks?

    1. Hello Sander and thanks for reading. I don’t see any immediate constraint using 10GbE nor 25GbE with a 256KB block size… anyway, as you know, IOPS and blocks size are strictly related to each other, this block size is definitely to consider in your performance sizing. Are you using vSAN?

Leave a Reply

Your email address will not be published.