At my current company, the powers that be have bought into a multi-cluster approach. I'm not saying 1 cluster per environment or region. No. I'm saying 1 non-production and 1 production cluster per system per region. Essentially, each team gets their own pair of clusters in each region we choose to operate in. I have repeatedly advised against this approach, and here's why...
Time and Effort
Each cluster takes time and effort to create. Sure, we could manage all of them with Rancher. But even then, we are talking multiple security reviews for each cluster with the enterprise security team (yeah, we're that kind of enterprise 😢), drastically increased cost due to duplicate cloud resources and reduced capability for resource packing, and a whole plethora of additional problems. The fact is, physical isolation (aka. cluster isolation) very rarely makes sense.
Note: It is advisable to isolate non-production and production. Stuff all of your non-production environments into one cluster, and let production live it a cluster all by itself. This is a common practice.
I am currently in a situation where I need an application on one cluster to retrieve data from another application that resides in a different cluster via an HTTP REST API. Sure, we could link these clusters together with a service mesh. But then we are connected with another team's cluster and managing another layer on top of our clusters. Who owns the management and coordination of the service mesh? For that matter, the service mesh would need to span many many clusters in order to tie everything together. That's even more overhead in manhours.
So now that we have established that a service mesh would be a lot more effort, let's consider that in order to make this HTTP request, the application needs to make the request over the web, as opposed to keeping the request internal to Kubernetes. That's drastically increased latency! What could have been a 10ms round trip with one or two hops has now become a 50ms round trip with at least 5 hops. Not to mention the fact that we need to have an Application Gateway in front of both clusters.
Implicitly, this theme is littered throughout this article. Crazy amounts of additional manhours are required to build and configure everything, even if you are using infrastructure as code. And much of the infrastructure is not just redundant, but outright unnecessary. For each cluster, we are paying for separate Application Gateways, API Management, Load Balancers, Virtual Machine Scale Sets, IP addresses, etc... The list goes on and on. And beyond that, we can't even pack workloads onto VMs to share compute space and to scale efficiently.
By taking the approach of physical isolation, we have effectively stripped Kubernetes of many of its core and foundational capabilities. Metaphorically, it has now become our new virtual machine. In all honesty, if you are planning to adopt the physical isolation paradigm, you should likely consider a different way to run your applications until you are comfortable enough with Kubernetes to know and understand how to make your clusters safe for all of your teams and applications to reside in harmony.
Bottom line: Two clusters is most likely how many your company should have. To add some clarity, that's two clusters (non-prod and prod) per region that you choose to operate in.