Bare Metal vs. Virtualization Performance

I just built a new homelab server. Specs are as follows:

Neil's Lab Server Specifications
CPU Model Intel® Xeon® Gold 6326, 16 Cores (32 Threads), 2.90 GHz (Base), 3.50 GHz (Turbo)
CPU Cooler Noctua NH-U12S DX-4189
Motherboard Supermicro X12SPi
RAM Samsung 6x16GB (96 GB) DDR4-3200 RDIMM ECC PC4-25600R Dual Rank
NIC (On board) Intel X550 2x 10G Base-T
NIC (PCIe) Supermicro AOC-SGP-I4 4x 1GbE
OS NVMe 2x1TB(2TB) Samsung 970 Pro
OS NVMe Carrier Supermicro AOC-SLG3-2M2 PCIe x8
NAS NVMe 8x2TB(16TB) Samsung 970 Evo
NAS NVMe Carrier 2x Quad Gigabyte GC-4XM2G4 PCIe x16
Power Supply EVGA 750 Watt 210-GQ-0750-V1
Chassis NZXT H510i Flow

I was trying to find performance differences of running an ESXi (or Proxmoxx) hypervisor and comparing it with bare metal performance. Google search yields garbage results. I've used VMWare Workstation Pro as well as VMWare Fusion before, but both of those are type 2 hypervisors.

Type 1 Hypervisor:
    [Guest OS] [Guest OS]
    [ Type 1 Hypervisor ]
    [      Hardware     ]

Type 2 Hypervisor:
    [Guest OS] [Guest OS]
    [ Type 2 Hypervisor ]
    [      Host OS      ]
    [      Hardware     ]

So naturally, I was curious what is the performance impact of running a virtualized OS on a Type 1 hypervisor? The most popular options appear to be:

I was able to get a VMWare vSphere 7.0 Enterprise Plus license from UC Berkeley, so I went with VMWare ESXi. vSphere comes with a bazillion packages to manage a datacenter, hypervisor being one of the pieces.

With regards to benchmarking, I just used Geekbench 5.4.3 version for linux to simplify things. Ubuntu Server 20.04 was installed as a bare metal OS as well as a virtual machine on vmfs file system. Both running on Samsung 970 Pro 1TB NVMe drives.

VMWare ESXi - New Virtual Machine Wizard
VMWare ESXi - New Virtual Machine Wizard

Geekbench5 results are as follows:

Geekbench5 - Bare Metal (Baseline) vs. VM
Geekbench5 - Bare Metal (Baseline) vs. VM

I am quite pleased with these results. Letting go 5% of the bare metal performance to have the flexibility of running multiple VMs with cool features such as snapshots is definitely a no-brainer for me. ESXi is rock solid and very much enterprise-grade piece of software. I'll compare Proxmoxx when I get a chance next.

For completeness, the full Geekbench results can be accessed here:

Bogus Ops test (stress-ng)

Using the stress-ng utility, the results are even more impressive:

*********************
SINGLE CORE BOGUS OPS
*********************

VM :~$ sudo stress-ng --matrix 1 -t 60s --metrics-brief
 dispatching hogs: 1 matrix
 successful run completed in 60.00s (1 min, 0.00 secs)
 stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
 matrix           237270     60.00     60.00      0.00      3954.50      3954.50

BARE METAL :~$ sudo stress-ng --matrix 1 -t 60s --metrics-brief
 dispatching hogs: 1 matrix
 successful run completed in 60.00s (1 min, 0.00 secs)
 stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
 matrix           240077     60.00     59.99      0.00      4001.28      4001.95

*********************
MULTI CORE BOGUS OPS
*********************
VM: $ sudo stress-ng --matrix 0 -t 60s --metrics-brief
 dispatching hogs: 32 matrix
 successful run completed in 60.01s (1 min, 0.01 secs)
 stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
 matrix          4531734     60.00   1919.50      0.00     75529.30      2360.89

BARE METAL: $ sudo stress-ng --matrix 0 -t 60s --metrics-brief
 dispatching hogs: 32 matrix
 successful run completed in 60.01s (1 min, 0.01 secs)
 stressor       bogo ops real time  usr time  sys time   bogo ops/s   bogo ops/s
                           (secs)    (secs)    (secs)   (real time) (usr+sys time)
 matrix          4578028     60.00   1919.57      0.00     76300.66      2384.92


*********************
MULTI CORE STRESS TEST
*********************
VM:~$ sudo stress-ng --cpu 32 --cpu-method all --perf -t 60
 dispatching hogs: 32 cpu
 successful run completed in 61.30s (1 min, 1.30 secs)
 cpu:
                          0 Cache L1D Read                  0.00 /sec
                          0 Cache L1D Read Miss             0.00 /sec
                          0 Cache L1D Write                 0.00 /sec
                          0 Cache L1I Read Miss             0.00 /sec
                          0 Cache DTLB Read                 0.00 /sec
                          0 Cache DTLB Read Miss            0.00 /sec
                          0 Cache DTLB Write                0.00 /sec
                          0 Cache DTLB Write Miss           0.00 /sec
                          0 Cache ITLB Read Miss            0.00 /sec
                          0 Cache BPU Read                  0.00 /sec
                          0 Cache BPU Read Miss             0.00 /sec
          1,918,611,096,224 CPU Clock                      31.30 B/sec
          1,918,611,245,760 Task Clock                     31.30 B/sec
                     26,592 Page Faults Total             433.84 /sec
                     26,592 Page Faults Minor             433.84 /sec
                          0 Page Faults Major               0.00 /sec
                      1,376 Context Switches               22.45 /sec
                          0 CPU Migrations                  0.00 /sec
                          0 Alignment Faults                0.00 /sec
                          0 Emulation Faults                0.00 /sec
                     26,528 Page Faults User              432.79 /sec
                        128 Page Faults Kernel              2.09 /sec
                      3,424 System Call Enter              55.86 /sec
                      3,392 System Call Exit               55.34 /sec
                        256 TLB Flushes                     4.18 /sec
                          0 Kmalloc                         0.00 /sec
                          0 Kmalloc Node                    0.00 /sec
                          0 Kfree                           0.00 /sec
                         32 Kmem Cache Alloc                0.52 /sec
                          0 Kmem Cache Alloc Node           0.00 /sec
                         32 Kmem Cache Free                 0.52 /sec
                     25,376 MM Page Alloc                 414.00 /sec
                          0 MM Page Free                    0.00 /sec
                  1,124,672 RCU Utilization                18.35 K/sec
                        448 Sched Migrate Task              7.31 /sec
                          0 Sched Move NUMA                 0.00 /sec
                      1,408 Sched Wakeup                   22.97 /sec
                          0 Sched Proc Exec                 0.00 /sec
                          0 Sched Proc Exit                 0.00 /sec
                          0 Sched Proc Fork                 0.00 /sec
                          0 Sched Proc Free                 0.00 /sec
                          0 Sched Proc Hang                 0.00 /sec
                          0 Sched Proc Wait                 0.00 /sec
                      1,376 Sched Switch                   22.45 /sec
                         32 Signal Generate                 0.52 /sec
                         32 Signal Deliver                  0.52 /sec
                        928 IRQ Entry                      15.14 /sec
                        928 IRQ Exit                       15.14 /sec
                    408,768 Soft IRQ Entry                  6.67 K/sec
                    408,768 Soft IRQ Exit                   6.67 K/sec
                          0 Writeback Dirty Inode           0.00 /sec
                          0 Writeback Dirty Page            0.00 /sec
                          0 Migrate MM Pages                0.00 /sec
                          0 SKB Consume                     0.00 /sec
                          0 SKB Kfree                       0.00 /sec
                          0 IOMMU IO Page Fault             0.00 /sec
                          0 IOMMU Map                       0.00 /sec
                          0 IOMMU Unmap                     0.00 /sec
                          0 Filemap page-cache add          0.00 /sec
                          0 Filemap page-cache del          0.00 /sec
                          0 OOM Compact Retry               0.00 /sec
                          0 OOM Wake Reaper                 0.00 /sec
                          0 Thermal Zone Trip               0.00 /sec


BARE METAL:~$ sudo stress-ng --cpu 32 --cpu-method all --perf -t 60
 dispatching hogs: 32 cpu
 successful run completed in 61.30s (1 min, 1.30 secs)
 cpu:
          6,299,757,819,808 CPU Cycles                      0.10 T/sec
          5,413,717,415,168 Instructions                   88.31 B/sec (0.859 instr. per cycle)
            993,084,418,080 Branch Instructions            16.20 B/sec
             21,493,168,320 Branch Misses                   0.35 B/sec ( 2.16%)
             47,817,010,688 Bus Cycles                      0.78 B/sec
          5,546,730,077,536 Total Cycles                   90.48 B/sec
             12,255,527,424 Cache References                0.20 B/sec
                164,721,152 Cache Misses                    2.69 M/sec ( 1.34%)
            557,545,772,192 Cache L1D Read                  9.10 B/sec
             64,777,397,440 Cache L1D Read Miss             1.06 B/sec
            390,375,844,512 Cache L1D Write                 6.37 B/sec
              1,156,206,112 Cache L1I Read Miss            18.86 M/sec
              3,278,834,912 Cache LL Read                  53.49 M/sec
                  3,822,464 Cache LL Read Miss             62.36 K/sec
                 85,624,000 Cache LL Write                  1.40 M/sec
                 47,737,408 Cache LL Write Miss             0.78 M/sec
            515,595,101,888 Cache DTLB Read                 8.41 B/sec
                    279,424 Cache DTLB Read Miss            4.56 K/sec
            369,159,797,632 Cache DTLB Write                6.02 B/sec
                  1,280,000 Cache DTLB Write Miss          20.88 K/sec
             10,606,716,288 Cache ITLB Read Miss            0.17 B/sec
            988,307,728,992 Cache BPU Read                 16.12 B/sec
             21,333,012,864 Cache BPU Read Miss             0.35 B/sec
                  9,740,960 Cache NODE Read                 0.16 M/sec
                          0 Cache NODE Read Miss            0.00 /sec
                 51,092,544 Cache NODE Write                0.83 M/sec
                          0 Cache NODE Write Miss           0.00 /sec
          1,917,173,399,968 CPU Clock                      31.27 B/sec
          1,917,190,695,168 Task Clock                     31.28 B/sec
                     26,624 Page Faults Total             434.32 /sec
                     26,624 Page Faults Minor             434.32 /sec
                          0 Page Faults Major               0.00 /sec
                    160,672 Context Switches                2.62 K/sec
                          0 CPU Migrations                  0.00 /sec
                          0 Alignment Faults                0.00 /sec
                          0 Emulation Faults                0.00 /sec
                     26,560 Page Faults User              433.27 /sec
                        128 Page Faults Kernel              2.09 /sec
                      3,936 System Call Enter              64.21 /sec
                      3,904 System Call Exit               63.69 /sec
                        256 TLB Flushes                     4.18 /sec
                          0 Kmalloc                         0.00 /sec
                          0 Kmalloc Node                    0.00 /sec
                         64 Kfree                           1.04 /sec
                         64 Kmem Cache Alloc                1.04 /sec
                          0 Kmem Cache Alloc Node           0.00 /sec
                        224 Kmem Cache Free                 3.65 /sec
                     25,344 MM Page Alloc                 413.44 /sec
                         32 MM Page Free                    0.52 /sec
                  1,294,656 RCU Utilization                21.12 K/sec
                        128 Sched Migrate Task              2.09 /sec
                          0 Sched Move NUMA                 0.00 /sec
                    160,832 Sched Wakeup                    2.62 K/sec
                          0 Sched Proc Exec                 0.00 /sec
                          0 Sched Proc Exit                 0.00 /sec
                          0 Sched Proc Fork                 0.00 /sec
                          0 Sched Proc Free                 0.00 /sec
                          0 Sched Proc Hang                 0.00 /sec
                          0 Sched Proc Wait                 0.00 /sec
                    160,672 Sched Switch                    2.62 K/sec
                         32 Signal Generate                 0.52 /sec
                         32 Signal Deliver                  0.52 /sec
                        992 IRQ Entry                      16.18 /sec
                        992 IRQ Exit                       16.18 /sec
                    366,624 Soft IRQ Entry                  5.98 K/sec
                    366,624 Soft IRQ Exit                   5.98 K/sec
                          0 Writeback Dirty Inode           0.00 /sec
                          0 Writeback Dirty Page            0.00 /sec
                          0 Migrate MM Pages                0.00 /sec
                          0 SKB Consume                     0.00 /sec
                          0 SKB Kfree                       0.00 /sec
                          0 IOMMU IO Page Fault             0.00 /sec
                          0 IOMMU Map                       0.00 /sec
                          0 IOMMU Unmap                     0.00 /sec
                          0 Filemap page-cache add          0.00 /sec
                          0 Filemap page-cache del          0.00 /sec
                          0 OOM Compact Retry               0.00 /sec
                          0 OOM Wake Reaper                 0.00 /sec
                          0 Thermal Zone Trip               0.00 /sec

← Back to Home