2018-06-10 | Adam Boliński

The Magic of Millions IOPS Database

The Magic of Millions IOPS Database


This blog post is the first of  series where I will cover database environment with conjunction of Software Define Storage called NVMesh develop by Excelero company www.excelero.com, Tiento company became official Partner of Excelero in EMEA region.

In this particular post I would like to show how we can build high performance Oracle RAC database environment with potential to use very powerful storage subsystem capable to gain millions of IOPS and transfer around 6-7 Gigabytes per second but what is most important  for  database environment we can get very low latency  around 0.1 -0.2 ms.

First of all, I would like to present storage subsystem which will be used in this environment, beginning of last year I started to cooperate with great innovative company Excelero (www.excelero.com), company which develop very powerful and low latency Software Define Storage called NVMesh.

Before we can go further I would like present how this NVMesh concept works and how this infrastructure is built in this case in my small company LAB.

NVMesh software is built on NVMe devices and low latency network using their own implementation of RDMA called RDDA (patented Remote Direct Drive Access)  which provide low (almost local) latency for remote storage devices.  Distributed NVMe storage resources are pooled with the ability to create arbitrary, dynamic block volumes that can be utilized by any host running the NVMesh block client. These virtual volumes can be stripped, mirrored or both while enjoying centralized management, monitoring, and administration. In short, applications can enjoy the latency, throughput, and IOPS of a local NVMe device while at the same time getting the benefits of centralized, redundant storage.





So to build this LAB we need RoCE or Infiniband network , NVMe devices , Infiniband/RoCE cards and few servers (depends of needs and number of NVMe devices).

My environment consist from Infiniband network in EDR technology , ConnectX-5 VPI cards , 4 servers E5-2600v3/v4 and NVMe devices (HGST SN260 and HGST SN200).  It is important to understand that you must have knowledge of Infiniband network , NVMe storage devices and CPU (many CPU can’t utilise such powerful IO storage and they end up with 1.0 mln-1.2 IOPS 4k block).  So to deploy this environment you must have medium knowledge of this infrastructure parts and I will not cover more deeply this because this is topics for few  very long blogs posts.

In NVMesh environment we manage all storage resource (physical drives, clients,targets, logical drives etc.) using NVMesh console via REST API , this is how it looks like :



In this case my goal is to achieve over 1 mln IOPS 8k in database environment , my Virtual Disk from NVMesh build from two stripes equal size first (2 x SN200 – 3.2 TB ) and second (2 x SN260 – 3.2 TB), for storage server where NVMe cards are installed I use two servers with 1xE5-2600v3 processor in each.

I have two DB servers in each 1 x CPU E5-2680v4 , 64 GB RAM  and two port card ConnectX-5 VPI connected to Infiniband switch EDR MSB 7790 , Oracle RAC software 12.2 with newest patches.

Now we have created LAB environment let’s check our first results using calibrate_io , I know that is not best tool to do it but for the first shot let’s try it.

max_iops = 1510657
latency = 0
max_mbps = 11034

So performance are very good we have over 1,5 mln IOPS (8k database) and throughput around 11GB/s , so let’ try to measure latency on this storage I/O , for  this  test  we will be using  SLOB.

So let’s see first  results , database is just started to clean this micro results.

As we can see most of the latencies is less then 256 microsecond and many of those latencies was around 128 microseconds , let check how this looks like in AWR report.

Wait Event Wait Time Summary Avg Wait Time
I# Class Event Waits %Timeouts Total(s) Avg Wait %DB time Avg Min Max Std Dev Cnt
* DB CPU 294.41 97.48 2
Cluster gc cr multi block grant 1,456 0.00 3.94 2.71ms 1.31 1.46ms 146.44us 2.77ms 1.86ms 2
User I/O db file sequential read 18,503 0.00 2.65 143.05us 0.88 151.18us 142.23us 160.14us 12.66us 2


As we can see sum. db file sequential read avg. is 151.18 us , ok let’s have a look how it will be latency for log parallel write


Wait Event Wait Time Summary Avg Wait Time
I# Class Event Waits %Timeouts Total(s) Avg Wait %DB time Avg Min Max Std Dev Cnt
* DB CPU 294.41 97.48 2
System I/O log file parallel write 373,320 0.00 26.40 70.72us 8.74 62.57us 54.38us 70.75us 11.57us 2

Log file parallel write is also very low latency in sum. average 62.57 us , so the latency are very good , but let’s check how is the maximum load on storage serves (targets) where this NVMe cards are installed and where from the I/O is served to clients (two DB servers).

As we  can see offload capability is working perfect , we don’t see any load on storage server despite that this server provide over 1 mln IOPS (4k) from each .

In conclusion we can see that this low latency storage subsystem NVMesh working perfectly on Oracle RAC environment , next my post will be TPS test to gain maximum number of transaction with minimal number of cores/Oracle licenses using selected Intel processor and NVMesh as the storage subsystem.  We will check how to decrease Oracle licenses in production environment using NVMesh SDS and my first test show me very optimistic results. Next two posts will show us how we can use  it also in PostgreSQL environment and how this storage will be working with new Grid Infrastructure 18c where new feature  was introduce which omit cache fusion and depends on very high performance I/O storage subsystem.