2016-01-28 | Adam Boliński

Extreme Performance RAC

Powerful RAC implementation using
Flash Shared Storage

Based of client’s needs to build not expensive very powerful Oracle RAC cluster environment with not big amount of storage I designed RAC cluster configuration using PCIe Shared Flash Storage to achieve best performance, availability, scalability and cost saving.  Designed configuration uses extended cluster configuration and ASM preferred reads to achieve High Availability (HA) and get the best performance.

The Oracle RAC database has the share architecture. All data files, control files, SPFILEs, and redo log files in Oracle RAC environments must reside on cluster-aware shared disks so that all of the cluster database instances can access the shared storage. All database instances must see the same view of the cluster storage and the data files. The redo log files on the shared storage are used for instance recovery.

PCIe SSD storage are directly attached to the server. They have some very desirable features that benefit databases that require high performance and cost effective solutions. Until now, it was not possible to use server attached PCIe SSDs as RAC storage.

To achieve needs of Oracle Grid Instrastructure we use to build our complete infrastructure revolutionary technology from Virdent/HGST – FlashMAX III Cards with Sharing Software from Virident/HGST.

Zdjęcie 1

Using sharing software, we can achieve “sharing” PCIe Flash Card between remote server (so the cards are not need to be physically connected to server witch will be write/read on it).

For many years it was problem to use SSD Flash cards to build RAC cluster because of Capacity (to small size of Flash Cards/Disks) and most important any possibility for High Availability –  If a server goes down, the storage is down as well.

This two limitation was resolved by Virident/HGST, size of cards vary 550 MB – 4.8TB and most important “sharing” PCIe cards between servers is done by Virident/HGST software called vShare.

Image of 2-node Cluster using vShare Software and Virident/HGST Cards


Main question which everyone can ask is how we achieve satisfactory bandwidth between two (or more) server to not downgrade available PCIe Flash Cards Performance.

To achieve needed performance, we will use InfiniBand network QDR/FDR – we will have big amount of bandwih and lowest possible latency.


Configuration of ASM Disk Group


  • Servers 1 and 2 have FlashMAX PCIe cards which are configured with ASM normal redundancy.
  • ASM handles the consistency of data across the two PCIe SSDs by means of ASM mirroring.
  • Diskgroup consists of logical shareable disks across the server. In our case, each FlashMAX SSD is exposed as a disk.
  • Each Diskgroup has two failure groups, FG1 and FG2, with one on each node. So all data in server1:FG1 is also available in server2:FG2. Oracle ASM ensures that at least two copies of data are present, one on each failure group. This way, should a server go down (i.e. a  failure group), there is no impact on data availability
  • Each failure group can contain one or more FlashMAX PCIe SSDs. The data will be evenly distributed across all the SSDs in a single failgroup.
  • Both servers can access the failure groups for Reads and Writes.
  • ASM Preferred Read is set up for node affinity. So all reads from server1 will only access data from FG1 and reads from server2 will access data from FG2 through the direct high speed PCIe bus which is attached to the server. Each failure group has mirror copy of data.
  • Writes to server1 is written directly to FG1 through the PCIe bus and to FG2 through a high speed InfiniBand interconnect.
  • When is needed we add quorum disk (i.e. from NFS Server).


Implementation and results:
Check the InfiniBand Cards and InfiniBand Connections:

CA 'mlx5_0'
CA type: MT4113
Number of ports: 2
Firmware version: 10.10.1010
Hardware version: 0
Node GUID: 0x24be05ffffa646d0
System image GUID: 0x24be05ffffa646d0
Port 1:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 1
LMC: 0
SM lid: 1
Capability mask: 0x06514848
Port GUID: 0x24be05ffffa646d0
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 56
Base lid: 4
LMC: 0
SM lid: 4
Capability mask: 0x0651484a
Port GUID: 0x24be05ffffa646d8
Link layer: InfiniBand

vShare software Installation

vgc-­‐rdma-­‐3.10.0-­‐123.el7.x86_64-­‐2.1.VS 19694.14a19f5.Bormio.release.x86_64

vShare Configuration

name: prod_cluster
host: prod1
backing-­‐dev: /dev/vgca0
size: 923
initiators: prod2
host: prod2
backing-­‐dev: /dev/vgca0
size: 923
initiators: prod1
prod1: 0x24be05ffffa646f0
prod2: 0x24be05ffffa646d0
prod1: 0x24be05ffffa646f8
prod2: 0x24be05ffffa646d8

Check proper InfiniBand configuration

GUIDS="0x24be05ffffa646f0 0x24be05ffffa646f8"
GUIDS="0x24be05ffffa646d0 0x24be05ffffa646d8"

ASM Configuration

'compatible.asm' = '',
'compatible.rdbms' = '12.1'
alter system set asm_preferred_read_failure_groups=’FG1'
alter system set asm_preferred_read_failure_groups='FG2'

Flash Card Status:

vgc-monitor: Virident Cluster Solutions 2.1.19690.a738097.Bormio.release
Driver Uptime: 31 days 1:45
Card Name Num Partitions Card Type Status
vgca 1 VBL-M2-LP-1100-2B Good
Partitions Usable Capacity RAID FMC
vgca0 923 GB enabled enabled

After complete installation of Oracle Grid Inftrastructure and Oracle Databases Software and migrate all databases to new Oracle RAC Cluster, we check I/O performance using dbms_resource_manager.calibrate_io  package.

Performance results are show bellow:

max_iops = 840027
latency = 0
max_mbps = 9898

After migration all database to new Oracle RAC cluster we compare performance of new RAC cluster to old  RAC cluster in Oracle Database batch processing and reporting.
New cluster with Virident/HGST Flash cards achieve 425% of performance boost compare to old one.