Extreme Performance RAC
Powerful RAC implementation using
Flash Shared Storage
Based of client’s needs to build not expensive very powerful Oracle RAC cluster environment with not big amount of storage I designed RAC cluster configuration using PCIe Shared Flash Storage to achieve best performance, availability, scalability and cost saving. Designed configuration uses extended cluster configuration and ASM preferred reads to achieve High Availability (HA) and get the best performance.
The Oracle RAC database has the share architecture. All data files, control files, SPFILEs, and redo log files in Oracle RAC environments must reside on cluster-aware shared disks so that all of the cluster database instances can access the shared storage. All database instances must see the same view of the cluster storage and the data files. The redo log files on the shared storage are used for instance recovery.
PCIe SSD storage are directly attached to the server. They have some very desirable features that benefit databases that require high performance and cost effective solutions. Until now, it was not possible to use server attached PCIe SSDs as RAC storage.
To achieve needs of Oracle Grid Instrastructure we use to build our complete infrastructure revolutionary technology from Virdent/HGST – FlashMAX III Cards with Sharing Software from Virident/HGST.
Using sharing software, we can achieve “sharing” PCIe Flash Card between remote server (so the cards are not need to be physically connected to server witch will be write/read on it).
For many years it was problem to use SSD Flash cards to build RAC cluster because of Capacity (to small size of Flash Cards/Disks) and most important any possibility for High Availability – If a server goes down, the storage is down as well.
This two limitation was resolved by Virident/HGST, size of cards vary 550 MB – 4.8TB and most important “sharing” PCIe cards between servers is done by Virident/HGST software called vShare.
Image of 2-node Cluster using vShare Software and Virident/HGST Cards
Main question which everyone can ask is how we achieve satisfactory bandwidth between two (or more) server to not downgrade available PCIe Flash Cards Performance.
To achieve needed performance, we will use InfiniBand network QDR/FDR – we will have big amount of bandwih and lowest possible latency.
Configuration of ASM Disk Group
- Servers 1 and 2 have FlashMAX PCIe cards which are configured with ASM normal redundancy.
- ASM handles the consistency of data across the two PCIe SSDs by means of ASM mirroring.
- Diskgroup consists of logical shareable disks across the server. In our case, each FlashMAX SSD is exposed as a disk.
- Each Diskgroup has two failure groups, FG1 and FG2, with one on each node. So all data in server1:FG1 is also available in server2:FG2. Oracle ASM ensures that at least two copies of data are present, one on each failure group. This way, should a server go down (i.e. a failure group), there is no impact on data availability
- Each failure group can contain one or more FlashMAX PCIe SSDs. The data will be evenly distributed across all the SSDs in a single failgroup.
- Both servers can access the failure groups for Reads and Writes.
- ASM Preferred Read is set up for node affinity. So all reads from server1 will only access data from FG1 and reads from server2 will access data from FG2 through the direct high speed PCIe bus which is attached to the server. Each failure group has mirror copy of data.
- Writes to server1 is written directly to FG1 through the PCIe bus and to FG2 through a high speed InfiniBand interconnect.
- When is needed we add quorum disk (i.e. from NFS Server).
Implementation and results:
Check the InfiniBand Cards and InfiniBand Connections:
CA 'mlx5_0' CA type: MT4113 Number of ports: 2 Firmware version: 10.10.1010 Hardware version: 0 Node GUID: 0x24be05ffffa646d0 System image GUID: 0x24be05ffffa646d0 Port 1: State: Active Physical state: LinkUp Rate: 56 Base lid: 1 3 LMC: 0 SM lid: 1 Capability mask: 0x06514848 Port GUID: 0x24be05ffffa646d0 Link layer: InfiniBand Port 2: State: Active Physical state: LinkUp Rate: 56 Base lid: 4 LMC: 0 SM lid: 4 Capability mask: 0x0651484a Port GUID: 0x24be05ffffa646d8 Link layer: InfiniBand
vShare software Installation
kmod-‐vgc-‐3.10.0-‐123.el7.x86_64-‐4.1.3-‐77760.C9C.x86_64.rpm vgc-‐rdma-‐3.10.0-‐123.el7.x86_64-‐2.1.VS 19694.14a19f5.Bormio.release.x86_64 vgc-‐oratools-‐2.1.VS-‐19694.14a19f5.Bormio.release.x86_64.rpm vgc-‐utils-‐2.1.VS-‐19694.14a19f5.Bormio.release.x86_64.rpm
[cluster] name: prod_cluster [vShare:vshare_prod1_a] host: prod1 backing-‐dev: /dev/vgca0 size: 923 initiators: prod2 [vShare:vshare_prod2_a] host: prod2 backing-‐dev: /dev/vgca0 size: 923 initiators: prod1 [ib:path1] prod1: 0x24be05ffffa646f0 prod2: 0x24be05ffffa646d0 [ib:path2] prod1: 0x24be05ffffa646f8 prod2: 0x24be05ffffa646d8
Check proper InfiniBand configuration
prod1: GUIDS="0x24be05ffffa646f0 0x24be05ffffa646f8" prod2: GUIDS="0x24be05ffffa646d0 0x24be05ffffa646d8"
ATTRIBUTE 'compatible.asm' = '184.108.40.206', 'compatible.rdbms' = '12.1' alter system set asm_preferred_read_failure_groups=’FG1' sid='+ASM1'; alter system set asm_preferred_read_failure_groups='FG2' sid='+ASM2';
Flash Card Status:
vgc-monitor vgc-monitor: Virident Cluster Solutions 2.1.19690.a738097.Bormio.release Driver Uptime: 31 days 1:45 Card Name Num Partitions Card Type Status vgca 1 VBL-M2-LP-1100-2B Good Partitions Usable Capacity RAID FMC vgca0 923 GB enabled enabled
After complete installation of Oracle Grid Inftrastructure and Oracle Databases Software and migrate all databases to new Oracle RAC Cluster, we check I/O performance using dbms_resource_manager.calibrate_io package.
Performance results are show bellow:
max_iops = 840027 latency = 0 max_mbps = 9898
After migration all database to new Oracle RAC cluster we compare performance of new RAC cluster to old RAC cluster in Oracle Database batch processing and reporting.
New cluster with Virident/HGST Flash cards achieve 425% of performance boost compare to old one.