2019-05-18 | Adam Boliński

DB Backup Appliance DbUp – Part I

DB Backup Appliance DbUp – Part I

 

Hello this is first of series of blogs regarding my work on DB Backup Appliance.

Purpose of this is to create No Data Lost Backup Appliance similar to ZDLRA (but with limited capabilities), main requirements is to have  Backup Appliance with every committed transaction by Primary/Production DB.

This series of posts won’t be describing how this Appliance is designed and how it works under the hood but I will be presenting how it is working from user perspective. 

Production Environment

First question which I get from my clients is the question, how this DB Backup Appliance affects  Production Database.

To show differences between working with DbUp and without I created test environment with DbUp and without DbUp connected, testing tool will be SLOB as always.

Testing Environment as follows :

Slob.conf


UPDATE_PCT=35
SCAN_PCT=0
RUN_TIME=300
WORK_LOOP=0
SCALE=80M
SCAN_TABLE_SZ=1M
WORK_UNIT=512
REDO_STRESS=LITE

 

Test was done using 64 and 128 users scenario. 

Before I will present the results just few info about production Environment , DB Files located on cheap NVMe device Intel P3520 (rather slow among NVMe devices), redo logs are located on separate disk SSD WD Blue, we have 128GB memory and i9 Skylake-X Processor.

Results without DbUp Connected to Production Environment we will start from the storage perspective so here we have Block IO Latency in the time of testing using SLOB tool.

Results without DbUp Appliance Connected


This latency measurement was generated using eBPF functionality 

 

Let’s check out AWR results after SLOB tests 

Load Profile

  Per Second Per Transaction Per Exec Per Call
DB Time(s): 63.6 0.8 0.27 26.68
DB CPU(s): 1.8 0.0 0.01 0.74
Background CPU(s): 0.7 0.0 0.00 0.00
Redo size (bytes): 15,096,649.0 183,375.3    
Logical read (blocks): 151,595.3 1,841.4    
Block changes: 103,222.3 1,253.8    
Physical read (blocks): 120,001.1 1,457.6    
Physical write (blocks): 64,004.4 777.5    
Read IO requests: 120,001.1 1,457.6    
Write IO requests: 62,348.2 757.3    
Read IO (MB): 937.5 11.4    
Write IO (MB): 500.0 6.1    
IM scan rows: 0.0 0.0    
Session Logical Read IM: 0.0 0.0    
User calls: 2.4 0.0    
Parses (SQL): 3.7 0.1    
Hard parses (SQL): 0.1 0.0    
SQL Work Area (MB): 0.6 0.0    
Logons: 0.0 0.0    
User logons: 0.0 0.0    
Executes (SQL): 238.2 2.9    
Rollbacks: 0.0 0.0    
Transactions: 82.3      

Background Wait Events

EventWaits%Time -outsTotal Wait Time (s)Avg waitWaits /txn% bg time
db file parallel write3,536,9630458129.45us142.1747.01
log file parallel write84,99402643.11ms3.4227.15
LGWR any worker group20,4740522.54ms0.825.33
db file async I/O submit525,16304687.86us21.114.74
LGWR worker group ordering180013.03ms0.010.06

Top 10 Foreground Events by Total Wait Time

EventWaitsTotal Wait Time (sec)Avg Wait% DB timeWait Class
db file sequential read36,168,52118.8K519.25us97.8User I/O
DB CPU 534.5 2.8 
log file switch (private strand flush incomplete)89.67.06ms.0Configuration
read by other session1,053.5453.52us.0User I/O
row cache mutex108.43.95ms.0Concurrency
buffer busy waits2,890.266.69us.0Concurrency
library cache: mutex X189.2816.24us.0Concurrency
log file sync35.11.95ms.0Commit
undo segment extension508.79ms.0Configuration
control file sequential read1760230.95us.0System I/O

Ok so let’s check what figures we get when we connect DbUp Appliance to Production Database.

Results with DbUp Appliance Connected

Results of Block IO Latency Distribution.

Please take a look that number of counts are greater then before , but we monitor BIO latency 3 times longer then before. Most important is % distribution between usecs counts.


Let’s check it out AWR results

Load Profile

Per SecondPer TransactionPer ExecPer Call
DB Time(s):63.40.80.2425.68
DB CPU(s):1.80.00.010.74
Background CPU(s):0.70.00.000.00
Redo size (bytes):16,211,189.1192,260.8  
Logical read (blocks):148,449.61,760.6  
Block changes:102,062.11,210.4  
Physical read (blocks):121,683.11,443.1  
Physical write (blocks):61,994.9735.2  
Read IO requests:121,678.71,443.1  
Write IO requests:59,703.4708.1  
Read IO (MB):950.711.3  
Write IO (MB):484.35.7  
IM scan rows:0.00.0  
Session Logical Read IM:0.00.0  
User calls:2.50.0  
Parses (SQL):12.70.2  
Hard parses (SQL):1.90.0  
SQL Work Area (MB):0.70.0  
Logons:0.10.0  
User logons:0.00.0  
Executes (SQL):268.53.2  
Rollbacks:0.00.0  
Transactions:84.3   

Background Wait Events

EventWaits%Time -outsTotal Wait Time (s)Avg waitWaits /txn% bg time
db file parallel write3,536,9630458129.45us142.1747.01
log file parallel write84,99402643.11ms3.4227.15
LGWR any worker group20,4740522.54ms0.825.33
db file async I/O submit525,16304687.86us21.114.74
LGWR worker group ordering180013.03ms0.010.06
enq: CR – block range reuse ckpt79801627.72us0.030.05
control file sequential read73000603.10us0.030.05
control file parallel write19900807.65us0.010.02
db file sequential read14800610.74us0.010.01
LGWR all worker groups39002.10ms0.000.01
latch free45700119.52us0.020.01
LGWR wait for redo copy9910035.89us0.040.00
reliable message7980031.38us0.030.00
direct path write12001.69ms0.000.00

Top 10 Foreground Events by Total Wait Time

EventWaitsTotal Wait Time (sec)Avg Wait% DB timeWait Class
db file sequential read36,168,52118.8K519.25us97.8User I/O
DB CPU 534.5 2.8 
log file switch (private strand flush incomplete)89.67.06ms.0Configuration
read by other session1,053.5453.52us.0User I/O
row cache mutex108.43.95ms.0Concurrency
buffer busy waits2,890.266.69us.0Concurrency
library cache: mutex X189.2816.24us.0Concurrency
log file sync35.11.95ms.0Commit
undo segment extension508.79ms.0Configuration
control file sequential read1760230.95us.0System I/O

As you can see there is no difference between this to results in terms of latency and Wait Events.

Based on this results we can ensure that DbUp Appliance do not affect Production Environment.