Performance improvements due to better NSS disk storage allocation policies with OES 2015 SP1

0 Likes

Beginning with the OES 2015 SP1 release, the NSS file system has improved disk storage allocation policies. This improvement in disk storage allocation is referred to as delayed block allocation.

Delayed block allocation improves the file system performance and reduces the file fragmentation by effectively writing larger amounts of data at a time. Delayed block allocation allows the aggregation of sequential file blocks before writing them to the disk. The aggregation of sequential file blocks allows multiple blocks to be allocated as a single extent (set of contiguous disk blocks) instead of separate disk blocks. By default, this feature is enabled.

For more information about the file fragmentation, see

https://en.wikipedia.org/wiki/File_system_fragmentation

The benefits of delayed block allocation are:

    • It combines the write requests and allocates the extents in large chunks so that the number of extents allocated to the files are less. It means more contiguous blocks are allocated

 

    • Read performance is improved as data is written in more contiguous blocks so it minimizes the rotational and seek latencies involved in the movement of disk head in a rotational disk

 

    • Write performance is improved by reducing the writes of NSS metadata such as journal, file map, free tree etc, as it now updates each of these metadata for all aggregated sequential file blocks instead of individual user file block writes



The following use cases might be benefited by the aforementioned capabilities of delayed block allocation.

    • File server use case where multiple users are accessing the file system and there are multiple write requests coming at the same time. In case of traditional block allocation, blocks were allocated as and when the write request comes, but delayed block allocation tries to allocate as many contiguous blocks or large extents by waiting for a certain period of time. It also helps in keeping allocations contiguous when there are several files growing at the same time and thereby improves the access time. File server users can notice the difference when they open and read those files.

 

    • Full and incremental backup performance is improved as the number of allocated extents are less. It has direct impact on the backup window. Assume that the approximate data growth is around 200 GB per week and it takes around 12 hours to finish that incremental backup, but now with delayed block allocation it might take around 2 hours to finish the same job. This calculation is based on the throughput observed by the test results shown below for the ‘Backup Performance test



I conducted the following tests to see the performance benefits offered by the delayed block allocation feature for the aforementioned use cases. Tests were performed with and without Delayed Block Allocation (DBA).

    1. File server use case - read and write test from multiple CIFS connections



Test Information:

Around 590 Novell CIFS connections were used for this test. Each connection mapped the NSS volume and performed continuous write operations on different size of files. This test created around ~230GB data. The created data is read by the same number of connections after the completion of write test. Java programs were used for this test.

File sizes - 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, and 2MB

Record size – 4KB

Number of files per connection – 1000

Test Results:

Write Test – Response Time Without DBA With DBA % Change w.r.t Without DBA
Avg Open response time (ms) 36.69 8.86 -75.851
Avg Close response time (ms) 29.82 11.58 -61.167
Avg Create response time (ms) 66.31 58.17 -12.275
Avg Write response time (ms) 76.09 28.38 -62.702
Time taken for test completion (mins) 140 55 -60.714

 

Write Test - Throughput Without DBA With DBA % Change w.r.t Without DBA
Avg Write throughput (KB/sec) 53.07 142.42 168.362


It can be seen that with DBA write throughput is improved by 168% and write test is taking 60% less time for the completion when compared to without DBA.

Read Test – Response Time Without DBA With DBA % Change w.r.t Without DBA
Avg Open response time (ms) 2720.65 498.59 -81.673
Avg Close response time (ms) 2.71 2.51 -7.38
Avg Read response time (ms) 196.91 39.16 -80.112
Time taken for test completion (mins) 690 135 -80.434

 

Read Test – Throughput Without DBA With DBA % Change w.r.t Without DBA
Avg Read throughput (KB/sec) 20.24 101.79 402.915


It can be seen that with DBA read throughput is improved by 402% and read test is taking 80% less time for the completion when compared to without DBA.

The above data represents the average response time and throughput seen by the individual connection when around 590 connections are performing operations simultaneously.

    1. Backup Performance test



Test Information:

In this test, the local backup performance is measured using the tool called TSAtest which is available as part of Open Enterprise Server. Firstly, around ~30 GB data was created for 590 Novell CIFS connections where each connection created around 50 files of 1MB size. The created data is then backed up by using the single instance of TSAtest tool to measure the backup performance

Test Results:

Backup Test Without DBA With DBA % Change w.r.t Without DBA
Total Files 29,451 29,101  
Total Extents 74,82,988 67,425  
Backup Throughput (MB/min) 272.45 1572.13 477.034
Time taken for test completion 103 18 -82.524


Where extents is a bunch of contiguous blocks, which were allocated for the created files.

These test results proves that significant performance improvements are observed for both the use cases.

    1. File server

 

    1. Backup performance



Please note that the above test results were observed under lab conditions and results may vary based on the test, hardware or network configuration. The number of connections, file access protocol, type of test, and number of operations also plays a crucial role in defining the test results.

The following are the hardware details for server and clients that were part of above tests:

Server Configuration:

RAM size - 16151420 kB

CPU information

Architecture:          x86_64

CPU op-mode(s):        32-bit, 64-bit

Byte Order:            Little Endian

CPU(s):                8

On-line CPU(s) list:   0-7

Thread(s) per core:    2

Core(s) per socket:    4

Socket(s):             1

NUMA node(s):          1

Vendor ID:             GenuineIntel

CPU family:            6

Model:                 58

Stepping:              9

CPU MHz:               3101.000

BogoMIPS:              6185.94

Virtualization:        VT-x

L1d cache:             32K

L1i cache:             32K

L2 cache:              256K

L3 cache:              8192K

NUMA node0 CPU(s):     0-7

Client Configuration:

Version – Windows 7 Enterprise SP1 32 bit

RAM size – 2GB

Labels:

How To-Best Practice
Comment List
Anonymous
Parents Comment Children
No Data
Related Discussions
Recommended