Highlighted
Trusted Contributor.
Trusted Contributor.
289 views

Relativity Linux - SQL Performance Issue -- COUNT(*) , distinct and GROUP BY

Jump to solution

I am doing the POC of Relativity on RM/COBOL Native File System. I am trying to use SQL using JDBC/ODBC connection [DBeaver/Microsoft Query]. File has 50M records. I noticed that GROUP BY, Distinct and COUNT on Primary Key is taking more than 10 mins. 

 

BackEnd file is Not marked for Recovery. Query Plan Viewer doesnot show any suspicious. 

I maybe missing some setup. I am new to Relativity. Can you please advise /guide ? 

 

0 Likes
1 Solution

Accepted Solutions
Highlighted
Knowledge Partner
Knowledge Partner

One thing to consider is the fact that Relativity DataServer (which is what you are using on the Linux server) sends back results in packages of a few rows at a time (few being relative to 50M).

The queries that take a long time are doing full table scans to produce the initial row(s) in the result set.   That means you are reading 50M records to do, for example, a count(*).

A query 'without a function' can produce the initial rows of the result set as soon as enough records have been read to create the rows in the first package of rows being sent back to the client.   Because the SQL client displays these rows without waiting for the entire result set to arrive, the response is visually 'quick', despite the fact that the server will have to do a full table scan should the client continue to consume rows and demand the entire result set.

Depending on the 'function' being used in the query, you may be forcing the SQL engine to do a sort on the data before producing initial rows.  DISTINCT and GROUP BY would, in general, require a sort to create the desired result set.  So, not only are you forcing a full table scan (potentially - depends on the SELECT criteria), you are also forcing a sort of the selected rows before the initial row(s) in the result set can be produced and sent back to the client.

As a final note, one might think that COUNT(*) should not require a full table scan.  However, consider the shared file nature of the underlying file store.  Even though Relativity stores cardinality (information (essentially number of rows produced in a full table scan) for the purpose of query optimization, a full table scan is still required to give an accurate count of rows, because a COBOL program may have added or deleted records.

Any relational database system will appear to have slow performance if the queries it processes require full table scans of 50M records (though with huge memory now available some will cache that many records to speed up the process).  Query optimization tries to avoid full table scans if at all possible.  In a more normal production environment where users are finding result sets that constrain the data needed to only a few rows that can be found using indexing, Relativity - just like other relational systems - will perform just fine.

One test on your concept of performance might be to ask yourself, "How would I do this particular query using COBOL?"  If you find yourself lacking indices in the files to look up records, you will see the difficulty faced by the query optimization process.


Tom Morrison
Consultant

View solution in original post

0 Likes
12 Replies
Highlighted
Knowledge Partner
Knowledge Partner

Your problem may be Microsoft Query.  From my own experience, I have seen Microsoft Query attempt to do its own optimization - usually with very poor results, because it has a high likelihood of demanding a full table scan.  

It appears that DBeaver has its own JDBC capability, so I would first choose to use the JDBC driver supplied with Relativity.

You can test the ODBC connection directly using Microsoft's ODBC Test.  Progress has made ODBC Test available here.  Before making conclusions about Relativity performance, make sure the the actual ODBC query has the performance issue.

Finally, please give specific SQL statement(s) that exhibit the performance problem, annotating which SQL columns contain all/part of the COBOL index keys.


Tom Morrison
Consultant

0 Likes
Highlighted
Trusted Contributor.
Trusted Contributor.

Thanks Tom.

I tested via ODBCTest. Query with Functions like DISTINCT/COUNT are taking same time as in DBeaver/Microsoft Query whereas Query without function are working fast.

 

Relativity has 

> 1 Composite Cluster Index [corresponding to Primary Key of RM Cobol file]. 3 columns. Unique Value

> 1 Cluster Indexes [corresponding to Composite Split Key RM Cobol file].3 Columns. Non-Unique

> 1 Cluster Indexes [corresponding to Composite Split Key RM Cobol file].2 Columns. Non-Unique

> 1 Cluster Index  [corresponding to Composite Split Key RM Cobol file].1 Columns. Non-Unique

File has about 90 fields. 50M+ records.

Query used is select count(*) from table1

 

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

One thing to consider is the fact that Relativity DataServer (which is what you are using on the Linux server) sends back results in packages of a few rows at a time (few being relative to 50M).

The queries that take a long time are doing full table scans to produce the initial row(s) in the result set.   That means you are reading 50M records to do, for example, a count(*).

A query 'without a function' can produce the initial rows of the result set as soon as enough records have been read to create the rows in the first package of rows being sent back to the client.   Because the SQL client displays these rows without waiting for the entire result set to arrive, the response is visually 'quick', despite the fact that the server will have to do a full table scan should the client continue to consume rows and demand the entire result set.

Depending on the 'function' being used in the query, you may be forcing the SQL engine to do a sort on the data before producing initial rows.  DISTINCT and GROUP BY would, in general, require a sort to create the desired result set.  So, not only are you forcing a full table scan (potentially - depends on the SELECT criteria), you are also forcing a sort of the selected rows before the initial row(s) in the result set can be produced and sent back to the client.

As a final note, one might think that COUNT(*) should not require a full table scan.  However, consider the shared file nature of the underlying file store.  Even though Relativity stores cardinality (information (essentially number of rows produced in a full table scan) for the purpose of query optimization, a full table scan is still required to give an accurate count of rows, because a COBOL program may have added or deleted records.

Any relational database system will appear to have slow performance if the queries it processes require full table scans of 50M records (though with huge memory now available some will cache that many records to speed up the process).  Query optimization tries to avoid full table scans if at all possible.  In a more normal production environment where users are finding result sets that constrain the data needed to only a few rows that can be found using indexing, Relativity - just like other relational systems - will perform just fine.

One test on your concept of performance might be to ask yourself, "How would I do this particular query using COBOL?"  If you find yourself lacking indices in the files to look up records, you will see the difficulty faced by the query optimization process.


Tom Morrison
Consultant

View solution in original post

0 Likes
Highlighted
Trusted Contributor.
Trusted Contributor.

Thanks Tom.

Actual query is using Table scan. I used Query Plan Viewer. Data Source is not using indexes with count(*).

Apologies for my ignorance.

Does it mean we have to live with slow performance if we use Relativity ?  I tried creating a table with 3 columns from 89 column Table below. All 3 columns are indexed and part of Primary Key. Still given bad performance. 

Is there any option to reduce sort period or something that we can do ?

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

You keep using the term 'slow performance' but fail to explain your metric for performance.  Is it simply time for the first row(s) to appear in the SQL client?  And, do the queries you are studying realistically represent the queries you will be using in your application.  

count(*) doesn't really require using an index.  You are going to do a full table scan.  

A word about the query plan viewer.  What you are viewing is the SQL engine's plan, but it does not always show optimizations that are provided by the underlying file access system.

You eliminate the need to sort the result set (on GROUP BY, for example) if you have an index that provides the required sort order.   So, some questions:
    - Is there redundant data in your keys?  For example, it is not unusual for data to be replicated in various keys.  Use the redundant data feature in the Designer.  I do note that you state split keys for your alternate keys, so this may not apply.
    - Are your GROUP BY columns in the same order from most significant to least as in the file key(s), without gaps?  
    - Is every character position in your key(s) included in your column definitions?

There is extensive information in the Designer help file regarding things that can help performance.


Tom Morrison
Consultant

0 Likes
Highlighted
Trusted Contributor.
Trusted Contributor.

Thanks Tom - 

I tried to use option CONVERT INVALID NUMERIC TO NULL in desginer on key fields. Seems I end up loosing indexes so seeing below problem.

Apologies for late reply. I kept on reading your email multiple times. 

Finally, I am able to realize meaning of your statement  "How would I do this particular query using COBOL?"  .

Post realization 🙂 able to create queries those are efficient. 

Understanding I build is :

> If  my query should involve and within indexes , it should give me good response time.

> Before writing SQL, I am thinking how each sql is converted into Cobol statement [ Start with Key option].

I hope I am correct path now.

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

I am happy that you are having some success.

I strongly suggest that you do not use 'invalid numeric are NULL' in the key fields (use it only on non-key fields).  This creates a problem in that NULL in indexes have to group all at the beginning or end of the sort order if the index is to be useful.  Having NULL pop up randomly (when an invalid numeric happens) destroys the ability to use a key for predictable results (in the SQL sense).

This leads me to another question:
          Why do you have invalid numerics in your key fields?

I am suspecting that there might be a reason you are having invalid numerics in key fields.  Could you explain?


Tom Morrison
Consultant

0 Likes
Highlighted
Trusted Contributor.
Trusted Contributor.

reg.

"This leads me to another question:
          Why do you have invalid numerics in your key fields?

I am suspecting that there might be a reason you are having invalid numeric in key fields.  Could you explain?"

There is bad data accumulated over the period of time due to coding issue for certain period. I used TABLE FILTER to remove invalid numeric data from Table. Will be working to remove those from source as well. [Our files has only 1 Record Definition or Layout]

I have one question - [ donot know whether relevant to current discussion]

I am planning to connect Relativity to Web-browser so that User can Maintain [Add/Delete/Modify] or View one record or few records [not more than 100] . Also, code on web-Browser will have all the data integrity rules in place. I hope Relativity can be used for this purpose.

 

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

I hope Relativity can be used for this purpose.

As a default, Relativity creates tables that are read only.  The reason for this is that, in a typical use, the business rules (i.e. data integrity) reside in the COBOL application.

However, you may use the Designer(*) to write enable any tables you wish, subject to the demands of the underlying data structure.  In particular, OCCURS and redefinition require special attention.  There are topics in the Designer help file that are very useful to building tables that support UPDATE and INSERT.  There is also a wizard available to assist you in finding issues that require attention.

Relativity also supports GRANT/REVOKE permissions.

So, in other words, yes.  Relativity is a real database.  There are special issues when the underlying COBOL data structure is 'hostile' to relational database concepts, and these issues appear in UPDATE/INSERT.  But this really has minimal impact on the SQL application.

I would suggest that you start a new topic(s) for any questions that pertain to write-enabling.

(*) A new Relativity Designer is now being shipped.  Statements that I make regarding Designer capabilities pertain to the 'old' Designer, which I continue to use.  I have already discovered one capability that is not supported by the 'new' Designer - the ability to export CREATE TABLE DDL.  There may be other features that were dropped.  I am sure that new features were added.


Tom Morrison
Consultant

0 Likes
Highlighted
Trusted Contributor.
Trusted Contributor.

Thanks Tom. This helped. Will raise separate ticket for write-enabling.

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

One final thing.  I need to add another source of information that I discovered just yesterday in my work for a client.

There is a topic in the data client help file (relcs32.chm on my installation) titled "Relativity Indexes (Advanced)" that explains how Relativity produces indexes from the keys in the file.  It also has a good technical description of the issue of allowing NULLs to be produced in the indexes. 

(This help file also clearly defines the SQL language that is supported by Relativity, including the scalar functions supported.  That's what I was looking for when I came across the index topic.)

----

I truly appreciate that you stayed with the process to investigate and solve your issue.  You clearly did quite a bit of work to determine the NULL issue and especially your inventive way of using table filters to avoid the records containing bad data.


Tom Morrison
Consultant

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.