This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Ignore errors on Relativity query?

I've been scouring through the forums looking for information on this. We have a lot of Relativity tables and most of them work great without any errors. Sometimes we've run into problems like:

Numeric item in key contains nonnumeric characters
Numeric value out of range
Numeric item contains nonnumeric characters (this one happens on dates too!)

This causes problems as I'm trying to SELECT * on a table and when an error happens, the ENTIRE query is halted (Relativity server on linux, connecting via unixODBC). I found a great post by Tom who explains how you can use the Relativity designer to create a function f(x) IsNumeric() or IsValidDate() to get around some of these errors.

This has been a great solution so far - because while we intend to fix the programs that allow this bad data to get into the data files, we also really need something that can just query a table and get very row possible, excluding errors. Again, the programs that introduce these errors should and will be fixed, but in the meantime we still need to be able to access this data through Relativity despite the errors.

In the same post he mentions this:


There is also a setting on the data source that can tell the Relativity SQL engine to produce NULL on all invalid numerics that do not appear in record keys.  In my opinion it is better to use my first suggestion to address individual issues of non-numeric data, rather than use the data source setting.  But if you have a lot of bad data in many fields, then this setting may be useful.

How can I access this setting, maybe through the odbc.ini settings (we have the relativity server on a linux box) that would allows us to skip all rows with errors instead of throwing an error and halting the query forever?

Thank you so much in advance!

  • 0

    I would like to add - we've looked into using Filters as well and those work great too, albeit sometimes they can bug out when trying to perform the "opposite" query, which helps us easily find and address bad data. I am super curious if we can handle it all with one blanket setting, which can be used for my program, and then disabled when needed to help us catch errors

  • Verified Answer

    +1

    Note: I am using the classic Designer, not the newer Designer.  I think the classic Designer is shipped, but not installed, on current release media.  You can ask Support about this.  One hopes the following has a close analog in the 'new' Designer.

    With the data source open in the Designer, use the Tools/Options menu option.  On the left side, select "Data Source".  You will see four options dealing with invalid numeric and/or date data; set these as you see fit.

    You might ask, "Why 'except for Record Key Data...'?"  When you apply these 'adjustments' to items in the record key, the SQL engine must assume that the table index involving the record key data item is unordered.  This does really nasty things to query optimization, and typically reduces any operation on the table to a full table scan.  (Table indexes built solely on record key data items that are not numeric or dates will be unaffected.)

    Since you are working on data cleansing, the lack of optimization may not bother you.  But I can assure you that you don't want to do this in a production environment unless you can put up with users complaining about performanceGrimacing.

  • 0 in reply to 

    I am a bit confused by this added question.

    I am not sure what the 'opposite query' might be.  Perhaps the logical not of the filter expressions?

    You can certainly define multiple (very similar) tables on the same record type, but using different filter expression(s).

    Perhaps you can provide more detail on what you are trying to achieve.  Start with the '20000 foot view' and we can try to zero in on a solution.

  • 0 in reply to 

    So the "AND NOT" expression (AND NOT NUMBER-IS-BAD) shows us all of the "good" rows, likewise, "AND" (AND NUMBER-IS-BAD) is a neat shortcut used to see which rows are the ones with problems.

    What we are doing at the moment is:

    SELECT * FROM (TABLENAME)

    In order to grab all of these rows and send them to a MySQL table. The reason for this redundant table as I understand it is:

    1. MySQL is currently querying faster than against the Relativity catalog.

    We've been learning Relativity along the way so I recognize this "middleman" table was created out of necessity because we didn't know how to handle a lot of these problems until recent

    Hopefully this is helpful, thank you for your time thus far

  • Verified Answer

    +1 in reply to 

    Ok, so you are building a MySQL data warehouse.  I have a client that does this on a nightly basis.  A couple of observations...

    First, MySQL offers several advantages over ODBC access.  The SQL engine benefits from the huge investment of the open source community as well as Oracle.  The toolsets built around MySQL are far more extensive.  And, MySQL access across the internet is  quite a bit easier.  

    Second, you might as well set the 'treat invalid as NULL' on record key items, because all you are doing are full table scans (SELECT * FROM table).  This will maximize the data being returned, and where there is invalid numeric/date data, you will get NULL.  In this application, where query optimization is not an issue, you really don't care about the table indexes Relativity creates.

    In my client's application, the 'data pull' part of the app does the full table scan on the Relativity data and stores the data in XML formatted files for later use by the 'data push' into the MySQL database (using LOAD XML LOCAL INFILE).  The data warehouse is rebuilt in its entirety every 24 hours.  

  • 0 in reply to 

    Hi!

    This is exactly it and we're almost finished with it all. Your advice has been thorough and helpful.

    I'm encountering this final error - we've gotten through all 150+ tables totally fine except for this:

    SQLSTATE[22003]: Numeric value out of range: [Micro Focus][Relativity for RM]Numeric value out of range.

    Normally the error message will supply the column name of the bad record, but it doesn't. Thankfully, we've found it to be a  S9(4)V99 PACKED DECIMAL. That data type seems exactly correct according to the COBOL file, and I've even peeked into the records manually to see if there's anything weird going on. Sadly, the data looks completely correct which is putting me at a loss.

    When Relativity is pulling the data via the client, it just says:

    ...

    21.27

    20.19

    ERROR! (1)

    0

    0

    ...

    And happily chugs along, whereas when I pull on Linux it stops the SELECT statement prematurely.

    Have you encountered anything like this or possibly have a solution?

  • 0 in reply to 

    Indeed I have encountered this.

    Consider the actual storage layout.  The PIC describes 6 4-bit digits and a 4-bit sign.  The actual storage allocated is rounded up to an 8-bit boundary.  So there is room for an additional digit.

    RM/COBOL has settings which allow storage of numbers with more significant digits than are described in the PIC.  (These are there to allow backward compatibility for applications ported to RM/COBOL from minicomputers where the actual picture digits were not considered.)  Relativity is using the PIC and is probably finding a nonzero value in the 'extra' digit - a number out of range for the 4 digits to the left of the decimal.

    My initial workaround would be to change the PIC to S9(5)V99.

    If that works - great.  If not, I used another Relativity 'hack' which might work here.

  • 0 in reply to 

    Michael, have you been able to work around the out of range error?

    You say you are on Linux.  Are you scripting in PHP?  If so, the reason the query 'shuts down' is due to the PHP ODBC interface layer getting an ODBC error that is not handled by your code.  To handle the error, you will need to use the more basic interface methods (e.g. odbc_result) rather than what I call the 'convenience' methods (odbc_fetch_array).  By getting down to the basic row and column methods, you can isolate the error.  This is how the Relativity client can give you the ERROR! (1) message.  

  • 0 in reply to 

    Hi Tom,

    So - we did find that the field had changed sizes at some point during development, and yes, it needs to turn into a S9(5)V99. We're having big trouble trying to get the Designer to accept merging the change from source, which I can't describe because I personally haven't been the developer messing with the catalog. I can say for sure that when we use the Classic Designer and Merge the COBOL FD from source, we get this: "Catalog invalid: Item has a duplicate presentation sequence"

    To answer another question: I am indeed using PHP. This is what I found during my development:

    1. I found that using the odbc_ set of functions with PHP did not really work that well: I tried really hard to get errors to come out of odbc_ functions. No try-catch exceptions, no errors from odbc_error() or odbc_errormsg(), nothing.
    2. I eventually switched to the PDO way of doing things in PHP and was happy to find that errors threw exceptions as expected! (although - once we encounter an exception, there is no way to continue loading rows past the exception. which is fair)
    3. After seeing your reply, I did try testing the odbc_ functions again, using odbc_result. What I saw was similar to my previous testing results. I have included the quick snippet of code I wrote so you can be sure that I hopefully wrote the code using odbc_result the way you expected

    <?php
    
    $conn = odbc_connect(); // CENSORED FOR POSTING ONLINE
    $sql = "SELECT * FROM TABLENAME"; // CENSORED
    
    $result = odbc_exec($conn, $sql);
    $cols = odbc_num_fields($result);
    $numrows = 0;
    while (odbc_fetch_row($result)) {
    
        for ($i = 1; $i <= $cols; $i++) {
            echo odbc_result($result, $i) . ",";
        }
        echo "\n";
    
        $numrows++;
    }
    
    echo odbc_errormsg() . "\n"; // STRING IS EMPTY DESPITE ERROR
    echo "Done. {$numrows} rows found!\n"; // NUMROWS IS NOT ACCURATE TO TOTAL NUMBER OF ROWS (PROCESSING WAS HALTED UPON ERROR)

    We are very glad for your help - we've gone down from 30+ tables with errors to just these last 2 all thanks to you SmirkThumbsup

  • 0 in reply to 

    First, to the issue of changing to s9(5)v99:

    I wouldn't try to change the original COBOL source.

    Instead use the Designer to create a redefinition of the original s9(4)v99 item.  At least in the classic designer, you should be able to create an item that is signed  packed-decimal with 7 digits and a scale of -2.   (The concept is that of a COBOL REDEFINES.  I am not where I can try the new Designer but it certainly must have the same capability.  This would obviate the need to merge a change into the existing catalog.)  Once you have the redefinition, delete the table column built on the old data item and rebuild it using the new data item.

    [Concerning the presentation sequence error... The presentation sequence is a metadata value by which the Classic Designer remembers the source order.  If the new Designer does not use that scheme, it may not be maintaining the presentation sequence values.  I will report this to support, but I doubt it will have a very high priority.]

    Now regarding PHP script error processing, I would have to experiment with various techniques.  I will also go back and review the data pull script my client is using.