Our vBulletin migration is complete.
Welcome vBulletin users! All content and user information from the Micro Focus Forums (vBulletin) site has been migrated to this site. READ MORE.
Anonymous_User Absent Member.
Absent Member.
1364 views

Matching analysis is very slow on large dataset


Hi,

I'm doing a matching analysis on 30K+ users based on the matching key
uid+fullname

Its taking me about 2 days to finish the match. This speed doesn't
seems quite right to me

Is this normal?


--
anja98
------------------------------------------------------------------------
anja98's Profile: https://forums.netiq.com/member.php?userid=1121
View this thread: https://forums.netiq.com/showthread.php?t=2952

Labels (1)
0 Likes
16 Replies
jtl1 Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset

Hello,

I don't know whats normal but I've stopped using Analyzer and switched back to old fashion tools (grep, sed, awk, excel...).
From my point of view the performance is not acceptable.

Best regards,
Tobias

On 2012-09-06 07:54, anja98 wrote:
>
> Hi,
>
> I'm doing a matching analysis on 30K+ users based on the matching key
> uid+fullname
>
> Its taking me about 2 days to finish the match. This speed doesn't
> seems quite right to me
>
> Is this normal?
>
>


0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Please try Analyzer 4.0.2, data import performance is improved to large
extent.

-KPRajesh


--
kprajesh
------------------------------------------------------------------------
kprajesh's Profile: https://forums.netiq.com/member.php?userid=333
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Which version are you using? There were some performance changes in 4.0
SP2 that improved the ability to scale to large sizes. Also, what are
the matches being done? That seems like too long for 30k matches, but
if you have regular expressions which are doing some weird things
performance can be negatively and unnecessarily impacted regardless of
the language or environment used.

Good luck.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQIcBAEBAgAGBQJQSISbAAoJEF+XTK08PnB5VlIP/31eTA5K+XBQWyZE7hFmz8Y8
06Ppg5d2t6ryvW16PpBTOEgLeGa81gs58oSokhkle2vQvU2dXqYgBsHYN92Vs12H
HS9VB4JDyd2knN/yc0puOiyj89xds+fC5SL2AjZdr5ICXjOSYRgDZ7nAFzK8dgBd
Qkbd2fhZtBEUaOlSKge+PqgZRXakze+MtoeFaZVWpiBzkgi1GDm5Mk5oqBngjkzX
8W2msnlIodthCf+5765a0KA5oc8rc1ucM/DMP38Pfqgdw+mqhXnx6tEN++VAZ+Bf
ygo7dXEK5X1ccstucSmAFiX99kLsjw7sz6lR4N0WLaBzGhkMDlDRsFsxtmZB9slb
eKsIjXKGbcMcMRBGySCpWt2wiiMZsACHCDkgClPsrB35yULM9hEx6zDUOP9FpSSR
XAClGOSManXT2vwIY2K8mzba1WNf8Cg8EVpXMAXACujdoruBB+A72cgatLhCbniM
/APBrBF4zNcFnrH0T6cSmLFph1LtiOeV0Jr+AY+0p9xg5H6XbBzY8FR6nd76KiLg
Pk+zT3aIO65KygIReMIX/9SpD0kkPst9GzmEqh4aAra+qcn0xHS3LFuXBguOjYTH
RqSHEKN0yuhmckd413XGZpBBR9wJkvid+ccTy/Fyn5AVJn3zyevO1fnqvpP65WxG
KAPdT+55ekkeje1CBMRo
=mA3h
-----END PGP SIGNATURE-----
0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


ab;12502 Wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Which version are you using? There were some performance changes in
> 4.0
> SP2 that improved the ability to scale to large sizes. Also, what are
> the matches being done? That seems like too long for 30k matches, but
> if you have regular expressions which are doing some weird things
> performance can be negatively and unnecessarily impacted regardless of
> the language or environment used.
>
> Good luck.
>


I'm using Analyzer 4.0.2. We are doing a simple matching profile based
on uid+fullname. We did not do the analysis profile so there are no
regex used.


--
anja98
------------------------------------------------------------------------
anja98's Profile: https://forums.netiq.com/member.php?userid=1121
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


I used Analyzer 4.02 for matching workforceID+manager info for 53k users
from different sources (eDir/LDAP Tivoli). After number of attempts with
"internal" database I switched to use "external" MySQL database instance
(default installation on same workstation).

With external MySQL process was finished in 40min.
This configuration is works much better and faster than working with
"internal" database, but it has "own" bugs and problems. 😞

Analyzer promised to be very powerful tool and too sad, that
development was frozen(?), number of bugs is huge and direction is not
clear...


--
al_b
------------------------------------------------------------------------
al_b's Profile: https://forums.netiq.com/member.php?userid=209
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Yes with Analyzer 4.0.2 performance improvement is the key change.

There are patches on 4.0.2 which are available as attachments on the
bugs listed below:
1) Bug*769713 - Unable to import flat file data if "First Row Contains
Field Names" is checked.
2) Bug*785113 - Can not delete any created Objects under a Connection
3) Bug*785583 - Cleaner Script only effects current page and not entire
dataset
4) 785130 - data merges on large datasets
5) 785859 - Slow dataset creation wizard - Determining Scope
6) [Bug 786370] Data browser - "Number of rows per page" setting not
being honored

In case you are facing any other issues part from these please do post
the details.


-KPRajesh


--
kprajesh
------------------------------------------------------------------------
kprajesh's Profile: https://forums.netiq.com/member.php?userid=333
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


kprajesh;216619 Wrote:
> Yes with Analyzer 4.0.2 performance improvement is the key change.
>
> There are patches on 4.0.2 which are available as attachments on the
> bugs listed below:
> 1) Bug*769713 - Unable to import flat file data if "First Row Contains
> Field Names" is checked.
> 2) Bug*785113 - Can not delete any created Objects under a Connection
> 3) Bug*785583 - Cleaner Script only effects current page and not entire
> dataset
> 4) 785130 - data merges on large datasets
> 5) 785859 - Slow dataset creation wizard - Determining Scope
> 6) [Bug 786370] Data browser - "Number of rows per page" setting not
> being honored
>
> In case you are facing any other issues part from these please do post
> the details.
>
>
> -KPRajesh


This is interesting! 🙂
I recognized all my bugs. 🙂


--
al_b
------------------------------------------------------------------------
al_b's Profile: https://forums.netiq.com/member.php?userid=209
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Nice to know that the above fixes were useful to you... 🙂
Do post if you are facing any new issues.

-KPRajesh


--
kprajesh
------------------------------------------------------------------------
kprajesh's Profile: https://forums.netiq.com/member.php?userid=333
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset

kprajesh wrote:

> Nice to know that the above fixes were useful to you... 🙂
> Do post if you are facing any new issues.


Nice to know those fixes *would* be useful, it the bugs they are attached to
were public...

--

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Lothar Haeger;216674 Wrote:
> kprajesh wrote:
>
> > Nice to know that the above fixes were useful to you... 🙂
> > Do post if you are facing any new issues.

>
> Nice to know those fixes *would* be useful, it the bugs they are
> attached to
> were public...
>
> --


It is interesting, but most of these bugs was created by Novell PSE
after my request and good news that most of the bugs was fixed!

1) Bug*769713 - Unable to import flat file data if "First Row Contains
Field Names" is checked. *founded way to avoid this bug*
2) Bug*785113 - Can not delete any created Objects under a Connection
-*fixed*
3) Bug*785583 - Cleaner Script only effects current page and not entire
dataset -*By design?*
4) 785130 - data merges on large datasets -*fixed*
5) 785859 - Slow dataset creation wizard - Determining Scope -*almost
fixed*
6) [Bug 786370] Data browser - "Number of rows per page" setting not
being honored -*fixed*

Now we will need some kind of official Analyzer patch: all these fixes
currently are big number of updated jar/packages/etc (in many cases it
is took more than 1 fix) and it's really hard to track what is actual
"correct" working version.


--
al_b
------------------------------------------------------------------------
al_b's Profile: https://forums.netiq.com/member.php?userid=209
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Yes, the behavior reported in "Bug 785583 - Cleaner Script only effects
current page and not entire dataset" is as per design.

I'll need to check and get back on the official Analyzer patch release.

-KPRajesh


--
kprajesh
------------------------------------------------------------------------
kprajesh's Profile: https://forums.netiq.com/member.php?userid=333
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


kprajesh;216778 Wrote:
> Yes, the behavior reported in "Bug 785583 - Cleaner Script only effects
> current page and not entire dataset" is as per design.
>
> I'll need to check and get back on the official Analyzer patch release.
>
> -KPRajesh


May be name of this context menu have to be changed from "*Apply Cleaner
Script to Column*" to "*Apply Cleaner Script to Column visible on
current page*"?

And also I would like to see 2 more related context menu:
1. Apply Cleaner Script to All information in Column.
2. Apply Cleaner Script to current record.


--
al_b
------------------------------------------------------------------------
al_b's Profile: https://forums.netiq.com/member.php?userid=209
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


Ok, will look at it.
Currently, you can achieve the same by increasing the number of rows per
page of the DataBrowser. The default setting of 1000 can be changed from
Analyzer > Windows > Preferences > DataBrowser.

-KPRajesh


--
kprajesh
------------------------------------------------------------------------
kprajesh's Profile: https://forums.netiq.com/member.php?userid=333
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
Anonymous_User Absent Member.
Absent Member.

Re: Matching analysis is very slow on large dataset


kprajesh;221624 Wrote:
> Ok, will look at it.
> Currently, you can achieve the same by increasing the number of rows per
> page of the DataBrowser. The default setting of 1000 can be changed from
> Analyzer > Windows > Preferences > DataBrowser.
>
> -KPRajesh



Analyzer 4.0.2 Patch 1 Build number 20121128 released today!

http://download.novell.com/Download?buildid=1zS-32Djj0s~

Bug fixes in this patch
- When a right click is done on import data, the metrics menu can not
pop-up. Bug 520552
- Null pointer exception when importing the data from MySQL. Bug 732817
- Unable to import a flat file data if the "First Row Contains Field
Names" is checked. Bug 769713
- "Display Attributes" are not shown in the uniqueness results. Bug
772557
- Issue with data merges on large data sets when multi-valued attributes
are used. Bug 785130
- Cleaner Script only effects current page and not entire dataset. Bug
785583
- Slow dataset creation wizard - Determining Scope. Bug 785859
- Data Browser - "Number of rows per page" setting is not being honored.
Bug 786370
- Matching dataset throuws exceptoin when dataset with muilti-valued
attribure support is enabled. Bug 789938
- Importing data causes a MySQL exception. Bug 791079


--
al_b
------------------------------------------------------------------------
al_b's Profile: https://forums.netiq.com/member.php?userid=209
View this thread: https://forums.netiq.com/showthread.php?t=2952

0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.