AD Driver - slow processing


So this seems to be typical, wondering if I missed anything.

Processing thru eDir driver, 1000 events queued, usually clears in under
10 minutes or so
Processing thru JDBC driver (hitting a read only table in Oracle), 1000
events queued, usually clears in 5 minutes or so
Processing thru loopback driver, load up 1000 identities and let it rip,
under 5 minutes.
AD driver? 1000 events takes over an hour to complete and bogs down
password processing and sync while the events chew thru.

setup
AD Remote loader, Virtual, 2008R2, 8GB Ram, 8 processors , 1GB network
trace set to 0/off , server is the PDC-E, password shim loaded and
running on all other DC's
IDVault - IDM 4.5 AE, Virtual, Sles 11SP3, 16GB Ram , 4 Processors ,
1GB network, trace set to 0, server has R/W replica of tree. Java heap
is min 512MB/max 1024MB, other drivers are JDBC, RRSD,2 loopbacks, eDir,
userapp is on another different box.

AD driver is mostly stock, nothing special going on in it. The AD is
fairly flat with the users kept in 3 different location OU's in it.

Anyone have any suggestions/ideas? it's virtual so we can up the memory
or procs, but nothing seems to be working overly hard, it mostly seems
to be waiting.

thanks in advance,

Dave G.


--
dbgallo
------------------------------------------------------------------------
dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
View this thread: https://forums.netiq.com/showthread.php?t=55347

  • dbgallo <dbgallo@no-mx.forums.microfocus.com> wrote:
    >
    >


    AD driver doesn't have loopback protection so is very chatty compared to
    other drivers.

    Have you done and LDAP performance profiling on the AD side? When I've been
    asked this question in the past the biggest culprit is normally AD as it
    can be slow to respond at times.

    Optimising AD LDAP performance is not really within the the scope of these
    forums.

    What is the ad shim version and remote loader memory params (default for
    idm 4.x remote loaders is 512Mb/128Mb)

    --
    If you find this post helpful and are logged into the web interface, show
    your appreciation and click on the star below...

  • using a tool like Apache Directory Studio against the same server, I
    query the entire domain and return a search in under 3 minutes with
    20000 objects, not as quick as eDir, but ok for us, replication delta is
    under 60 seconds.

    AD version on the vault side is 4.0.0.4
    AD version on the remote loader is 4.0.0.4
    and the memory is set to 512/128


    --
    dbgallo
    ------------------------------------------------------------------------
    dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
    View this thread: https://forums.netiq.com/showthread.php?t=55347

  • dbgallo <dbgallo@no-mx.forums.microfocus.com> wrote:
    >

    using a tool like Apache Directory Studio against the same server, I
    query the entire domain and return a search in under 3 minutes with
    20000 objects, not as quick as eDir, but ok for us, replication delta is
    under 60 seconds.
    >


    Is this the exact same query as performed by the shim? Otherwise this isn't
    really that valid a test.

    Have you turned on any tracing of how efficient the LDAP queries are
    against AD?

    It may be that an index on AD could make a huge difference here.

    I've never had to adjust the AD shim max heap about 512M.

    --
    If you find this post helpful and are logged into the web interface, show
    your appreciation and click on the star below...
  • On 02/11/2016 11:27 AM, dbgallo wrote:
    >
    > using a tool like Apache Directory Studio against the same server, I
    > query the entire domain and return a search in under 3 minutes with
    > 20000 objects, not as quick as eDir, but ok for us, replication delta is
    > under 60 seconds.


    This is not at all what you write that your driver is doing. In your
    case, you said you queried up 1,000 events (not 1 event) that then went
    over to microsoft active directory (MAD) via the driver. That would look
    a bit like this (outlining just the major events, and consolidating many
    things into single lines because we cannot see your driver config logic):


    for i in { 0..1000 }; do
    Add sync/modify/whatever to TAO file on engine side
    Get object from eDirectory
    Process through the MAD driver config's Subscriber channel
    At some point, probably do query for object in MAD (Matching or Merging
    depending on create or just a sync of an associated object)
    Send through SSLized connection (hopefully) to the Remote Loader (RL)
    RL uses MAD driver (shim) to act as MAD client (hopefully SSLized)
    MAD returns the object, or not, to the RL
    RL returns the output to the engine
    Finish other policies in driver config, possibly doing more queries (see
    section above) to complete that.
    Schema mapping changes attribute names for most/all attributes
    Final document (add/modify/etc.) sent to Remote Loader (SSLized still)
    RL uses MAD driver (shim) to act as MAD client (hopefully SSLized)
    MAD returns the object, or not, to the RL
    RL returns the output to the engine
    Engine Processes Status of operation
    Engine updates association on the object.
    Engine clears the event from the TAO file.
    done


    vs. what you did:


    Query MAD from client (possibly using LDAPS, but not the default)
    Server processes query, returns 20,000 objects
    Client gets objects.


    While the latter is admittedly a big query, one big operation is usually
    much faster than many small operations, and in your case you have
    potentials for nested operations from each operation.

    Seeing a level three trace, probably from both the engine and RL, would
    help us find what may be slow, but at the end of the day you're doing a
    lot of operations involving several machines.

    It would perhaps help us to know what these operations are and why they
    are caused. Are they creates? Resyncs to fix some problem? Predictable
    modifies that happen to objects regularly? What exactly causes the
    events, and could it be throttled to maybe do it in smaller batches? By
    default IDM is First-In-First-Out (FIFO) because event order often matters
    (deletes come after password changes which come after creates, and other
    orders may fail badly) so it will try to keep that order for you as much
    as possible.

    Do you have auditing turned on, and with what settings? Do you have
    tracing set or turned up, and to what level?

    Also you may find that some features of IDM 4.5 let you prioritize certain
    traffic (passwords in particular) to help with this kind of thing.

    Some organizations who cannot get around this have, in the past, setup a
    password-sync driver that only does passwords so those always go through
    very quickly. It's an option, and pretty easily done, but in most cases
    there are other things that can be done to fix the symptom.

    --
    Good luck.

    If you find this post helpful and are logged into the web interface,
    show your appreciation and click on the star below...

  • indexing is on , for most of the user class attributes, was able to
    confirm that with the AD schema console.
    The slowness is consistent whether it's 100 events or 1000, they take
    proportionally the same time. The AD server is not out of memory, nor
    running out of CPU.
    The exclusions for the C:\Novell directory are in place, the
    dirxml_remote process is excluded from scans as well.

    this is one of those perplexing things, the other drivers run fine 99.9%
    of the time.


    --
    dbgallo
    ------------------------------------------------------------------------
    dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
    View this thread: https://forums.netiq.com/showthread.php?t=55347

  • On 2/11/2016 10:44 AM, dbgallo wrote:
    > setup
    > AD Remote loader, Virtual, 2008R2, 8GB Ram, 8 processors , 1GB network
    > trace set to 0/off , server is the PDC-E, password shim loaded and
    > running on all other DC's
    > IDVault - IDM 4.5 AE, Virtual, Sles 11SP3, 16GB Ram , 4 Processors ,
    > 1GB network, trace set to 0, server has R/W replica of tree. Java heap
    > is min 512MB/max 1024MB, other drivers are JDBC, RRSD,2 loopbacks, eDir,
    > userapp is on another different box.


    Just a point of clarity, the remote loader trace and the vault trace are set to zero correct?
    When you see the slow down is it when there are multiple drivers with stuff queued? or does the AD driver end up standing
    alone with a backup of events?

    What if you stop the other drivers? I've seen where it appears that there are no events on the other drivers but in reality
    the loopback on the AD driver is causing other events to be processed on other drivers and they are just keeping up. So on
    the dashboard it looks like they are doing nothing but in reality they are working hard and that causes the AD events to be
    slower.

    Then, as others have mentioned, I would look next at any queries going to AD in the driver like the matching policy and what
    attributes they are using. A slow query would be my next culprit to consider.

    --
    -----------------------------------------------------------------------
    Will Schneider
    Knowledge Partner http://forums.netiq.com

    If you find this post helpful, please click on the star below.

  • remote loader trace is set to 0 and the Driver trace has been set to 0
    OR no trace file at all, no real difference in performance.

    I downed the other drivers in the driver set, did not change processing
    time.

    We are querying for "userprinciplename" to set a variable for a compare
    (so we don't needlessly overwrite it), but that attribute is indexed in
    AD.


    --
    dbgallo
    ------------------------------------------------------------------------
    dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
    View this thread: https://forums.netiq.com/showthread.php?t=55347

  • dbgallo <dbgallo@no-mx.forums.microfocus.com> wrote:
    >

    remote loader trace is set to 0 and the Driver trace has been set to 0
    OR no trace file at all, no real difference in performance.
    >
    > I downed the other drivers in the driver set, did not change processing

    time.
    >
    > We are querying for "userprinciplename" to set a variable for a compare

    (so we don't needlessly overwrite it), but that attribute is indexed in
    AD.


    --
    dbgallo
    ------------------------------------------------------------------------
    dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
    View this thread: https://forums.netiq.com/showthread.php?t=55347

    >


    Do you have any Powershell commands on your AD driver? (By using PSExecute)

    --
    Best regards
    Marcus

  • no powershell involved


    --
    dbgallo
    ------------------------------------------------------------------------
    dbgallo's Profile: https://forums.netiq.com/member.php?userid=3621
    View this thread: https://forums.netiq.com/showthread.php?t=55347

  • dbgallo <dbgallo@no-mx.forums.microfocus.com> wrote:
    >

    indexing is on , for most of the user class attributes, was able to
    confirm that with the AD schema console.

    What type of index though?
    Substring indexes can slow writes. Regular indexes are generally ok.

    Still could be a match or query that generates an inefficient LDAP filter.
    Could also be policies that query one attribute at a time rather than
    pre-fetch all required attributes (the idm engine has some built in
    optimisations here, but it is quite possible to inadvertently bypass these)

    Are you querying things that are really on another object (like group
    membership - which is a backlink)?

    > The slowness is consistent whether it's 100 events or 1000, they take

    proportionally the same time. The AD server is not out of memory, nor
    running out of CPU.
    > The exclusions for the C:\Novell directory are in place, the

    dirxml_remote process is excluded from scans as well.
    >
    > this is one of those perplexing things, the other drivers run fine 99.9%

    of the time.

    Consistent slowness might point to some sort of timeout, fallback or
    similar. A wireshark trace (or even sysinternals process monitor might be
    good enough) might identify any long pauses and what causes them.

    --
    If you find this post helpful and are logged into the web interface, show
    your appreciation and click on the star below...