DR for BSM , RUM

Hi Team

 

We have 2 gateway which are in load balancer , 2 DPS in failover and 2 Database in Oracle RAC cluster . 3 RUM Enigne connected with BSM along with 3 RUM client monitor probe . 

We want to setup DR for above scenario .  Pl suggest best practcies for configuring around it

BSM 9.26

RUM 9.26

 

Regards

Parents
  • We have the exact same configuration and we have done annual DR as well. We are on BSM 9.25IP1 though.  HP deployment document and hardering documents describe various aspects.   If you have any specific question, I may be able to help.

    Thanks,

    Rufeng

  • Hi Parkar,

    What level of resiliency do you need?  Are you trying to remove all SPOFs, cater for one data centre failure or something else?

    With 2 GWs, DPs, DB nodes, your BSM environment should be OK for DR - as long as everything is setup according to the DR doc (and the server pairs are in different data centres - and not the same rack ;) ).

    However, RUM is a bit different, and you would have to share some more specific information about your monitored apps and data centres to give specific suggestions, so here are some more general thoughts:

    With client monitor probes, it depends on how the incoming data is routed to them.  If you have the data going through an aggregator or load balancer, you can balance the traffic across the probes and ensure you have enough capacity to handle peak volumes going through the remaining probes if one fails.

    I worked on a solution to cater for component failure in an environment with a lot of sniffer Probes.  This used network aggregators (aka Network Packet Brokers) to filter the traffic and split the high volume output across lots of probes.  With advanced models, it's possible to setup rules and triggers to automatically redirect traffic to a different output port (probe) if one probe fails.

    The application being monitored ran active/active across 2 data centres, and was designed to have enough capacity to run in peak periods from only one data centre if required.  Therefore, we tried to ensure we also had enough capacity to do the same with RUM Engines and Probes.  In this scenario, we didn't care if one data centre failed, because all the user traffic would be going through the other data centre anyway.  If an individual Probe failed, we could redirect traffic to other probes (ensuring Engine/Probe assignments were updated in BSM/EUM if required).  If an Engine failed, we could move the Probe to a spare, or another with enough free capacity (again, checking Engine/Probe assignments).

    Generally, you can double up on RUM capacity and minimise SPOFs as long as the Business requirements justify the costs.

    Regards,
    Tim

  • We are looking for  best solution where minimal time required when application switch over done.

  • Also how to ensure DR for RUM engine and RUM Probe 

  • Hi Parkar,

    This is what I was touching on in my previous post.  There isn't a default failover mechanism, but there are ways of doing it with some manual intervention if you have spare capacity.

    How does the monitored data get to your Probes?  Is it via a load balancer or anything with traffic management capabilties (e.g. an aggregator)?

    How many RUM Apps do you have in total and per Probe?

    How many Probes does each Engine have?  Is it a 1:1 ratio?

    Regards,

    Tim

Reply
  • Hi Parkar,

    This is what I was touching on in my previous post.  There isn't a default failover mechanism, but there are ways of doing it with some manual intervention if you have spare capacity.

    How does the monitored data get to your Probes?  Is it via a load balancer or anything with traffic management capabilties (e.g. an aggregator)?

    How many RUM Apps do you have in total and per Probe?

    How many Probes does each Engine have?  Is it a 1:1 ratio?

    Regards,

    Tim

Children
  • rum probe to rum engine is 1:1 and no load balancer for rum client monitor probe.

     

     

  • Hi Parkar,

    You only really have the following options:

    Probe Failure: Agree a process with the network / data centre team to redirect the traffic from the failed probe to one of the other Probes, then update Engine/Probe assignments for the RUM apps in EUM.  This depends on the location of the probes and how the data is routed

    Engine Failure: Connect the failed Engine's Probe to another Engine, then update Engine/Probe assignments for the RUM apps in EUM.  When the failed Engine is running again, both Engines will think that they own the Probe, so revert the config quickly.

    However, it depends on whether there is enough capacity on your Engines and Probes, and whether there is cross connectivity etc.  It may be worth requesting a spare Engine and Probe.

     

    Regards,

    Tim