Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..
1615 views

Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

Good morning,

I'm curious as to what other companies are doing in the area of Disaster Recovery or Backup Contingency Planning for outages for Service Manager.  

In our company, when we make changes to Service Manager that require changes to the dbdicts, we schedule an outage, kick everybody off the application, perform our changes and checkout, and then bring the application back up.  While the main Service Manager application is down, we actually have our users create tickets in another system, and then we use ConnectIt to move those tickets over to HPSM.  The other system is limited, does not have all the data our Production system has, and it's kind of a pain trying to keep the core data in sync.

We use that same system for unexpected outages - like if HPSM goes down for some reason.

For 'Hard Down' disaster recovery, the database team and server teams bring up whole copies of Service Manager in another data center, but that's only for a company-wide disaster, and not something that's available for scheduled outages or unexpected HPSM outages.

So, I'm curious how other companies have solved it.  What do you guys do in your companies for unexpected outages or shceduled maintenance?  

10 Replies
Highlighted
Established Member..
Established Member..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

Hi Jacob,

We do have exactly same requirement, but HP said that is not supported from Application Side. E.g. In case of an planned outage, we thought of moving users to DR instance and work on Primary instance. During this window, there will updates to DB from both instances, but Service Manager doesn't support active sync between DB instances. Service Manager being an enterprise level application, clients expects this functionality should available. HP not even planning add this feature to SM 9.5x versions also. Not sure when this will be availalbe in Service Manager.

Regards,

Madhava

Highlighted
Outstanding Contributor.. Outstanding Contributor..
Outstanding Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

One of my customers which no longer use HP had a 2 datacenter setup. One active and one standby.

In case of major outages in the active datacenter they would switch over to the standby datacenter and the database would be synced between the two datacenters so all data would be available regardless of which datacenter was active.  Only one service manager instance would be active at any time.

Once a year they did a disaster recovery test where this setup was tested and it worked perfectly.

Service Manager was running on AIX.

Highlighted
Honored Contributor.. Honored Contributor..
Honored Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

We swing over to our secondary data center for DR exercises.  If I had a planned outage that was a significant length I could probably plan to also swing over to DR but we mirror the database to our secondary data center and I think it would be problematic to run both instances.  I've never had an unplanned outage last more than 15-20 minutes.

And most of my scheduled outages are less than 30 minutes so people just have to do without for that short time.  I do plan all my outages during my maintenance window which is 9pm on Sundays because that was determined to be the least busy time for Service Manager usage.  So far this has worked for us.

When we upgraded from 9.40 to 9.41 we knew the outage would be 3-4 hours.  The powers that be were not happy about that and didn't like the idea of tickets being tracked on paper or via a spreadsheet, access database, etc..  I had to allow them to open tickets in the QA environment (after resetting the number table to match production) and then after the outage I did an unload in QA of the following tables for the outage time period: SYSATTACHMENTS, activity, activityservicemgmt, incidents, probsummary, screlation and moved it to production.

It was a pain on myside but the only impact on the users was they had to sign into a different link during the outage window.

Highlighted
Knowledge Partner
Knowledge Partner

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

Hi Jacob

For scheduled maintenance, I think you are being ultraconservative. I never "shutdown" the system before creating new fields in dbdict. You can make a dbdict backup or hot backup.... I don't see your point in doing this. For some activities that may require an outage/partial outage, like change of table structure, if your client can't face some time of unavailability you can always find a solution to make it quicker - in my example, for instance: clone the table, perform the change, sync and replace the original.

For unexpected outages, in 10 years working with this tool in different clients all problems that last for more time than a full restart were related to components that were out of my control, like LDAP or Hardware LoadBalancer or general network failures.

General advice would be:
RTE: Have a standby or snapshot of yours HPSM RTE servers. Discuss with your Windows/Linux/Virtualization team the best place to have it.

Database: The same for any other application. Appropriate RAID setting, Full backup, Differential Backup, Log Backup, Different storage for database, logs, backup.... All depends on what Infrastructure you already have and the money you are willing to invest.
Discuss with your DB Team strategies for your database. Depends on what downtime you accept, most of them are transparent for HPSM and in case of need requires few minutes to be ready to use.

This topic is very interesting. I liked it and would like to have more people joining in this discussion ...

Thanks!

Regards,
Breno Abreu

If you feel this was helpful please click the KUDOS! thumb below!
0 Likes
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

So you do the dbdict changes at the database level, rather than via the UI?

We kick users out because if you're updating a record in the Incident module (or Change module or whatever) and have a record lock while changes are being made to the dbdict, those changes don't actually make it down to the database.  

And as much as we'd love to have short outages (our work could be done in less than 30 minutes, when we're doing release activities) we have process owners who, though we have 4 years of releases with zero defects introduced with the release that were 'new' in Production, they feel the neccessity to do a full regression test in Prod on release day.

0 Likes
Highlighted
Knowledge Partner
Knowledge Partner

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

So you do the dbdict changes at the database level, rather than via the UI?

It depends on the environment. Most clients I worked for, the hpsm db user had ddl grants so mostly I do by dbdict utility only. But the behavior is the same, the difference is that if your user does not have the DDL you need to involve the DB Team in the change. It does not change anything else. 

We kick users out because if you're updating a record in the Incident module (or Change module or whatever) and have a record lock while changes are being made to the dbdict, those changes don't actually make it down to the database.  

Are you sure? I never face this situation and I just did a test in my private 9.4x playground and it's not what I observed.

And as much as we'd love to have short outages (our work could be done in less than 30 minutes, when we're doing release activities) we have process owners who, though we have 4 years of releases with zero defects introduced with the release that were 'new' in Production, they feel the neccessity to do a full regression test in Prod on release day.

You probably can reduce the outage and keep satisfying your process owners by creating a pre-prod environment. Indeed, 4 years of releases with zero defects is quite impressive, congratulations!

Regards,
Breno Abreu

If you feel this was helpful please click the KUDOS! thumb below!
0 Likes
Highlighted
Acclaimed Contributor.. Acclaimed Contributor..
Acclaimed Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

Are you sure? I never face this situation and I just did a test in my private 9.4x playground and it's not what I observed.

On 7.1x, 9.2x and 9.3x, we observed this.  We were trying to do in-flight (leting users stay on).  We'd make a change to the dbdict, save, exit HPSM, log back in, and the changes would be gone.  No recycle, just log out.

You probably can reduce the outage and keep satisfying your process owners by creating a pre-prod environment.

Heh, we have a pre-prod, which is why I can state we haven't introduced any 'new' defects.  The theory is, the process teams are supposed to full regression test in the UAT environment, sign off, then we move code to Staging, they smoke test and sign off, and then we move that code into Prod.  What _really_ happens is they sign off in UAT and Staging without really testing.  Then we go into Prod, they regression test, and find issues... So then we go back to the lower environment (where they said it was defect free) and replicate the issue.  So if exists in the code they signed off on, then at that point, it's working as designed (even if we have to change it as soon as possible).  We give 3 weeks for testing, and they do more in the 4 - 6 hour outage than in those 3 weeks.

0 Likes
Highlighted
Outstanding Contributor.. Outstanding Contributor..
Outstanding Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

We give 3 weeks for testing, and they do more in the 4 - 6 hour outage than in those 3 weeks.

Hehe so few customers do proper testing in advance.

0 Likes
Highlighted
Outstanding Contributor.. Outstanding Contributor..
Outstanding Contributor..

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

As for when to kick out users/restart server it depends on the type customization applied.

For example most simple dbdict changes I do with users online except if a new field also becomes mandatory then I will do the update in a maintenacne window. Just adding a new field require users to logoff and on to have effect.  Reindexing of large tables is always done in maintenance windows since this locks the entire table.

Very few customisations actually require restart of server. In that case it is always done in the maintenance window.

0 Likes
Highlighted
Acclaimed Contributor.
Acclaimed Contributor.

Re: Service Manager and Disaster Recovery/Backup Contingency Plan/Scheduled Maintenance

Thanks guys, this has been a most insightful thread for ages.

Recently I listened to the webinar, that stated that for upcoming releases ITSM will be adopting container approach for installation and upgrading procedures. I'm no means expert in that area, but what I heard and understood, it would make a disaster recovery a bit easier. (At least for the pieces you have control over...)

If you are interested, search for a webinar called "HPE ITSM Automation and Containers – Accelerating Deployment and Time to Value".

---
Moving on, this account is no longer active. Best regards, Kelalek
- So Long, and Thanks for All the Fish
0 Likes
The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.