Highlighted
Respected Contributor.
Respected Contributor.
675 views

IPC Read Error

Hello Everybody,

There is a misterious problem on my DP enviroment. I have a oracle backup that its working normally, however, everytime when it is started, after exactly 40 minutes, I have a IPC Read Error, but the backup continues and finish with status "completed/erros".

I already tried to put the OB2IPCKEEPALIVE=1 into the .omnirc file and set to the values below

        ndd –set /dev/tcp  tcp_keepalive_interval  600000

        ndd –set /dev/tcp tcp_time_wait_interval 60000

... but It continues with error.

The OS where is installed the DP agent is HP-UX B11.31 and I don't know about it.

0 Likes
8 Replies
Highlighted
Outstanding Contributor.
Outstanding Contributor.

Can you attach the session output? It should show which binary reported the IPC read error, which can help narrow things down.

0 Likes
Highlighted
Respected Contributor.
Respected Contributor.

Hi,

This is the part of session which shows the IPC Read Error.

Just it !!! Exactly, 40 minutes after backup starting.

-----------------------------------------------------------------------------

[Major] From: BSM@CM_SERVER "BACKUPNAME_ORACLE_ONLINE" Time: 20/07/2017 17:02:10

[61:3003] Lost connection to OB2BAR Backup DA named "ERROR"

on host 10.0.1.160.

Ipc subsystem reports: "IPC Read Error

System error: [232] Connection reset by peer

 

0 Likes
Highlighted
Outstanding Contributor.
Outstanding Contributor.

Hm, still looks like a idle connections getting dropped. Are you sure keepalive is in effect? Where did you set .omnirc? On the cell server or the agent host?

 

OB2BAR Backup DA can be:

1a) channel connection with active object

1b) channel connection without an active object (after sbtclose before sbtend).

2) connection to inet over which ob2rman.pl is sending output of RMAN and progress of other non-rman/mml operations.

3) connection from util_cmd, spawned by ob2rman.pl to wait for aborts

 

Does this happen with any oracle backup, regardless of size? If yes, then disconnect is not likely to be from 1a and 1b).

Do you see any failed objects in the backup after the error? If yes, then 1b) is likely the culprit, but I have a hunch that's not the case with your situation because this would result in an error propagating to RMAN, which you would notice.

How many channels do you use in your backup?

- If 1, then 1b) is not likely to be a culprit.

- If more than 1, check if there are fewer objects running for extended periods of time than there are channels. AFAIK Oracle tries to load channels pretty evenly, so this imbalance should not be 40 minutes in.

Are you starting backup via RMAN directly?

- If yes, then 2) and 3) are not culprits (because ob2rman.pl is not running).

- If not, either 2) or 3) are prone to inactivity timeouts (2 if RMAN script doesn't print anything for extended periods, and 3 for any long session).

For 2), check when was the last output from RMAN (it would really be helpful if you could attach the full session output). DP does not prefix each line with a timestamp, but you may be able to infer some of the timing (e.g. channel closed should happen close to object stop, which should have a dump).

0 Likes
Highlighted
Respected Contributor.
Respected Contributor.

Hi,

This happened with only 2 oracle servers, and the error happen after 40 minutes too.

In this case, I have 27 allocated channels, But I believe that the backup isn't using all, cause the Oracle version isn't Enterprise. ( If need, I can edit the channels ).

About the .OMNIRC, I have OB2IPCKEEPALIVE=1 on the Cell Manager and ORACLE_SERVER... and I configured too the values "tcp_keepalive_interval" and "tcp_time_wait_interval" on the ORACLE SERVER. See below:

ndd –set /dev/tcp  tcp_keepalive_interval  600000
ndd –set /dev/tcp tcp_time_wait_interval 60000

fore more, I'm attaching the session file.

Thanks for you help

0 Likes
Highlighted
Respected Contributor.
Respected Contributor.

See attach

0 Likes
Highlighted
Outstanding Contributor.
Outstanding Contributor.

Well, RMAN is responsive, so it's not 2). Out of 27 allocated channels, the session appears to be using only one.

But AFAIK channel does not connect to BSM until file is assigned to it and sbtopen is called. Even if it did, there are 26 idle channels, but only one "lost connection" report, so it doesn't look like 1b). Still, you could try reducing channels to 1 or 2.

There are no other object failures reported, and the only used channel has an active object when the error is reported, so it's not 1a).

This leaves only 3) - after the failure, can you try aborting the session, and posting the session output?

- If connection from 3) is still active, it should issue an abort handler in the agent.

- If it's not active, then the keepalive do not seem to be turned on, or another issue is at play. You would then maybe need to open a support ticket.

 

0 Likes
Highlighted
Respected Contributor.
Respected Contributor.

Hi,

See attach!

... after error, I aborted the backup but the datas continued backuping  normally until finish the command abort. 

 

0 Likes
Highlighted
Outstanding Contributor.
Outstanding Contributor.

Ob2rman abort handler did not receive the abort after the failure, so it was probably its connection that is getting disconnected. As said, that connection is usually idle. Either TCP keepalives are not properly configured or an unknown issue is at play.

At any rate, at this point, I do suggest you contact support.

The opinions expressed above are the personal opinions of the authors, not of Micro Focus. By using this site, you accept the Terms of Use and Rules of Participation. Certain versions of content ("Material") accessible here may contain branding from Hewlett-Packard Company (now HP Inc.) and Hewlett Packard Enterprise Company. As of September 1, 2017, the Material is now offered by Micro Focus, a separately owned and operated company. Any reference to the HP and Hewlett Packard Enterprise/HPE marks is historical in nature, and the HP and Hewlett Packard Enterprise/HPE marks are the property of their respective owners.