pappa_recd

Lieutenant Commander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2016-04-03
15:59
2214 views
Request Timeout Error : Timeout Exceeded
Hi,
When SSPR application is left idle for around 4-5 hours I am getting the error "RequestTimeoutError: Timeout exceeded". After trying 5-6 times am able to login to the application without any issues.
DEV is working fine but the problem is with QA and PROD. The difference between DEv and QA/PROD is in DEV all servers (SSPR,IDV(user store), Oracle(store for response)) are in the app tier. But in QA / PRD SSSPR servers are in DMZ.
Username search filter used is (&(objectClass=person)(cn=%USERNAME%)(!(lockedByIntruder=true)))
Below is the snippet from the trace level log
2016-04-03T14:44:18Z, TRACE, http.PwmRequest, {5w} POST request for: /sspr/private/Login [10.199.180.16]
preventCache='1459694658664'
processAction='restLogin'
pwmFormID='EGTpgtfHxeWngxenYlKGxvxQ4EXeRXYF153dc92d917iIZFl'
2016-04-03T14:44:18Z, TRACE, ldap.UserSearchEngine, {5w} username '******' does not appear to be a DN (does not start with configured ldap naming attribute 'cn') [10.199.180.16]
2016-04-03T14:44:18Z, DEBUG, ldap.UserSearchEngine, {5w} beginning user search process [10.199.180.16]
2016-04-03T14:44:18Z, DEBUG, ldap.UserSearchEngine, {5w} performing ldap search for user; searchID=5 profile=IDV base=ou=users,o=***** filter=SearchHelper: filter: (&(objectClass=person)(cn=******)(!(lockedByIntruder=true))), scope: SUBTREE, attributes: [] [10.199.180.16]
2016-04-03T14:45:15Z, TRACE, http.PwmRequest, {5v} POST request for: /sspr/public/CommandServlet [172.19.49.216]
processAction='idleUpdate'
time='1459694715278'
pwmFormID='ihobI156Ug5hAl1hjsbQkGYA6uKrWHIB153dc839260qdz42'
2016-04-03T14:45:15Z, TRACE, servlet.CommandServlet, {5v} received request for action idleUpdate [172.19.49.216]
When SSPR application is left idle for around 4-5 hours I am getting the error "RequestTimeoutError: Timeout exceeded". After trying 5-6 times am able to login to the application without any issues.
DEV is working fine but the problem is with QA and PROD. The difference between DEv and QA/PROD is in DEV all servers (SSPR,IDV(user store), Oracle(store for response)) are in the app tier. But in QA / PRD SSSPR servers are in DMZ.
Username search filter used is (&(objectClass=person)(cn=%USERNAME%)(!(lockedByIntruder=true)))
Below is the snippet from the trace level log
2016-04-03T14:44:18Z, TRACE, http.PwmRequest, {5w} POST request for: /sspr/private/Login [10.199.180.16]
preventCache='1459694658664'
processAction='restLogin'
pwmFormID='EGTpgtfHxeWngxenYlKGxvxQ4EXeRXYF153dc92d917iIZFl'
2016-04-03T14:44:18Z, TRACE, ldap.UserSearchEngine, {5w} username '******' does not appear to be a DN (does not start with configured ldap naming attribute 'cn') [10.199.180.16]
2016-04-03T14:44:18Z, DEBUG, ldap.UserSearchEngine, {5w} beginning user search process [10.199.180.16]
2016-04-03T14:44:18Z, DEBUG, ldap.UserSearchEngine, {5w} performing ldap search for user; searchID=5 profile=IDV base=ou=users,o=***** filter=SearchHelper: filter: (&(objectClass=person)(cn=******)(!(lockedByIntruder=true))), scope: SUBTREE, attributes: [] [10.199.180.16]
2016-04-03T14:45:15Z, TRACE, http.PwmRequest, {5v} POST request for: /sspr/public/CommandServlet [172.19.49.216]
processAction='idleUpdate'
time='1459694715278'
pwmFormID='ihobI156Ug5hAl1hjsbQkGYA6uKrWHIB153dc839260qdz42'
2016-04-03T14:45:15Z, TRACE, servlet.CommandServlet, {5v} received request for action idleUpdate [172.19.49.216]
2 Replies


Knowledge Partner
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2016-04-03
20:42
Since what you posted from the logs shows LDAP, do you suspect that this
is a timeout between the client (SSPR) the server (LDAP)? Does it happen
if you cause connections to be made every hour or so for more than the
four or five hours?
I've seen environments in the past with very poor setups in
routers/firewalls/switches that automatically broke idle, or even active,
connections after some timeout. This is really a terrible practice
generally, but network folks seem to think it's neat to "clean up"
connections for no good reason. Maybe it makes sense when dealing with
the public, but a connection between an application you own, and an LDAP
service you own, isn't that way. TCP has pretty relaxed timeouts on
connections that are idle and have NO traffic, and that's fine.
If this is your issue, there is a problem that exacerbates it, which is
that devices which prevent the "old" connections from continuing often do
so illegally (per the TCP protocol) meaning they simply stop allowing
packets from one system's socket to another while the connection is
ESTABLISHED according to the endpoints; what they should (MUST per TCP) do
instead is forge a RST for each side and send it to the other side, so
basically each side sees the other side as immediately killing the
connection. In most cases, this will cause the client to immediately
reconnect (again, this timeout is pointless, but networking equipment
sometimes does pointless things) but at least the client and server know
that the connection is broken, and the client's new connection will be
used instead of the one that has a permanent block because of the bad
network device.
If this is your problem, it's pretty easy to see. Get a LAN/wire trace
from both sides at the same time (application and LDAP service) and then
wait until the problem happens. If it happens, you'll see packets from
the application to the LDAP service that never arrive. Clearly they are
being sent, and clearly they are not arriving, thus you have something in
between that is blocking them. Find that device and fix it.
Feel free to capture with the command above, then compress the output and
post it somewhere for review.
--
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
is a timeout between the client (SSPR) the server (LDAP)? Does it happen
if you cause connections to be made every hour or so for more than the
four or five hours?
I've seen environments in the past with very poor setups in
routers/firewalls/switches that automatically broke idle, or even active,
connections after some timeout. This is really a terrible practice
generally, but network folks seem to think it's neat to "clean up"
connections for no good reason. Maybe it makes sense when dealing with
the public, but a connection between an application you own, and an LDAP
service you own, isn't that way. TCP has pretty relaxed timeouts on
connections that are idle and have NO traffic, and that's fine.
If this is your issue, there is a problem that exacerbates it, which is
that devices which prevent the "old" connections from continuing often do
so illegally (per the TCP protocol) meaning they simply stop allowing
packets from one system's socket to another while the connection is
ESTABLISHED according to the endpoints; what they should (MUST per TCP) do
instead is forge a RST for each side and send it to the other side, so
basically each side sees the other side as immediately killing the
connection. In most cases, this will cause the client to immediately
reconnect (again, this timeout is pointless, but networking equipment
sometimes does pointless things) but at least the client and server know
that the connection is broken, and the client's new connection will be
used instead of the one that has a permanent block because of the bad
network device.
If this is your problem, it's pretty easy to see. Get a LAN/wire trace
from both sides at the same time (application and LDAP service) and then
wait until the problem happens. If it happens, you'll see packets from
the application to the LDAP service that never arrive. Clearly they are
being sent, and clearly they are not arriving, thus you have something in
between that is blocking them. Find that device and fix it.
#The next bit should all be on one line
sudo /usr/sbin/tcpdump -n -s 0 -i any -w /tmp/ldap.cap -v port 389 or port 636
Feel free to capture with the command above, then compress the output and
post it somewhere for review.
--
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below...
pappa_recd

Lieutenant Commander
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
2016-04-04
17:10
This is what I see in the SSPR debug log
From SSPR Debulg Log:
2016-04-03T21:13:22Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
2016-04-03T05:26:59Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
2016-04-03T03:50:50Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
Note the "socket closed" message. Something is closing the connection SSPR has with the LDAP server. This most likely points to a network issue, usually a firewall/gateway that is closing a connection from the middle.
Setting > LDAP ⇨ General LDAP Settings ⇨ LDAP Idle Time out = 0
Observations:
- SSPR will want to keep ldap connections open forever.
- But it seems a gateway, firewall, switch or something is closing them.
- The timeout occurs because it takes awhile for SSPR to figure out that the connection is gone and to set a new one.
I have change the Idle time out to 30 and the problem seems to be gone.
From SSPR Debulg Log:
2016-04-03T21:13:22Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
2016-04-03T05:26:59Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
2016-04-03T03:50:50Z, WARN , provider.FailOverWrapper, unable to reach ldap server ldaps://DE08U7445.honeywell.com:636, last error: DE08U7445.honeywell.com:636; socket closed
Note the "socket closed" message. Something is closing the connection SSPR has with the LDAP server. This most likely points to a network issue, usually a firewall/gateway that is closing a connection from the middle.
Setting > LDAP ⇨ General LDAP Settings ⇨ LDAP Idle Time out = 0
Observations:
- SSPR will want to keep ldap connections open forever.
- But it seems a gateway, firewall, switch or something is closing them.
- The timeout occurs because it takes awhile for SSPR to figure out that the connection is gone and to set a new one.
I have change the Idle time out to 30 and the problem seems to be gone.