sm.exe high CPU utilization

We've had this happen twice.  Once on 9/13/16 around 1:15 om and again today around the same time.  Users start complaining the Service Manager is running slow.  I see the 3 or 4 sm.exe processes are pegging the CPU at 96-99%.  Then users start getting all kinds of random errors.

examples:
Unrecoverable error in application: secRoleBasedAccess on panel set.area.name
Unrecoverable error in application:  se.lock.object on panel get.object
Unrecoverable error in application:  Ruleset.runMultiple on panel run.rule
Unrecoverable error in application:  sc.setup.manage on panel manage.call
Cannot display menu; most likely a condition cannot be evaluated.

By the time I get involved, users are getting kicked out and I can't even sign in. 

The database team nor the server teams see anything wrong on their side.

I have SM set up to write to 8 logs before overwriting (sm.log, sm.log.1 .. sm.log.7).  Normally this is enough to cover 7-10 days worth of logs.  But both times this has happened so much gets written to the logs that the 8 logs together only covers about 2-3 minutes.  Not enough to figure out what's going on as the logs from the very beginning are lost.

What I do see in the logs is this over and over with a different amount of bytes for each one

8976( 15000) 12/09/2016 13:39:09  RTE I Diagnostic Services           0           0           0
  8976( 15000) 12/09/2016 13:39:09  RTE E sm_alloct: Not enough shared memory available to allocate 416 bytes
  8976( 15000) 12/09/2016 13:39:09  RTE I ------ Shared Memory ------
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I Shared Memory Release     9.40.1002
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I Current Size             256000000
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I Segment Allocation        46466648
  8976( 15000) 12/09/2016 13:39:09  RTE I Large Block Allocation   209513472
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I Unused Space                 19880  (0%)
  8976( 15000) 12/09/2016 13:39:09  RTE I Free Space               200005648  (78%)
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I
  8976( 15000) 12/09/2016 13:39:09  RTE I Shared Memory Type  Allocations    Frees     Allocated
  8976( 15000) 12/09/2016 13:39:09  RTE I ------------------  ----------- ----------- -----------
  8976( 15000) 12/09/2016 13:39:09  RTE I Not named                  2599        2331      325488
  8976( 15000) 12/09/2016 13:39:09  RTE I User blocks                 865         777      180224
  8976( 15000) 12/09/2016 13:39:09  RTE I Messages                      0           0           0
  8976( 15000) 12/09/2016 13:39:09  RTE I Resource locks                0           0           0
  8976( 15000) 12/09/2016 13:39:09  RTE I Database Services           520           0       23440
  8976( 15000) 12/09/2016 13:39:09  RTE I Cache overhead                1           0       39936
  8976( 15000) 12/09/2016 13:39:09  RTE I Application cache        748784      746093     5126544
  8976( 15000) 12/09/2016 13:39:09  RTE I DBDICT cache            2318885     2318467     6302208
  8976( 15000) 12/09/2016 13:39:09  RTE I SQL descriptor cache      15722       15192     1211776
  8976( 15000) 12/09/2016 13:39:09  RTE I Join/ERD/Type cache         386           0     1116032
  8976( 15000) 12/09/2016 13:39:09  RTE I String Type cache        421397      419418     1051904
  8976( 15000) 12/09/2016 13:39:09  RTE I IR Expert cache           74944         770    40616800
  8976( 15000) 12/09/2016 13:39:09  RTE I Diagnostic Services           0           0           0

We are on SM 9.40 and have been since January.
Anyone run across this before?

 

 

  • Hello FCBCD,

    In 9.40 what I noticed is a strange behavior of report process sometimes but I don't think this is the cause of your issue, because the report run under one sm.exe not three.

    Almost all times I had this kind of issues,, they were related to infra issues... How long was the issue? It solved alone or after a complete restart?

    Next time, run the full report using smdoctor.

    Hope it helps or .... even better: I hope it never happens again :).

    Regards,

    Breno Abreu

  • Most likely it's caused by wrong settings on the memory:

    8976( 15000) 12/09/2016 13:39:09  RTE I Unused Space                 19880  (0%)

    what you need to do is:

    1. if you don't use IR, then add ir_disable:1 into sm.ini file

    2. if you are using IR, then remove unnecessary IR keys.

    3. Increase shared memory to 256M, add the cache_clean_interval:2700 parameter to sm.ini

    Let me know if it helps,

    Ling-Yan

  • Thanks BrenoAbreu and lingyanmeng

    It turns out that it was KMUpdate.  Every instance of this happening was after a full reindex of the knowledgebase.  and it doesn't just affect one sm.exe process but multiple ones.  I have a case opened with HP to determine thy this is happening but in the mean time I have updated my shared_memory setting in the ini and also increase physical memory and virtual processors on the knowledger search server.  And implemented more stringent monitoring to alert me of issues.

  • Hii dear ,

     

    after Sm.exe Cpu utilzation taking high  in Sm server .

    first go to task manager > deatails> sm.exe> right click and check affinity

    how much cpu core taking after that all sm.exe chenge the affinity 5 and 4 all sm.exe file .

    wait few minute a check now .

    i hope Cpu is going down and normal steg.

     

    Regards,

    Rakesh