doc: Update stallwarn.rst with recent changes
This commit calls out the possibility of self-detected stalls, adds new messages, and calls out the use for stack traces. Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
This commit is contained in:
@@ -189,8 +189,8 @@ rcupdate.rcu_task_stall_timeout
|
|||||||
Interpreting RCU's CPU Stall-Detector "Splats"
|
Interpreting RCU's CPU Stall-Detector "Splats"
|
||||||
==============================================
|
==============================================
|
||||||
|
|
||||||
For non-RCU-tasks flavors of RCU, when a CPU detects that it is stalling,
|
For non-RCU-tasks flavors of RCU, when a CPU detects that some other
|
||||||
it will print a message similar to the following::
|
CPU is stalling, it will print a message similar to the following::
|
||||||
|
|
||||||
INFO: rcu_sched detected stalls on CPUs/tasks:
|
INFO: rcu_sched detected stalls on CPUs/tasks:
|
||||||
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
|
2-...: (3 GPs behind) idle=06c/0/0 softirq=1453/1455 fqs=0
|
||||||
@@ -204,6 +204,8 @@ PREEMPT_RCU builds can be stalled by tasks as well as by CPUs, and that
|
|||||||
the tasks will be indicated by PID, for example, "P3421". It is even
|
the tasks will be indicated by PID, for example, "P3421". It is even
|
||||||
possible for an rcu_state stall to be caused by both CPUs *and* tasks,
|
possible for an rcu_state stall to be caused by both CPUs *and* tasks,
|
||||||
in which case the offending CPUs and tasks will all be called out in the list.
|
in which case the offending CPUs and tasks will all be called out in the list.
|
||||||
|
In some cases, CPUs will detect themselves stalling, which will result
|
||||||
|
in a self-detected stall.
|
||||||
|
|
||||||
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
|
CPU 2's "(3 GPs behind)" indicates that this CPU has not interacted with
|
||||||
the RCU core for the past three grace periods. In contrast, CPU 16's "(0
|
the RCU core for the past three grace periods. In contrast, CPU 16's "(0
|
||||||
@@ -283,7 +285,8 @@ If the relevant grace-period kthread has been unable to run prior to
|
|||||||
the stall warning, as was the case in the "All QSes seen" line above,
|
the stall warning, as was the case in the "All QSes seen" line above,
|
||||||
the following additional line is printed::
|
the following additional line is printed::
|
||||||
|
|
||||||
kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
|
rcu_sched kthread starved for 23807 jiffies! g7075 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x1 ->cpu=5
|
||||||
|
Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
|
||||||
|
|
||||||
Starving the grace-period kthreads of CPU time can of course result
|
Starving the grace-period kthreads of CPU time can of course result
|
||||||
in RCU CPU stall warnings even when all CPUs and tasks have passed
|
in RCU CPU stall warnings even when all CPUs and tasks have passed
|
||||||
@@ -313,15 +316,21 @@ is the current ``TIMER_SOFTIRQ`` count on cpu 4. If this value does not
|
|||||||
change on successive RCU CPU stall warnings, there is further reason to
|
change on successive RCU CPU stall warnings, there is further reason to
|
||||||
suspect a timer problem.
|
suspect a timer problem.
|
||||||
|
|
||||||
|
These messages are usually followed by stack dumps of the CPUs and tasks
|
||||||
|
involved in the stall. These stack traces can help you locate the cause
|
||||||
|
of the stall, keeping in mind that the CPU detecting the stall will have
|
||||||
|
an interrupt frame that is mainly devoted to detecting the stall.
|
||||||
|
|
||||||
|
|
||||||
Multiple Warnings From One Stall
|
Multiple Warnings From One Stall
|
||||||
================================
|
================================
|
||||||
|
|
||||||
If a stall lasts long enough, multiple stall-warning messages will be
|
If a stall lasts long enough, multiple stall-warning messages will
|
||||||
printed for it. The second and subsequent messages are printed at
|
be printed for it. The second and subsequent messages are printed at
|
||||||
longer intervals, so that the time between (say) the first and second
|
longer intervals, so that the time between (say) the first and second
|
||||||
message will be about three times the interval between the beginning
|
message will be about three times the interval between the beginning
|
||||||
of the stall and the first message.
|
of the stall and the first message. It can be helpful to compare the
|
||||||
|
stack dumps for the different messages for the same stalled grace period.
|
||||||
|
|
||||||
|
|
||||||
Stall Warnings for Expedited Grace Periods
|
Stall Warnings for Expedited Grace Periods
|
||||||
|
|||||||
Reference in New Issue
Block a user