PM: Suspend/hibernation debug documentation update (rev. 2)

Update the suspend/hibernation debugging and testing documentation to describe
the newly introduced testing facilities.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
This commit is contained in:
Rafael J. Wysocki 2007-11-19 23:43:34 +01:00 committed by Len Brown
parent 4cc79776c9
commit ce2b7147bb
2 changed files with 172 additions and 70 deletions

View file

@ -1,45 +1,111 @@
Debugging suspend and resume Debugging hibernation and suspend
(C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL (C) 2007 Rafael J. Wysocki <rjw@sisk.pl>, GPL
1. Testing suspend to disk (STD) 1. Testing hibernation (aka suspend to disk or STD)
To verify that the STD works, you can try to suspend in the "reboot" mode: To check if hibernation works, you can try to hibernate in the "reboot" mode:
# echo reboot > /sys/power/disk # echo reboot > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
and the system should suspend, reboot, resume and get back to the command prompt and the system should create a hibernation image, reboot, resume and get back to
where you have started the transition. If that happens, the STD is most likely the command prompt where you have started the transition. If that happens,
to work correctly, but you need to repeat the test at least a couple of times in hibernation is most likely to work correctly. Still, you need to repeat the
a row for confidence. This is necessary, because some problems only show up on test at least a couple of times in a row for confidence. [This is necessary,
a second attempt at suspending and resuming the system. You should also test because some problems only show up on a second attempt at suspending and
the "platform" and "shutdown" modes of suspend: resuming the system.] Moreover, hibernating in the "reboot" and "shutdown"
modes causes the PM core to skip some platform-related callbacks which on ACPI
systems might be necessary to make hibernation work. Thus, if you machine fails
to hibernate or resume in the "reboot" mode, you should try the "platform" mode:
# echo platform > /sys/power/disk # echo platform > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
or which is the default and recommended mode of hibernation.
Unfortunately, the "platform" mode of hibernation does not work on some systems
with broken BIOSes. In such cases the "shutdown" mode of hibernation might
work:
# echo shutdown > /sys/power/disk # echo shutdown > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
in which cases you will have to press the power button to make the system (it is similar to the "reboot" mode, but it requires you to press the power
resume. If that does not work, you will need to identify what goes wrong. button to make the system resume).
a) Test mode of STD If neither "platform" nor "shutdown" hibernation mode works, you will need to
identify what goes wrong.
To verify if there are any drivers that cause problems you can run the STD a) Test modes of hibernation
in the test mode:
# echo test > /sys/power/disk To find out why hibernation fails on your system, you can use a special testing
facility available if the kernel is compiled with CONFIG_PM_DEBUG set. Then,
there is the file /sys/power/pm_test that can be used to make the hibernation
core run in a test mode. There are 5 test modes available:
freezer
- test the freezing of processes
devices
- test the freezing of processes and suspending of devices
platform
- test the freezing of processes, suspending of devices and platform
global control methods(*)
processors
- test the freezing of processes, suspending of devices, platform
global control methods(*) and the disabling of nonboot CPUs
core
- test the freezing of processes, suspending of devices, platform global
control methods(*), the disabling of nonboot CPUs and suspending of
platform/system devices
(*) the platform global control methods are only available on ACPI systems
and are only tested if the hibernation mode is set to "platform"
To use one of them it is necessary to write the corresponding string to
/sys/power/pm_test (eg. "devices" to test the freezing of processes and
suspending devices) and issue the standard hibernation commands. For example,
to use the "devices" test mode along with the "platform" mode of hibernation,
you should do the following:
# echo devices > /sys/power/pm_test
# echo platform > /sys/power/disk
# echo disk > /sys/power/state # echo disk > /sys/power/state
in which case the system should freeze tasks, suspend devices, disable nonboot Then, the kernel will try to freeze processes, suspend devices, wait 5 seconds,
CPUs (if any), wait for 5 seconds, enable nonboot CPUs, resume devices, thaw resume devices and thaw processes. If "platform" is written to
tasks and return to your command prompt. If that fails, most likely there is /sys/power/pm_test , then after suspending devices the kernel will additionally
a driver that fails to either suspend or resume (in the latter case the system invoke the global control methods (eg. ACPI global control methods) used to
may hang or be unstable after the test, so please take that into consideration). prepare the platform firmware for hibernation. Next, it will wait 5 seconds and
To find this driver, you can carry out a binary search according to the rules: invoke the platform (eg. ACPI) global methods used to cancel hibernation etc.
Writing "none" to /sys/power/pm_test causes the kernel to switch to the normal
hibernation/suspend operations. Also, when open for reading, /sys/power/pm_test
contains a space-separated list of all available tests (including "none" that
represents the normal functionality) in which the current test level is
indicated by square brackets.
Generally, as you can see, each test level is more "invasive" than the previous
one and the "core" level tests the hardware and drivers as deeply as possible
without creating a hibernation image. Obviously, if the "devices" test fails,
the "platform" test will fail as well and so on. Thus, as a rule of thumb, you
should try the test modes starting from "freezer", through "devices", "platform"
and "processors" up to "core" (repeat the test on each level a couple of times
to make sure that any random factors are avoided).
If the "freezer" test fails, there is a task that cannot be frozen (in that case
it usually is possible to identify the offending task by analysing the output of
dmesg obtained after the failing test). Failure at this level usually means
that there is a problem with the tasks freezer subsystem that should be
reported.
If the "devices" test fails, most likely there is a driver that cannot suspend
or resume its device (in the latter case the system may hang or become unstable
after the test, so please take that into consideration). To find this driver,
you can carry out a binary search according to the rules:
- if the test fails, unload a half of the drivers currently loaded and repeat - if the test fails, unload a half of the drivers currently loaded and repeat
(that would probably involve rebooting the system, so always note what drivers (that would probably involve rebooting the system, so always note what drivers
have been loaded before the test), have been loaded before the test),
@ -47,23 +113,46 @@ have been loaded before the test),
recently and repeat. recently and repeat.
Once you have found the failing driver (there can be more than just one of Once you have found the failing driver (there can be more than just one of
them), you have to unload it every time before the STD transition. In that case them), you have to unload it every time before hibernation. In that case please
please make sure to report the problem with the driver. make sure to report the problem with the driver.
It is also possible that a cycle can still fail after you have unloaded It is also possible that the "devices" test will still fail after you have
all modules. In that case, you would want to look in your kernel configuration unloaded all modules. In that case, you may want to look in your kernel
for the drivers that can be compiled as modules (testing again with them as configuration for the drivers that can be compiled as modules (and test again
modules), and possibly also try boot time options such as "noapic" or "noacpi". with these drivers compiled as modules). You may also try to use some special
kernel command line options such as "noapic", "noacpi" or even "acpi=off".
If the "platform" test fails, there is a problem with the handling of the
platform (eg. ACPI) firmware on your system. In that case the "platform" mode
of hibernation is not likely to work. You can try the "shutdown" mode, but that
is rather a poor man's workaround.
If the "processors" test fails, the disabling/enabling of nonboot CPUs does not
work (of course, this only may be an issue on SMP systems) and the problem
should be reported. In that case you can also try to switch the nonboot CPUs
off and on using the /sys/devices/system/cpu/cpu*/online sysfs attributes and
see if that works.
If the "core" test fails, which means that suspending of the system/platform
devices has failed (these devices are suspended on one CPU with interrupts off),
the problem is most probably hardware-related and serious, so it should be
reported.
A failure of any of the "platform", "processors" or "core" tests may cause your
system to hang or become unstable, so please beware. Such a failure usually
indicates a serious problem that very well may be related to the hardware, but
please report it anyway.
b) Testing minimal configuration b) Testing minimal configuration
If the test mode of STD works, you can boot the system with "init=/bin/bash" If all of the hibernation test modes work, you can boot the system with the
and attempt to suspend in the "reboot", "shutdown" and "platform" modes. If "init=/bin/bash" command line parameter and attempt to hibernate in the
that does not work, there probably is a problem with a driver statically "reboot", "shutdown" and "platform" modes. If that does not work, there
compiled into the kernel and you can try to compile more drivers as modules, probably is a problem with a driver statically compiled into the kernel and you
so that they can be tested individually. Otherwise, there is a problem with a can try to compile more drivers as modules, so that they can be tested
modular driver and you can find it by loading a half of the modules you normally individually. Otherwise, there is a problem with a modular driver and you can
use and binary searching in accordance with the algorithm: find it by loading a half of the modules you normally use and binary searching
in accordance with the algorithm:
- if there are n modules loaded and the attempt to suspend and resume fails, - if there are n modules loaded and the attempt to suspend and resume fails,
unload n/2 of the modules and try again (that would probably involve rebooting unload n/2 of the modules and try again (that would probably involve rebooting
the system), the system),
@ -71,19 +160,19 @@ the system),
load n/2 modules more and try again. load n/2 modules more and try again.
Again, if you find the offending module(s), it(they) must be unloaded every time Again, if you find the offending module(s), it(they) must be unloaded every time
before the STD transition, and please report the problem with it(them). before hibernation, and please report the problem with it(them).
c) Advanced debugging c) Advanced debugging
In case the STD does not work on your system even in the minimal configuration In case that hibernation does not work on your system even in the minimal
and compiling more drivers as modules is not practical or some modules cannot configuration and compiling more drivers as modules is not practical or some
be unloaded, you can use one of the more advanced debugging techniques to find modules cannot be unloaded, you can use one of the more advanced debugging
the problem. First, if there is a serial port in your box, you can boot the techniques to find the problem. First, if there is a serial port in your box,
kernel with the 'no_console_suspend' parameter and try to log kernel you can boot the kernel with the 'no_console_suspend' parameter and try to log
messages using the serial console. This may provide you with some information kernel messages using the serial console. This may provide you with some
about the reasons of the suspend (resume) failure. Alternatively, it may be information about the reasons of the suspend (resume) failure. Alternatively,
possible to use a FireWire port for debugging with firescope it may be possible to use a FireWire port for debugging with firescope
(ftp://ftp.firstfloor.org/pub/ak/firescope/). On i386 it is also possible to (ftp://ftp.firstfloor.org/pub/ak/firescope/). On x86 it is also possible to
use the PM_TRACE mechanism documented in Documentation/s2ram.txt . use the PM_TRACE mechanism documented in Documentation/s2ram.txt .
2. Testing suspend to RAM (STR) 2. Testing suspend to RAM (STR)
@ -91,16 +180,25 @@ use the PM_TRACE mechanism documented in Documentation/s2ram.txt .
To verify that the STR works, it is generally more convenient to use the s2ram To verify that the STR works, it is generally more convenient to use the s2ram
tool available from http://suspend.sf.net and documented at tool available from http://suspend.sf.net and documented at
http://en.opensuse.org/s2ram . However, before doing that it is recommended to http://en.opensuse.org/s2ram . However, before doing that it is recommended to
carry out the procedure described in section 1. carry out STR testing using the facility described in section 1.
Assume you have resolved the problems with the STD and you have found some Namely, after writing "freezer", "devices", "platform", "processors", or "core"
failing drivers. These drivers are also likely to fail during the STR or into /sys/power/pm_test (available if the kernel is compiled with
during the resume, so it is better to unload them every time before the STR CONFIG_PM_DEBUG set) the suspend code will work in the test mode corresponding
transition. Now, you can follow the instructions at to given string. The STR test modes are defined in the same way as for
http://en.opensuse.org/s2ram to test the system, but if it does not work hibernation, so please refer to Section 1 for more information about them. In
"out of the box", you may need to boot it with "init=/bin/bash" and test particular, the "core" test allows you to test everything except for the actual
s2ram in the minimal configuration. In that case, you may be able to search invocation of the platform firmware in order to put the system into the sleep
for failing drivers by following the procedure analogous to the one described in state.
1b). If you find some failing drivers, you will have to unload them every time
before the STR transition (ie. before you run s2ram), and please report the Among other things, the testing with the help of /sys/power/pm_test may allow
problems with them. you to identify drivers that fail to suspend or resume their devices. They
should be unloaded every time before an STR transition.
Next, you can follow the instructions at http://en.opensuse.org/s2ram to test
the system, but if it does not work "out of the box", you may need to boot it
with "init=/bin/bash" and test s2ram in the minimal configuration. In that
case, you may be able to search for failing drivers by following the procedure
analogous to the one described in section 1. If you find some failing drivers,
you will have to unload them every time before an STR transition (ie. before
you run s2ram), and please report the problems with them.

View file

@ -6,9 +6,9 @@ Testing suspend and resume support in device drivers
Unfortunately, to effectively test the support for the system-wide suspend and Unfortunately, to effectively test the support for the system-wide suspend and
resume transitions in a driver, it is necessary to suspend and resume a fully resume transitions in a driver, it is necessary to suspend and resume a fully
functional system with this driver loaded. Moreover, that should be done functional system with this driver loaded. Moreover, that should be done
several times, preferably several times in a row, and separately for the suspend several times, preferably several times in a row, and separately for hibernation
to disk (STD) and the suspend to RAM (STR) transitions, because each of these (aka suspend to disk or STD) and suspend to RAM (STR), because each of these
cases involves different ordering of operations and different interactions with cases involves slightly different operations and different interactions with
the machine's BIOS. the machine's BIOS.
Of course, for this purpose the test system has to be known to suspend and Of course, for this purpose the test system has to be known to suspend and
@ -22,20 +22,24 @@ for more information about the debugging of suspend/resume functionality.
Once you have resolved the suspend/resume-related problems with your test system Once you have resolved the suspend/resume-related problems with your test system
without the new driver, you are ready to test it: without the new driver, you are ready to test it:
a) Build the driver as a module, load it and try the STD in the test mode (see: a) Build the driver as a module, load it and try the test modes of hibernation
Documents/power/basic-pm-debugging.txt, 1a)). (see: Documents/power/basic-pm-debugging.txt, 1).
b) Load the driver and attempt to suspend to disk in the "reboot", "shutdown" b) Load the driver and attempt to hibernate in the "reboot", "shutdown" and
and "platform" modes (see: Documents/power/basic-pm-debugging.txt, 1). "platform" modes (see: Documents/power/basic-pm-debugging.txt, 1).
c) Compile the driver directly into the kernel and try the STD in the test mode. c) Compile the driver directly into the kernel and try the test modes of
hibernation.
d) Attempt to suspend to disk with the driver compiled directly into the kernel d) Attempt to hibernate with the driver compiled directly into the kernel
in the "reboot", "shutdown" and "platform" modes. in the "reboot", "shutdown" and "platform" modes.
e) Attempt to suspend to RAM using the s2ram tool with the driver loaded (see: e) Try the test modes of suspend (see: Documents/power/basic-pm-debugging.txt,
Documents/power/basic-pm-debugging.txt, 2). As far as the STR tests are 2). [As far as the STR tests are concerned, it should not matter whether or
concerned, it should not matter whether or not the driver is built as a module. not the driver is built as a module.]
f) Attempt to suspend to RAM using the s2ram tool with the driver loaded
(see: Documents/power/basic-pm-debugging.txt, 2).
Each of the above tests should be repeated several times and the STD tests Each of the above tests should be repeated several times and the STD tests
should be mixed with the STR tests. If any of them fails, the driver cannot be should be mixed with the STR tests. If any of them fails, the driver cannot be