aha/mm
Paul Jackson 3e0d98b9f1 [PATCH] cpuset: memory pressure meter
Provide a simple per-cpuset metric of memory pressure, tracking the -rate-
that the tasks in a cpuset call try_to_free_pages(), the synchronous
(direct) memory reclaim code.

This enables batch managers monitoring jobs running in dedicated cpusets to
efficiently detect what level of memory pressure that job is causing.

This is useful both on tightly managed systems running a wide mix of
submitted jobs, which may choose to terminate or reprioritize jobs that are
trying to use more memory than allowed on the nodes assigned them, and with
tightly coupled, long running, massively parallel scientific computing jobs
that will dramatically fail to meet required performance goals if they
start to use more memory than allowed to them.

This patch just provides a very economical way for the batch manager to
monitor a cpuset for signs of memory pressure.  It's up to the batch
manager or other user code to decide what to do about it and take action.

==> Unless this feature is enabled by writing "1" to the special file
    /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
    code of __alloc_pages() for this metric reduces to simply noticing
    that the cpuset_memory_pressure_enabled flag is zero.  So only
    systems that enable this feature will compute the metric.

Why a per-cpuset, running average:

    Because this meter is per-cpuset, rather than per-task or mm, the
    system load imposed by a batch scheduler monitoring this metric is
    sharply reduced on large systems, because a scan of the tasklist can be
    avoided on each set of queries.

    Because this meter is a running average, instead of an accumulating
    counter, a batch scheduler can detect memory pressure with a single
    read, instead of having to read and accumulate results for a period of
    time.

    Because this meter is per-cpuset rather than per-task or mm, the
    batch scheduler can obtain the key information, memory pressure in a
    cpuset, with a single read, rather than having to query and accumulate
    results over all the (dynamically changing) set of tasks in the cpuset.

A per-cpuset simple digital filter (requires a spinlock and 3 words of data
per-cpuset) is kept, and updated by any task attached to that cpuset, if it
enters the synchronous (direct) page reclaim code.

A per-cpuset file provides an integer number representing the recent
(half-life of 10 seconds) rate of direct page reclaims caused by the tasks
in the cpuset, in units of reclaims attempted per second, times 1000.

Signed-off-by: Paul Jackson <pj@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-01-08 20:13:42 -08:00
..
bootmem.c [PATCH] FRV: Clean up bootmem allocator's page freeing algorithm 2006-01-06 08:33:26 -08:00
fadvise.c [PATCH] xip: madvice/fadvice: execute in place 2005-06-24 00:06:42 -07:00
filemap.c [PATCH] find_lock_page(): call __lock_page() directly. 2006-01-06 08:33:26 -08:00
filemap.h [PATCH] xip: reduce code duplication 2005-06-24 00:06:41 -07:00
filemap_xip.c [PATCH] mm: rmap with inner ptlock 2005-10-29 21:40:41 -07:00
fremap.c VM: add common helper function to create the page tables 2005-11-29 14:03:14 -08:00
highmem.c [PATCH] gfp_t: the rest 2005-10-28 08:16:51 -07:00
hugetlb.c [PATCH] mm: make hugepages obey cpusets. 2006-01-08 20:12:43 -08:00
internal.h [PATCH] FRV: Clean up bootmem allocator's page freeing algorithm 2006-01-06 08:33:26 -08:00
Kconfig [PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support 2006-01-08 20:12:41 -08:00
madvise.c [PATCH] madvise(MADV_REMOVE): remove pages from tmpfs shm backing store 2006-01-06 08:33:22 -08:00
Makefile [PATCH] slob: introduce the SLOB allocator 2006-01-08 20:13:41 -08:00
memory.c [PATCH] mm: pfault optimisation 2006-01-06 08:33:27 -08:00
memory_hotplug.c [PATCH] memhotplug: __add_section remove unused pgdat definition 2006-01-06 08:33:21 -08:00
mempolicy.c [PATCH] cpuset: mempolicy one more nodemask conversion 2006-01-08 20:13:42 -08:00
mempool.c [PATCH] gfp_t: mm/* (easy parts) 2005-10-28 08:16:47 -07:00
mincore.c [PATCH] freepgt: sys_mincore ignore FIRST_USER_PGD_NR 2005-04-19 13:29:20 -07:00
mlock.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
mmap.c Make sure we copy pages inserted with "vm_insert_page()" on fork 2005-12-16 10:21:23 -08:00
mprotect.c [PATCH] unpaged: private write VM_RESERVED 2005-11-22 09:13:42 -08:00
mremap.c Make sure we copy pages inserted with "vm_insert_page()" on fork 2005-12-16 10:21:23 -08:00
msync.c mm: re-architect the VM_UNPAGED logic 2005-11-28 14:34:23 -08:00
nommu.c [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU 2006-01-06 08:33:32 -08:00
oom_kill.c [PATCH] Optimise oom kill of current task 2006-01-08 20:12:45 -08:00
page-writeback.c identify multipage ->writepages() calls 2006-01-06 14:58:38 -05:00
page_alloc.c [PATCH] cpuset: memory pressure meter 2006-01-08 20:13:42 -08:00
page_io.c [PATCH] mm: split page table lock 2005-10-29 21:40:42 -07:00
pdflush.c [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap 2006-01-08 20:12:41 -08:00
prio_tree.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
readahead.c [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE 2006-01-03 11:45:42 -08:00
rmap.c [PATCH] rmap: additional diagnostics in page_remove_rmap() 2006-01-08 20:12:44 -08:00
shmem.c [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU 2006-01-06 08:33:32 -08:00
slab.c [PATCH] slob: introduce mm/util.c for shared functions 2006-01-08 20:13:41 -08:00
slob.c [PATCH] slob: introduce the SLOB allocator 2006-01-08 20:13:41 -08:00
sparse.c [PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp macros 2006-01-08 20:13:38 -08:00
swap.c [PATCH] consolidate lru_add_drain() and lru_drain_cache() 2006-01-06 08:33:28 -08:00
swap_state.c [PATCH] SwapMig: add_to_swap() avoid atomic allocations 2006-01-08 20:12:42 -08:00
swapfile.c [PATCH] mm: clean up local variables 2006-01-08 20:12:43 -08:00
thrash.c [PATCH] temporarily disable swap token on memory pressure 2005-11-28 14:42:25 -08:00
tiny-shmem.c [PATCH] NOMMU: Make SYSV IPC SHM use ramfs facilities on NOMMU 2006-01-06 08:33:32 -08:00
truncate.c [PATCH] drop-pagecache 2006-01-08 20:12:40 -08:00
util.c [PATCH] slob: introduce mm/util.c for shared functions 2006-01-08 20:13:41 -08:00
vmalloc.c [PATCH] kernel-doc: fix warnings in vmalloc.c 2005-11-07 07:53:56 -08:00
vmscan.c [PATCH] SwapMig: Switch error handling in migrate_pages to use -Exx 2006-01-08 20:12:42 -08:00