This patch clean up/fixes for memcg's uncharge soft limit path.
Problems:
Now, res_counter_charge()/uncharge() handles softlimit information at
charge/uncharge and softlimit-check is done when event counter per memcg
goes over limit. Now, event counter per memcg is updated only when
memory usage is over soft limit. Here, considering hierarchical memcg
management, ancesotors should be taken care of.
Now, ancerstors(hierarchy) are handled in charge() but not in uncharge().
This is not good.
Prolems:
1. memcg's event counter incremented only when softlimit hits. That's bad.
It makes event counter hard to be reused for other purpose.
2. At uncharge, only the lowest level rescounter is handled. This is bug.
Because ancesotor's event counter is not incremented, children should
take care of them.
3. res_counter_uncharge()'s 3rd argument is NULL in most case.
ops under res_counter->lock should be small. No "if" sentense is better.
Fixes:
* Removed soft_limit_xx poitner and checks in charge and uncharge.
Do-check-only-when-necessary scheme works enough well without them.
* make event-counter of memcg incremented at every charge/uncharge.
(per-cpu area will be accessed soon anyway)
* All ancestors are checked at soft-limit-check. This is necessary because
ancesotor's event counter may never be modified. Then, they should be
checked at the same time.
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__css_put() doesn't check a bug as refcnt goes to minus.
I think it should be caught. This patch adds a check for it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__mem_cgroup_largest_soft_limit_node() returns a mem_cgroup_per_zone "mz"
with incremnted mz->mem->css's refcnt. Then, the caller of this function
has to call css_put(mz->mem->css).
But, mz can be !NULL even if "not found" i.e. without css_get(). By
this, css->refcnt will go down to minus.
This may cause various things...one of results will be
initite-loop in css_tryget() as this.
INFO: RCU detected CPU 0 stall (t=10000 jiffies)
sending NMI to all CPUs:
NMI backtrace for cpu 0
CPU 0:
<snip>
<<EOE>> <IRQ> [<ffffffff810884bd>] trace_hardirqs_off+0xd/0x10
[<ffffffff8102a940>] flat_send_IPI_mask+0x90/0xb0
[<ffffffff8102a9c9>] flat_send_IPI_all+0x69/0x70
[<ffffffff81027372>] arch_trigger_all_cpu_backtrace+0x62/0xa0
[<ffffffff810bff8e>] __rcu_pending+0x7e/0x370
[<ffffffff810c02c7>] rcu_check_callbacks+0x47/0x130
[<ffffffff81063a26>] update_process_times+0x46/0x70
[<ffffffff81085930>] tick_sched_timer+0x60/0x160
[<ffffffff810858d0>] ? tick_sched_timer+0x0/0x160
[<ffffffff8107a03a>] __run_hrtimer+0xba/0x150
[<ffffffff8107a325>] hrtimer_interrupt+0xd5/0x1b0
[<ffffffff81426dfe>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[<ffffffff8142cacd>] smp_apic_timer_interrupt+0x6d/0x9b
[<ffffffff8100cb33>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff811317b6>] ? mem_cgroup_walk_tree+0x156/0x180
[<ffffffff811316d3>] ? mem_cgroup_walk_tree+0x73/0x180
[<ffffffff81131692>] ? mem_cgroup_walk_tree+0x32/0x180
[<ffffffff81131a00>] ? mem_cgroup_get_local_stat+0x0/0x110
[<ffffffff81131d5b>] ? mem_control_stat_show+0x14b/0x330
[<ffffffff810a57fd>] ? cgroup_seqfile_show+0x3d/0x60
Above shows CPU0 caught in css_tryget()'s inifinite loop because
of bad refcnt.
This is a fix to set mz=NULL at the top of retry path.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some configurations of the Timberdale FPGA has the uartlite
included.
Signed-off-by: Richard Röjfors <richard.rojfors@mocean-labs.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch size comment is like so last millenium. Update it to modern
times.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Some manufacturers provide vendor information in non-vendor specific CIS
tuples. For example, Broadcom uses an Extended Function tuple to provide
the MAC address on some of their network cards, as in the case of the
Nintendo Wii WLAN daughter card.
This patch allows passing whitelisted FUNCE tuples unknown to the SDIO
core to a matching SDIO driver instead of rejecting them and failing.
Signed-off-by: Albert Herranz <albert_herranz@yahoo.es>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The page_address_in_vma() is not only used in unuse_vma().
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just like ip_fast_csum, the assembly snippet in csum_ipv6_magic needs a
memory clobber, as it is only passed the address of the buffer, not a
memory reference to the buffer itself.
This caused failures in Hurd's pfinetv4 when we tried to compile it with
gcc-4.3 (bogus checksums).
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: "David S. Miller" <davem@davemloft.net>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix some build failures when using gcc-4.x for MN10300.
Firstly, __get_user() fails to build because the pointer points to a const and
__gu_val ends up being read-only:
In file included from include/linux/mempolicy.h:62,
from init/main.c:50:
include/linux/pagemap.h: In function 'fault_in_pages_readable':
include/linux/pagemap.h:394: error: read-only variable '__gu_val' used as 'asm' output
include/linux/pagemap.h:394: error: read-only variable '__gu_val' used as 'asm' output
include/linux/pagemap.h:394: error: read-only variable '__gu_val' used as 'asm' output
include/linux/pagemap.h:400: error: read-only variable '__gu_val' used as 'asm' output
include/linux/pagemap.h:400: error: read-only variable '__gu_val' used as 'asm' output
include/linux/pagemap.h:400: error: read-only variable '__gu_val' used as 'asm' output
make[1]: *** [init/main.o] Error 1
Secondly, gcc-4 doesn't allow casts of lvalues:
UPD include/linux/compile.h
arch/mn10300/kernel/rtc.c: In function 'calibrate_clock':
arch/mn10300/kernel/rtc.c:170: error: lvalue required as left operand of assignment
arch/mn10300/kernel/rtc.c:172: error: lvalue required as left operand of assignment
make[1]: *** [arch/mn10300/kernel/rtc.o] Error 1
These are seen with gcc 4.2.1.
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Marek Vasut <marek.vasut@gmail.com>
Acked-by: Tomas Cech <sleep_walker@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Revert 45d80eea87 ("m68k: convert to
asm-generic/hardirq.h") - it fails to compile due to an inclusion tangle:
In file included from include/linux/irq.h:12,
from include/asm-generic/hardirq.h:6,
from /usr/src/devel/arch/m68k/include/asm/hardirq_mm.h:6,
from /usr/src/devel/arch/m68k/include/asm/hardirq.h:4,
from include/linux/hardirq.h:10,
from /usr/src/devel/arch/m68k/include/asm/system_mm.h:69,
from /usr/src/devel/arch/m68k/include/asm/system.h:4,
from include/linux/list.h:7,
from include/linux/preempt.h:11,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:29,
from include/linux/time.h:8,
from include/linux/timex.h:56,
from include/linux/sched.h:56,
from arch/m68k/kernel/asm-offsets.c:14:
include/linux/smp.h:17: error: field 'list' has incomplete type
Cc: Christoph Hellwig <hch@lst.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The asm-generic/gpio.h header uses the might_sleep() macro but doesn't
include the header for it, so any source code that might include
linux/gpio.h before linux/kernel.h can easily lead to a build failure.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
drivers/input/input.c:1277: warning: 'input_dev_reset' defined but not used
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Starting from commit 4a4962263f "reduce
symbol table for loaded modules (v2)", the kernel/module.c build is broken
with CONFIG_KALLSYMS disabled.
CC kernel/module.o
kernel/module.c:1995: warning: type defaults to 'int' in declaration of 'Elf_Hdr'
kernel/module.c:1995: error: expected ';', ',' or ')' before '*' token
kernel/module.c: In function 'load_module':
kernel/module.c:2203: error: 'strmap' undeclared (first use in this function)
kernel/module.c:2203: error: (Each undeclared identifier is reported only once
kernel/module.c:2203: error: for each function it appears in.)
kernel/module.c:2239: error: 'symoffs' undeclared (first use in this function)
kernel/module.c:2239: error: implicit declaration of function 'layout_symtab'
kernel/module.c:2240: error: 'stroffs' undeclared (first use in this function)
make[1]: *** [kernel/module.o] Error 1
make: *** [kernel/module.o] Error 2
There are three different issues:
- layout_symtab() takes a const Elf_Ehdr
- layout_symtab() needs to return a value
- symoffs/stroffs/strmap are referenced by the load_module() code
despite being ifdefed out, which seems unnecessary given the noop
behaviour of layout_symtab()/add_kallsyms() in the case of
CONFIG_KALLSYMS=n.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In ax25_make_new, if kmemdup of digipeat returns an error, there would
be an oops in sk_free while calling sk_destruct, because sk_protinfo
is NULL at the moment; move sk->sk_destruct initialization after this.
BTW of reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since commit 9b22ea5609
( net: fix packet socket delivery in rx irq handler )
We lost rx timestamping of packets received on accelerated vlans.
Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps
too late (ie at skb dequeueing time, not at skb queueing time)
14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1
14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1
14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2
14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2
14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3
14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
port_mutex was unlocked twice.
Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When requesting all prl entries (kprl.addr == INADDR_ANY) and there are
more prl entries than there is space passed from userspace, the existing
code would always copy cmax+1 entries, which is more than can be handled.
This patch makes the kernel copy only exactly cmax entries.
Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de>
Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
opens a window in sock_wfree() where another cpu
might free the socket we are working on.
A fix is to call sk->sk_write_space(sk) while still
holding a reference on sk.
Reported-by: Jike Song <albcamus@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This provides safety against negative optlen at the type
level instead of depending upon (sometimes non-trivial)
checks against this sprinkled all over the the place, in
each and every implementation.
Based upon work done by Arjan van de Ven and feedback
from Linus Torvalds.
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched_clock: Fix atomicity/continuity bug by using cmpxchg64()
x86: Provide an alternative() based cmpxchg64()
Commit def0a9b257 (sched_clock: Make it NMI safe) assumed
cmpxchg() of 64bit values was available on X86_32.
That is not so - and causes some subtle scheduler misbehavior due
to incorrect timestamps off to up by ~4 seconds.
Two symptoms are known right now:
- interactivity problems seen by Arjan: up to 600 msecs
latencies instead of the expected 20-40 msecs. These
latencies are very visible on the desktop.
- incorrect CPU stats: occasionally too high percentages in 'top',
and crazy CPU usage stats.
Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cmpxchg64() today generates, to quote Linus, "barf bag" code.
cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.
This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.
Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
arch/mips/include/asm/unaligned.h: linux/unaligned/generic.h is included more than once.
Entirely legitimate but just noise.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Produce an error if request_irq() fails.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: "Ithamar R. Adema" <ithamar.adema@team-embedded.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
There are 16 individual channels (NUM_DBDMA_CHANS) to save/restore plus the
global ddma block config (the +1). The last register in a channel can be
skipped since it's read-only (at offset 0x18).
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Manuel Lauss <manuel.lauss@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Patch 14275ccdb1e4b487cca745aba994699c426a31ee and
d5dedd4507 are conflicting and the
conflict was resolved badly in merge
92241940be501f798cb21db344bbb3d1ec3c4f1c resulting in the BCM1480 changes
of 14275ccdb1e4b487cca745aba994699c426a31ee getting lost. Sort out the
damage.
Reported and initial patch by Mark Mason <mmason@upwardaccess.com>.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
The definition of the irq_ipi structure has two initializations of the
flags field. This combines them.
[Ralf: The issue was originally introduced by commit
be4894196d79455f420dd7bb78be7dc73bec115c (linux-mips.org) rsp.
033890b084 (kernel.org). The original
intention of the code was to initialize .flags with both flags ored together.
The broken C code as actually implemented will be compiled by an equally
broken gcc to use only the last initialization, that is IRQF_PERCPU
which means this turned into an SMTC bug for 2.6.23 and newer.]
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r@
identifier I, s, fld;
position p0,p;
expression E;
@@
struct I s =@p0 { ... .fld@p = E, ...};
@s@
identifier I, s, r.fld;
position r.p0,p;
expression E;
@@
struct I s =@p0 { ... .fld@p = E, ...};
@script:python@
p0 << r.p0;
fld << r.fld;
ps << s.p;
pr << r.p;
@@
if int(ps[0].line)!=int(pr[0].line) or int(ps[0].column)!=int(pr[0].column):
cocci.print_main(fld,p0)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix missing initialization of i_dir_start_lookup member
nilfs2: fix missing zero-fill initialization of btree node cache
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: Fix time encoding with extra epoch bits
ext4: Add a stub for mpage_da_data in the trace header
jbd2: Use tracepoints for history file
ext4: Use tracepoints for mb_history trace file
ext4, jbd2: Drop unneeded printks at mount and unmount time
ext4: Handle nested ext4_journal_start/stop calls without a journal
ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode
ext4: Avoid updating the inode table bh twice in no journal mode
ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first
ext4: async direct IO for holes and fallocate support
ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O
ext4: Split uninitialized extents for direct I/O
ext4: release reserved quota when block reservation for delalloc retry
ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks
ext4: Fix hueristic which avoids group preallocation for closed files
ext4: Use ext4_msg() for ext4_da_writepage() errors
ext4: Update documentation about quota mount options
* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: Check s_dirt in fat_sync_fs()
vfat: change the default from shortname=lower to shortname=mixed
fat/nls: Fix handling of utf8 invalid char
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM / yenta: Fix cardbus suspend/resume regression
PM / PCMCIA: Drop second argument of pcmcia_socket_dev_suspend()
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
sony-laptop: re-read the rfkill state when resuming from suspend
sony-laptop: check for rfkill hard block at load time
wext: add back wireless/ dir in sysfs for cfg80211 interfaces
wext: Add bound checks for copy_from_user
mac80211: improve/fix mlme messages
cfg80211: always get BSS
iwlwifi: fix 3945 ucode info retrieval after failure
iwlwifi: fix memory leak in command queue handling
iwlwifi: fix debugfs buffer handling
cfg80211: don't set privacy w/o key
cfg80211: wext: don't display BSSID unless associated
net: Add explicit bound checks in net/socket.c
bridge: Fix double-free in br_add_if.
isdn: fix netjet/isdnhdlc build errors
atm: dereference of he_dev->rbps_virt in he_init_group()
ax25: Add missing dev_put in ax25_setsockopt
Revert "sit: stateless autoconf for isatap"
net: fix double skb free in dcbnl
net: fix nlmsg len size for skb when error bit is set.
net: fix vlan_get_size to include vlan_flags size
...