* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFSv4: Fix a regression in the NFSv4 state manager
NFSv4: Release the sequence id before restarting a CLOSE rpc call
nfs41: fix session fore channel negotiation
nfs41: do not zero seqid portion of stateid on close
nfs: run state manager in privileged mode
nfs: make recovery state manager operations privileged
nfs: enforce FIFO ordering of operations trying to acquire slot
rpc: add a new priority in RPC task
nfs: remove rpc_task argument from nfs4_find_slot
rpc: add rpc_queue_empty function
nfs: change nfs4_do_setlk params to identify recovery type
nfs: do not do a LOOKUP after open
nfs: minor cleanup of session draining
* 'module' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
modpost: fix segfault with short symbol names
module: handle ppc64 relocating kcrctabs when CONFIG_RELOCATABLE=y
Kbuild: clear marker out of modpost
module: make MODULE_SYMBOL_PREFIX into a CONFIG option
ARM: unexport symbols used to implement floating point emulation
ARM: use unified discard definition in linker script
x86: don't export inline function
sparc64: don't export static inline pci_ functions
* 'for-2.6.33' of git://linux-nfs.org/~bfields/linux: (42 commits)
nfsd: remove pointless paths in file headers
nfsd: move most of nfsfh.h to fs/nfsd
nfsd: remove unused field rq_reffh
nfsd: enable V4ROOT exports
nfsd: make V4ROOT exports read-only
nfsd: restrict filehandles accepted in V4ROOT case
nfsd: allow exports of symlinks
nfsd: filter readdir results in V4ROOT case
nfsd: filter lookup results in V4ROOT case
nfsd4: don't continue "under" mounts in V4ROOT case
nfsd: introduce export flag for v4 pseudoroot
nfsd: let "insecure" flag vary by pseudoflavor
nfsd: new interface to advertise export features
nfsd: Move private headers to source directory
vfs: nfsctl.c un-used nfsd #includes
lockd: Remove un-used nfsd headers #includes
s390: remove un-used nfsd #includes
sparc: remove un-used nfsd #includes
parsic: remove un-used nfsd #includes
compat.c: Remove dependence on nfsd private headers
...
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (26 commits)
net: sh_eth alignment fix for sh7724 using NET_IP_ALIGN V2
ixgbe: allow tx of pre-formatted vlan tagged packets
ixgbe: Fix 82598 premature copper PHY link indicatation
ixgbe: Fix tx_restart_queue/non_eop_desc statistics counters
bcm63xx_enet: fix compilation failure after get_stats_count removal
packet: dont call sleeping functions while holding rcu_read_lock()
tcp: Revert per-route SACK/DSACK/TIMESTAMP changes.
ipvs: zero usvc and udest
netfilter: fix crashes in bridge netfilter caused by fragment jumps
ipv6: reassembly: use seperate reassembly queues for conntrack and local delivery
sky2: leave PCI config space writeable
sky2: print Optima chip name
x25: Update maintainer.
ipvs: fix synchronization on connection close
netfilter: xtables: document minimal required version
drivers/net/bonding/: : use pr_fmt
can: CAN_MCP251X should depend on HAS_DMA
drivers/net/usb: Correct code taking the size of a pointer
drivers/net/cpmac.c: Correct code taking the size of a pointer
drivers/net/sfc: Correct code taking the size of a pointer
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (22 commits)
Input: ALPS - add interleaved protocol support (Dell E6x00 series)
Input: keyboard - don't override beep with a bell
Input: altera_ps2 - fix test of unsigned in altera_ps2_probe()
Input: add mc13783 touchscreen driver
Input: ep93xx_keypad - update driver to new core support
Input: wacom - separate pen from express keys on Graphire
Input: wacom - add defines for data packet report IDs
Input: wacom - add support for new LCD tablets
Input: wacom - add defines for packet lengths of various devices
Input: wacom - ensure the device is initialized properly upon resume
Input: at32psif - do not sleep in atomic context
Input: i8042 - add Gigabyte M1022M to the noloop list
Input: i8042 - allow installing platform filters for incoming data
Input: i8042 - fix locking in interrupt routine
Input: ALPS - do not set REL_X/REL_Y capabilities on the touchpad
Input: document use of input_event() function
Input: sa1111ps2 - annotate probe() and remove() methods
Input: ambakmi - annotate probe() and remove() methods
Input: gscps2 - fix probe() and remove() annotations
Input: altera_ps2 - add annotations to probe and remove methods
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (33 commits)
sh: Fix test of unsigned in se7722_irq_demux()
sh: mach-ecovec24: Add FSI sound support
sh: mach-ecovec24: Add mt9t112 camera support
sh: mach-ecovec24: Add tw9910 support
sh: MSIOF/mmc_spi platform data for the Ecovec24 board
sh: ms7724se: Add ak4642 support
sh: Fix up FPU build for SH5
sh: Remove old early serial console code V2
sh: sh5 scif pdata (sh5-101/sh5-103)
sh: sh4a scif pdata (sh7757/sh7763/sh7770/sh7780/sh7785/sh7786/x3)
sh: sh4a scif pdata (sh7343/sh7366/sh7722/sh7723/sh7724)
sh: sh4 scif pdata (sh7750/sh7760/sh4-202)
sh: sh3 scif pdata (sh7705/sh770x/sh7710/sh7720)
sh: sh2a scif pdata (sh7201/sh7203/sh7206/mxg)
sh: sh2 scif pdata (sh7616)
sh-sci: Extend sh-sci driver with early console V2
sh: Stub in P3 ioremap support for nommu parts.
sh: wire up vmallocinfo support in ioremap() implementations.
sh: Make the unaligned trap handler always obey notification levels.
sh: Couple kernel and user write page perm bits for CONFIG_X2TLB
...
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
ppc440spe-adma: adds updated ppc440spe adma driver
iop-adma.c: use resource_size()
dmaengine: clarify the meaning of the DMA_CTRL_ACK flag
sh: stylistic improvements for the DMA driver
dmaengine: fix dmatest to verify minimum transfer length and test buffer size
sh: DMA driver has to specify its alignment requirements
Add COH 901 318 DMA block driver v5
* git://git.infradead.org/mtd-2.6: (90 commits)
jffs2: Fix long-standing bug with symlink garbage collection.
mtd: OneNAND: Fix test of unsigned in onenand_otp_walk()
mtd: cfi_cmdset_0002, fix lock imbalance
Revert "mtd: move mxcnd_remove to .exit.text"
mtd: m25p80: add support for Macronix MX25L4005A
kmsg_dump: fix build for CONFIG_PRINTK=n
mtd: nandsim: add support for 4KiB pages
mtd: mtdoops: refactor as a kmsg_dumper
mtd: mtdoops: make record size configurable
mtd: mtdoops: limit the maximum mtd partition size
mtd: mtdoops: keep track of used/unused pages in an array
mtd: mtdoops: several minor cleanups
core: Add kernel message dumper to call on oopses and panics
mtd: add ARM pismo support
mtd: pxa3xx_nand: Fix PIO data transfer
mtd: nand: fix multi-chip suspend problem
mtd: add support for switching old SST chips into QRY mode
mtd: fix M29W800D dev_id and uaddr
mtd: don't use PF_MEMALLOC
mtd: Add bad block table overrides to Davinci NAND driver
...
Fixed up conflicts (mostly trivial) in
drivers/mtd/devices/m25p80.c
drivers/mtd/maps/pcmciamtd.c
drivers/mtd/nand/pxa3xx_nand.c
kernel/printk.c
* git://git.infradead.org/iommu-2.6:
implement early_io{re,un}map for ia64
Revert "Intel IOMMU: Avoid memory allocation failures in dma map api calls"
intel-iommu: ignore page table validation in pass through mode
intel-iommu: Fix oops with intel_iommu=igfx_off
intel-iommu: Check for an RMRR which ends before it starts.
intel-iommu: Apply BIOS sanity checks for interrupt remapping too.
intel-iommu: Detect DMAR in hyperspace at probe time.
dmar: Fix build failure without NUMA, warn on bogus RHSA tables and don't abort
iommu: Allocate dma-remapping structures using numa locality info
intr_remap: Allocate intr-remapping table using numa locality info
dmar: Allocate queued invalidation structure using numa locality info
dmar: support for parsing Remapping Hardware Static Affinity structure
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6: (116 commits)
V4L/DVB (13698): pms: replace asm/uaccess.h to linux/uaccess.h
V4L/DVB (13690): radio/si470x: #include <sched.h>
V4L/DVB (13688): au8522: modify the attributes of local filter coefficients
V4L/DVB (13687): cx231xx: use NULL when pointer is needed
V4L/DVB: Davinci VPFE Capture: remove unused #include <linux/version.h>
V4L/DVB (13685): Correct code taking the size of a pointer
V4L/DVB (13684): Fix some cut-and-paste noise in dib0090.h
V4L/DVB (13683): sanio-ms: clean up init, exit and id_table
V4L/DVB (13682): dib8000: make some constant static
V4L/DVB: lgs8gxx: Use shifts rather than multiply/divide when possible
V4L/DVB (13680b): DocBook/media: create links for included sources
V4L/DVB (13680a): DocBook/media: copy images after building HTML
V4L/DVB (13678): Add support for yet another DvbWorld, TeVii and Prof USB devices
V4L/DVB (13676): configurable IRQ mode on NetUP Dual DVB-S2 CI; IRQ from CAM processing (CI interface works faster)
V4L/DVB (13674): stv090x: Add DiSEqC envelope mode
V4L/DVB (13673): lnbp21: Implement 22 kHz tone control
V4L/DVB (13671): sh_mobile_ceu_camera: Remove frame size page alignment
V4L/DVB (13670): soc-camera: Add mt9t112 camera driver
V4L/DVB (13669): tw9910: Add sync polarity support
V4L/DVB (13668): tw9910: remove cropping
...
* akpm: (173 commits)
genalloc: use bitmap_find_next_zero_area
ia64: use bitmap_find_next_zero_area
sparc: use bitmap_find_next_zero_area
mlx4: use bitmap_find_next_zero_area
isp1362-hcd: use bitmap_find_next_zero_area
iommu-helper: use bitmap library
bitmap: introduce bitmap_set, bitmap_clear, bitmap_find_next_zero_area
qnx4: use hweight8
qnx4fs: remove remains of the (defunct) write support
resource: constify arg to resource_size() and resource_type()
gru: send cross partition interrupts using the gru
gru: function to generate chipset IPI values
gru: update driver version number
gru: improve GRU TLB dropin statistics
gru: fix GRU interrupt race at deallocate
gru: add hugepage support
gru: fix bug in allocation of kernel contexts
gru: update GRU structures to match latest hardware spec
gru: check for correct GRU chiplet assignment
gru: remove stray local_irq_enable
...
Use bitmap library and kill some unused iommu helper functions.
1. s/iommu_area_free/bitmap_clear/
2. s/iommu_area_reserve/bitmap_set/
3. Use bitmap_find_next_zero_area instead of find_next_zero_area
This cannot be simple substitution because find_next_zero_area
doesn't check the last bit of the limit in bitmap
4. Remove iommu_area_free, iommu_area_reserve, and find_next_zero_area
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This introduces new bitmap functions:
bitmap_set: Set specified bit area
bitmap_clear: Clear specified bit area
bitmap_find_next_zero_area: Find free bit area
These are mostly stolen from iommu helper. The differences are:
- Use find_next_bit instead of doing test_bit for each bit
- Rewrite bitmap_set and bitmap_clear
Instead of setting or clearing for each bit.
- Check the last bit of the limit
iommu-helper doesn't want to find such area
- The return value if there is no zero area
find_next_zero_area in iommu helper: returns -1
bitmap_find_next_zero_area: return >= bitmap size
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Lothar Wassmann <LW@KARO-electronics.de>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
resource_size() doesn't change the resource it operates on, so the res
parameter can be marked const. Same for resource_type().
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently the locking in blockdev_direct_IO is a mess, we have three
different locking types and very confusing checks for some of them. The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.
This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING. The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks. There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.
Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag. Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.
Also revamp the documentation of the locking scheme to actually make
sense.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Elder <aelder@sgi.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Don't know the reason, but it appears ki_wait field of iocb never gets used.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement shrinking the reserved memory for crash kernel, if it is more
than enough.
For example, if you have already reserved 128M, now you just want 100M,
you can do:
# echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size
Note, you can only do this before loading the crash kernel.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit
machines have more memory than 32bit machines.
Making it higher on 64bit seems reasonable, and keep the original number
on 32bit.
Acked-by: Serge E. Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Based on Nick's findings:
sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores. Atomic operations that affect multiple semaphores are
supported.
The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.
Note: this patch does not make sense by itself, the new list is used
nowhere.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No changes in compiled code. The patch adds the new helper, si_fromuser()
and changes check_kill_permission() to use this helper.
The real effect of this patch is that from now we "officially" consider
SEND_SIG_NOINFO signal as "from user-space" signals. This is already true
if we look at the code which uses SEND_SIG_NOINFO, except __send_signal()
has another opinion - see the next patch.
The naming of these special SEND_SIG_XXX siginfo's is really bad
imho. From __send_signal()'s pov they mean
SEND_SIG_NOINFO from user
SEND_SIG_PRIV from kernel
SEND_SIG_FORCED no info
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Reviewed-by: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested by Roland.
Change tracehook_report_syscall_exit() to look at step flag and send the
trap signal if needed.
This change affects ia64, microblaze, parisc, powerpc, sh. They pass
nonzero "step" argument to tracehook but since it was ignored the tracee
reports via ptrace_notify(), this is not right and not consistent.
- PTRACE_SETSIGINFO doesn't work
- if the tracer resumes the tracee with signr != 0 the new signal
is generated rather than delivering it
- If PT_TRACESYSGOOD is set the tracee reports the wrong exit_code
I don't have a powerpc machine, but I think this test-case should see the
difference:
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <assert.h>
#include <stdio.h>
int main(void)
{
int pid, status;
if (!(pid = fork())) {
assert(ptrace(PTRACE_TRACEME) == 0);
kill(getpid(), SIGSTOP);
getppid();
return 0;
}
assert(pid == wait(&status));
assert(ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_TRACESYSGOOD) == 0);
assert(ptrace(PTRACE_SYSCALL, pid, 0,0) == 0);
assert(pid == wait(&status));
assert(ptrace(PTRACE_SINGLESTEP, pid, 0,0) == 0);
assert(pid == wait(&status));
if (status == 0x57F)
return 0;
printf("kernel bug: status=%X shouldn't have 0x80\n", status);
return 1;
}
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested by Roland.
Currently there is no way to synthesize a single-stepping trap in the
arch-independent manner. This patch adds the default helper which fills
siginfo_t, arch/ can can override it.
Architetures which implement user_enable_single_step() should add
user_single_step_siginfo() also.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes.
ptrace_init_task() looks confusing, as if we always auto-attach when "bool
ptrace" argument is true, while in fact we attach only if current is
traced.
Make the code more explicit and kill now unused ptrace_link().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
mem_cgroup_move_parent() calls try_charge first and cancel_charge on
failure. IMHO, charge/uncharge(especially charge) is high cost operation,
so we should avoid it as far as possible.
This patch tries to delay try_charge in mem_cgroup_move_parent() by
re-ordering checks it does.
And this patch renames mem_cgroup_move_account() to
__mem_cgroup_move_account(), changes the return value of
__mem_cgroup_move_account() from int to void, and adds a new
wrapper(mem_cgroup_move_account()), which checks whether a @pc is valid
for moving account and calls __mem_cgroup_move_account().
This patch removes the last caller of trylock_page_cgroup(), so removes
its definition too.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In global VM, FILE_MAPPED is used but memcg uses MAPPED_FILE. This makes
grep difficult. Replace memcg's MAPPED_FILE with FILE_MAPPED
And in global VM, mapped shared memory is accounted into FILE_MAPPED.
But memcg doesn't. fix it.
Note:
page_is_file_cache() just checks SwapBacked or not.
So, we need to check PageAnon.
Cc: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In massive parallel enviroment, res_counter can be a performance
bottleneck. One strong techinque to reduce lock contention is reducing
calls by coalescing some amount of calls into one.
Considering charge/uncharge chatacteristic,
- charge is done one by one via demand-paging.
- uncharge is done by
- in chunk at munmap, truncate, exit, execve...
- one by one via vmscan/paging.
It seems we have a chance to coalesce uncharges for improving scalability
at unmap/truncation.
This patch is a for coalescing uncharge. For avoiding scattering memcg's
structure to functions under /mm, this patch adds memcg batch uncharge
information to the task. A reason for per-task batching is for making use
of caller's context information. We do batched uncharge (deleyed
uncharge) when truncation/unmap occurs but do direct uncharge when
uncharge is called by memory reclaim (vmscan.c).
The degree of coalescing depends on callers
- at invalidate/trucate... pagevec size
- at unmap ....ZAP_BLOCK_SIZE
(memory itself will be freed in this degree.)
Then, we'll not coalescing too much.
On x86-64 8cpu server, I tested overheads of memcg at page fault by
running a program which does map/fault/unmap in a loop. Running
a task per a cpu by taskset and see sum of the number of page faults
in 60secs.
[without memcg config]
40156968 page-faults # 0.085 M/sec ( +- 0.046% )
27.67 cache-miss/faults
[root cgroup]
36659599 page-faults # 0.077 M/sec ( +- 0.247% )
31.58 miss/faults
[in a child cgroup]
18444157 page-faults # 0.039 M/sec ( +- 0.133% )
69.96 miss/faults
[child with this patch]
27133719 page-faults # 0.057 M/sec ( +- 0.155% )
47.16 miss/faults
We can see some amounts of improvement.
(root cgroup doesn't affected by this patch)
Another patch for "charge" will follow this and above will be improved more.
Changelog(since 2009/10/02):
- renamed filed of memcg_batch (as pages to bytes, memsw to memsw_bytes)
- some clean up and commentary/description updates.
- added initialize code to copy_process(). (possible bug fix)
Changelog(old):
- fixed !CONFIG_MEM_CGROUP case.
- rebased onto the latest mmotm + softlimit fix patches.
- unified patch for callers
- added commetns.
- make ->do_batch as bool.
- removed css_get() at el. We don't need it.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* small define cleanup in header
* fix #ifdeffery in procfs.c via Kconfig
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
/proc/fs/reiserfs/version is on the way of removing ->read_proc interface.
It's empty however, so simply remove it instead of doing dummy
conversion. It's hard to see what information userspace can extract from
empty file.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a helper function to enable raster. Also add one member in the
private data structure to track the current blank status, another function
pointer which takes in the platform specific callback function to control
panel power.
These updates will help in adding suspend/resume and frame buffer blank
operation features.
Signed-off-by: Chaithrika U S <chaithrika@ti.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch provides the acceleration entry points for the SM501
framebuffer driver.
This patch provides the sync, copyarea and fillrect entry points, using
the SM501's 2D acceleration engine to perform the operations in-chip
rather than across the bus.
Signed-off-by: Ben Dooks <ben@simtec.co.uk>
Signed-off-by: Simtec Linux Team <linux@simtec.co.uk>
Signed-off-by: Vincent Sanders <vince@simtec.co.uk>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Drivers may use gpiolib sysfs as part of their public user space
interface. The GPIO number and polarity might change from board to
board. The gpio_export_link() call can be used to hide the GPIO number
from user space. Add support for also hiding the GPIO line polarity
changes from user space.
Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A GPIO driver for the Timberdale FPGA found on the Intel Atom board
Russellville.
The GPIO driver also has an IRQ-chip to support interrupts on the pins.
Signed-off-by: Richard Röjfors <richard.rojfors@mocean-labs.com>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fix node-oriented allocation handling in oom-kill.c I myself think of this
as a bugfix not as an ehnancement.
In these days, things are changed as
- alloc_pages() eats nodemask as its arguments, __alloc_pages_nodemask().
- mempolicy don't maintain its own private zonelists.
(And cpuset doesn't use nodemask for __alloc_pages_nodemask())
So, current oom-killer's check function is wrong.
This patch does
- check nodemask, if nodemask && nodemask doesn't cover all
node_states[N_HIGH_MEMORY], this is CONSTRAINT_MEMORY_POLICY.
- Scan all zonelist under nodemask, if it hits cpuset's wall
this faiulre is from cpuset.
And
- modifies the caller of out_of_memory not to call oom if __GFP_THISNODE.
This doesn't change "current" behavior. If callers use __GFP_THISNODE
it should handle "page allocation failure" by itself.
- handle __GFP_NOFAIL+__GFP_THISNODE path.
This is something like a FIXME but this gfpmask is not used now.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hioryu@jp.fujitsu.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 5ad6468801 "ksm: let shared pages
be swappable" breaks the build on m68knommu and I suspect on any nommu:
In file included from kernel/fork.c:52:
include/linux/ksm.h:129: warning: 'enum ttu_flags' declared inside parameter list
include/linux/ksm.h:129: warning: its scope is only this definition or declaration, which is probably not what you want
include/linux/ksm.h:129: error: parameter 2 ('flags') has incomplete type
make[1]: *** [kernel/fork.o] Error 1
make[1]: *** Waiting for unfinished jobs....
Let's fix that with CONFIG_MMU around most of the !CONFIG_KSM declarations.
Reported-by: Steven King <sfking@fdwdc.com>
Signed-off-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Tested-by: Steven King <sfking@fdwdc.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It has been experimentally found out, that the sensor only supports up to
512x384 video output and also has some restrictions on minimum scale. We
disable non-working size ranges until, maybe, someone finds out how to properly
set them up. Also add cropping support, an auto white balance control, platform
data to specify master clock frequency and polarity of the IOCTL pin.
create mode 100644 include/media/rj54n1cb0c.h
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Convert soc-camera core and all soc-camera drivers to the new mediabus
API. This also takes soc-camera client drivers one step closer to also be
usable with generic v4l2-subdev host drivers.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Video subdevices, like cameras, decoders, connect to video bridges over
specialised busses. Data is being transferred over these busses in various
formats, which only loosely correspond to fourcc codes, describing how video
data is stored in RAM. This is not a one-to-one correspondence, therefore we
cannot use fourcc codes to configure subdevice output data formats. This patch
adds codes for several such on-the-bus formats and an API, similar to the
familiar .s_fmt(), .g_fmt(), .try_fmt(), .enum_fmt() API for configuring those
codes. After all users of the old API in struct v4l2_subdev_video_ops are
converted, it will be removed. Also add helper routines to support generic
pass-through mode for the soc-camera framework.
create mode 100644 drivers/media/video/soc_mediabus.c
create mode 100644 include/media/soc_mediabus.h
create mode 100644 include/media/v4l2-mediabus.h
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Hans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
After this change drivers can be further extended to not fail, if they don't
get platform data, but to use defaults.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Up to now, if a client driver needed platform data apart from those contained
in struct soc_camera_link, it had to embed the struct into its own object. This
makes the use of such a driver in configurations other than soc-camera
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
The 16-bit monochrome fourcc code has been previously abused for a 10-bit
format, add a new 10-bit code instead. Also add missing 8- and 10-bit Bayer
fourcc codes for completeness.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Introduce new v4l2-subdev sensor operations, move .enum_framesizes() and
.enum_frameintervals() methods to it, add a new .g_skip_top_lines() method
and switch soc-camera to use it instead of .y_skip_top soc_camera_device
member, which can now be removed.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Reviewed-by: Hans Verkuil <hverkuil@xs4all.nl>
Reviewed-by: Sergio Aguirre <saaguirre@ti.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
It creates a regression, triggering badness for SYN_RECV
sockets, for example:
[19148.022102] Badness at net/ipv4/inet_connection_sock.c:293
[19148.022570] NIP: c02a0914 LR: c02a0904 CTR: 00000000
[19148.023035] REGS: eeecbd30 TRAP: 0700 Not tainted (2.6.32)
[19148.023496] MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24002442 XER: 00000000
[19148.024012] TASK = eee9a820[1756] 'privoxy' THREAD: eeeca000
This is likely caused by the change in the 'estab' parameter
passed to tcp_parse_options() when invoked by the functions
in net/ipv4/tcp_minisocks.c
But even if that is fixed, the ->conn_request() changes made in
this patch series is fundamentally wrong. They try to use the
listening socket's 'dst' to probe the route settings. The
listening socket doesn't even have a route, and you can't
get the right route (the child request one) until much later
after we setup all of the state, and it must be done by hand.
This stuff really isn't ready, so the best thing to do is a
full revert. This reverts the following commits:
f55017a93f022c3f7d821aba721ebacda42ebd67345cda2fd6dc343475ed05eaade2786a2a2d6bf8
Signed-off-by: David S. Miller <davem@davemloft.net>