aha/fs
Fengguang Wu 1f7decf6d9 writeback: remove pages_skipped accounting in __block_write_full_page()
Miklos Szeredi <miklos@szeredi.hu> and me identified a writeback bug:

> The following strange behavior can be observed:
>
> 1. large file is written
> 2. after 30 seconds, nr_dirty goes down by 1024
> 3. then for some time (< 30 sec) nothing happens (disk idle)
> 4. then nr_dirty again goes down by 1024
> 5. repeat from 3. until whole file is written
>
> So basically a 4Mbyte chunk of the file is written every 30 seconds.
> I'm quite sure this is not the intended behavior.

It can be produced by the following test scheme:

# cat bin/test-writeback.sh
grep nr_dirty /proc/vmstat
echo 1 > /proc/sys/fs/inode_debug
dd if=/dev/zero of=/var/x bs=1K count=204800&
while true; do grep nr_dirty /proc/vmstat; sleep 1; done

# bin/test-writeback.sh
nr_dirty 19207
nr_dirty 19207
nr_dirty 30924
204800+0 records in
204800+0 records out
209715200 bytes (210 MB) copied, 1.58363 seconds, 132 MB/s
nr_dirty 47150
nr_dirty 47141
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47205
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47214
nr_dirty 47215
nr_dirty 47216
nr_dirty 47216
nr_dirty 47216
nr_dirty 47154
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47143
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47142
nr_dirty 47134
nr_dirty 47134
nr_dirty 47135
nr_dirty 47135
nr_dirty 47135
nr_dirty 46097 <== -1038
nr_dirty 46098
nr_dirty 46098
nr_dirty 46098
[...]
nr_dirty 46091
nr_dirty 46092
nr_dirty 46092
nr_dirty 45069 <== -1023
nr_dirty 45056
nr_dirty 45056
nr_dirty 45056
[...]
nr_dirty 37822
nr_dirty 36799 <== -1023
[...]
nr_dirty 36781
nr_dirty 35758 <== -1023
[...]
nr_dirty 34708
nr_dirty 33672 <== -1024
[...]
nr_dirty 33692
nr_dirty 32669 <== -1023

% ls -li /var/x
847824 -rw-r--r-- 1 root root 200M 2007-08-12 04:12 /var/x

% dmesg|grep 847824  # generated by a debug printk
[  529.263184] redirtied inode 847824 line 548
[  564.250872] redirtied inode 847824 line 548
[  594.272797] redirtied inode 847824 line 548
[  629.231330] redirtied inode 847824 line 548
[  659.224674] redirtied inode 847824 line 548
[  689.219890] redirtied inode 847824 line 548
[  724.226655] redirtied inode 847824 line 548
[  759.198568] redirtied inode 847824 line 548

# line 548 in fs/fs-writeback.c:
543                 if (wbc->pages_skipped != pages_skipped) {
544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */
548                         redirty_tail(inode);
549                 }

More debug efforts show that __block_write_full_page()
never has the chance to call submit_bh() for that big dirty file:
the buffer head is *clean*. So basicly no page io is issued by
__block_write_full_page(), hence pages_skipped goes up.

Also the comment in generic_sync_sb_inodes():

544                         /*
545                          * writeback is not making progress due to locked
546                          * buffers.  Skip this inode for now.
547                          */

and the comment in __block_write_full_page():

1713                 /*
1714                  * The page was marked dirty, but the buffers were
1715                  * clean.  Someone wrote them back by hand with
1716                  * ll_rw_block/submit_bh.  A rare case.
1717                  */

do not quite agree with each other. The page writeback should be skipped for
'locked buffer', but here it is 'clean buffer'!

This patch fixes this bug. Though I'm not sure why __block_write_full_page()
is called only to do nothing and who actually issued the writeback for us.

This is the two possible new behaviors after the patch:

1) pretty nice: wait 30s and write ALL:)
2) not so good:
	- during the dd: ~16M
	- after 30s:      ~4M
	- after 5s:       ~4M
	- after 5s:     ~176M

The next patch will fix case (2).

Cc: David Chinner <dgc@sgi.com>
Cc: Ken Chen <kenchen@google.com>
Signed-off-by: Fengguang Wu <wfg@mail.ustc.edu.cn>
Signed-off-by: David Chinner <dgc@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:43:02 -07:00
..
9p 9PFS: clean up explicit check for mandatory locks 2007-10-09 18:32:46 -04:00
adfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
affs fs: mark nibblemap const 2007-10-17 08:42:47 -07:00
afs KEYS: Make request_key() and co fundamentally asynchronous 2007-10-17 08:42:57 -07:00
autofs
autofs4 fs/autofs4/inode.c: kmalloc + memset conversion to kzalloc 2007-10-17 08:42:50 -07:00
befs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
bfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
cifs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
coda Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
configfs mm: bdi init hooks 2007-10-17 08:42:45 -07:00
cramfs cramfs: error message about endianess 2007-10-17 08:42:53 -07:00
debugfs docbook: fix filesystems content 2007-10-15 17:56:36 -07:00
devpts
dlm menuconfig: transform NLS and DLM menus 2007-10-17 08:43:00 -07:00
ecryptfs Clean up duplicate includes in fs/ecryptfs/ 2007-10-17 08:42:48 -07:00
efs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
exportfs
ext2 ext2/4: use is_power_of_2() 2007-10-17 08:42:53 -07:00
ext3 ext3: lighten up resize transaction requirements 2007-10-17 08:43:01 -07:00
ext4 Fix f_version type: should be u64 instead of unsigned long 2007-10-17 08:42:53 -07:00
fat Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
freevxfs mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
fuse Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
gfs2 fs: correct SuS compliance for open of large file without options 2007-10-17 08:43:01 -07:00
hfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hfsplus Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hostfs uml: fix hostfs style 2007-10-16 09:43:07 -07:00
hpfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hppfs
hugetlbfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
isofs fs/isofs/namei.c: Remove uninitialized local vars warning 2007-10-17 08:42:58 -07:00
jbd Group short-lived and reclaimable kernel allocations 2007-10-16 09:43:00 -07:00
jbd2 mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
jffs2 Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
jfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
lockd NFS/SUNRPC: use transport protocol naming 2007-10-09 17:17:53 -04:00
minix limit minixfs printks on corrupted dir i_size 2007-10-17 08:42:53 -07:00
msdos
ncpfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
nfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
nfs_common
nfsd fs/nfsd/export.c: make 3 functions static 2007-10-16 09:43:10 -07:00
nls menuconfig: transform NLS and DLM menus 2007-10-17 08:43:00 -07:00
ntfs writeback: fix ntfs with sb_has_dirty_inodes() 2007-10-17 08:43:02 -07:00
ocfs2 Fix f_version type: should be u64 instead of unsigned long 2007-10-17 08:42:53 -07:00
openpromfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
partitions fs/partitions/sun.c endianness annotations 2007-10-14 12:41:51 -07:00
proc Don't truncate /proc/PID/environ at 4096 characters 2007-10-17 08:43:00 -07:00
qnx4 Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
ramfs Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
reiserfs reiserfs: do not repair wrong journal params 2007-10-17 08:43:01 -07:00
romfs fs/romfs/inode.c: trivial improvements 2007-10-17 08:42:47 -07:00
smbfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
sysfs spin_lock_unlocked cleanups 2007-10-17 08:43:01 -07:00
sysv Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
udf fs/udf/balloc.c: mark a variable as uninitialized_var() 2007-10-17 08:43:00 -07:00
ufs ufs: Fix mount check in ufs_fill_super() 2007-10-17 08:42:51 -07:00
vfat
xfs writeback: remove pages_skipped accounting in __block_write_full_page() 2007-10-17 08:43:02 -07:00
aio.c aio: account I/O wait time properly 2007-10-17 08:42:53 -07:00
anon_inodes.c anon-inodes use open coded atomic_inc for the shared inode 2007-10-17 08:43:00 -07:00
attr.c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check 2007-07-17 12:00:03 -07:00
bad_inode.c
binfmt_aout.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
binfmt_elf.c Break ELF_PLATFORM and stack pointer randomization dependency 2007-10-17 08:43:01 -07:00
binfmt_elf_fdpic.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: warning fixes 2007-10-17 08:42:54 -07:00
binfmt_misc.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
binfmt_script.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
binfmt_som.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
bio.c bio: make freeing of ->bi_io_vec conditional in bio_free() 2007-10-16 11:03:52 +02:00
block_dev.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
buffer.c writeback: remove pages_skipped accounting in __block_write_full_page() 2007-10-17 08:43:02 -07:00
char_dev.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
compat.c mm: variable length argument support 2007-07-19 10:04:45 -07:00
compat_ioctl.c Clean up duplicate includes in fs/ 2007-10-17 08:42:48 -07:00
dcache.c vfs: use the predefined d_unhashed inline function instead 2007-10-17 08:43:00 -07:00
dcookies.c Remove fs.h from mm.h 2007-07-29 17:09:29 -07:00
direct-io.c remove ZERO_PAGE 2007-10-16 09:42:53 -07:00
dnotify.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
dquot.c quota: send messages via netlink 2007-10-17 08:42:56 -07:00
drop_caches.c
eventfd.c
eventpoll.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
exec.c exec: RT sub-thread can livelock and monopolize CPU on exec 2007-10-17 08:42:54 -07:00
fcntl.c F_DUPFD_CLOEXEC implementation 2007-10-17 08:43:01 -07:00
fifo.c
file.c
file_table.c fs: use kmem_cache_zalloc instead 2007-10-17 08:42:48 -07:00
filesystems.c
fs-writeback.c writeback: fix ntfs with sb_has_dirty_inodes() 2007-10-17 08:43:02 -07:00
generic_acl.c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check 2007-07-17 12:00:03 -07:00
inode.c fs: remove the unused mempages parameter 2007-10-17 08:42:49 -07:00
inotify.c
inotify_user.c change inotifyfs magic as the same magic is used for futexfs 2007-10-17 08:43:00 -07:00
internal.h
ioctl.c
ioprio.c
Kconfig menuconfig: transform Network Filesystems menu 2007-10-17 08:43:00 -07:00
Kconfig.binfmt
libfs.c make fs/libfs.c:simple_commit_write() static 2007-10-17 08:42:53 -07:00
locks.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
Makefile Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
mbcache.c mm: Remove slab destructors from kmem_cache_create(). 2007-07-20 10:11:58 +09:00
mpage.c mm: buffered write cleanup 2007-10-16 09:42:54 -07:00
namei.c fix execute checking in permission() 2007-10-17 08:42:52 -07:00
namespace.c fs: remove the unused mempages parameter 2007-10-17 08:42:49 -07:00
nfsctl.c nfsctl: use vfs_path_lookup 2007-07-19 10:04:45 -07:00
no-block.c
open.c fs: correct SuS compliance for open of large file without options 2007-10-17 08:43:01 -07:00
pipe.c sched: affine sync wakeups 2007-10-15 17:00:19 +02:00
pnode.c
pnode.h
posix_acl.c
quota.c [IA64] Fix build failure in fs/quota.c 2007-07-27 15:40:13 -07:00
quota_v1.c
quota_v2.c
read_write.c Cleanup macros for distinguishing mandatory locks 2007-10-09 18:32:46 -04:00
read_write.h
readdir.c
select.c Use ERESTART_RESTARTBLOCK if poll() is interrupted by a signal 2007-10-17 08:42:53 -07:00
seq_file.c [FS] seq_file: Introduce the seq_open_private() 2007-10-10 16:55:33 -07:00
signalfd.c rename signalfd_siginfo fields 2007-10-17 08:43:01 -07:00
splice.c Merge branch 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block 2007-10-16 10:09:16 -07:00
stack.c
stat.c
super.c writeback: fix periodic superblock dirty inode flushing 2007-10-17 08:43:02 -07:00
sync.c
timerfd.c make timerfd return a u64 and fix the __put_user 2007-07-26 11:35:17 -07:00
utimes.c VFS: check nanoseconds in utimensat 2007-10-17 08:42:52 -07:00
xattr.c Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check 2007-07-17 12:00:03 -07:00
xattr_acl.c