aha/fs
Andreas Dilger 717d50e497 Ext4: Uninitialized Block Groups
In pass1 of e2fsck, every inode table in the fileystem is scanned and checked,
regardless of whether it is in use.  This is this the most time consuming part
of the filesystem check.  The unintialized block group feature can greatly
reduce e2fsck time by eliminating checking of uninitialized inodes.

With this feature, there is a a high water mark of used inodes for each block
group.  Block and inode bitmaps can be uninitialized on disk via a flag in the
group descriptor to avoid reading or scanning them at e2fsck time.  A checksum
of each group descriptor is used to ensure that corruption in the group
descriptor's bit flags does not cause incorrect operation.

The feature is enabled through a mkfs option

	mke2fs /dev/ -O uninit_groups

A patch adding support for uninitialized block groups to e2fsprogs tools has
been posted to the linux-ext4 mailing list.

The patches have been stress tested with fsstress and fsx.  In performance
tests testing e2fsck time, we have seen that e2fsck time on ext3 grows
linearly with the total number of inodes in the filesytem.  In ext4 with the
uninitialized block groups feature, the e2fsck time is constant, based
solely on the number of used inodes rather than the total inode count.
Since typical ext4 filesystems only use 1-10% of their inodes, this feature can
greatly reduce e2fsck time for users.  With performance improvement of 2-20
times, depending on how full the filesystem is.

The attached graph shows the major improvements in e2fsck times in filesystems
with a large total inode count, but few inodes in use.

In each group descriptor if we have

EXT4_BG_INODE_UNINIT set in bg_flags:
        Inode table is not initialized/used in this group. So we can skip
        the consistency check during fsck.
EXT4_BG_BLOCK_UNINIT set in bg_flags:
        No block in the group is used. So we can skip the block bitmap
        verification for this group.

We also add two new fields to group descriptor as a part of
uninitialized group patch.

        __le16  bg_itable_unused;       /* Unused inodes count */
        __le16  bg_checksum;            /* crc16(sb_uuid+group+desc) */

bg_itable_unused:

If we have EXT4_BG_INODE_UNINIT not set in bg_flags
then bg_itable_unused will give the offset within
the inode table till the inodes are used. This can be
used by fsck to skip list of inodes that are marked unused.

bg_checksum:
Now that we depend on bg_flags and bg_itable_unused to determine
the block and inode usage, we need to make sure group descriptor
is not corrupt. We add checksum to group descriptor to
detect corruption. If the descriptor is found to be corrupt, we
mark all the blocks and inodes in the group used.

Signed-off-by: Avantika Mathur <mathur@us.ibm.com>
Signed-off-by: Andreas Dilger <adilger@clusterfs.com>
Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
2007-10-17 18:50:00 -04:00
..
9p 9p: fix bad kconfig cross-dependency 2007-10-17 14:31:07 -05:00
adfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
affs fs: mark nibblemap const 2007-10-17 08:42:47 -07:00
afs KEYS: Make request_key() and co fundamentally asynchronous 2007-10-17 08:42:57 -07:00
autofs
autofs4 fs/autofs4/inode.c: kmalloc + memset conversion to kzalloc 2007-10-17 08:42:50 -07:00
befs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
bfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
cifs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
coda Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
configfs r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
cramfs cramfs: error message about endianess 2007-10-17 08:42:53 -07:00
debugfs docbook: fix filesystems content 2007-10-15 17:56:36 -07:00
devpts
dlm menuconfig: transform NLS and DLM menus 2007-10-17 08:43:00 -07:00
ecryptfs Clean up duplicate includes in fs/ecryptfs/ 2007-10-17 08:42:48 -07:00
efs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
exportfs
ext2 ext2 reservations 2007-10-17 08:43:02 -07:00
ext3 ext3: lighten up resize transaction requirements 2007-10-17 08:43:01 -07:00
ext4 Ext4: Uninitialized Block Groups 2007-10-17 18:50:00 -04:00
fat Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
freevxfs
fuse fuse: clean up execute permission checking 2007-10-17 08:43:04 -07:00
gfs2 fs: correct SuS compliance for open of large file without options 2007-10-17 08:43:01 -07:00
hfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hfsplus Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hostfs uml: fix hostfs style 2007-10-16 09:43:07 -07:00
hpfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
hppfs
hugetlbfs r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
isofs fs/isofs/namei.c: Remove uninitialized local vars warning 2007-10-17 08:42:58 -07:00
jbd JBD: replace jbd_kmalloc with kmalloc directly 2007-10-17 18:49:57 -04:00
jbd2 JBD2: debug code cleanup. 2007-10-17 18:49:59 -04:00
jffs2 Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
jfs introduce I_SYNC 2007-10-17 08:43:02 -07:00
lockd
minix limit minixfs printks on corrupted dir i_size 2007-10-17 08:42:53 -07:00
msdos
ncpfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
nfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
nfs_common
nfsd Implement file posix capabilities 2007-10-17 08:43:07 -07:00
nls menuconfig: transform NLS and DLM menus 2007-10-17 08:43:00 -07:00
ntfs writeback: fix ntfs with sb_has_dirty_inodes() 2007-10-17 08:43:02 -07:00
ocfs2 Fix f_version type: should be u64 instead of unsigned long 2007-10-17 08:42:53 -07:00
openpromfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
partitions fs/partitions/sun.c endianness annotations 2007-10-14 12:41:51 -07:00
proc Don't truncate /proc/PID/environ at 4096 characters 2007-10-17 08:43:00 -07:00
qnx4 Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
ramfs Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
reiserfs reiserfs: do not repair wrong journal params 2007-10-17 08:43:01 -07:00
romfs fs/romfs/inode.c: trivial improvements 2007-10-17 08:42:47 -07:00
smbfs Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
sysfs spin_lock_unlocked cleanups 2007-10-17 08:43:01 -07:00
sysv Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
udf fs/udf/balloc.c: mark a variable as uninitialized_var() 2007-10-17 08:43:00 -07:00
ufs ufs: Fix mount check in ufs_fill_super() 2007-10-17 08:42:51 -07:00
vfat
xfs Merge branch 'for-linus' of git://oss.sgi.com:8090/xfs/xfs-2.6 2007-10-17 09:04:11 -07:00
aio.c aio: account I/O wait time properly 2007-10-17 08:42:53 -07:00
anon_inodes.c anon-inodes use open coded atomic_inc for the shared inode 2007-10-17 08:43:00 -07:00
attr.c Implement file posix capabilities 2007-10-17 08:43:07 -07:00
bad_inode.c
binfmt_aout.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
binfmt_elf.c Break ELF_PLATFORM and stack pointer randomization dependency 2007-10-17 08:43:01 -07:00
binfmt_elf_fdpic.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
binfmt_em86.c
binfmt_flat.c binfmt_flat: warning fixes 2007-10-17 08:42:54 -07:00
binfmt_misc.c
binfmt_script.c
binfmt_som.c core_pattern: ignore RLIMIT_CORE if core_pattern is a pipe 2007-10-17 08:42:50 -07:00
bio.c bio: make freeing of ->bi_io_vec conditional in bio_free() 2007-10-16 11:03:52 +02:00
block_dev.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
buffer.c writeback: remove pages_skipped accounting in __block_write_full_page() 2007-10-17 08:43:02 -07:00
char_dev.c mm: bdi init hooks 2007-10-17 08:42:45 -07:00
compat.c
compat_ioctl.c Clean up duplicate includes in fs/ 2007-10-17 08:42:48 -07:00
dcache.c vfs: use the predefined d_unhashed inline function instead 2007-10-17 08:43:00 -07:00
dcookies.c
direct-io.c remove ZERO_PAGE 2007-10-16 09:42:53 -07:00
dnotify.c
dquot.c quota: send messages via netlink 2007-10-17 08:42:56 -07:00
drop_caches.c
eventfd.c
eventpoll.c
exec.c security/ cleanups 2007-10-17 08:43:07 -07:00
fcntl.c F_DUPFD_CLOEXEC implementation 2007-10-17 08:43:01 -07:00
fifo.c
file.c
file_table.c r/o bind mounts: filesystem helpers for custom 'struct file's 2007-10-17 08:43:04 -07:00
filesystems.c
fs-writeback.c introduce I_SYNC 2007-10-17 08:43:02 -07:00
generic_acl.c
inode.c introduce I_SYNC 2007-10-17 08:43:02 -07:00
inotify.c
inotify_user.c change inotifyfs magic as the same magic is used for futexfs 2007-10-17 08:43:00 -07:00
internal.h
ioctl.c
ioprio.c
Kconfig Ext4: Uninitialized Block Groups 2007-10-17 18:50:00 -04:00
Kconfig.binfmt
libfs.c make fs/libfs.c:simple_commit_write() static 2007-10-17 08:42:53 -07:00
locks.c Slab API: remove useless ctor parameter and reorder parameters 2007-10-17 08:42:45 -07:00
Makefile Remove valueless definition of hard-selected RAMFS option 2007-10-17 08:42:56 -07:00
mbcache.c
mpage.c mm: buffered write cleanup 2007-10-16 09:42:54 -07:00
namei.c r/o bind mounts: give permission() a local 'mnt' variable 2007-10-17 08:43:05 -07:00
namespace.c fs: remove the unused mempages parameter 2007-10-17 08:42:49 -07:00
nfsctl.c
no-block.c
open.c Implement file posix capabilities 2007-10-17 08:43:07 -07:00
pipe.c sched: affine sync wakeups 2007-10-15 17:00:19 +02:00
pnode.c
pnode.h
posix_acl.c
quota.c
quota_v1.c
quota_v2.c
read_write.c Cleanup macros for distinguishing mandatory locks 2007-10-09 18:32:46 -04:00
read_write.h
readdir.c
select.c Use ERESTART_RESTARTBLOCK if poll() is interrupted by a signal 2007-10-17 08:42:53 -07:00
seq_file.c [FS] seq_file: Introduce the seq_open_private() 2007-10-10 16:55:33 -07:00
signalfd.c rename signalfd_siginfo fields 2007-10-17 08:43:01 -07:00
splice.c Implement file posix capabilities 2007-10-17 08:43:07 -07:00
stack.c
stat.c
super.c writeback: fix periodic superblock dirty inode flushing 2007-10-17 08:43:02 -07:00
sync.c
timerfd.c
utimes.c VFS: check nanoseconds in utimensat 2007-10-17 08:42:52 -07:00
xattr.c
xattr_acl.c