Merge branch 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6

* 'for_linus' of git://git.infradead.org/~dedekind/ubifs-2.6:
  UBIFS: include to compilation
  UBIFS: add new flash file system
  UBIFS: add brief documentation
  MAINTAINERS: add UBIFS section
  do_mounts: allow UBI root device name
  VFS: export sync_sb_inodes
  VFS: move inode_lock into sync_sb_inodes
This commit is contained in:
Linus Torvalds 2008-07-16 15:02:57 -07:00
commit 9c1be0c471
41 changed files with 33055 additions and 11 deletions

View file

@ -0,0 +1,164 @@
Introduction
=============
UBIFS file-system stands for UBI File System. UBI stands for "Unsorted
Block Images". UBIFS is a flash file system, which means it is designed
to work with flash devices. It is important to understand, that UBIFS
is completely different to any traditional file-system in Linux, like
Ext2, XFS, JFS, etc. UBIFS represents a separate class of file-systems
which work with MTD devices, not block devices. The other Linux
file-system of this class is JFFS2.
To make it more clear, here is a small comparison of MTD devices and
block devices.
1 MTD devices represent flash devices and they consist of eraseblocks of
rather large size, typically about 128KiB. Block devices consist of
small blocks, typically 512 bytes.
2 MTD devices support 3 main operations - read from some offset within an
eraseblock, write to some offset within an eraseblock, and erase a whole
eraseblock. Block devices support 2 main operations - read a whole
block and write a whole block.
3 The whole eraseblock has to be erased before it becomes possible to
re-write its contents. Blocks may be just re-written.
4 Eraseblocks become worn out after some number of erase cycles -
typically 100K-1G for SLC NAND and NOR flashes, and 1K-10K for MLC
NAND flashes. Blocks do not have the wear-out property.
5 Eraseblocks may become bad (only on NAND flashes) and software should
deal with this. Blocks on hard drives typically do not become bad,
because hardware has mechanisms to substitute bad blocks, at least in
modern LBA disks.
It should be quite obvious why UBIFS is very different to traditional
file-systems.
UBIFS works on top of UBI. UBI is a separate software layer which may be
found in drivers/mtd/ubi. UBI is basically a volume management and
wear-leveling layer. It provides so called UBI volumes which is a higher
level abstraction than a MTD device. The programming model of UBI devices
is very similar to MTD devices - they still consist of large eraseblocks,
they have read/write/erase operations, but UBI devices are devoid of
limitations like wear and bad blocks (items 4 and 5 in the above list).
In a sense, UBIFS is a next generation of JFFS2 file-system, but it is
very different and incompatible to JFFS2. The following are the main
differences.
* JFFS2 works on top of MTD devices, UBIFS depends on UBI and works on
top of UBI volumes.
* JFFS2 does not have on-media index and has to build it while mounting,
which requires full media scan. UBIFS maintains the FS indexing
information on the flash media and does not require full media scan,
so it mounts many times faster than JFFS2.
* JFFS2 is a write-through file-system, while UBIFS supports write-back,
which makes UBIFS much faster on writes.
Similarly to JFFS2, UBIFS supports on-the-flight compression which makes
it possible to fit quite a lot of data to the flash.
Similarly to JFFS2, UBIFS is tolerant of unclean reboots and power-cuts.
It does not need stuff like ckfs.ext2. UBIFS automatically replays its
journal and recovers from crashes, ensuring that the on-flash data
structures are consistent.
UBIFS scales logarithmically (most of the data structures it uses are
trees), so the mount time and memory consumption do not linearly depend
on the flash size, like in case of JFFS2. This is because UBIFS
maintains the FS index on the flash media. However, UBIFS depends on
UBI, which scales linearly. So overall UBI/UBIFS stack scales linearly.
Nevertheless, UBI/UBIFS scales considerably better than JFFS2.
The authors of UBIFS believe, that it is possible to develop UBI2 which
would scale logarithmically as well. UBI2 would support the same API as UBI,
but it would be binary incompatible to UBI. So UBIFS would not need to be
changed to use UBI2
Mount options
=============
(*) == default.
norm_unmount (*) commit on unmount; the journal is committed
when the file-system is unmounted so that the
next mount does not have to replay the journal
and it becomes very fast;
fast_unmount do not commit on unmount; this option makes
unmount faster, but the next mount slower
because of the need to replay the journal.
Quick usage instructions
========================
The UBI volume to mount is specified using "ubiX_Y" or "ubiX:NAME" syntax,
where "X" is UBI device number, "Y" is UBI volume number, and "NAME" is
UBI volume name.
Mount volume 0 on UBI device 0 to /mnt/ubifs:
$ mount -t ubifs ubi0_0 /mnt/ubifs
Mount "rootfs" volume of UBI device 0 to /mnt/ubifs ("rootfs" is volume
name):
$ mount -t ubifs ubi0:rootfs /mnt/ubifs
The following is an example of the kernel boot arguments to attach mtd0
to UBI and mount volume "rootfs":
ubi.mtd=0 root=ubi0:rootfs rootfstype=ubifs
Module Parameters for Debugging
===============================
When UBIFS has been compiled with debugging enabled, there are 3 module
parameters that are available to control aspects of testing and debugging.
The parameters are unsigned integers where each bit controls an option.
The parameters are:
debug_msgs Selects which debug messages to display, as follows:
Message Type Flag value
General messages 1
Journal messages 2
Mount messages 4
Commit messages 8
LEB search messages 16
Budgeting messages 32
Garbage collection messages 64
Tree Node Cache (TNC) messages 128
LEB properties (lprops) messages 256
Input/output messages 512
Log messages 1024
Scan messages 2048
Recovery messages 4096
debug_chks Selects extra checks that UBIFS can do while running:
Check Flag value
General checks 1
Check Tree Node Cache (TNC) 2
Check indexing tree size 4
Check orphan area 8
Check old indexing tree 16
Check LEB properties (lprops) 32
Check leaf nodes and inodes 64
debug_tsts Selects a mode of testing, as follows:
Test mode Flag value
Force in-the-gaps method 2
Failure mode for recovery testing 4
For example, set debug_msgs to 5 to display General messages and Mount
messages.
References
==========
UBIFS documentation and FAQ/HOWTO at the MTD web site:
http://www.linux-mtd.infradead.org/doc/ubifs.html
http://www.linux-mtd.infradead.org/faq/ubifs.html

View file

@ -2336,6 +2336,16 @@ L: linux-mtd@lists.infradead.org
W: http://www.linux-mtd.infradead.org/doc/jffs2.html
S: Maintained
UBI FILE SYSTEM (UBIFS)
P: Artem Bityutskiy
M: dedekind@infradead.org
P: Adrian Hunter
M: ext-adrian.hunter@nokia.com
L: linux-mtd@lists.infradead.org
T: git git://git.infradead.org/~dedekind/ubifs-2.6.git
W: http://www.linux-mtd.infradead.org/doc/ubifs.html
S: Maintained
JFS FILESYSTEM
P: Dave Kleikamp
M: shaggy@austin.ibm.com

View file

@ -1375,6 +1375,9 @@ config JFFS2_CMODE_FAVOURLZO
endchoice
# UBIFS File system configuration
source "fs/ubifs/Kconfig"
config CRAMFS
tristate "Compressed ROM file system support (cramfs)"
depends on BLOCK

View file

@ -101,6 +101,7 @@ obj-$(CONFIG_NTFS_FS) += ntfs/
obj-$(CONFIG_UFS_FS) += ufs/
obj-$(CONFIG_EFS_FS) += efs/
obj-$(CONFIG_JFFS2_FS) += jffs2/
obj-$(CONFIG_UBIFS_FS) += ubifs/
obj-$(CONFIG_AFFS_FS) += affs/
obj-$(CONFIG_ROMFS_FS) += romfs/
obj-$(CONFIG_QNX4FS_FS) += qnx4/

View file

@ -424,8 +424,6 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
* WB_SYNC_HOLD is a hack for sys_sync(): reattach the inode to sb->s_dirty so
* that it can be located for waiting on in __writeback_single_inode().
*
* Called under inode_lock.
*
* If `bdi' is non-zero then we're being asked to writeback a specific queue.
* This function assumes that the blockdev superblock's inodes are backed by
* a variety of queues, so all inodes are searched. For other superblocks,
@ -441,11 +439,12 @@ __writeback_single_inode(struct inode *inode, struct writeback_control *wbc)
* on the writer throttling path, and we get decent balancing between many
* throttled threads: we don't want them all piling up on inode_sync_wait.
*/
static void
sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
void generic_sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
const unsigned long start = jiffies; /* livelock avoidance */
spin_lock(&inode_lock);
if (!wbc->for_kupdate || list_empty(&sb->s_io))
queue_io(sb, wbc->older_than_this);
@ -524,8 +523,16 @@ sync_sb_inodes(struct super_block *sb, struct writeback_control *wbc)
if (!list_empty(&sb->s_more_io))
wbc->more_io = 1;
}
spin_unlock(&inode_lock);
return; /* Leave any unwritten inodes on s_io */
}
EXPORT_SYMBOL_GPL(generic_sync_sb_inodes);
static void sync_sb_inodes(struct super_block *sb,
struct writeback_control *wbc)
{
generic_sync_sb_inodes(sb, wbc);
}
/*
* Start writeback of dirty pagecache data against all unlocked inodes.
@ -565,11 +572,8 @@ restart:
* be unmounted by the time it is released.
*/
if (down_read_trylock(&sb->s_umount)) {
if (sb->s_root) {
spin_lock(&inode_lock);
if (sb->s_root)
sync_sb_inodes(sb, wbc);
spin_unlock(&inode_lock);
}
up_read(&sb->s_umount);
}
spin_lock(&sb_lock);
@ -607,9 +611,7 @@ void sync_inodes_sb(struct super_block *sb, int wait)
(inodes_stat.nr_inodes - inodes_stat.nr_unused) +
nr_dirty + nr_unstable;
wbc.nr_to_write += wbc.nr_to_write / 2; /* Bit more for luck */
spin_lock(&inode_lock);
sync_sb_inodes(sb, &wbc);
spin_unlock(&inode_lock);
}
/*

72
fs/ubifs/Kconfig Normal file
View file

@ -0,0 +1,72 @@
config UBIFS_FS
tristate "UBIFS file system support"
select CRC16
select CRC32
select CRYPTO if UBIFS_FS_ADVANCED_COMPR
select CRYPTO if UBIFS_FS_LZO
select CRYPTO if UBIFS_FS_ZLIB
select CRYPTO_LZO if UBIFS_FS_LZO
select CRYPTO_DEFLATE if UBIFS_FS_ZLIB
depends on MTD_UBI
help
UBIFS is a file system for flash devices which works on top of UBI.
config UBIFS_FS_XATTR
bool "Extended attributes support"
depends on UBIFS_FS
help
This option enables support of extended attributes.
config UBIFS_FS_ADVANCED_COMPR
bool "Advanced compression options"
depends on UBIFS_FS
help
This option allows to explicitly choose which compressions, if any,
are enabled in UBIFS. Removing compressors means inbility to read
existing file systems.
If unsure, say 'N'.
config UBIFS_FS_LZO
bool "LZO compression support" if UBIFS_FS_ADVANCED_COMPR
depends on UBIFS_FS
default y
help
LZO compressor is generally faster then zlib but compresses worse.
Say 'Y' if unsure.
config UBIFS_FS_ZLIB
bool "ZLIB compression support" if UBIFS_FS_ADVANCED_COMPR
depends on UBIFS_FS
default y
help
Zlib copresses better then LZO but it is slower. Say 'Y' if unsure.
# Debugging-related stuff
config UBIFS_FS_DEBUG
bool "Enable debugging"
depends on UBIFS_FS
select DEBUG_FS
select KALLSYMS_ALL
help
This option enables UBIFS debugging.
config UBIFS_FS_DEBUG_MSG_LVL
int "Default message level (0 = no extra messages, 3 = lots)"
depends on UBIFS_FS_DEBUG
default "0"
help
This controls the amount of debugging messages produced by UBIFS.
If reporting bugs, please try to have available a full dump of the
messages at level 1 while the misbehaviour was occurring. Level 2
may become necessary if level 1 messages were not enough to find the
bug. Generally Level 3 should be avoided.
config UBIFS_FS_DEBUG_CHKS
bool "Enable extra checks"
depends on UBIFS_FS_DEBUG
help
If extra checks are enabled UBIFS will check the consistency of its
internal data structures during operation. However, UBIFS performance
is dramatically slower when this option is selected especially if the
file system is large.

9
fs/ubifs/Makefile Normal file
View file

@ -0,0 +1,9 @@
obj-$(CONFIG_UBIFS_FS) += ubifs.o
ubifs-y += shrinker.o journal.o file.o dir.o super.o sb.o io.o
ubifs-y += tnc.o master.o scan.o replay.o log.o commit.o gc.o orphan.o
ubifs-y += budget.o find.o tnc_commit.o compress.o lpt.o lprops.o
ubifs-y += recovery.o ioctl.o lpt_commit.o tnc_misc.o
ubifs-$(CONFIG_UBIFS_FS_DEBUG) += debug.o
ubifs-$(CONFIG_UBIFS_FS_XATTR) += xattr.o

731
fs/ubifs/budget.c Normal file
View file

@ -0,0 +1,731 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Adrian Hunter
* Artem Bityutskiy (Битюцкий Артём)
*/
/*
* This file implements the budgeting sub-system which is responsible for UBIFS
* space management.
*
* Factors such as compression, wasted space at the ends of LEBs, space in other
* journal heads, the effect of updates on the index, and so on, make it
* impossible to accurately predict the amount of space needed. Consequently
* approximations are used.
*/
#include "ubifs.h"
#include <linux/writeback.h>
#include <asm/div64.h>
/*
* When pessimistic budget calculations say that there is no enough space,
* UBIFS starts writing back dirty inodes and pages, doing garbage collection,
* or committing. The below constants define maximum number of times UBIFS
* repeats the operations.
*/
#define MAX_SHRINK_RETRIES 8
#define MAX_GC_RETRIES 4
#define MAX_CMT_RETRIES 2
#define MAX_NOSPC_RETRIES 1
/*
* The below constant defines amount of dirty pages which should be written
* back at when trying to shrink the liability.
*/
#define NR_TO_WRITE 16
/**
* struct retries_info - information about re-tries while making free space.
* @prev_liability: previous liability
* @shrink_cnt: how many times the liability was shrinked
* @shrink_retries: count of liability shrink re-tries (increased when
* liability does not shrink)
* @try_gc: GC should be tried first
* @gc_retries: how many times GC was run
* @cmt_retries: how many times commit has been done
* @nospc_retries: how many times GC returned %-ENOSPC
*
* Since we consider budgeting to be the fast-path, and this structure has to
* be allocated on stack and zeroed out, we make it smaller using bit-fields.
*/
struct retries_info {
long long prev_liability;
unsigned int shrink_cnt;
unsigned int shrink_retries:5;
unsigned int try_gc:1;
unsigned int gc_retries:4;
unsigned int cmt_retries:3;
unsigned int nospc_retries:1;
};
/**
* shrink_liability - write-back some dirty pages/inodes.
* @c: UBIFS file-system description object
* @nr_to_write: how many dirty pages to write-back
*
* This function shrinks UBIFS liability by means of writing back some amount
* of dirty inodes and their pages. Returns the amount of pages which were
* written back. The returned value does not include dirty inodes which were
* synchronized.
*
* Note, this function synchronizes even VFS inodes which are locked
* (@i_mutex) by the caller of the budgeting function, because write-back does
* not touch @i_mutex.
*/
static int shrink_liability(struct ubifs_info *c, int nr_to_write)
{
int nr_written;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_NONE,
.range_end = LLONG_MAX,
.nr_to_write = nr_to_write,
};
generic_sync_sb_inodes(c->vfs_sb, &wbc);
nr_written = nr_to_write - wbc.nr_to_write;
if (!nr_written) {
/*
* Re-try again but wait on pages/inodes which are being
* written-back concurrently (e.g., by pdflush).
*/
memset(&wbc, 0, sizeof(struct writeback_control));
wbc.sync_mode = WB_SYNC_ALL;
wbc.range_end = LLONG_MAX;
wbc.nr_to_write = nr_to_write;
generic_sync_sb_inodes(c->vfs_sb, &wbc);
nr_written = nr_to_write - wbc.nr_to_write;
}
dbg_budg("%d pages were written back", nr_written);
return nr_written;
}
/**
* run_gc - run garbage collector.
* @c: UBIFS file-system description object
*
* This function runs garbage collector to make some more free space. Returns
* zero if a free LEB has been produced, %-EAGAIN if commit is required, and a
* negative error code in case of failure.
*/
static int run_gc(struct ubifs_info *c)
{
int err, lnum;
/* Make some free space by garbage-collecting dirty space */
down_read(&c->commit_sem);
lnum = ubifs_garbage_collect(c, 1);
up_read(&c->commit_sem);
if (lnum < 0)
return lnum;
/* GC freed one LEB, return it to lprops */
dbg_budg("GC freed LEB %d", lnum);
err = ubifs_return_leb(c, lnum);
if (err)
return err;
return 0;
}
/**
* make_free_space - make more free space on the file-system.
* @c: UBIFS file-system description object
* @ri: information about previous invocations of this function
*
* This function is called when an operation cannot be budgeted because there
* is supposedly no free space. But in most cases there is some free space:
* o budgeting is pessimistic, so it always budgets more then it is actually
* needed, so shrinking the liability is one way to make free space - the
* cached data will take less space then it was budgeted for;
* o GC may turn some dark space into free space (budgeting treats dark space
* as not available);
* o commit may free some LEB, i.e., turn freeable LEBs into free LEBs.
*
* So this function tries to do the above. Returns %-EAGAIN if some free space
* was presumably made and the caller has to re-try budgeting the operation.
* Returns %-ENOSPC if it couldn't do more free space, and other negative error
* codes on failures.
*/
static int make_free_space(struct ubifs_info *c, struct retries_info *ri)
{
int err;
/*
* If we have some dirty pages and inodes (liability), try to write
* them back unless this was tried too many times without effect
* already.
*/
if (ri->shrink_retries < MAX_SHRINK_RETRIES && !ri->try_gc) {
long long liability;
spin_lock(&c->space_lock);
liability = c->budg_idx_growth + c->budg_data_growth +
c->budg_dd_growth;
spin_unlock(&c->space_lock);
if (ri->prev_liability >= liability) {
/* Liability does not shrink, next time try GC then */
ri->shrink_retries += 1;
if (ri->gc_retries < MAX_GC_RETRIES)
ri->try_gc = 1;
dbg_budg("liability did not shrink: retries %d of %d",
ri->shrink_retries, MAX_SHRINK_RETRIES);
}
dbg_budg("force write-back (count %d)", ri->shrink_cnt);
shrink_liability(c, NR_TO_WRITE + ri->shrink_cnt);
ri->prev_liability = liability;
ri->shrink_cnt += 1;
return -EAGAIN;
}
/*
* Try to run garbage collector unless it was already tried too many
* times.
*/
if (ri->gc_retries < MAX_GC_RETRIES) {
ri->gc_retries += 1;
dbg_budg("run GC, retries %d of %d",
ri->gc_retries, MAX_GC_RETRIES);
ri->try_gc = 0;
err = run_gc(c);
if (!err)
return -EAGAIN;
if (err == -EAGAIN) {
dbg_budg("GC asked to commit");
err = ubifs_run_commit(c);
if (err)
return err;
return -EAGAIN;
}
if (err != -ENOSPC)
return err;
/*
* GC could not make any progress. If this is the first time,
* then it makes sense to try to commit, because it might make
* some dirty space.
*/
dbg_budg("GC returned -ENOSPC, retries %d",
ri->nospc_retries);
if (ri->nospc_retries >= MAX_NOSPC_RETRIES)
return err;
ri->nospc_retries += 1;
}
/* Neither GC nor write-back helped, try to commit */
if (ri->cmt_retries < MAX_CMT_RETRIES) {
ri->cmt_retries += 1;
dbg_budg("run commit, retries %d of %d",
ri->cmt_retries, MAX_CMT_RETRIES);
err = ubifs_run_commit(c);
if (err)
return err;
return -EAGAIN;
}
return -ENOSPC;
}
/**
* ubifs_calc_min_idx_lebs - calculate amount of eraseblocks for the index.
* @c: UBIFS file-system description object
*
* This function calculates and returns the number of eraseblocks which should
* be kept for index usage.
*/
int ubifs_calc_min_idx_lebs(struct ubifs_info *c)
{
int ret;
uint64_t idx_size;
idx_size = c->old_idx_sz + c->budg_idx_growth + c->budg_uncommitted_idx;
/* And make sure we have twice the index size of space reserved */
idx_size <<= 1;
/*
* We do not maintain 'old_idx_size' as 'old_idx_lebs'/'old_idx_bytes'
* pair, nor similarly the two variables for the new index size, so we
* have to do this costly 64-bit division on fast-path.
*/
if (do_div(idx_size, c->leb_size - c->max_idx_node_sz))
ret = idx_size + 1;
else
ret = idx_size;
/*
* The index head is not available for the in-the-gaps method, so add an
* extra LEB to compensate.
*/
ret += 1;
/*
* At present the index needs at least 2 LEBs: one for the index head
* and one for in-the-gaps method (which currently does not cater for
* the index head and so excludes it from consideration).
*/
if (ret < 2)
ret = 2;
return ret;
}
/**
* ubifs_calc_available - calculate available FS space.
* @c: UBIFS file-system description object
* @min_idx_lebs: minimum number of LEBs reserved for the index
*
* This function calculates and returns amount of FS space available for use.
*/
long long ubifs_calc_available(const struct ubifs_info *c, int min_idx_lebs)
{
int subtract_lebs;
long long available;
/*
* Force the amount available to the total size reported if the used
* space is zero.
*/
if (c->lst.total_used <= UBIFS_INO_NODE_SZ &&
c->budg_data_growth + c->budg_dd_growth == 0) {
/* Do the same calculation as for c->block_cnt */
available = c->main_lebs - 2;
available *= c->leb_size - c->dark_wm;
return available;
}
available = c->main_bytes - c->lst.total_used;
/*
* Now 'available' contains theoretically available flash space
* assuming there is no index, so we have to subtract the space which
* is reserved for the index.
*/
subtract_lebs = min_idx_lebs;
/* Take into account that GC reserves one LEB for its own needs */
subtract_lebs += 1;
/*
* The GC journal head LEB is not really accessible. And since
* different write types go to different heads, we may count only on
* one head's space.
*/
subtract_lebs += c->jhead_cnt - 1;
/* We also reserve one LEB for deletions, which bypass budgeting */
subtract_lebs += 1;
available -= (long long)subtract_lebs * c->leb_size;
/* Subtract the dead space which is not available for use */
available -= c->lst.total_dead;
/*
* Subtract dark space, which might or might not be usable - it depends
* on the data which we have on the media and which will be written. If
* this is a lot of uncompressed or not-compressible data, the dark
* space cannot be used.
*/
available -= c->lst.total_dark;
/*
* However, there is more dark space. The index may be bigger than
* @min_idx_lebs. Those extra LEBs are assumed to be available, but
* their dark space is not included in total_dark, so it is subtracted
* here.
*/
if (c->lst.idx_lebs > min_idx_lebs) {
subtract_lebs = c->lst.idx_lebs - min_idx_lebs;
available -= subtract_lebs * c->dark_wm;
}
/* The calculations are rough and may end up with a negative number */
return available > 0 ? available : 0;
}
/**
* can_use_rp - check whether the user is allowed to use reserved pool.
* @c: UBIFS file-system description object
*
* UBIFS has so-called "reserved pool" which is flash space reserved
* for the superuser and for uses whose UID/GID is recorded in UBIFS superblock.
* This function checks whether current user is allowed to use reserved pool.
* Returns %1 current user is allowed to use reserved pool and %0 otherwise.
*/
static int can_use_rp(struct ubifs_info *c)
{
if (current->fsuid == c->rp_uid || capable(CAP_SYS_RESOURCE) ||
(c->rp_gid != 0 && in_group_p(c->rp_gid)))
return 1;
return 0;
}
/**
* do_budget_space - reserve flash space for index and data growth.
* @c: UBIFS file-system description object
*
* This function makes sure UBIFS has enough free eraseblocks for index growth
* and data.
*
* When budgeting index space, UBIFS reserves twice as more LEBs as the index
* would take if it was consolidated and written to the flash. This guarantees
* that the "in-the-gaps" commit method always succeeds and UBIFS will always
* be able to commit dirty index. So this function basically adds amount of
* budgeted index space to the size of the current index, multiplies this by 2,
* and makes sure this does not exceed the amount of free eraseblocks.
*
* Notes about @c->min_idx_lebs and @c->lst.idx_lebs variables:
* o @c->lst.idx_lebs is the number of LEBs the index currently uses. It might
* be large, because UBIFS does not do any index consolidation as long as
* there is free space. IOW, the index may take a lot of LEBs, but the LEBs
* will contain a lot of dirt.
* o @c->min_idx_lebs is the the index presumably takes. IOW, the index may be
* consolidated to take up to @c->min_idx_lebs LEBs.
*
* This function returns zero in case of success, and %-ENOSPC in case of
* failure.
*/
static int do_budget_space(struct ubifs_info *c)
{
long long outstanding, available;
int lebs, rsvd_idx_lebs, min_idx_lebs;
/* First budget index space */
min_idx_lebs = ubifs_calc_min_idx_lebs(c);
/* Now 'min_idx_lebs' contains number of LEBs to reserve */
if (min_idx_lebs > c->lst.idx_lebs)
rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
else
rsvd_idx_lebs = 0;
/*
* The number of LEBs that are available to be used by the index is:
*
* @c->lst.empty_lebs + @c->freeable_cnt + @c->idx_gc_cnt -
* @c->lst.taken_empty_lebs
*
* @empty_lebs are available because they are empty. @freeable_cnt are
* available because they contain only free and dirty space and the
* index allocation always occurs after wbufs are synch'ed.
* @idx_gc_cnt are available because they are index LEBs that have been
* garbage collected (including trivial GC) and are awaiting the commit
* before they can be unmapped - note that the in-the-gaps method will
* grab these if it needs them. @taken_empty_lebs are empty_lebs that
* have already been allocated for some purpose (also includes those
* LEBs on the @idx_gc list).
*
* Note, @taken_empty_lebs may temporarily be higher by one because of
* the way we serialize LEB allocations and budgeting. See a comment in
* 'ubifs_find_free_space()'.
*/
lebs = c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt -
c->lst.taken_empty_lebs;
if (unlikely(rsvd_idx_lebs > lebs)) {
dbg_budg("out of indexing space: min_idx_lebs %d (old %d), "
"rsvd_idx_lebs %d", min_idx_lebs, c->min_idx_lebs,
rsvd_idx_lebs);
return -ENOSPC;
}
available = ubifs_calc_available(c, min_idx_lebs);
outstanding = c->budg_data_growth + c->budg_dd_growth;
if (unlikely(available < outstanding)) {
dbg_budg("out of data space: available %lld, outstanding %lld",
available, outstanding);
return -ENOSPC;
}
if (available - outstanding <= c->rp_size && !can_use_rp(c))
return -ENOSPC;
c->min_idx_lebs = min_idx_lebs;
return 0;
}
/**
* calc_idx_growth - calculate approximate index growth from budgeting request.
* @c: UBIFS file-system description object
* @req: budgeting request
*
* For now we assume each new node adds one znode. But this is rather poor
* approximation, though.
*/
static int calc_idx_growth(const struct ubifs_info *c,
const struct ubifs_budget_req *req)
{
int znodes;
znodes = req->new_ino + (req->new_page << UBIFS_BLOCKS_PER_PAGE_SHIFT) +
req->new_dent;
return znodes * c->max_idx_node_sz;
}
/**
* calc_data_growth - calculate approximate amount of new data from budgeting
* request.
* @c: UBIFS file-system description object
* @req: budgeting request
*/
static int calc_data_growth(const struct ubifs_info *c,
const struct ubifs_budget_req *req)
{
int data_growth;
data_growth = req->new_ino ? c->inode_budget : 0;
if (req->new_page)
data_growth += c->page_budget;
if (req->new_dent)
data_growth += c->dent_budget;
data_growth += req->new_ino_d;
return data_growth;
}
/**
* calc_dd_growth - calculate approximate amount of data which makes other data
* dirty from budgeting request.
* @c: UBIFS file-system description object
* @req: budgeting request
*/
static int calc_dd_growth(const struct ubifs_info *c,
const struct ubifs_budget_req *req)
{
int dd_growth;
dd_growth = req->dirtied_page ? c->page_budget : 0;
if (req->dirtied_ino)
dd_growth += c->inode_budget << (req->dirtied_ino - 1);
if (req->mod_dent)
dd_growth += c->dent_budget;
dd_growth += req->dirtied_ino_d;
return dd_growth;
}
/**
* ubifs_budget_space - ensure there is enough space to complete an operation.
* @c: UBIFS file-system description object
* @req: budget request
*
* This function allocates budget for an operation. It uses pessimistic
* approximation of how much flash space the operation needs. The goal of this
* function is to make sure UBIFS always has flash space to flush all dirty
* pages, dirty inodes, and dirty znodes (liability). This function may force
* commit, garbage-collection or write-back. Returns zero in case of success,
* %-ENOSPC if there is no free space and other negative error codes in case of
* failures.
*/
int ubifs_budget_space(struct ubifs_info *c, struct ubifs_budget_req *req)
{
int uninitialized_var(cmt_retries), uninitialized_var(wb_retries);
int err, idx_growth, data_growth, dd_growth;
struct retries_info ri;
ubifs_assert(req->dirtied_ino <= 4);
ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 4);
data_growth = calc_data_growth(c, req);
dd_growth = calc_dd_growth(c, req);
if (!data_growth && !dd_growth)
return 0;
idx_growth = calc_idx_growth(c, req);
memset(&ri, 0, sizeof(struct retries_info));
again:
spin_lock(&c->space_lock);
ubifs_assert(c->budg_idx_growth >= 0);
ubifs_assert(c->budg_data_growth >= 0);
ubifs_assert(c->budg_dd_growth >= 0);
if (unlikely(c->nospace) && (c->nospace_rp || !can_use_rp(c))) {
dbg_budg("no space");
spin_unlock(&c->space_lock);
return -ENOSPC;
}
c->budg_idx_growth += idx_growth;
c->budg_data_growth += data_growth;
c->budg_dd_growth += dd_growth;
err = do_budget_space(c);
if (likely(!err)) {
req->idx_growth = idx_growth;
req->data_growth = data_growth;
req->dd_growth = dd_growth;
spin_unlock(&c->space_lock);
return 0;
}
/* Restore the old values */
c->budg_idx_growth -= idx_growth;
c->budg_data_growth -= data_growth;
c->budg_dd_growth -= dd_growth;
spin_unlock(&c->space_lock);
if (req->fast) {
dbg_budg("no space for fast budgeting");
return err;
}
err = make_free_space(c, &ri);
if (err == -EAGAIN) {
dbg_budg("try again");
cond_resched();
goto again;
} else if (err == -ENOSPC) {
dbg_budg("FS is full, -ENOSPC");
c->nospace = 1;
if (can_use_rp(c) || c->rp_size == 0)
c->nospace_rp = 1;
smp_wmb();
} else
ubifs_err("cannot budget space, error %d", err);
return err;
}
/**
* ubifs_release_budget - release budgeted free space.
* @c: UBIFS file-system description object
* @req: budget request
*
* This function releases the space budgeted by 'ubifs_budget_space()'. Note,
* since the index changes (which were budgeted for in @req->idx_growth) will
* only be written to the media on commit, this function moves the index budget
* from @c->budg_idx_growth to @c->budg_uncommitted_idx. The latter will be
* zeroed by the commit operation.
*/
void ubifs_release_budget(struct ubifs_info *c, struct ubifs_budget_req *req)
{
ubifs_assert(req->dirtied_ino <= 4);
ubifs_assert(req->dirtied_ino_d <= UBIFS_MAX_INO_DATA * 4);
if (!req->recalculate) {
ubifs_assert(req->idx_growth >= 0);
ubifs_assert(req->data_growth >= 0);
ubifs_assert(req->dd_growth >= 0);
}
if (req->recalculate) {
req->data_growth = calc_data_growth(c, req);
req->dd_growth = calc_dd_growth(c, req);
req->idx_growth = calc_idx_growth(c, req);
}
if (!req->data_growth && !req->dd_growth)
return;
c->nospace = c->nospace_rp = 0;
smp_wmb();
spin_lock(&c->space_lock);
c->budg_idx_growth -= req->idx_growth;
c->budg_uncommitted_idx += req->idx_growth;
c->budg_data_growth -= req->data_growth;
c->budg_dd_growth -= req->dd_growth;
c->min_idx_lebs = ubifs_calc_min_idx_lebs(c);
ubifs_assert(c->budg_idx_growth >= 0);
ubifs_assert(c->budg_data_growth >= 0);
ubifs_assert(c->min_idx_lebs < c->main_lebs);
spin_unlock(&c->space_lock);
}
/**
* ubifs_convert_page_budget - convert budget of a new page.
* @c: UBIFS file-system description object
*
* This function converts budget which was allocated for a new page of data to
* the budget of changing an existing page of data. The latter is smaller then
* the former, so this function only does simple re-calculation and does not
* involve any write-back.
*/
void ubifs_convert_page_budget(struct ubifs_info *c)
{
spin_lock(&c->space_lock);
/* Release the index growth reservation */
c->budg_idx_growth -= c->max_idx_node_sz << UBIFS_BLOCKS_PER_PAGE_SHIFT;
/* Release the data growth reservation */
c->budg_data_growth -= c->page_budget;
/* Increase the dirty data growth reservation instead */
c->budg_dd_growth += c->page_budget;
/* And re-calculate the indexing space reservation */
c->min_idx_lebs = ubifs_calc_min_idx_lebs(c);
spin_unlock(&c->space_lock);
}
/**
* ubifs_release_dirty_inode_budget - release dirty inode budget.
* @c: UBIFS file-system description object
* @ui: UBIFS inode to release the budget for
*
* This function releases budget corresponding to a dirty inode. It is usually
* called when after the inode has been written to the media and marked as
* clean.
*/
void ubifs_release_dirty_inode_budget(struct ubifs_info *c,
struct ubifs_inode *ui)
{
struct ubifs_budget_req req = {.dd_growth = c->inode_budget,
.dirtied_ino_d = ui->data_len};
ubifs_release_budget(c, &req);
}
/**
* ubifs_budg_get_free_space - return amount of free space.
* @c: UBIFS file-system description object
*
* This function returns amount of free space on the file-system.
*/
long long ubifs_budg_get_free_space(struct ubifs_info *c)
{
int min_idx_lebs, rsvd_idx_lebs;
long long available, outstanding, free;
/* Do exactly the same calculations as in 'do_budget_space()' */
spin_lock(&c->space_lock);
min_idx_lebs = ubifs_calc_min_idx_lebs(c);
if (min_idx_lebs > c->lst.idx_lebs)
rsvd_idx_lebs = min_idx_lebs - c->lst.idx_lebs;
else
rsvd_idx_lebs = 0;
if (rsvd_idx_lebs > c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt
- c->lst.taken_empty_lebs) {
spin_unlock(&c->space_lock);
return 0;
}
available = ubifs_calc_available(c, min_idx_lebs);
outstanding = c->budg_data_growth + c->budg_dd_growth;
c->min_idx_lebs = min_idx_lebs;
spin_unlock(&c->space_lock);
if (available > outstanding)
free = ubifs_reported_space(c, available - outstanding);
else
free = 0;
return free;
}

677
fs/ubifs/commit.c Normal file
View file

@ -0,0 +1,677 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Adrian Hunter
* Artem Bityutskiy (Битюцкий Артём)
*/
/*
* This file implements functions that manage the running of the commit process.
* Each affected module has its own functions to accomplish their part in the
* commit and those functions are called here.
*
* The commit is the process whereby all updates to the index and LEB properties
* are written out together and the journal becomes empty. This keeps the
* file system consistent - at all times the state can be recreated by reading
* the index and LEB properties and then replaying the journal.
*
* The commit is split into two parts named "commit start" and "commit end".
* During commit start, the commit process has exclusive access to the journal
* by holding the commit semaphore down for writing. As few I/O operations as
* possible are performed during commit start, instead the nodes that are to be
* written are merely identified. During commit end, the commit semaphore is no
* longer held and the journal is again in operation, allowing users to continue
* to use the file system while the bulk of the commit I/O is performed. The
* purpose of this two-step approach is to prevent the commit from causing any
* latency blips. Note that in any case, the commit does not prevent lookups
* (as permitted by the TNC mutex), or access to VFS data structures e.g. page
* cache.
*/
#include <linux/freezer.h>
#include <linux/kthread.h>
#include "ubifs.h"
/**
* do_commit - commit the journal.
* @c: UBIFS file-system description object
*
* This function implements UBIFS commit. It has to be called with commit lock
* locked. Returns zero in case of success and a negative error code in case of
* failure.
*/
static int do_commit(struct ubifs_info *c)
{
int err, new_ltail_lnum, old_ltail_lnum, i;
struct ubifs_zbranch zroot;
struct ubifs_lp_stats lst;
dbg_cmt("start");
if (c->ro_media) {
err = -EROFS;
goto out_up;
}
/* Sync all write buffers (necessary for recovery) */
for (i = 0; i < c->jhead_cnt; i++) {
err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
if (err)
goto out_up;
}
err = ubifs_gc_start_commit(c);
if (err)
goto out_up;
err = dbg_check_lprops(c);
if (err)
goto out_up;
err = ubifs_log_start_commit(c, &new_ltail_lnum);
if (err)
goto out_up;
err = ubifs_tnc_start_commit(c, &zroot);
if (err)
goto out_up;
err = ubifs_lpt_start_commit(c);
if (err)
goto out_up;
err = ubifs_orphan_start_commit(c);
if (err)
goto out_up;
ubifs_get_lp_stats(c, &lst);
up_write(&c->commit_sem);
err = ubifs_tnc_end_commit(c);
if (err)
goto out;
err = ubifs_lpt_end_commit(c);
if (err)
goto out;
err = ubifs_orphan_end_commit(c);
if (err)
goto out;
old_ltail_lnum = c->ltail_lnum;
err = ubifs_log_end_commit(c, new_ltail_lnum);
if (err)
goto out;
err = dbg_check_old_index(c, &zroot);
if (err)
goto out;
mutex_lock(&c->mst_mutex);
c->mst_node->cmt_no = cpu_to_le64(++c->cmt_no);
c->mst_node->log_lnum = cpu_to_le32(new_ltail_lnum);
c->mst_node->root_lnum = cpu_to_le32(zroot.lnum);
c->mst_node->root_offs = cpu_to_le32(zroot.offs);
c->mst_node->root_len = cpu_to_le32(zroot.len);
c->mst_node->ihead_lnum = cpu_to_le32(c->ihead_lnum);
c->mst_node->ihead_offs = cpu_to_le32(c->ihead_offs);
c->mst_node->index_size = cpu_to_le64(c->old_idx_sz);
c->mst_node->lpt_lnum = cpu_to_le32(c->lpt_lnum);
c->mst_node->lpt_offs = cpu_to_le32(c->lpt_offs);
c->mst_node->nhead_lnum = cpu_to_le32(c->nhead_lnum);
c->mst_node->nhead_offs = cpu_to_le32(c->nhead_offs);
c->mst_node->ltab_lnum = cpu_to_le32(c->ltab_lnum);
c->mst_node->ltab_offs = cpu_to_le32(c->ltab_offs);
c->mst_node->lsave_lnum = cpu_to_le32(c->lsave_lnum);
c->mst_node->lsave_offs = cpu_to_le32(c->lsave_offs);
c->mst_node->lscan_lnum = cpu_to_le32(c->lscan_lnum);
c->mst_node->empty_lebs = cpu_to_le32(lst.empty_lebs);
c->mst_node->idx_lebs = cpu_to_le32(lst.idx_lebs);
c->mst_node->total_free = cpu_to_le64(lst.total_free);
c->mst_node->total_dirty = cpu_to_le64(lst.total_dirty);
c->mst_node->total_used = cpu_to_le64(lst.total_used);
c->mst_node->total_dead = cpu_to_le64(lst.total_dead);
c->mst_node->total_dark = cpu_to_le64(lst.total_dark);
if (c->no_orphs)
c->mst_node->flags |= cpu_to_le32(UBIFS_MST_NO_ORPHS);
else
c->mst_node->flags &= ~cpu_to_le32(UBIFS_MST_NO_ORPHS);
err = ubifs_write_master(c);
mutex_unlock(&c->mst_mutex);
if (err)
goto out;
err = ubifs_log_post_commit(c, old_ltail_lnum);
if (err)
goto out;
err = ubifs_gc_end_commit(c);
if (err)
goto out;
err = ubifs_lpt_post_commit(c);
if (err)
goto out;
spin_lock(&c->cs_lock);
c->cmt_state = COMMIT_RESTING;
wake_up(&c->cmt_wq);
dbg_cmt("commit end");
spin_unlock(&c->cs_lock);
return 0;
out_up:
up_write(&c->commit_sem);
out:
ubifs_err("commit failed, error %d", err);
spin_lock(&c->cs_lock);
c->cmt_state = COMMIT_BROKEN;
wake_up(&c->cmt_wq);
spin_unlock(&c->cs_lock);
ubifs_ro_mode(c, err);
return err;
}
/**
* run_bg_commit - run background commit if it is needed.
* @c: UBIFS file-system description object
*
* This function runs background commit if it is needed. Returns zero in case
* of success and a negative error code in case of failure.
*/
static int run_bg_commit(struct ubifs_info *c)
{
spin_lock(&c->cs_lock);
/*
* Run background commit only if background commit was requested or if
* commit is required.
*/
if (c->cmt_state != COMMIT_BACKGROUND &&
c->cmt_state != COMMIT_REQUIRED)
goto out;
spin_unlock(&c->cs_lock);
down_write(&c->commit_sem);
spin_lock(&c->cs_lock);
if (c->cmt_state == COMMIT_REQUIRED)
c->cmt_state = COMMIT_RUNNING_REQUIRED;
else if (c->cmt_state == COMMIT_BACKGROUND)
c->cmt_state = COMMIT_RUNNING_BACKGROUND;
else
goto out_cmt_unlock;
spin_unlock(&c->cs_lock);
return do_commit(c);
out_cmt_unlock:
up_write(&c->commit_sem);
out:
spin_unlock(&c->cs_lock);
return 0;
}
/**
* ubifs_bg_thread - UBIFS background thread function.
* @info: points to the file-system description object
*
* This function implements various file-system background activities:
* o when a write-buffer timer expires it synchronizes the appropriate
* write-buffer;
* o when the journal is about to be full, it starts in-advance commit.
*
* Note, other stuff like background garbage collection may be added here in
* future.
*/
int ubifs_bg_thread(void *info)
{
int err;
struct ubifs_info *c = info;
ubifs_msg("background thread \"%s\" started, PID %d",
c->bgt_name, current->pid);
set_freezable();
while (1) {
if (kthread_should_stop())
break;
if (try_to_freeze())
continue;
set_current_state(TASK_INTERRUPTIBLE);
/* Check if there is something to do */
if (!c->need_bgt) {
/*
* Nothing prevents us from going sleep now and
* be never woken up and block the task which
* could wait in 'kthread_stop()' forever.
*/
if (kthread_should_stop())
break;
schedule();
continue;
} else
__set_current_state(TASK_RUNNING);
c->need_bgt = 0;
err = ubifs_bg_wbufs_sync(c);
if (err)
ubifs_ro_mode(c, err);
run_bg_commit(c);
cond_resched();
}
dbg_msg("background thread \"%s\" stops", c->bgt_name);
return 0;
}
/**
* ubifs_commit_required - set commit state to "required".
* @c: UBIFS file-system description object
*
* This function is called if a commit is required but cannot be done from the
* calling function, so it is just flagged instead.
*/
void ubifs_commit_required(struct ubifs_info *c)
{
spin_lock(&c->cs_lock);
switch (c->cmt_state) {
case COMMIT_RESTING:
case COMMIT_BACKGROUND:
dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
dbg_cstate(COMMIT_REQUIRED));
c->cmt_state = COMMIT_REQUIRED;
break;
case COMMIT_RUNNING_BACKGROUND:
dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
dbg_cstate(COMMIT_RUNNING_REQUIRED));
c->cmt_state = COMMIT_RUNNING_REQUIRED;
break;
case COMMIT_REQUIRED:
case COMMIT_RUNNING_REQUIRED:
case COMMIT_BROKEN:
break;
}
spin_unlock(&c->cs_lock);
}
/**
* ubifs_request_bg_commit - notify the background thread to do a commit.
* @c: UBIFS file-system description object
*
* This function is called if the journal is full enough to make a commit
* worthwhile, so background thread is kicked to start it.
*/
void ubifs_request_bg_commit(struct ubifs_info *c)
{
spin_lock(&c->cs_lock);
if (c->cmt_state == COMMIT_RESTING) {
dbg_cmt("old: %s, new: %s", dbg_cstate(c->cmt_state),
dbg_cstate(COMMIT_BACKGROUND));
c->cmt_state = COMMIT_BACKGROUND;
spin_unlock(&c->cs_lock);
ubifs_wake_up_bgt(c);
} else
spin_unlock(&c->cs_lock);
}
/**
* wait_for_commit - wait for commit.
* @c: UBIFS file-system description object
*
* This function sleeps until the commit operation is no longer running.
*/
static int wait_for_commit(struct ubifs_info *c)
{
dbg_cmt("pid %d goes sleep", current->pid);
/*
* The following sleeps if the condition is false, and will be woken
* when the commit ends. It is possible, although very unlikely, that we
* will wake up and see the subsequent commit running, rather than the
* one we were waiting for, and go back to sleep. However, we will be
* woken again, so there is no danger of sleeping forever.
*/
wait_event(c->cmt_wq, c->cmt_state != COMMIT_RUNNING_BACKGROUND &&
c->cmt_state != COMMIT_RUNNING_REQUIRED);
dbg_cmt("commit finished, pid %d woke up", current->pid);
return 0;
}
/**
* ubifs_run_commit - run or wait for commit.
* @c: UBIFS file-system description object
*
* This function runs commit and returns zero in case of success and a negative
* error code in case of failure.
*/
int ubifs_run_commit(struct ubifs_info *c)
{
int err = 0;
spin_lock(&c->cs_lock);
if (c->cmt_state == COMMIT_BROKEN) {
err = -EINVAL;
goto out;
}
if (c->cmt_state == COMMIT_RUNNING_BACKGROUND)
/*
* We set the commit state to 'running required' to indicate
* that we want it to complete as quickly as possible.
*/
c->cmt_state = COMMIT_RUNNING_REQUIRED;
if (c->cmt_state == COMMIT_RUNNING_REQUIRED) {
spin_unlock(&c->cs_lock);
return wait_for_commit(c);
}
spin_unlock(&c->cs_lock);
/* Ok, the commit is indeed needed */
down_write(&c->commit_sem);
spin_lock(&c->cs_lock);
/*
* Since we unlocked 'c->cs_lock', the state may have changed, so
* re-check it.
*/
if (c->cmt_state == COMMIT_BROKEN) {
err = -EINVAL;
goto out_cmt_unlock;
}
if (c->cmt_state == COMMIT_RUNNING_BACKGROUND)
c->cmt_state = COMMIT_RUNNING_REQUIRED;
if (c->cmt_state == COMMIT_RUNNING_REQUIRED) {
up_write(&c->commit_sem);
spin_unlock(&c->cs_lock);
return wait_for_commit(c);
}
c->cmt_state = COMMIT_RUNNING_REQUIRED;
spin_unlock(&c->cs_lock);
err = do_commit(c);
return err;
out_cmt_unlock:
up_write(&c->commit_sem);
out:
spin_unlock(&c->cs_lock);
return err;
}
/**
* ubifs_gc_should_commit - determine if it is time for GC to run commit.
* @c: UBIFS file-system description object
*
* This function is called by garbage collection to determine if commit should
* be run. If commit state is @COMMIT_BACKGROUND, which means that the journal
* is full enough to start commit, this function returns true. It is not
* absolutely necessary to commit yet, but it feels like this should be better
* then to keep doing GC. This function returns %1 if GC has to initiate commit
* and %0 if not.
*/
int ubifs_gc_should_commit(struct ubifs_info *c)
{
int ret = 0;
spin_lock(&c->cs_lock);
if (c->cmt_state == COMMIT_BACKGROUND) {
dbg_cmt("commit required now");
c->cmt_state = COMMIT_REQUIRED;
} else
dbg_cmt("commit not requested");
if (c->cmt_state == COMMIT_REQUIRED)
ret = 1;
spin_unlock(&c->cs_lock);
return ret;
}
#ifdef CONFIG_UBIFS_FS_DEBUG
/**
* struct idx_node - hold index nodes during index tree traversal.
* @list: list
* @iip: index in parent (slot number of this indexing node in the parent
* indexing node)
* @upper_key: all keys in this indexing node have to be less or equivalent to
* this key
* @idx: index node (8-byte aligned because all node structures must be 8-byte
* aligned)
*/
struct idx_node {
struct list_head list;
int iip;
union ubifs_key upper_key;
struct ubifs_idx_node idx __attribute__((aligned(8)));
};
/**
* dbg_old_index_check_init - get information for the next old index check.
* @c: UBIFS file-system description object
* @zroot: root of the index
*
* This function records information about the index that will be needed for the
* next old index check i.e. 'dbg_check_old_index()'.
*
* This function returns %0 on success and a negative error code on failure.
*/
int dbg_old_index_check_init(struct ubifs_info *c, struct ubifs_zbranch *zroot)
{
struct ubifs_idx_node *idx;
int lnum, offs, len, err = 0;
c->old_zroot = *zroot;
lnum = c->old_zroot.lnum;
offs = c->old_zroot.offs;
len = c->old_zroot.len;
idx = kmalloc(c->max_idx_node_sz, GFP_NOFS);
if (!idx)
return -ENOMEM;
err = ubifs_read_node(c, idx, UBIFS_IDX_NODE, len, lnum, offs);
if (err)
goto out;
c->old_zroot_level = le16_to_cpu(idx->level);
c->old_zroot_sqnum = le64_to_cpu(idx->ch.sqnum);
out:
kfree(idx);
return err;
}
/**
* dbg_check_old_index - check the old copy of the index.
* @c: UBIFS file-system description object
* @zroot: root of the new index
*
* In order to be able to recover from an unclean unmount, a complete copy of
* the index must exist on flash. This is the "old" index. The commit process
* must write the "new" index to flash without overwriting or destroying any
* part of the old index. This function is run at commit end in order to check
* that the old index does indeed exist completely intact.
*
* This function returns %0 on success and a negative error code on failure.
*/
int dbg_check_old_index(struct ubifs_info *c, struct ubifs_zbranch *zroot)
{
int lnum, offs, len, err = 0, uninitialized_var(last_level), child_cnt;
int first = 1, iip;
union ubifs_key lower_key, upper_key, l_key, u_key;
unsigned long long uninitialized_var(last_sqnum);
struct ubifs_idx_node *idx;
struct list_head list;
struct idx_node *i;
size_t sz;
if (!(ubifs_chk_flags & UBIFS_CHK_OLD_IDX))
goto out;
INIT_LIST_HEAD(&list);
sz = sizeof(struct idx_node) + ubifs_idx_node_sz(c, c->fanout) -
UBIFS_IDX_NODE_SZ;
/* Start at the old zroot */
lnum = c->old_zroot.lnum;
offs = c->old_zroot.offs;
len = c->old_zroot.len;
iip = 0;
/*
* Traverse the index tree preorder depth-first i.e. do a node and then
* its subtrees from left to right.
*/
while (1) {
struct ubifs_branch *br;
/* Get the next index node */
i = kmalloc(sz, GFP_NOFS);
if (!i) {
err = -ENOMEM;
goto out_free;
}
i->iip = iip;
/* Keep the index nodes on our path in a linked list */
list_add_tail(&i->list, &list);
/* Read the index node */
idx = &i->idx;
err = ubifs_read_node(c, idx, UBIFS_IDX_NODE, len, lnum, offs);
if (err)
goto out_free;
/* Validate index node */
child_cnt = le16_to_cpu(idx->child_cnt);
if (child_cnt < 1 || child_cnt > c->fanout) {
err = 1;
goto out_dump;
}
if (first) {
first = 0;
/* Check root level and sqnum */
if (le16_to_cpu(idx->level) != c->old_zroot_level) {
err = 2;
goto out_dump;
}
if (le64_to_cpu(idx->ch.sqnum) != c->old_zroot_sqnum) {
err = 3;
goto out_dump;
}
/* Set last values as though root had a parent */
last_level = le16_to_cpu(idx->level) + 1;
last_sqnum = le64_to_cpu(idx->ch.sqnum) + 1;
key_read(c, ubifs_idx_key(c, idx), &lower_key);
highest_ino_key(c, &upper_key, INUM_WATERMARK);
}
key_copy(c, &upper_key, &i->upper_key);
if (le16_to_cpu(idx->level) != last_level - 1) {
err = 3;
goto out_dump;
}
/*
* The index is always written bottom up hence a child's sqnum
* is always less than the parents.
*/
if (le64_to_cpu(idx->ch.sqnum) >= last_sqnum) {
err = 4;
goto out_dump;
}
/* Check key range */
key_read(c, ubifs_idx_key(c, idx), &l_key);
br = ubifs_idx_branch(c, idx, child_cnt - 1);
key_read(c, &br->key, &u_key);
if (keys_cmp(c, &lower_key, &l_key) > 0) {
err = 5;
goto out_dump;
}
if (keys_cmp(c, &upper_key, &u_key) < 0) {
err = 6;
goto out_dump;
}
if (keys_cmp(c, &upper_key, &u_key) == 0)
if (!is_hash_key(c, &u_key)) {
err = 7;
goto out_dump;
}
/* Go to next index node */
if (le16_to_cpu(idx->level) == 0) {
/* At the bottom, so go up until can go right */
while (1) {
/* Drop the bottom of the list */
list_del(&i->list);
kfree(i);
/* No more list means we are done */
if (list_empty(&list))
goto out;
/* Look at the new bottom */
i = list_entry(list.prev, struct idx_node,
list);
idx = &i->idx;
/* Can we go right */
if (iip + 1 < le16_to_cpu(idx->child_cnt)) {
iip = iip + 1;
break;
} else
/* Nope, so go up again */
iip = i->iip;
}
} else
/* Go down left */
iip = 0;
/*
* We have the parent in 'idx' and now we set up for reading the
* child pointed to by slot 'iip'.
*/
last_level = le16_to_cpu(idx->level);
last_sqnum = le64_to_cpu(idx->ch.sqnum);
br = ubifs_idx_branch(c, idx, iip);
lnum = le32_to_cpu(br->lnum);
offs = le32_to_cpu(br->offs);
len = le32_to_cpu(br->len);
key_read(c, &br->key, &lower_key);
if (iip + 1 < le16_to_cpu(idx->child_cnt)) {
br = ubifs_idx_branch(c, idx, iip + 1);
key_read(c, &br->key, &upper_key);
} else
key_copy(c, &i->upper_key, &upper_key);
}
out:
err = dbg_old_index_check_init(c, zroot);
if (err)
goto out_free;
return 0;
out_dump:
dbg_err("dumping index node (iip=%d)", i->iip);
dbg_dump_node(c, idx);
list_del(&i->list);
kfree(i);
if (!list_empty(&list)) {
i = list_entry(list.prev, struct idx_node, list);
dbg_err("dumping parent index node");
dbg_dump_node(c, &i->idx);
}
out_free:
while (!list_empty(&list)) {
i = list_entry(list.next, struct idx_node, list);
list_del(&i->list);
kfree(i);
}
ubifs_err("failed, error %d", err);
if (err > 0)
err = -EINVAL;
return err;
}
#endif /* CONFIG_UBIFS_FS_DEBUG */

253
fs/ubifs/compress.c Normal file
View file

@ -0,0 +1,253 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
* Copyright (C) 2006, 2007 University of Szeged, Hungary
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Adrian Hunter
* Artem Bityutskiy (Битюцкий Артём)
* Zoltan Sogor
*/
/*
* This file provides a single place to access to compression and
* decompression.
*/
#include <linux/crypto.h>
#include "ubifs.h"
/* Fake description object for the "none" compressor */
static struct ubifs_compressor none_compr = {
.compr_type = UBIFS_COMPR_NONE,
.name = "no compression",
.capi_name = "",
};
#ifdef CONFIG_UBIFS_FS_LZO
static DEFINE_MUTEX(lzo_mutex);
static struct ubifs_compressor lzo_compr = {
.compr_type = UBIFS_COMPR_LZO,
.comp_mutex = &lzo_mutex,
.name = "LZO",
.capi_name = "lzo",
};
#else
static struct ubifs_compressor lzo_compr = {
.compr_type = UBIFS_COMPR_LZO,
.name = "LZO",
};
#endif
#ifdef CONFIG_UBIFS_FS_ZLIB
static DEFINE_MUTEX(deflate_mutex);
static DEFINE_MUTEX(inflate_mutex);
static struct ubifs_compressor zlib_compr = {
.compr_type = UBIFS_COMPR_ZLIB,
.comp_mutex = &deflate_mutex,
.decomp_mutex = &inflate_mutex,
.name = "zlib",
.capi_name = "deflate",
};
#else
static struct ubifs_compressor zlib_compr = {
.compr_type = UBIFS_COMPR_ZLIB,
.name = "zlib",
};
#endif
/* All UBIFS compressors */
struct ubifs_compressor *ubifs_compressors[UBIFS_COMPR_TYPES_CNT];
/**
* ubifs_compress - compress data.
* @in_buf: data to compress
* @in_len: length of the data to compress
* @out_buf: output buffer where compressed data should be stored
* @out_len: output buffer length is returned here
* @compr_type: type of compression to use on enter, actually used compression
* type on exit
*
* This function compresses input buffer @in_buf of length @in_len and stores
* the result in the output buffer @out_buf and the resulting length in
* @out_len. If the input buffer does not compress, it is just copied to the
* @out_buf. The same happens if @compr_type is %UBIFS_COMPR_NONE or if
* compression error occurred.
*
* Note, if the input buffer was not compressed, it is copied to the output
* buffer and %UBIFS_COMPR_NONE is returned in @compr_type.
*
* This functions returns %0 on success or a negative error code on failure.
*/
void ubifs_compress(const void *in_buf, int in_len, void *out_buf, int *out_len,
int *compr_type)
{
int err;
struct ubifs_compressor *compr = ubifs_compressors[*compr_type];
if (*compr_type == UBIFS_COMPR_NONE)
goto no_compr;
/* If the input data is small, do not even try to compress it */
if (in_len < UBIFS_MIN_COMPR_LEN)
goto no_compr;
if (compr->comp_mutex)
mutex_lock(compr->comp_mutex);
err = crypto_comp_compress(compr->cc, in_buf, in_len, out_buf,
out_len);
if (compr->comp_mutex)
mutex_unlock(compr->comp_mutex);
if (unlikely(err)) {
ubifs_warn("cannot compress %d bytes, compressor %s, "
"error %d, leave data uncompressed",
in_len, compr->name, err);
goto no_compr;
}
/*
* Presently, we just require that compression results in less data,
* rather than any defined minimum compression ratio or amount.
*/
if (ALIGN(*out_len, 8) >= ALIGN(in_len, 8))
goto no_compr;
return;
no_compr:
memcpy(out_buf, in_buf, in_len);
*out_len = in_len;
*compr_type = UBIFS_COMPR_NONE;
}
/**
* ubifs_decompress - decompress data.
* @in_buf: data to decompress
* @in_len: length of the data to decompress
* @out_buf: output buffer where decompressed data should
* @out_len: output length is returned here
* @compr_type: type of compression
*
* This function decompresses data from buffer @in_buf into buffer @out_buf.
* The length of the uncompressed data is returned in @out_len. This functions
* returns %0 on success or a negative error code on failure.
*/
int ubifs_decompress(const void *in_buf, int in_len, void *out_buf,
int *out_len, int compr_type)
{
int err;
struct ubifs_compressor *compr;
if (unlikely(compr_type < 0 || compr_type >= UBIFS_COMPR_TYPES_CNT)) {
ubifs_err("invalid compression type %d", compr_type);
return -EINVAL;
}
compr = ubifs_compressors[compr_type];
if (unlikely(!compr->capi_name)) {
ubifs_err("%s compression is not compiled in", compr->name);
return -EINVAL;
}
if (compr_type == UBIFS_COMPR_NONE) {
memcpy(out_buf, in_buf, in_len);
*out_len = in_len;
return 0;
}
if (compr->decomp_mutex)
mutex_lock(compr->decomp_mutex);
err = crypto_comp_decompress(compr->cc, in_buf, in_len, out_buf,
out_len);
if (compr->decomp_mutex)
mutex_unlock(compr->decomp_mutex);
if (err)
ubifs_err("cannot decompress %d bytes, compressor %s, "
"error %d", in_len, compr->name, err);
return err;
}
/**
* compr_init - initialize a compressor.
* @compr: compressor description object
*
* This function initializes the requested compressor and returns zero in case
* of success or a negative error code in case of failure.
*/
static int __init compr_init(struct ubifs_compressor *compr)
{
if (compr->capi_name) {
compr->cc = crypto_alloc_comp(compr->capi_name, 0, 0);
if (IS_ERR(compr->cc)) {
ubifs_err("cannot initialize compressor %s, error %ld",
compr->name, PTR_ERR(compr->cc));
return PTR_ERR(compr->cc);
}
}
ubifs_compressors[compr->compr_type] = compr;
return 0;
}
/**
* compr_exit - de-initialize a compressor.
* @compr: compressor description object
*/
static void compr_exit(struct ubifs_compressor *compr)
{
if (compr->capi_name)
crypto_free_comp(compr->cc);
return;
}
/**
* ubifs_compressors_init - initialize UBIFS compressors.
*
* This function initializes the compressor which were compiled in. Returns
* zero in case of success and a negative error code in case of failure.
*/
int __init ubifs_compressors_init(void)
{
int err;
err = compr_init(&lzo_compr);
if (err)
return err;
err = compr_init(&zlib_compr);
if (err)
goto out_lzo;
ubifs_compressors[UBIFS_COMPR_NONE] = &none_compr;
return 0;
out_lzo:
compr_exit(&lzo_compr);
return err;
}
/**
* ubifs_compressors_exit - de-initialize UBIFS compressors.
*/
void __exit ubifs_compressors_exit(void)
{
compr_exit(&lzo_compr);
compr_exit(&zlib_compr);
}

2289
fs/ubifs/debug.c Normal file

File diff suppressed because it is too large Load diff

403
fs/ubifs/debug.h Normal file
View file

@ -0,0 +1,403 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
#ifndef __UBIFS_DEBUG_H__
#define __UBIFS_DEBUG_H__
#ifdef CONFIG_UBIFS_FS_DEBUG
#define UBIFS_DBG(op) op
#define ubifs_assert(expr) do { \
if (unlikely(!(expr))) { \
printk(KERN_CRIT "UBIFS assert failed in %s at %u (pid %d)\n", \
__func__, __LINE__, current->pid); \
dbg_dump_stack(); \
} \
} while (0)
#define ubifs_assert_cmt_locked(c) do { \
if (unlikely(down_write_trylock(&(c)->commit_sem))) { \
up_write(&(c)->commit_sem); \
printk(KERN_CRIT "commit lock is not locked!\n"); \
ubifs_assert(0); \
} \
} while (0)
#define dbg_dump_stack() do { \
if (!dbg_failure_mode) \
dump_stack(); \
} while (0)
/* Generic debugging messages */
#define dbg_msg(fmt, ...) do { \
spin_lock(&dbg_lock); \
printk(KERN_DEBUG "UBIFS DBG (pid %d): %s: " fmt "\n", current->pid, \
__func__, ##__VA_ARGS__); \
spin_unlock(&dbg_lock); \
} while (0)
#define dbg_do_msg(typ, fmt, ...) do { \
if (ubifs_msg_flags & typ) \
dbg_msg(fmt, ##__VA_ARGS__); \
} while (0)
#define dbg_err(fmt, ...) do { \
spin_lock(&dbg_lock); \
ubifs_err(fmt, ##__VA_ARGS__); \
spin_unlock(&dbg_lock); \
} while (0)
const char *dbg_key_str0(const struct ubifs_info *c,
const union ubifs_key *key);
const char *dbg_key_str1(const struct ubifs_info *c,
const union ubifs_key *key);
/*
* DBGKEY macros require dbg_lock to be held, which it is in the dbg message
* macros.
*/
#define DBGKEY(key) dbg_key_str0(c, (key))
#define DBGKEY1(key) dbg_key_str1(c, (key))
/* General messages */
#define dbg_gen(fmt, ...) dbg_do_msg(UBIFS_MSG_GEN, fmt, ##__VA_ARGS__)
/* Additional journal messages */
#define dbg_jnl(fmt, ...) dbg_do_msg(UBIFS_MSG_JNL, fmt, ##__VA_ARGS__)
/* Additional TNC messages */
#define dbg_tnc(fmt, ...) dbg_do_msg(UBIFS_MSG_TNC, fmt, ##__VA_ARGS__)
/* Additional lprops messages */
#define dbg_lp(fmt, ...) dbg_do_msg(UBIFS_MSG_LP, fmt, ##__VA_ARGS__)
/* Additional LEB find messages */
#define dbg_find(fmt, ...) dbg_do_msg(UBIFS_MSG_FIND, fmt, ##__VA_ARGS__)
/* Additional mount messages */
#define dbg_mnt(fmt, ...) dbg_do_msg(UBIFS_MSG_MNT, fmt, ##__VA_ARGS__)
/* Additional I/O messages */
#define dbg_io(fmt, ...) dbg_do_msg(UBIFS_MSG_IO, fmt, ##__VA_ARGS__)
/* Additional commit messages */
#define dbg_cmt(fmt, ...) dbg_do_msg(UBIFS_MSG_CMT, fmt, ##__VA_ARGS__)
/* Additional budgeting messages */
#define dbg_budg(fmt, ...) dbg_do_msg(UBIFS_MSG_BUDG, fmt, ##__VA_ARGS__)
/* Additional log messages */
#define dbg_log(fmt, ...) dbg_do_msg(UBIFS_MSG_LOG, fmt, ##__VA_ARGS__)
/* Additional gc messages */
#define dbg_gc(fmt, ...) dbg_do_msg(UBIFS_MSG_GC, fmt, ##__VA_ARGS__)
/* Additional scan messages */
#define dbg_scan(fmt, ...) dbg_do_msg(UBIFS_MSG_SCAN, fmt, ##__VA_ARGS__)
/* Additional recovery messages */
#define dbg_rcvry(fmt, ...) dbg_do_msg(UBIFS_MSG_RCVRY, fmt, ##__VA_ARGS__)
/*
* Debugging message type flags (must match msg_type_names in debug.c).
*
* UBIFS_MSG_GEN: general messages
* UBIFS_MSG_JNL: journal messages
* UBIFS_MSG_MNT: mount messages
* UBIFS_MSG_CMT: commit messages
* UBIFS_MSG_FIND: LEB find messages
* UBIFS_MSG_BUDG: budgeting messages
* UBIFS_MSG_GC: garbage collection messages
* UBIFS_MSG_TNC: TNC messages
* UBIFS_MSG_LP: lprops messages
* UBIFS_MSG_IO: I/O messages
* UBIFS_MSG_LOG: log messages
* UBIFS_MSG_SCAN: scan messages
* UBIFS_MSG_RCVRY: recovery messages
*/
enum {
UBIFS_MSG_GEN = 0x1,
UBIFS_MSG_JNL = 0x2,
UBIFS_MSG_MNT = 0x4,
UBIFS_MSG_CMT = 0x8,
UBIFS_MSG_FIND = 0x10,
UBIFS_MSG_BUDG = 0x20,
UBIFS_MSG_GC = 0x40,
UBIFS_MSG_TNC = 0x80,
UBIFS_MSG_LP = 0x100,
UBIFS_MSG_IO = 0x200,
UBIFS_MSG_LOG = 0x400,
UBIFS_MSG_SCAN = 0x800,
UBIFS_MSG_RCVRY = 0x1000,
};
/* Debugging message type flags for each default debug message level */
#define UBIFS_MSG_LVL_0 0
#define UBIFS_MSG_LVL_1 0x1
#define UBIFS_MSG_LVL_2 0x7f
#define UBIFS_MSG_LVL_3 0xffff
/*
* Debugging check flags (must match chk_names in debug.c).
*
* UBIFS_CHK_GEN: general checks
* UBIFS_CHK_TNC: check TNC
* UBIFS_CHK_IDX_SZ: check index size
* UBIFS_CHK_ORPH: check orphans
* UBIFS_CHK_OLD_IDX: check the old index
* UBIFS_CHK_LPROPS: check lprops
* UBIFS_CHK_FS: check the file-system
*/
enum {
UBIFS_CHK_GEN = 0x1,
UBIFS_CHK_TNC = 0x2,
UBIFS_CHK_IDX_SZ = 0x4,
UBIFS_CHK_ORPH = 0x8,
UBIFS_CHK_OLD_IDX = 0x10,
UBIFS_CHK_LPROPS = 0x20,
UBIFS_CHK_FS = 0x40,
};
/*
* Special testing flags (must match tst_names in debug.c).
*
* UBIFS_TST_FORCE_IN_THE_GAPS: force the use of in-the-gaps method
* UBIFS_TST_RCVRY: failure mode for recovery testing
*/
enum {
UBIFS_TST_FORCE_IN_THE_GAPS = 0x2,
UBIFS_TST_RCVRY = 0x4,
};
#if CONFIG_UBIFS_FS_DEBUG_MSG_LVL == 1
#define UBIFS_MSG_FLAGS_DEFAULT UBIFS_MSG_LVL_1
#elif CONFIG_UBIFS_FS_DEBUG_MSG_LVL == 2
#define UBIFS_MSG_FLAGS_DEFAULT UBIFS_MSG_LVL_2
#elif CONFIG_UBIFS_FS_DEBUG_MSG_LVL == 3
#define UBIFS_MSG_FLAGS_DEFAULT UBIFS_MSG_LVL_3
#else
#define UBIFS_MSG_FLAGS_DEFAULT UBIFS_MSG_LVL_0
#endif
#ifdef CONFIG_UBIFS_FS_DEBUG_CHKS
#define UBIFS_CHK_FLAGS_DEFAULT 0xffffffff
#else
#define UBIFS_CHK_FLAGS_DEFAULT 0
#endif
extern spinlock_t dbg_lock;
extern unsigned int ubifs_msg_flags;
extern unsigned int ubifs_chk_flags;
extern unsigned int ubifs_tst_flags;
/* Dump functions */
const char *dbg_ntype(int type);
const char *dbg_cstate(int cmt_state);
const char *dbg_get_key_dump(const struct ubifs_info *c,
const union ubifs_key *key);
void dbg_dump_inode(const struct ubifs_info *c, const struct inode *inode);
void dbg_dump_node(const struct ubifs_info *c, const void *node);
void dbg_dump_budget_req(const struct ubifs_budget_req *req);
void dbg_dump_lstats(const struct ubifs_lp_stats *lst);
void dbg_dump_budg(struct ubifs_info *c);
void dbg_dump_lprop(const struct ubifs_info *c, const struct ubifs_lprops *lp);
void dbg_dump_lprops(struct ubifs_info *c);
void dbg_dump_leb(const struct ubifs_info *c, int lnum);
void dbg_dump_znode(const struct ubifs_info *c,
const struct ubifs_znode *znode);
void dbg_dump_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap, int cat);
void dbg_dump_pnode(struct ubifs_info *c, struct ubifs_pnode *pnode,
struct ubifs_nnode *parent, int iip);
void dbg_dump_tnc(struct ubifs_info *c);
void dbg_dump_index(struct ubifs_info *c);
/* Checking helper functions */
typedef int (*dbg_leaf_callback)(struct ubifs_info *c,
struct ubifs_zbranch *zbr, void *priv);
typedef int (*dbg_znode_callback)(struct ubifs_info *c,
struct ubifs_znode *znode, void *priv);
int dbg_walk_index(struct ubifs_info *c, dbg_leaf_callback leaf_cb,
dbg_znode_callback znode_cb, void *priv);
/* Checking functions */
int dbg_check_lprops(struct ubifs_info *c);
int dbg_old_index_check_init(struct ubifs_info *c, struct ubifs_zbranch *zroot);
int dbg_check_old_index(struct ubifs_info *c, struct ubifs_zbranch *zroot);
int dbg_check_cats(struct ubifs_info *c);
int dbg_check_ltab(struct ubifs_info *c);
int dbg_check_synced_i_size(struct inode *inode);
int dbg_check_dir_size(struct ubifs_info *c, const struct inode *dir);
int dbg_check_tnc(struct ubifs_info *c, int extra);
int dbg_check_idx_size(struct ubifs_info *c, long long idx_size);
int dbg_check_filesystem(struct ubifs_info *c);
void dbg_check_heap(struct ubifs_info *c, struct ubifs_lpt_heap *heap, int cat,
int add_pos);
int dbg_check_lprops(struct ubifs_info *c);
int dbg_check_lpt_nodes(struct ubifs_info *c, struct ubifs_cnode *cnode,
int row, int col);
/* Force the use of in-the-gaps method for testing */
#define dbg_force_in_the_gaps_enabled \
(ubifs_tst_flags & UBIFS_TST_FORCE_IN_THE_GAPS)
int dbg_force_in_the_gaps(void);
/* Failure mode for recovery testing */
#define dbg_failure_mode (ubifs_tst_flags & UBIFS_TST_RCVRY)
void dbg_failure_mode_registration(struct ubifs_info *c);
void dbg_failure_mode_deregistration(struct ubifs_info *c);
#ifndef UBIFS_DBG_PRESERVE_UBI
#define ubi_leb_read dbg_leb_read
#define ubi_leb_write dbg_leb_write
#define ubi_leb_change dbg_leb_change
#define ubi_leb_erase dbg_leb_erase
#define ubi_leb_unmap dbg_leb_unmap
#define ubi_is_mapped dbg_is_mapped
#define ubi_leb_map dbg_leb_map
#endif
int dbg_leb_read(struct ubi_volume_desc *desc, int lnum, char *buf, int offset,
int len, int check);
int dbg_leb_write(struct ubi_volume_desc *desc, int lnum, const void *buf,
int offset, int len, int dtype);
int dbg_leb_change(struct ubi_volume_desc *desc, int lnum, const void *buf,
int len, int dtype);
int dbg_leb_erase(struct ubi_volume_desc *desc, int lnum);
int dbg_leb_unmap(struct ubi_volume_desc *desc, int lnum);
int dbg_is_mapped(struct ubi_volume_desc *desc, int lnum);
int dbg_leb_map(struct ubi_volume_desc *desc, int lnum, int dtype);
static inline int dbg_read(struct ubi_volume_desc *desc, int lnum, char *buf,
int offset, int len)
{
return dbg_leb_read(desc, lnum, buf, offset, len, 0);
}
static inline int dbg_write(struct ubi_volume_desc *desc, int lnum,
const void *buf, int offset, int len)
{
return dbg_leb_write(desc, lnum, buf, offset, len, UBI_UNKNOWN);
}
static inline int dbg_change(struct ubi_volume_desc *desc, int lnum,
const void *buf, int len)
{
return dbg_leb_change(desc, lnum, buf, len, UBI_UNKNOWN);
}
#else /* !CONFIG_UBIFS_FS_DEBUG */
#define UBIFS_DBG(op)
#define ubifs_assert(expr) ({})
#define ubifs_assert_cmt_locked(c)
#define dbg_dump_stack()
#define dbg_err(fmt, ...) ({})
#define dbg_msg(fmt, ...) ({})
#define dbg_key(c, key, fmt, ...) ({})
#define dbg_gen(fmt, ...) ({})
#define dbg_jnl(fmt, ...) ({})
#define dbg_tnc(fmt, ...) ({})
#define dbg_lp(fmt, ...) ({})
#define dbg_find(fmt, ...) ({})
#define dbg_mnt(fmt, ...) ({})
#define dbg_io(fmt, ...) ({})
#define dbg_cmt(fmt, ...) ({})
#define dbg_budg(fmt, ...) ({})
#define dbg_log(fmt, ...) ({})
#define dbg_gc(fmt, ...) ({})
#define dbg_scan(fmt, ...) ({})
#define dbg_rcvry(fmt, ...) ({})
#define dbg_ntype(type) ""
#define dbg_cstate(cmt_state) ""
#define dbg_get_key_dump(c, key) ({})
#define dbg_dump_inode(c, inode) ({})
#define dbg_dump_node(c, node) ({})
#define dbg_dump_budget_req(req) ({})
#define dbg_dump_lstats(lst) ({})
#define dbg_dump_budg(c) ({})
#define dbg_dump_lprop(c, lp) ({})
#define dbg_dump_lprops(c) ({})
#define dbg_dump_leb(c, lnum) ({})
#define dbg_dump_znode(c, znode) ({})
#define dbg_dump_heap(c, heap, cat) ({})
#define dbg_dump_pnode(c, pnode, parent, iip) ({})
#define dbg_dump_tnc(c) ({})
#define dbg_dump_index(c) ({})
#define dbg_walk_index(c, leaf_cb, znode_cb, priv) 0
#define dbg_old_index_check_init(c, zroot) 0
#define dbg_check_old_index(c, zroot) 0
#define dbg_check_cats(c) 0
#define dbg_check_ltab(c) 0
#define dbg_check_synced_i_size(inode) 0
#define dbg_check_dir_size(c, dir) 0
#define dbg_check_tnc(c, x) 0
#define dbg_check_idx_size(c, idx_size) 0
#define dbg_check_filesystem(c) 0
#define dbg_check_heap(c, heap, cat, add_pos) ({})
#define dbg_check_lprops(c) 0
#define dbg_check_lpt_nodes(c, cnode, row, col) 0
#define dbg_force_in_the_gaps_enabled 0
#define dbg_force_in_the_gaps() 0
#define dbg_failure_mode 0
#define dbg_failure_mode_registration(c) ({})
#define dbg_failure_mode_deregistration(c) ({})
#endif /* !CONFIG_UBIFS_FS_DEBUG */
#endif /* !__UBIFS_DEBUG_H__ */

1240
fs/ubifs/dir.c Normal file

File diff suppressed because it is too large Load diff

1275
fs/ubifs/file.c Normal file

File diff suppressed because it is too large Load diff

975
fs/ubifs/find.c Normal file
View file

@ -0,0 +1,975 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/*
* This file contains functions for finding LEBs for various purposes e.g.
* garbage collection. In general, lprops category heaps and lists are used
* for fast access, falling back on scanning the LPT as a last resort.
*/
#include <linux/sort.h>
#include "ubifs.h"
/**
* struct scan_data - data provided to scan callback functions
* @min_space: minimum number of bytes for which to scan
* @pick_free: whether it is OK to scan for empty LEBs
* @lnum: LEB number found is returned here
* @exclude_index: whether to exclude index LEBs
*/
struct scan_data {
int min_space;
int pick_free;
int lnum;
int exclude_index;
};
/**
* valuable - determine whether LEB properties are valuable.
* @c: the UBIFS file-system description object
* @lprops: LEB properties
*
* This function return %1 if the LEB properties should be added to the LEB
* properties tree in memory. Otherwise %0 is returned.
*/
static int valuable(struct ubifs_info *c, const struct ubifs_lprops *lprops)
{
int n, cat = lprops->flags & LPROPS_CAT_MASK;
struct ubifs_lpt_heap *heap;
switch (cat) {
case LPROPS_DIRTY:
case LPROPS_DIRTY_IDX:
case LPROPS_FREE:
heap = &c->lpt_heap[cat - 1];
if (heap->cnt < heap->max_cnt)
return 1;
if (lprops->free + lprops->dirty >= c->dark_wm)
return 1;
return 0;
case LPROPS_EMPTY:
n = c->lst.empty_lebs + c->freeable_cnt -
c->lst.taken_empty_lebs;
if (n < c->lsave_cnt)
return 1;
return 0;
case LPROPS_FREEABLE:
return 1;
case LPROPS_FRDI_IDX:
return 1;
}
return 0;
}
/**
* scan_for_dirty_cb - dirty space scan callback.
* @c: the UBIFS file-system description object
* @lprops: LEB properties to scan
* @in_tree: whether the LEB properties are in main memory
* @data: information passed to and from the caller of the scan
*
* This function returns a code that indicates whether the scan should continue
* (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
* in main memory (%LPT_SCAN_ADD), or whether the scan should stop
* (%LPT_SCAN_STOP).
*/
static int scan_for_dirty_cb(struct ubifs_info *c,
const struct ubifs_lprops *lprops, int in_tree,
struct scan_data *data)
{
int ret = LPT_SCAN_CONTINUE;
/* Exclude LEBs that are currently in use */
if (lprops->flags & LPROPS_TAKEN)
return LPT_SCAN_CONTINUE;
/* Determine whether to add these LEB properties to the tree */
if (!in_tree && valuable(c, lprops))
ret |= LPT_SCAN_ADD;
/* Exclude LEBs with too little space */
if (lprops->free + lprops->dirty < data->min_space)
return ret;
/* If specified, exclude index LEBs */
if (data->exclude_index && lprops->flags & LPROPS_INDEX)
return ret;
/* If specified, exclude empty or freeable LEBs */
if (lprops->free + lprops->dirty == c->leb_size) {
if (!data->pick_free)
return ret;
/* Exclude LEBs with too little dirty space (unless it is empty) */
} else if (lprops->dirty < c->dead_wm)
return ret;
/* Finally we found space */
data->lnum = lprops->lnum;
return LPT_SCAN_ADD | LPT_SCAN_STOP;
}
/**
* scan_for_dirty - find a data LEB with free space.
* @c: the UBIFS file-system description object
* @min_space: minimum amount free plus dirty space the returned LEB has to
* have
* @pick_free: if it is OK to return a free or freeable LEB
* @exclude_index: whether to exclude index LEBs
*
* This function returns a pointer to the LEB properties found or a negative
* error code.
*/
static const struct ubifs_lprops *scan_for_dirty(struct ubifs_info *c,
int min_space, int pick_free,
int exclude_index)
{
const struct ubifs_lprops *lprops;
struct ubifs_lpt_heap *heap;
struct scan_data data;
int err, i;
/* There may be an LEB with enough dirty space on the free heap */
heap = &c->lpt_heap[LPROPS_FREE - 1];
for (i = 0; i < heap->cnt; i++) {
lprops = heap->arr[i];
if (lprops->free + lprops->dirty < min_space)
continue;
if (lprops->dirty < c->dead_wm)
continue;
return lprops;
}
/*
* A LEB may have fallen off of the bottom of the dirty heap, and ended
* up as uncategorized even though it has enough dirty space for us now,
* so check the uncategorized list. N.B. neither empty nor freeable LEBs
* can end up as uncategorized because they are kept on lists not
* finite-sized heaps.
*/
list_for_each_entry(lprops, &c->uncat_list, list) {
if (lprops->flags & LPROPS_TAKEN)
continue;
if (lprops->free + lprops->dirty < min_space)
continue;
if (exclude_index && (lprops->flags & LPROPS_INDEX))
continue;
if (lprops->dirty < c->dead_wm)
continue;
return lprops;
}
/* We have looked everywhere in main memory, now scan the flash */
if (c->pnodes_have >= c->pnode_cnt)
/* All pnodes are in memory, so skip scan */
return ERR_PTR(-ENOSPC);
data.min_space = min_space;
data.pick_free = pick_free;
data.lnum = -1;
data.exclude_index = exclude_index;
err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
(ubifs_lpt_scan_callback)scan_for_dirty_cb,
&data);
if (err)
return ERR_PTR(err);
ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
c->lscan_lnum = data.lnum;
lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
if (IS_ERR(lprops))
return lprops;
ubifs_assert(lprops->lnum == data.lnum);
ubifs_assert(lprops->free + lprops->dirty >= min_space);
ubifs_assert(lprops->dirty >= c->dead_wm ||
(pick_free &&
lprops->free + lprops->dirty == c->leb_size));
ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
ubifs_assert(!exclude_index || !(lprops->flags & LPROPS_INDEX));
return lprops;
}
/**
* ubifs_find_dirty_leb - find a dirty LEB for the Garbage Collector.
* @c: the UBIFS file-system description object
* @ret_lp: LEB properties are returned here on exit
* @min_space: minimum amount free plus dirty space the returned LEB has to
* have
* @pick_free: controls whether it is OK to pick empty or index LEBs
*
* This function tries to find a dirty logical eraseblock which has at least
* @min_space free and dirty space. It prefers to take an LEB from the dirty or
* dirty index heap, and it falls-back to LPT scanning if the heaps are empty
* or do not have an LEB which satisfies the @min_space criteria.
*
* Note:
* o LEBs which have less than dead watermark of dirty space are never picked
* by this function;
*
* Returns zero and the LEB properties of
* found dirty LEB in case of success, %-ENOSPC if no dirty LEB was found and a
* negative error code in case of other failures. The returned LEB is marked as
* "taken".
*
* The additional @pick_free argument controls if this function has to return a
* free or freeable LEB if one is present. For example, GC must to set it to %1,
* when called from the journal space reservation function, because the
* appearance of free space may coincide with the loss of enough dirty space
* for GC to succeed anyway.
*
* In contrast, if the Garbage Collector is called from budgeting, it should
* just make free space, not return LEBs which are already free or freeable.
*
* In addition @pick_free is set to %2 by the recovery process in order to
* recover gc_lnum in which case an index LEB must not be returned.
*/
int ubifs_find_dirty_leb(struct ubifs_info *c, struct ubifs_lprops *ret_lp,
int min_space, int pick_free)
{
int err = 0, sum, exclude_index = pick_free == 2 ? 1 : 0;
const struct ubifs_lprops *lp = NULL, *idx_lp = NULL;
struct ubifs_lpt_heap *heap, *idx_heap;
ubifs_get_lprops(c);
if (pick_free) {
int lebs, rsvd_idx_lebs = 0;
spin_lock(&c->space_lock);
lebs = c->lst.empty_lebs;
lebs += c->freeable_cnt - c->lst.taken_empty_lebs;
/*
* Note, the index may consume more LEBs than have been reserved
* for it. It is OK because it might be consolidated by GC.
* But if the index takes fewer LEBs than it is reserved for it,
* this function must avoid picking those reserved LEBs.
*/
if (c->min_idx_lebs >= c->lst.idx_lebs) {
rsvd_idx_lebs = c->min_idx_lebs - c->lst.idx_lebs;
exclude_index = 1;
}
spin_unlock(&c->space_lock);
/* Check if there are enough free LEBs for the index */
if (rsvd_idx_lebs < lebs) {
/* OK, try to find an empty LEB */
lp = ubifs_fast_find_empty(c);
if (lp)
goto found;
/* Or a freeable LEB */
lp = ubifs_fast_find_freeable(c);
if (lp)
goto found;
} else
/*
* We cannot pick free/freeable LEBs in the below code.
*/
pick_free = 0;
} else {
spin_lock(&c->space_lock);
exclude_index = (c->min_idx_lebs >= c->lst.idx_lebs);
spin_unlock(&c->space_lock);
}
/* Look on the dirty and dirty index heaps */
heap = &c->lpt_heap[LPROPS_DIRTY - 1];
idx_heap = &c->lpt_heap[LPROPS_DIRTY_IDX - 1];
if (idx_heap->cnt && !exclude_index) {
idx_lp = idx_heap->arr[0];
sum = idx_lp->free + idx_lp->dirty;
/*
* Since we reserve twice as more space for the index than it
* actually takes, it does not make sense to pick indexing LEBs
* with less than half LEB of dirty space.
*/
if (sum < min_space || sum < c->half_leb_size)
idx_lp = NULL;
}
if (heap->cnt) {
lp = heap->arr[0];
if (lp->dirty + lp->free < min_space)
lp = NULL;
}
/* Pick the LEB with most space */
if (idx_lp && lp) {
if (idx_lp->free + idx_lp->dirty >= lp->free + lp->dirty)
lp = idx_lp;
} else if (idx_lp && !lp)
lp = idx_lp;
if (lp) {
ubifs_assert(lp->dirty >= c->dead_wm);
goto found;
}
/* Did not find a dirty LEB on the dirty heaps, have to scan */
dbg_find("scanning LPT for a dirty LEB");
lp = scan_for_dirty(c, min_space, pick_free, exclude_index);
if (IS_ERR(lp)) {
err = PTR_ERR(lp);
goto out;
}
ubifs_assert(lp->dirty >= c->dead_wm ||
(pick_free && lp->free + lp->dirty == c->leb_size));
found:
dbg_find("found LEB %d, free %d, dirty %d, flags %#x",
lp->lnum, lp->free, lp->dirty, lp->flags);
lp = ubifs_change_lp(c, lp, LPROPS_NC, LPROPS_NC,
lp->flags | LPROPS_TAKEN, 0);
if (IS_ERR(lp)) {
err = PTR_ERR(lp);
goto out;
}
memcpy(ret_lp, lp, sizeof(struct ubifs_lprops));
out:
ubifs_release_lprops(c);
return err;
}
/**
* scan_for_free_cb - free space scan callback.
* @c: the UBIFS file-system description object
* @lprops: LEB properties to scan
* @in_tree: whether the LEB properties are in main memory
* @data: information passed to and from the caller of the scan
*
* This function returns a code that indicates whether the scan should continue
* (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
* in main memory (%LPT_SCAN_ADD), or whether the scan should stop
* (%LPT_SCAN_STOP).
*/
static int scan_for_free_cb(struct ubifs_info *c,
const struct ubifs_lprops *lprops, int in_tree,
struct scan_data *data)
{
int ret = LPT_SCAN_CONTINUE;
/* Exclude LEBs that are currently in use */
if (lprops->flags & LPROPS_TAKEN)
return LPT_SCAN_CONTINUE;
/* Determine whether to add these LEB properties to the tree */
if (!in_tree && valuable(c, lprops))
ret |= LPT_SCAN_ADD;
/* Exclude index LEBs */
if (lprops->flags & LPROPS_INDEX)
return ret;
/* Exclude LEBs with too little space */
if (lprops->free < data->min_space)
return ret;
/* If specified, exclude empty LEBs */
if (!data->pick_free && lprops->free == c->leb_size)
return ret;
/*
* LEBs that have only free and dirty space must not be allocated
* because they may have been unmapped already or they may have data
* that is obsolete only because of nodes that are still sitting in a
* wbuf.
*/
if (lprops->free + lprops->dirty == c->leb_size && lprops->dirty > 0)
return ret;
/* Finally we found space */
data->lnum = lprops->lnum;
return LPT_SCAN_ADD | LPT_SCAN_STOP;
}
/**
* do_find_free_space - find a data LEB with free space.
* @c: the UBIFS file-system description object
* @min_space: minimum amount of free space required
* @pick_free: whether it is OK to scan for empty LEBs
* @squeeze: whether to try to find space in a non-empty LEB first
*
* This function returns a pointer to the LEB properties found or a negative
* error code.
*/
static
const struct ubifs_lprops *do_find_free_space(struct ubifs_info *c,
int min_space, int pick_free,
int squeeze)
{
const struct ubifs_lprops *lprops;
struct ubifs_lpt_heap *heap;
struct scan_data data;
int err, i;
if (squeeze) {
lprops = ubifs_fast_find_free(c);
if (lprops && lprops->free >= min_space)
return lprops;
}
if (pick_free) {
lprops = ubifs_fast_find_empty(c);
if (lprops)
return lprops;
}
if (!squeeze) {
lprops = ubifs_fast_find_free(c);
if (lprops && lprops->free >= min_space)
return lprops;
}
/* There may be an LEB with enough free space on the dirty heap */
heap = &c->lpt_heap[LPROPS_DIRTY - 1];
for (i = 0; i < heap->cnt; i++) {
lprops = heap->arr[i];
if (lprops->free >= min_space)
return lprops;
}
/*
* A LEB may have fallen off of the bottom of the free heap, and ended
* up as uncategorized even though it has enough free space for us now,
* so check the uncategorized list. N.B. neither empty nor freeable LEBs
* can end up as uncategorized because they are kept on lists not
* finite-sized heaps.
*/
list_for_each_entry(lprops, &c->uncat_list, list) {
if (lprops->flags & LPROPS_TAKEN)
continue;
if (lprops->flags & LPROPS_INDEX)
continue;
if (lprops->free >= min_space)
return lprops;
}
/* We have looked everywhere in main memory, now scan the flash */
if (c->pnodes_have >= c->pnode_cnt)
/* All pnodes are in memory, so skip scan */
return ERR_PTR(-ENOSPC);
data.min_space = min_space;
data.pick_free = pick_free;
data.lnum = -1;
err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
(ubifs_lpt_scan_callback)scan_for_free_cb,
&data);
if (err)
return ERR_PTR(err);
ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
c->lscan_lnum = data.lnum;
lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
if (IS_ERR(lprops))
return lprops;
ubifs_assert(lprops->lnum == data.lnum);
ubifs_assert(lprops->free >= min_space);
ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
ubifs_assert(!(lprops->flags & LPROPS_INDEX));
return lprops;
}
/**
* ubifs_find_free_space - find a data LEB with free space.
* @c: the UBIFS file-system description object
* @min_space: minimum amount of required free space
* @free: contains amount of free space in the LEB on exit
* @squeeze: whether to try to find space in a non-empty LEB first
*
* This function looks for an LEB with at least @min_space bytes of free space.
* It tries to find an empty LEB if possible. If no empty LEBs are available,
* this function searches for a non-empty data LEB. The returned LEB is marked
* as "taken".
*
* This function returns found LEB number in case of success, %-ENOSPC if it
* failed to find a LEB with @min_space bytes of free space and other a negative
* error codes in case of failure.
*/
int ubifs_find_free_space(struct ubifs_info *c, int min_space, int *free,
int squeeze)
{
const struct ubifs_lprops *lprops;
int lebs, rsvd_idx_lebs, pick_free = 0, err, lnum, flags;
dbg_find("min_space %d", min_space);
ubifs_get_lprops(c);
/* Check if there are enough empty LEBs for commit */
spin_lock(&c->space_lock);
if (c->min_idx_lebs > c->lst.idx_lebs)
rsvd_idx_lebs = c->min_idx_lebs - c->lst.idx_lebs;
else
rsvd_idx_lebs = 0;
lebs = c->lst.empty_lebs + c->freeable_cnt + c->idx_gc_cnt -
c->lst.taken_empty_lebs;
ubifs_assert(lebs + c->lst.idx_lebs >= c->min_idx_lebs);
if (rsvd_idx_lebs < lebs)
/*
* OK to allocate an empty LEB, but we still don't want to go
* looking for one if there aren't any.
*/
if (c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) {
pick_free = 1;
/*
* Because we release the space lock, we must account
* for this allocation here. After the LEB properties
* flags have been updated, we subtract one. Note, the
* result of this is that lprops also decreases
* @taken_empty_lebs in 'ubifs_change_lp()', so it is
* off by one for a short period of time which may
* introduce a small disturbance to budgeting
* calculations, but this is harmless because at the
* worst case this would make the budgeting subsystem
* be more pessimistic than needed.
*
* Fundamentally, this is about serialization of the
* budgeting and lprops subsystems. We could make the
* @space_lock a mutex and avoid dropping it before
* calling 'ubifs_change_lp()', but mutex is more
* heavy-weight, and we want budgeting to be as fast as
* possible.
*/
c->lst.taken_empty_lebs += 1;
}
spin_unlock(&c->space_lock);
lprops = do_find_free_space(c, min_space, pick_free, squeeze);
if (IS_ERR(lprops)) {
err = PTR_ERR(lprops);
goto out;
}
lnum = lprops->lnum;
flags = lprops->flags | LPROPS_TAKEN;
lprops = ubifs_change_lp(c, lprops, LPROPS_NC, LPROPS_NC, flags, 0);
if (IS_ERR(lprops)) {
err = PTR_ERR(lprops);
goto out;
}
if (pick_free) {
spin_lock(&c->space_lock);
c->lst.taken_empty_lebs -= 1;
spin_unlock(&c->space_lock);
}
*free = lprops->free;
ubifs_release_lprops(c);
if (*free == c->leb_size) {
/*
* Ensure that empty LEBs have been unmapped. They may not have
* been, for example, because of an unclean unmount. Also
* LEBs that were freeable LEBs (free + dirty == leb_size) will
* not have been unmapped.
*/
err = ubifs_leb_unmap(c, lnum);
if (err)
return err;
}
dbg_find("found LEB %d, free %d", lnum, *free);
ubifs_assert(*free >= min_space);
return lnum;
out:
if (pick_free) {
spin_lock(&c->space_lock);
c->lst.taken_empty_lebs -= 1;
spin_unlock(&c->space_lock);
}
ubifs_release_lprops(c);
return err;
}
/**
* scan_for_idx_cb - callback used by the scan for a free LEB for the index.
* @c: the UBIFS file-system description object
* @lprops: LEB properties to scan
* @in_tree: whether the LEB properties are in main memory
* @data: information passed to and from the caller of the scan
*
* This function returns a code that indicates whether the scan should continue
* (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
* in main memory (%LPT_SCAN_ADD), or whether the scan should stop
* (%LPT_SCAN_STOP).
*/
static int scan_for_idx_cb(struct ubifs_info *c,
const struct ubifs_lprops *lprops, int in_tree,
struct scan_data *data)
{
int ret = LPT_SCAN_CONTINUE;
/* Exclude LEBs that are currently in use */
if (lprops->flags & LPROPS_TAKEN)
return LPT_SCAN_CONTINUE;
/* Determine whether to add these LEB properties to the tree */
if (!in_tree && valuable(c, lprops))
ret |= LPT_SCAN_ADD;
/* Exclude index LEBS */
if (lprops->flags & LPROPS_INDEX)
return ret;
/* Exclude LEBs that cannot be made empty */
if (lprops->free + lprops->dirty != c->leb_size)
return ret;
/*
* We are allocating for the index so it is safe to allocate LEBs with
* only free and dirty space, because write buffers are sync'd at commit
* start.
*/
data->lnum = lprops->lnum;
return LPT_SCAN_ADD | LPT_SCAN_STOP;
}
/**
* scan_for_leb_for_idx - scan for a free LEB for the index.
* @c: the UBIFS file-system description object
*/
static const struct ubifs_lprops *scan_for_leb_for_idx(struct ubifs_info *c)
{
struct ubifs_lprops *lprops;
struct scan_data data;
int err;
data.lnum = -1;
err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
(ubifs_lpt_scan_callback)scan_for_idx_cb,
&data);
if (err)
return ERR_PTR(err);
ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
c->lscan_lnum = data.lnum;
lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
if (IS_ERR(lprops))
return lprops;
ubifs_assert(lprops->lnum == data.lnum);
ubifs_assert(lprops->free + lprops->dirty == c->leb_size);
ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
ubifs_assert(!(lprops->flags & LPROPS_INDEX));
return lprops;
}
/**
* ubifs_find_free_leb_for_idx - find a free LEB for the index.
* @c: the UBIFS file-system description object
*
* This function looks for a free LEB and returns that LEB number. The returned
* LEB is marked as "taken", "index".
*
* Only empty LEBs are allocated. This is for two reasons. First, the commit
* calculates the number of LEBs to allocate based on the assumption that they
* will be empty. Secondly, free space at the end of an index LEB is not
* guaranteed to be empty because it may have been used by the in-the-gaps
* method prior to an unclean unmount.
*
* If no LEB is found %-ENOSPC is returned. For other failures another negative
* error code is returned.
*/
int ubifs_find_free_leb_for_idx(struct ubifs_info *c)
{
const struct ubifs_lprops *lprops;
int lnum = -1, err, flags;
ubifs_get_lprops(c);
lprops = ubifs_fast_find_empty(c);
if (!lprops) {
lprops = ubifs_fast_find_freeable(c);
if (!lprops) {
ubifs_assert(c->freeable_cnt == 0);
if (c->lst.empty_lebs - c->lst.taken_empty_lebs > 0) {
lprops = scan_for_leb_for_idx(c);
if (IS_ERR(lprops)) {
err = PTR_ERR(lprops);
goto out;
}
}
}
}
if (!lprops) {
err = -ENOSPC;
goto out;
}
lnum = lprops->lnum;
dbg_find("found LEB %d, free %d, dirty %d, flags %#x",
lnum, lprops->free, lprops->dirty, lprops->flags);
flags = lprops->flags | LPROPS_TAKEN | LPROPS_INDEX;
lprops = ubifs_change_lp(c, lprops, c->leb_size, 0, flags, 0);
if (IS_ERR(lprops)) {
err = PTR_ERR(lprops);
goto out;
}
ubifs_release_lprops(c);
/*
* Ensure that empty LEBs have been unmapped. They may not have been,
* for example, because of an unclean unmount. Also LEBs that were
* freeable LEBs (free + dirty == leb_size) will not have been unmapped.
*/
err = ubifs_leb_unmap(c, lnum);
if (err) {
ubifs_change_one_lp(c, lnum, LPROPS_NC, LPROPS_NC, 0,
LPROPS_TAKEN | LPROPS_INDEX, 0);
return err;
}
return lnum;
out:
ubifs_release_lprops(c);
return err;
}
static int cmp_dirty_idx(const struct ubifs_lprops **a,
const struct ubifs_lprops **b)
{
const struct ubifs_lprops *lpa = *a;
const struct ubifs_lprops *lpb = *b;
return lpa->dirty + lpa->free - lpb->dirty - lpb->free;
}
static void swap_dirty_idx(struct ubifs_lprops **a, struct ubifs_lprops **b,
int size)
{
struct ubifs_lprops *t = *a;
*a = *b;
*b = t;
}
/**
* ubifs_save_dirty_idx_lnums - save an array of the most dirty index LEB nos.
* @c: the UBIFS file-system description object
*
* This function is called each commit to create an array of LEB numbers of
* dirty index LEBs sorted in order of dirty and free space. This is used by
* the in-the-gaps method of TNC commit.
*/
int ubifs_save_dirty_idx_lnums(struct ubifs_info *c)
{
int i;
ubifs_get_lprops(c);
/* Copy the LPROPS_DIRTY_IDX heap */
c->dirty_idx.cnt = c->lpt_heap[LPROPS_DIRTY_IDX - 1].cnt;
memcpy(c->dirty_idx.arr, c->lpt_heap[LPROPS_DIRTY_IDX - 1].arr,
sizeof(void *) * c->dirty_idx.cnt);
/* Sort it so that the dirtiest is now at the end */
sort(c->dirty_idx.arr, c->dirty_idx.cnt, sizeof(void *),
(int (*)(const void *, const void *))cmp_dirty_idx,
(void (*)(void *, void *, int))swap_dirty_idx);
dbg_find("found %d dirty index LEBs", c->dirty_idx.cnt);
if (c->dirty_idx.cnt)
dbg_find("dirtiest index LEB is %d with dirty %d and free %d",
c->dirty_idx.arr[c->dirty_idx.cnt - 1]->lnum,
c->dirty_idx.arr[c->dirty_idx.cnt - 1]->dirty,
c->dirty_idx.arr[c->dirty_idx.cnt - 1]->free);
/* Replace the lprops pointers with LEB numbers */
for (i = 0; i < c->dirty_idx.cnt; i++)
c->dirty_idx.arr[i] = (void *)(size_t)c->dirty_idx.arr[i]->lnum;
ubifs_release_lprops(c);
return 0;
}
/**
* scan_dirty_idx_cb - callback used by the scan for a dirty index LEB.
* @c: the UBIFS file-system description object
* @lprops: LEB properties to scan
* @in_tree: whether the LEB properties are in main memory
* @data: information passed to and from the caller of the scan
*
* This function returns a code that indicates whether the scan should continue
* (%LPT_SCAN_CONTINUE), whether the LEB properties should be added to the tree
* in main memory (%LPT_SCAN_ADD), or whether the scan should stop
* (%LPT_SCAN_STOP).
*/
static int scan_dirty_idx_cb(struct ubifs_info *c,
const struct ubifs_lprops *lprops, int in_tree,
struct scan_data *data)
{
int ret = LPT_SCAN_CONTINUE;
/* Exclude LEBs that are currently in use */
if (lprops->flags & LPROPS_TAKEN)
return LPT_SCAN_CONTINUE;
/* Determine whether to add these LEB properties to the tree */
if (!in_tree && valuable(c, lprops))
ret |= LPT_SCAN_ADD;
/* Exclude non-index LEBs */
if (!(lprops->flags & LPROPS_INDEX))
return ret;
/* Exclude LEBs with too little space */
if (lprops->free + lprops->dirty < c->min_idx_node_sz)
return ret;
/* Finally we found space */
data->lnum = lprops->lnum;
return LPT_SCAN_ADD | LPT_SCAN_STOP;
}
/**
* find_dirty_idx_leb - find a dirty index LEB.
* @c: the UBIFS file-system description object
*
* This function returns LEB number upon success and a negative error code upon
* failure. In particular, -ENOSPC is returned if a dirty index LEB is not
* found.
*
* Note that this function scans the entire LPT but it is called very rarely.
*/
static int find_dirty_idx_leb(struct ubifs_info *c)
{
const struct ubifs_lprops *lprops;
struct ubifs_lpt_heap *heap;
struct scan_data data;
int err, i, ret;
/* Check all structures in memory first */
data.lnum = -1;
heap = &c->lpt_heap[LPROPS_DIRTY_IDX - 1];
for (i = 0; i < heap->cnt; i++) {
lprops = heap->arr[i];
ret = scan_dirty_idx_cb(c, lprops, 1, &data);
if (ret & LPT_SCAN_STOP)
goto found;
}
list_for_each_entry(lprops, &c->frdi_idx_list, list) {
ret = scan_dirty_idx_cb(c, lprops, 1, &data);
if (ret & LPT_SCAN_STOP)
goto found;
}
list_for_each_entry(lprops, &c->uncat_list, list) {
ret = scan_dirty_idx_cb(c, lprops, 1, &data);
if (ret & LPT_SCAN_STOP)
goto found;
}
if (c->pnodes_have >= c->pnode_cnt)
/* All pnodes are in memory, so skip scan */
return -ENOSPC;
err = ubifs_lpt_scan_nolock(c, -1, c->lscan_lnum,
(ubifs_lpt_scan_callback)scan_dirty_idx_cb,
&data);
if (err)
return err;
found:
ubifs_assert(data.lnum >= c->main_first && data.lnum < c->leb_cnt);
c->lscan_lnum = data.lnum;
lprops = ubifs_lpt_lookup_dirty(c, data.lnum);
if (IS_ERR(lprops))
return PTR_ERR(lprops);
ubifs_assert(lprops->lnum == data.lnum);
ubifs_assert(lprops->free + lprops->dirty >= c->min_idx_node_sz);
ubifs_assert(!(lprops->flags & LPROPS_TAKEN));
ubifs_assert((lprops->flags & LPROPS_INDEX));
dbg_find("found dirty LEB %d, free %d, dirty %d, flags %#x",
lprops->lnum, lprops->free, lprops->dirty, lprops->flags);
lprops = ubifs_change_lp(c, lprops, LPROPS_NC, LPROPS_NC,
lprops->flags | LPROPS_TAKEN, 0);
if (IS_ERR(lprops))
return PTR_ERR(lprops);
return lprops->lnum;
}
/**
* get_idx_gc_leb - try to get a LEB number from trivial GC.
* @c: the UBIFS file-system description object
*/
static int get_idx_gc_leb(struct ubifs_info *c)
{
const struct ubifs_lprops *lp;
int err, lnum;
err = ubifs_get_idx_gc_leb(c);
if (err < 0)
return err;
lnum = err;
/*
* The LEB was due to be unmapped after the commit but
* it is needed now for this commit.
*/
lp = ubifs_lpt_lookup_dirty(c, lnum);
if (unlikely(IS_ERR(lp)))
return PTR_ERR(lp);
lp = ubifs_change_lp(c, lp, LPROPS_NC, LPROPS_NC,
lp->flags | LPROPS_INDEX, -1);
if (unlikely(IS_ERR(lp)))
return PTR_ERR(lp);
dbg_find("LEB %d, dirty %d and free %d flags %#x",
lp->lnum, lp->dirty, lp->free, lp->flags);
return lnum;
}
/**
* find_dirtiest_idx_leb - find dirtiest index LEB from dirtiest array.
* @c: the UBIFS file-system description object
*/
static int find_dirtiest_idx_leb(struct ubifs_info *c)
{
const struct ubifs_lprops *lp;
int lnum;
while (1) {
if (!c->dirty_idx.cnt)
return -ENOSPC;
/* The lprops pointers were replaced by LEB numbers */
lnum = (size_t)c->dirty_idx.arr[--c->dirty_idx.cnt];
lp = ubifs_lpt_lookup(c, lnum);
if (IS_ERR(lp))
return PTR_ERR(lp);
if ((lp->flags & LPROPS_TAKEN) || !(lp->flags & LPROPS_INDEX))
continue;
lp = ubifs_change_lp(c, lp, LPROPS_NC, LPROPS_NC,
lp->flags | LPROPS_TAKEN, 0);
if (IS_ERR(lp))
return PTR_ERR(lp);
break;
}
dbg_find("LEB %d, dirty %d and free %d flags %#x", lp->lnum, lp->dirty,
lp->free, lp->flags);
ubifs_assert(lp->flags | LPROPS_TAKEN);
ubifs_assert(lp->flags | LPROPS_INDEX);
return lnum;
}
/**
* ubifs_find_dirty_idx_leb - try to find dirtiest index LEB as at last commit.
* @c: the UBIFS file-system description object
*
* This function attempts to find an untaken index LEB with the most free and
* dirty space that can be used without overwriting index nodes that were in the
* last index committed.
*/
int ubifs_find_dirty_idx_leb(struct ubifs_info *c)
{
int err;
ubifs_get_lprops(c);
/*
* We made an array of the dirtiest index LEB numbers as at the start of
* last commit. Try that array first.
*/
err = find_dirtiest_idx_leb(c);
/* Next try scanning the entire LPT */
if (err == -ENOSPC)
err = find_dirty_idx_leb(c);
/* Finally take any index LEBs awaiting trivial GC */
if (err == -ENOSPC)
err = get_idx_gc_leb(c);
ubifs_release_lprops(c);
return err;
}

773
fs/ubifs/gc.c Normal file
View file

@ -0,0 +1,773 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Adrian Hunter
* Artem Bityutskiy (Битюцкий Артём)
*/
/*
* This file implements garbage collection. The procedure for garbage collection
* is different depending on whether a LEB as an index LEB (contains index
* nodes) or not. For non-index LEBs, garbage collection finds a LEB which
* contains a lot of dirty space (obsolete nodes), and copies the non-obsolete
* nodes to the journal, at which point the garbage-collected LEB is free to be
* reused. For index LEBs, garbage collection marks the non-obsolete index nodes
* dirty in the TNC, and after the next commit, the garbage-collected LEB is
* to be reused. Garbage collection will cause the number of dirty index nodes
* to grow, however sufficient space is reserved for the index to ensure the
* commit will never run out of space.
*/
#include <linux/pagemap.h>
#include "ubifs.h"
/*
* GC tries to optimize the way it fit nodes to available space, and it sorts
* nodes a little. The below constants are watermarks which define "large",
* "medium", and "small" nodes.
*/
#define MEDIUM_NODE_WM (UBIFS_BLOCK_SIZE / 4)
#define SMALL_NODE_WM UBIFS_MAX_DENT_NODE_SZ
/*
* GC may need to move more then one LEB to make progress. The below constants
* define "soft" and "hard" limits on the number of LEBs the garbage collector
* may move.
*/
#define SOFT_LEBS_LIMIT 4
#define HARD_LEBS_LIMIT 32
/**
* switch_gc_head - switch the garbage collection journal head.
* @c: UBIFS file-system description object
* @buf: buffer to write
* @len: length of the buffer to write
* @lnum: LEB number written is returned here
* @offs: offset written is returned here
*
* This function switch the GC head to the next LEB which is reserved in
* @c->gc_lnum. Returns %0 in case of success, %-EAGAIN if commit is required,
* and other negative error code in case of failures.
*/
static int switch_gc_head(struct ubifs_info *c)
{
int err, gc_lnum = c->gc_lnum;
struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
ubifs_assert(gc_lnum != -1);
dbg_gc("switch GC head from LEB %d:%d to LEB %d (waste %d bytes)",
wbuf->lnum, wbuf->offs + wbuf->used, gc_lnum,
c->leb_size - wbuf->offs - wbuf->used);
err = ubifs_wbuf_sync_nolock(wbuf);
if (err)
return err;
/*
* The GC write-buffer was synchronized, we may safely unmap
* 'c->gc_lnum'.
*/
err = ubifs_leb_unmap(c, gc_lnum);
if (err)
return err;
err = ubifs_add_bud_to_log(c, GCHD, gc_lnum, 0);
if (err)
return err;
c->gc_lnum = -1;
err = ubifs_wbuf_seek_nolock(wbuf, gc_lnum, 0, UBI_LONGTERM);
return err;
}
/**
* move_nodes - move nodes.
* @c: UBIFS file-system description object
* @sleb: describes nodes to move
*
* This function moves valid nodes from data LEB described by @sleb to the GC
* journal head. The obsolete nodes are dropped.
*
* When moving nodes we have to deal with classical bin-packing problem: the
* space in the current GC journal head LEB and in @c->gc_lnum are the "bins",
* where the nodes in the @sleb->nodes list are the elements which should be
* fit optimally to the bins. This function uses the "first fit decreasing"
* strategy, although it does not really sort the nodes but just split them on
* 3 classes - large, medium, and small, so they are roughly sorted.
*
* This function returns zero in case of success, %-EAGAIN if commit is
* required, and other negative error codes in case of other failures.
*/
static int move_nodes(struct ubifs_info *c, struct ubifs_scan_leb *sleb)
{
struct ubifs_scan_node *snod, *tmp;
struct list_head large, medium, small;
struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
int avail, err, min = INT_MAX;
INIT_LIST_HEAD(&large);
INIT_LIST_HEAD(&medium);
INIT_LIST_HEAD(&small);
list_for_each_entry_safe(snod, tmp, &sleb->nodes, list) {
struct list_head *lst;
ubifs_assert(snod->type != UBIFS_IDX_NODE);
ubifs_assert(snod->type != UBIFS_REF_NODE);
ubifs_assert(snod->type != UBIFS_CS_NODE);
err = ubifs_tnc_has_node(c, &snod->key, 0, sleb->lnum,
snod->offs, 0);
if (err < 0)
goto out;
lst = &snod->list;
list_del(lst);
if (!err) {
/* The node is obsolete, remove it from the list */
kfree(snod);
continue;
}
/*
* Sort the list of nodes so that large nodes go first, and
* small nodes go last.
*/
if (snod->len > MEDIUM_NODE_WM)
list_add(lst, &large);
else if (snod->len > SMALL_NODE_WM)
list_add(lst, &medium);
else
list_add(lst, &small);
/* And find the smallest node */
if (snod->len < min)
min = snod->len;
}
/*
* Join the tree lists so that we'd have one roughly sorted list
* ('large' will be the head of the joined list).
*/
list_splice(&medium, large.prev);
list_splice(&small, large.prev);
if (wbuf->lnum == -1) {
/*
* The GC journal head is not set, because it is the first GC
* invocation since mount.
*/
err = switch_gc_head(c);
if (err)
goto out;
}
/* Write nodes to their new location. Use the first-fit strategy */
while (1) {
avail = c->leb_size - wbuf->offs - wbuf->used;
list_for_each_entry_safe(snod, tmp, &large, list) {
int new_lnum, new_offs;
if (avail < min)
break;
if (snod->len > avail)
/* This node does not fit */
continue;
cond_resched();
new_lnum = wbuf->lnum;
new_offs = wbuf->offs + wbuf->used;
err = ubifs_wbuf_write_nolock(wbuf, snod->node,
snod->len);
if (err)
goto out;
err = ubifs_tnc_replace(c, &snod->key, sleb->lnum,
snod->offs, new_lnum, new_offs,
snod->len);
if (err)
goto out;
avail = c->leb_size - wbuf->offs - wbuf->used;
list_del(&snod->list);
kfree(snod);
}
if (list_empty(&large))
break;
/*
* Waste the rest of the space in the LEB and switch to the
* next LEB.
*/
err = switch_gc_head(c);
if (err)
goto out;
}
return 0;
out:
list_for_each_entry_safe(snod, tmp, &large, list) {
list_del(&snod->list);
kfree(snod);
}
return err;
}
/**
* gc_sync_wbufs - sync write-buffers for GC.
* @c: UBIFS file-system description object
*
* We must guarantee that obsoleting nodes are on flash. Unfortunately they may
* be in a write-buffer instead. That is, a node could be written to a
* write-buffer, obsoleting another node in a LEB that is GC'd. If that LEB is
* erased before the write-buffer is sync'd and then there is an unclean
* unmount, then an existing node is lost. To avoid this, we sync all
* write-buffers.
*
* This function returns %0 on success or a negative error code on failure.
*/
static int gc_sync_wbufs(struct ubifs_info *c)
{
int err, i;
for (i = 0; i < c->jhead_cnt; i++) {
if (i == GCHD)
continue;
err = ubifs_wbuf_sync(&c->jheads[i].wbuf);
if (err)
return err;
}
return 0;
}
/**
* ubifs_garbage_collect_leb - garbage-collect a logical eraseblock.
* @c: UBIFS file-system description object
* @lp: describes the LEB to garbage collect
*
* This function garbage-collects an LEB and returns one of the @LEB_FREED,
* @LEB_RETAINED, etc positive codes in case of success, %-EAGAIN if commit is
* required, and other negative error codes in case of failures.
*/
int ubifs_garbage_collect_leb(struct ubifs_info *c, struct ubifs_lprops *lp)
{
struct ubifs_scan_leb *sleb;
struct ubifs_scan_node *snod;
struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
int err = 0, lnum = lp->lnum;
ubifs_assert(c->gc_lnum != -1 || wbuf->offs + wbuf->used == 0 ||
c->need_recovery);
ubifs_assert(c->gc_lnum != lnum);
ubifs_assert(wbuf->lnum != lnum);
/*
* We scan the entire LEB even though we only really need to scan up to
* (c->leb_size - lp->free).
*/
sleb = ubifs_scan(c, lnum, 0, c->sbuf);
if (IS_ERR(sleb))
return PTR_ERR(sleb);
ubifs_assert(!list_empty(&sleb->nodes));
snod = list_entry(sleb->nodes.next, struct ubifs_scan_node, list);
if (snod->type == UBIFS_IDX_NODE) {
struct ubifs_gced_idx_leb *idx_gc;
dbg_gc("indexing LEB %d (free %d, dirty %d)",
lnum, lp->free, lp->dirty);
list_for_each_entry(snod, &sleb->nodes, list) {
struct ubifs_idx_node *idx = snod->node;
int level = le16_to_cpu(idx->level);
ubifs_assert(snod->type == UBIFS_IDX_NODE);
key_read(c, ubifs_idx_key(c, idx), &snod->key);
err = ubifs_dirty_idx_node(c, &snod->key, level, lnum,
snod->offs);
if (err)
goto out;
}
idx_gc = kmalloc(sizeof(struct ubifs_gced_idx_leb), GFP_NOFS);
if (!idx_gc) {
err = -ENOMEM;
goto out;
}
idx_gc->lnum = lnum;
idx_gc->unmap = 0;
list_add(&idx_gc->list, &c->idx_gc);
/*
* Don't release the LEB until after the next commit, because
* it may contain date which is needed for recovery. So
* although we freed this LEB, it will become usable only after
* the commit.
*/
err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0,
LPROPS_INDEX, 1);
if (err)
goto out;
err = LEB_FREED_IDX;
} else {
dbg_gc("data LEB %d (free %d, dirty %d)",
lnum, lp->free, lp->dirty);
err = move_nodes(c, sleb);
if (err)
goto out;
err = gc_sync_wbufs(c);
if (err)
goto out;
err = ubifs_change_one_lp(c, lnum, c->leb_size, 0, 0, 0, 0);
if (err)
goto out;
if (c->gc_lnum == -1) {
c->gc_lnum = lnum;
err = LEB_RETAINED;
} else {
err = ubifs_wbuf_sync_nolock(wbuf);
if (err)
goto out;
err = ubifs_leb_unmap(c, lnum);
if (err)
goto out;
err = LEB_FREED;
}
}
out:
ubifs_scan_destroy(sleb);
return err;
}
/**
* ubifs_garbage_collect - UBIFS garbage collector.
* @c: UBIFS file-system description object
* @anyway: do GC even if there are free LEBs
*
* This function does out-of-place garbage collection. The return codes are:
* o positive LEB number if the LEB has been freed and may be used;
* o %-EAGAIN if the caller has to run commit;
* o %-ENOSPC if GC failed to make any progress;
* o other negative error codes in case of other errors.
*
* Garbage collector writes data to the journal when GC'ing data LEBs, and just
* marking indexing nodes dirty when GC'ing indexing LEBs. Thus, at some point
* commit may be required. But commit cannot be run from inside GC, because the
* caller might be holding the commit lock, so %-EAGAIN is returned instead;
* And this error code means that the caller has to run commit, and re-run GC
* if there is still no free space.
*
* There are many reasons why this function may return %-EAGAIN:
* o the log is full and there is no space to write an LEB reference for
* @c->gc_lnum;
* o the journal is too large and exceeds size limitations;
* o GC moved indexing LEBs, but they can be used only after the commit;
* o the shrinker fails to find clean znodes to free and requests the commit;
* o etc.
*
* Note, if the file-system is close to be full, this function may return
* %-EAGAIN infinitely, so the caller has to limit amount of re-invocations of
* the function. E.g., this happens if the limits on the journal size are too
* tough and GC writes too much to the journal before an LEB is freed. This
* might also mean that the journal is too large, and the TNC becomes to big,
* so that the shrinker is constantly called, finds not clean znodes to free,
* and requests commit. Well, this may also happen if the journal is all right,
* but another kernel process consumes too much memory. Anyway, infinite
* %-EAGAIN may happen, but in some extreme/misconfiguration cases.
*/
int ubifs_garbage_collect(struct ubifs_info *c, int anyway)
{
int i, err, ret, min_space = c->dead_wm;
struct ubifs_lprops lp;
struct ubifs_wbuf *wbuf = &c->jheads[GCHD].wbuf;
ubifs_assert_cmt_locked(c);
if (ubifs_gc_should_commit(c))
return -EAGAIN;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
if (c->ro_media) {
ret = -EROFS;
goto out_unlock;
}
/* We expect the write-buffer to be empty on entry */
ubifs_assert(!wbuf->used);
for (i = 0; ; i++) {
int space_before = c->leb_size - wbuf->offs - wbuf->used;
int space_after;
cond_resched();
/* Give the commit an opportunity to run */
if (ubifs_gc_should_commit(c)) {
ret = -EAGAIN;
break;
}
if (i > SOFT_LEBS_LIMIT && !list_empty(&c->idx_gc)) {
/*
* We've done enough iterations. Indexing LEBs were
* moved and will be available after the commit.
*/
dbg_gc("soft limit, some index LEBs GC'ed, -EAGAIN");
ubifs_commit_required(c);
ret = -EAGAIN;
break;
}
if (i > HARD_LEBS_LIMIT) {
/*
* We've moved too many LEBs and have not made
* progress, give up.
*/
dbg_gc("hard limit, -ENOSPC");
ret = -ENOSPC;
break;
}
/*
* Empty and freeable LEBs can turn up while we waited for
* the wbuf lock, or while we have been running GC. In that
* case, we should just return one of those instead of
* continuing to GC dirty LEBs. Hence we request
* 'ubifs_find_dirty_leb()' to return an empty LEB if it can.
*/
ret = ubifs_find_dirty_leb(c, &lp, min_space, anyway ? 0 : 1);
if (ret) {
if (ret == -ENOSPC)
dbg_gc("no more dirty LEBs");
break;
}
dbg_gc("found LEB %d: free %d, dirty %d, sum %d "
"(min. space %d)", lp.lnum, lp.free, lp.dirty,
lp.free + lp.dirty, min_space);
if (lp.free + lp.dirty == c->leb_size) {
/* An empty LEB was returned */
dbg_gc("LEB %d is free, return it", lp.lnum);
/*
* ubifs_find_dirty_leb() doesn't return freeable index
* LEBs.
*/
ubifs_assert(!(lp.flags & LPROPS_INDEX));
if (lp.free != c->leb_size) {
/*
* Write buffers must be sync'd before
* unmapping freeable LEBs, because one of them
* may contain data which obsoletes something
* in 'lp.pnum'.
*/
ret = gc_sync_wbufs(c);
if (ret)
goto out;
ret = ubifs_change_one_lp(c, lp.lnum,
c->leb_size, 0, 0, 0,
0);
if (ret)
goto out;
}
ret = ubifs_leb_unmap(c, lp.lnum);
if (ret)
goto out;
ret = lp.lnum;
break;
}
space_before = c->leb_size - wbuf->offs - wbuf->used;
if (wbuf->lnum == -1)
space_before = 0;
ret = ubifs_garbage_collect_leb(c, &lp);
if (ret < 0) {
if (ret == -EAGAIN || ret == -ENOSPC) {
/*
* These codes are not errors, so we have to
* return the LEB to lprops. But if the
* 'ubifs_return_leb()' function fails, its
* failure code is propagated to the caller
* instead of the original '-EAGAIN' or
* '-ENOSPC'.
*/
err = ubifs_return_leb(c, lp.lnum);
if (err)
ret = err;
break;
}
goto out;
}
if (ret == LEB_FREED) {
/* An LEB has been freed and is ready for use */
dbg_gc("LEB %d freed, return", lp.lnum);
ret = lp.lnum;
break;
}
if (ret == LEB_FREED_IDX) {
/*
* This was an indexing LEB and it cannot be
* immediately used. And instead of requesting the
* commit straight away, we try to garbage collect some
* more.
*/
dbg_gc("indexing LEB %d freed, continue", lp.lnum);
continue;
}
ubifs_assert(ret == LEB_RETAINED);
space_after = c->leb_size - wbuf->offs - wbuf->used;
dbg_gc("LEB %d retained, freed %d bytes", lp.lnum,
space_after - space_before);
if (space_after > space_before) {
/* GC makes progress, keep working */
min_space >>= 1;
if (min_space < c->dead_wm)
min_space = c->dead_wm;
continue;
}
dbg_gc("did not make progress");
/*
* GC moved an LEB bud have not done any progress. This means
* that the previous GC head LEB contained too few free space
* and the LEB which was GC'ed contained only large nodes which
* did not fit that space.
*
* We can do 2 things:
* 1. pick another LEB in a hope it'll contain a small node
* which will fit the space we have at the end of current GC
* head LEB, but there is no guarantee, so we try this out
* unless we have already been working for too long;
* 2. request an LEB with more dirty space, which will force
* 'ubifs_find_dirty_leb()' to start scanning the lprops
* table, instead of just picking one from the heap
* (previously it already picked the dirtiest LEB).
*/
if (i < SOFT_LEBS_LIMIT) {
dbg_gc("try again");
continue;
}
min_space <<= 1;
if (min_space > c->dark_wm)
min_space = c->dark_wm;
dbg_gc("set min. space to %d", min_space);
}
if (ret == -ENOSPC && !list_empty(&c->idx_gc)) {
dbg_gc("no space, some index LEBs GC'ed, -EAGAIN");
ubifs_commit_required(c);
ret = -EAGAIN;
}
err = ubifs_wbuf_sync_nolock(wbuf);
if (!err)
err = ubifs_leb_unmap(c, c->gc_lnum);
if (err) {
ret = err;
goto out;
}
out_unlock:
mutex_unlock(&wbuf->io_mutex);
return ret;
out:
ubifs_assert(ret < 0);
ubifs_assert(ret != -ENOSPC && ret != -EAGAIN);
ubifs_ro_mode(c, ret);
ubifs_wbuf_sync_nolock(wbuf);
mutex_unlock(&wbuf->io_mutex);
ubifs_return_leb(c, lp.lnum);
return ret;
}
/**
* ubifs_gc_start_commit - garbage collection at start of commit.
* @c: UBIFS file-system description object
*
* If a LEB has only dirty and free space, then we may safely unmap it and make
* it free. Note, we cannot do this with indexing LEBs because dirty space may
* correspond index nodes that are required for recovery. In that case, the
* LEB cannot be unmapped until after the next commit.
*
* This function returns %0 upon success and a negative error code upon failure.
*/
int ubifs_gc_start_commit(struct ubifs_info *c)
{
struct ubifs_gced_idx_leb *idx_gc;
const struct ubifs_lprops *lp;
int err = 0, flags;
ubifs_get_lprops(c);
/*
* Unmap (non-index) freeable LEBs. Note that recovery requires that all
* wbufs are sync'd before this, which is done in 'do_commit()'.
*/
while (1) {
lp = ubifs_fast_find_freeable(c);
if (unlikely(IS_ERR(lp))) {
err = PTR_ERR(lp);
goto out;
}
if (!lp)
break;
ubifs_assert(!(lp->flags & LPROPS_TAKEN));
ubifs_assert(!(lp->flags & LPROPS_INDEX));
err = ubifs_leb_unmap(c, lp->lnum);
if (err)
goto out;
lp = ubifs_change_lp(c, lp, c->leb_size, 0, lp->flags, 0);
if (unlikely(IS_ERR(lp))) {
err = PTR_ERR(lp);
goto out;
}
ubifs_assert(!(lp->flags & LPROPS_TAKEN));
ubifs_assert(!(lp->flags & LPROPS_INDEX));
}
/* Mark GC'd index LEBs OK to unmap after this commit finishes */
list_for_each_entry(idx_gc, &c->idx_gc, list)
idx_gc->unmap = 1;
/* Record index freeable LEBs for unmapping after commit */
while (1) {
lp = ubifs_fast_find_frdi_idx(c);
if (unlikely(IS_ERR(lp))) {
err = PTR_ERR(lp);
goto out;
}
if (!lp)
break;
idx_gc = kmalloc(sizeof(struct ubifs_gced_idx_leb), GFP_NOFS);
if (!idx_gc) {
err = -ENOMEM;
goto out;
}
ubifs_assert(!(lp->flags & LPROPS_TAKEN));
ubifs_assert(lp->flags & LPROPS_INDEX);
/* Don't release the LEB until after the next commit */
flags = (lp->flags | LPROPS_TAKEN) ^ LPROPS_INDEX;
lp = ubifs_change_lp(c, lp, c->leb_size, 0, flags, 1);
if (unlikely(IS_ERR(lp))) {
err = PTR_ERR(lp);
kfree(idx_gc);
goto out;
}
ubifs_assert(lp->flags & LPROPS_TAKEN);
ubifs_assert(!(lp->flags & LPROPS_INDEX));
idx_gc->lnum = lp->lnum;
idx_gc->unmap = 1;
list_add(&idx_gc->list, &c->idx_gc);
}
out:
ubifs_release_lprops(c);
return err;
}
/**
* ubifs_gc_end_commit - garbage collection at end of commit.
* @c: UBIFS file-system description object
*
* This function completes out-of-place garbage collection of index LEBs.
*/
int ubifs_gc_end_commit(struct ubifs_info *c)
{
struct ubifs_gced_idx_leb *idx_gc, *tmp;
struct ubifs_wbuf *wbuf;
int err = 0;
wbuf = &c->jheads[GCHD].wbuf;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
list_for_each_entry_safe(idx_gc, tmp, &c->idx_gc, list)
if (idx_gc->unmap) {
dbg_gc("LEB %d", idx_gc->lnum);
err = ubifs_leb_unmap(c, idx_gc->lnum);
if (err)
goto out;
err = ubifs_change_one_lp(c, idx_gc->lnum, LPROPS_NC,
LPROPS_NC, 0, LPROPS_TAKEN, -1);
if (err)
goto out;
list_del(&idx_gc->list);
kfree(idx_gc);
}
out:
mutex_unlock(&wbuf->io_mutex);
return err;
}
/**
* ubifs_destroy_idx_gc - destroy idx_gc list.
* @c: UBIFS file-system description object
*
* This function destroys the idx_gc list. It is called when unmounting or
* remounting read-only so locks are not needed.
*/
void ubifs_destroy_idx_gc(struct ubifs_info *c)
{
while (!list_empty(&c->idx_gc)) {
struct ubifs_gced_idx_leb *idx_gc;
idx_gc = list_entry(c->idx_gc.next, struct ubifs_gced_idx_leb,
list);
c->idx_gc_cnt -= 1;
list_del(&idx_gc->list);
kfree(idx_gc);
}
}
/**
* ubifs_get_idx_gc_leb - get a LEB from GC'd index LEB list.
* @c: UBIFS file-system description object
*
* Called during start commit so locks are not needed.
*/
int ubifs_get_idx_gc_leb(struct ubifs_info *c)
{
struct ubifs_gced_idx_leb *idx_gc;
int lnum;
if (list_empty(&c->idx_gc))
return -ENOSPC;
idx_gc = list_entry(c->idx_gc.next, struct ubifs_gced_idx_leb, list);
lnum = idx_gc->lnum;
/* c->idx_gc_cnt is updated by the caller when lprops are updated */
list_del(&idx_gc->list);
kfree(idx_gc);
return lnum;
}

914
fs/ubifs/io.c Normal file
View file

@ -0,0 +1,914 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
* Copyright (C) 2006, 2007 University of Szeged, Hungary
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
* Zoltan Sogor
*/
/*
* This file implements UBIFS I/O subsystem which provides various I/O-related
* helper functions (reading/writing/checking/validating nodes) and implements
* write-buffering support. Write buffers help to save space which otherwise
* would have been wasted for padding to the nearest minimal I/O unit boundary.
* Instead, data first goes to the write-buffer and is flushed when the
* buffer is full or when it is not used for some time (by timer). This is
* similarto the mechanism is used by JFFS2.
*
* Write-buffers are defined by 'struct ubifs_wbuf' objects and protected by
* mutexes defined inside these objects. Since sometimes upper-level code
* has to lock the write-buffer (e.g. journal space reservation code), many
* functions related to write-buffers have "nolock" suffix which means that the
* caller has to lock the write-buffer before calling this function.
*
* UBIFS stores nodes at 64 bit-aligned addresses. If the node length is not
* aligned, UBIFS starts the next node from the aligned address, and the padded
* bytes may contain any rubbish. In other words, UBIFS does not put padding
* bytes in those small gaps. Common headers of nodes store real node lengths,
* not aligned lengths. Indexing nodes also store real lengths in branches.
*
* UBIFS uses padding when it pads to the next min. I/O unit. In this case it
* uses padding nodes or padding bytes, if the padding node does not fit.
*
* All UBIFS nodes are protected by CRC checksums and UBIFS checks all nodes
* every time they are read from the flash media.
*/
#include <linux/crc32.h>
#include "ubifs.h"
/**
* ubifs_check_node - check node.
* @c: UBIFS file-system description object
* @buf: node to check
* @lnum: logical eraseblock number
* @offs: offset within the logical eraseblock
* @quiet: print no messages
*
* This function checks node magic number and CRC checksum. This function also
* validates node length to prevent UBIFS from becoming crazy when an attacker
* feeds it a file-system image with incorrect nodes. For example, too large
* node length in the common header could cause UBIFS to read memory outside of
* allocated buffer when checking the CRC checksum.
*
* This function returns zero in case of success %-EUCLEAN in case of bad CRC
* or magic.
*/
int ubifs_check_node(const struct ubifs_info *c, const void *buf, int lnum,
int offs, int quiet)
{
int err = -EINVAL, type, node_len;
uint32_t crc, node_crc, magic;
const struct ubifs_ch *ch = buf;
ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
ubifs_assert(!(offs & 7) && offs < c->leb_size);
magic = le32_to_cpu(ch->magic);
if (magic != UBIFS_NODE_MAGIC) {
if (!quiet)
ubifs_err("bad magic %#08x, expected %#08x",
magic, UBIFS_NODE_MAGIC);
err = -EUCLEAN;
goto out;
}
type = ch->node_type;
if (type < 0 || type >= UBIFS_NODE_TYPES_CNT) {
if (!quiet)
ubifs_err("bad node type %d", type);
goto out;
}
node_len = le32_to_cpu(ch->len);
if (node_len + offs > c->leb_size)
goto out_len;
if (c->ranges[type].max_len == 0) {
if (node_len != c->ranges[type].len)
goto out_len;
} else if (node_len < c->ranges[type].min_len ||
node_len > c->ranges[type].max_len)
goto out_len;
crc = crc32(UBIFS_CRC32_INIT, buf + 8, node_len - 8);
node_crc = le32_to_cpu(ch->crc);
if (crc != node_crc) {
if (!quiet)
ubifs_err("bad CRC: calculated %#08x, read %#08x",
crc, node_crc);
err = -EUCLEAN;
goto out;
}
return 0;
out_len:
if (!quiet)
ubifs_err("bad node length %d", node_len);
out:
if (!quiet) {
ubifs_err("bad node at LEB %d:%d", lnum, offs);
dbg_dump_node(c, buf);
dbg_dump_stack();
}
return err;
}
/**
* ubifs_pad - pad flash space.
* @c: UBIFS file-system description object
* @buf: buffer to put padding to
* @pad: how many bytes to pad
*
* The flash media obliges us to write only in chunks of %c->min_io_size and
* when we have to write less data we add padding node to the write-buffer and
* pad it to the next minimal I/O unit's boundary. Padding nodes help when the
* media is being scanned. If the amount of wasted space is not enough to fit a
* padding node which takes %UBIFS_PAD_NODE_SZ bytes, we write padding bytes
* pattern (%UBIFS_PADDING_BYTE).
*
* Padding nodes are also used to fill gaps when the "commit-in-gaps" method is
* used.
*/
void ubifs_pad(const struct ubifs_info *c, void *buf, int pad)
{
uint32_t crc;
ubifs_assert(pad >= 0 && !(pad & 7));
if (pad >= UBIFS_PAD_NODE_SZ) {
struct ubifs_ch *ch = buf;
struct ubifs_pad_node *pad_node = buf;
ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
ch->node_type = UBIFS_PAD_NODE;
ch->group_type = UBIFS_NO_NODE_GROUP;
ch->padding[0] = ch->padding[1] = 0;
ch->sqnum = 0;
ch->len = cpu_to_le32(UBIFS_PAD_NODE_SZ);
pad -= UBIFS_PAD_NODE_SZ;
pad_node->pad_len = cpu_to_le32(pad);
crc = crc32(UBIFS_CRC32_INIT, buf + 8, UBIFS_PAD_NODE_SZ - 8);
ch->crc = cpu_to_le32(crc);
memset(buf + UBIFS_PAD_NODE_SZ, 0, pad);
} else if (pad > 0)
/* Too little space, padding node won't fit */
memset(buf, UBIFS_PADDING_BYTE, pad);
}
/**
* next_sqnum - get next sequence number.
* @c: UBIFS file-system description object
*/
static unsigned long long next_sqnum(struct ubifs_info *c)
{
unsigned long long sqnum;
spin_lock(&c->cnt_lock);
sqnum = ++c->max_sqnum;
spin_unlock(&c->cnt_lock);
if (unlikely(sqnum >= SQNUM_WARN_WATERMARK)) {
if (sqnum >= SQNUM_WATERMARK) {
ubifs_err("sequence number overflow %llu, end of life",
sqnum);
ubifs_ro_mode(c, -EINVAL);
}
ubifs_warn("running out of sequence numbers, end of life soon");
}
return sqnum;
}
/**
* ubifs_prepare_node - prepare node to be written to flash.
* @c: UBIFS file-system description object
* @node: the node to pad
* @len: node length
* @pad: if the buffer has to be padded
*
* This function prepares node at @node to be written to the media - it
* calculates node CRC, fills the common header, and adds proper padding up to
* the next minimum I/O unit if @pad is not zero.
*/
void ubifs_prepare_node(struct ubifs_info *c, void *node, int len, int pad)
{
uint32_t crc;
struct ubifs_ch *ch = node;
unsigned long long sqnum = next_sqnum(c);
ubifs_assert(len >= UBIFS_CH_SZ);
ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
ch->len = cpu_to_le32(len);
ch->group_type = UBIFS_NO_NODE_GROUP;
ch->sqnum = cpu_to_le64(sqnum);
ch->padding[0] = ch->padding[1] = 0;
crc = crc32(UBIFS_CRC32_INIT, node + 8, len - 8);
ch->crc = cpu_to_le32(crc);
if (pad) {
len = ALIGN(len, 8);
pad = ALIGN(len, c->min_io_size) - len;
ubifs_pad(c, node + len, pad);
}
}
/**
* ubifs_prep_grp_node - prepare node of a group to be written to flash.
* @c: UBIFS file-system description object
* @node: the node to pad
* @len: node length
* @last: indicates the last node of the group
*
* This function prepares node at @node to be written to the media - it
* calculates node CRC and fills the common header.
*/
void ubifs_prep_grp_node(struct ubifs_info *c, void *node, int len, int last)
{
uint32_t crc;
struct ubifs_ch *ch = node;
unsigned long long sqnum = next_sqnum(c);
ubifs_assert(len >= UBIFS_CH_SZ);
ch->magic = cpu_to_le32(UBIFS_NODE_MAGIC);
ch->len = cpu_to_le32(len);
if (last)
ch->group_type = UBIFS_LAST_OF_NODE_GROUP;
else
ch->group_type = UBIFS_IN_NODE_GROUP;
ch->sqnum = cpu_to_le64(sqnum);
ch->padding[0] = ch->padding[1] = 0;
crc = crc32(UBIFS_CRC32_INIT, node + 8, len - 8);
ch->crc = cpu_to_le32(crc);
}
/**
* wbuf_timer_callback - write-buffer timer callback function.
* @data: timer data (write-buffer descriptor)
*
* This function is called when the write-buffer timer expires.
*/
static void wbuf_timer_callback_nolock(unsigned long data)
{
struct ubifs_wbuf *wbuf = (struct ubifs_wbuf *)data;
wbuf->need_sync = 1;
wbuf->c->need_wbuf_sync = 1;
ubifs_wake_up_bgt(wbuf->c);
}
/**
* new_wbuf_timer - start new write-buffer timer.
* @wbuf: write-buffer descriptor
*/
static void new_wbuf_timer_nolock(struct ubifs_wbuf *wbuf)
{
ubifs_assert(!timer_pending(&wbuf->timer));
if (!wbuf->timeout)
return;
wbuf->timer.expires = jiffies + wbuf->timeout;
add_timer(&wbuf->timer);
}
/**
* cancel_wbuf_timer - cancel write-buffer timer.
* @wbuf: write-buffer descriptor
*/
static void cancel_wbuf_timer_nolock(struct ubifs_wbuf *wbuf)
{
/*
* If the syncer is waiting for the lock (from the background thread's
* context) and another task is changing write-buffer then the syncing
* should be canceled.
*/
wbuf->need_sync = 0;
del_timer(&wbuf->timer);
}
/**
* ubifs_wbuf_sync_nolock - synchronize write-buffer.
* @wbuf: write-buffer to synchronize
*
* This function synchronizes write-buffer @buf and returns zero in case of
* success or a negative error code in case of failure.
*/
int ubifs_wbuf_sync_nolock(struct ubifs_wbuf *wbuf)
{
struct ubifs_info *c = wbuf->c;
int err, dirt;
cancel_wbuf_timer_nolock(wbuf);
if (!wbuf->used || wbuf->lnum == -1)
/* Write-buffer is empty or not seeked */
return 0;
dbg_io("LEB %d:%d, %d bytes",
wbuf->lnum, wbuf->offs, wbuf->used);
ubifs_assert(!(c->vfs_sb->s_flags & MS_RDONLY));
ubifs_assert(!(wbuf->avail & 7));
ubifs_assert(wbuf->offs + c->min_io_size <= c->leb_size);
if (c->ro_media)
return -EROFS;
ubifs_pad(c, wbuf->buf + wbuf->used, wbuf->avail);
err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
c->min_io_size, wbuf->dtype);
if (err) {
ubifs_err("cannot write %d bytes to LEB %d:%d",
c->min_io_size, wbuf->lnum, wbuf->offs);
dbg_dump_stack();
return err;
}
dirt = wbuf->avail;
spin_lock(&wbuf->lock);
wbuf->offs += c->min_io_size;
wbuf->avail = c->min_io_size;
wbuf->used = 0;
wbuf->next_ino = 0;
spin_unlock(&wbuf->lock);
if (wbuf->sync_callback)
err = wbuf->sync_callback(c, wbuf->lnum,
c->leb_size - wbuf->offs, dirt);
return err;
}
/**
* ubifs_wbuf_seek_nolock - seek write-buffer.
* @wbuf: write-buffer
* @lnum: logical eraseblock number to seek to
* @offs: logical eraseblock offset to seek to
* @dtype: data type
*
* This function targets the write buffer to logical eraseblock @lnum:@offs.
* The write-buffer is synchronized if it is not empty. Returns zero in case of
* success and a negative error code in case of failure.
*/
int ubifs_wbuf_seek_nolock(struct ubifs_wbuf *wbuf, int lnum, int offs,
int dtype)
{
const struct ubifs_info *c = wbuf->c;
dbg_io("LEB %d:%d", lnum, offs);
ubifs_assert(lnum >= 0 && lnum < c->leb_cnt);
ubifs_assert(offs >= 0 && offs <= c->leb_size);
ubifs_assert(offs % c->min_io_size == 0 && !(offs & 7));
ubifs_assert(lnum != wbuf->lnum);
if (wbuf->used > 0) {
int err = ubifs_wbuf_sync_nolock(wbuf);
if (err)
return err;
}
spin_lock(&wbuf->lock);
wbuf->lnum = lnum;
wbuf->offs = offs;
wbuf->avail = c->min_io_size;
wbuf->used = 0;
spin_unlock(&wbuf->lock);
wbuf->dtype = dtype;
return 0;
}
/**
* ubifs_bg_wbufs_sync - synchronize write-buffers.
* @c: UBIFS file-system description object
*
* This function is called by background thread to synchronize write-buffers.
* Returns zero in case of success and a negative error code in case of
* failure.
*/
int ubifs_bg_wbufs_sync(struct ubifs_info *c)
{
int err, i;
if (!c->need_wbuf_sync)
return 0;
c->need_wbuf_sync = 0;
if (c->ro_media) {
err = -EROFS;
goto out_timers;
}
dbg_io("synchronize");
for (i = 0; i < c->jhead_cnt; i++) {
struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
cond_resched();
/*
* If the mutex is locked then wbuf is being changed, so
* synchronization is not necessary.
*/
if (mutex_is_locked(&wbuf->io_mutex))
continue;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
if (!wbuf->need_sync) {
mutex_unlock(&wbuf->io_mutex);
continue;
}
err = ubifs_wbuf_sync_nolock(wbuf);
mutex_unlock(&wbuf->io_mutex);
if (err) {
ubifs_err("cannot sync write-buffer, error %d", err);
ubifs_ro_mode(c, err);
goto out_timers;
}
}
return 0;
out_timers:
/* Cancel all timers to prevent repeated errors */
for (i = 0; i < c->jhead_cnt; i++) {
struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
cancel_wbuf_timer_nolock(wbuf);
mutex_unlock(&wbuf->io_mutex);
}
return err;
}
/**
* ubifs_wbuf_write_nolock - write data to flash via write-buffer.
* @wbuf: write-buffer
* @buf: node to write
* @len: node length
*
* This function writes data to flash via write-buffer @wbuf. This means that
* the last piece of the node won't reach the flash media immediately if it
* does not take whole minimal I/O unit. Instead, the node will sit in RAM
* until the write-buffer is synchronized (e.g., by timer).
*
* This function returns zero in case of success and a negative error code in
* case of failure. If the node cannot be written because there is no more
* space in this logical eraseblock, %-ENOSPC is returned.
*/
int ubifs_wbuf_write_nolock(struct ubifs_wbuf *wbuf, void *buf, int len)
{
struct ubifs_info *c = wbuf->c;
int err, written, n, aligned_len = ALIGN(len, 8), offs;
dbg_io("%d bytes (%s) to wbuf at LEB %d:%d", len,
dbg_ntype(((struct ubifs_ch *)buf)->node_type), wbuf->lnum,
wbuf->offs + wbuf->used);
ubifs_assert(len > 0 && wbuf->lnum >= 0 && wbuf->lnum < c->leb_cnt);
ubifs_assert(wbuf->offs >= 0 && wbuf->offs % c->min_io_size == 0);
ubifs_assert(!(wbuf->offs & 7) && wbuf->offs <= c->leb_size);
ubifs_assert(wbuf->avail > 0 && wbuf->avail <= c->min_io_size);
ubifs_assert(mutex_is_locked(&wbuf->io_mutex));
if (c->leb_size - wbuf->offs - wbuf->used < aligned_len) {
err = -ENOSPC;
goto out;
}
cancel_wbuf_timer_nolock(wbuf);
if (c->ro_media)
return -EROFS;
if (aligned_len <= wbuf->avail) {
/*
* The node is not very large and fits entirely within
* write-buffer.
*/
memcpy(wbuf->buf + wbuf->used, buf, len);
if (aligned_len == wbuf->avail) {
dbg_io("flush wbuf to LEB %d:%d", wbuf->lnum,
wbuf->offs);
err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf,
wbuf->offs, c->min_io_size,
wbuf->dtype);
if (err)
goto out;
spin_lock(&wbuf->lock);
wbuf->offs += c->min_io_size;
wbuf->avail = c->min_io_size;
wbuf->used = 0;
wbuf->next_ino = 0;
spin_unlock(&wbuf->lock);
} else {
spin_lock(&wbuf->lock);
wbuf->avail -= aligned_len;
wbuf->used += aligned_len;
spin_unlock(&wbuf->lock);
}
goto exit;
}
/*
* The node is large enough and does not fit entirely within current
* minimal I/O unit. We have to fill and flush write-buffer and switch
* to the next min. I/O unit.
*/
dbg_io("flush wbuf to LEB %d:%d", wbuf->lnum, wbuf->offs);
memcpy(wbuf->buf + wbuf->used, buf, wbuf->avail);
err = ubi_leb_write(c->ubi, wbuf->lnum, wbuf->buf, wbuf->offs,
c->min_io_size, wbuf->dtype);
if (err)
goto out;
offs = wbuf->offs + c->min_io_size;
len -= wbuf->avail;
aligned_len -= wbuf->avail;
written = wbuf->avail;
/*
* The remaining data may take more whole min. I/O units, so write the
* remains multiple to min. I/O unit size directly to the flash media.
* We align node length to 8-byte boundary because we anyway flash wbuf
* if the remaining space is less than 8 bytes.
*/
n = aligned_len >> c->min_io_shift;
if (n) {
n <<= c->min_io_shift;
dbg_io("write %d bytes to LEB %d:%d", n, wbuf->lnum, offs);
err = ubi_leb_write(c->ubi, wbuf->lnum, buf + written, offs, n,
wbuf->dtype);
if (err)
goto out;
offs += n;
aligned_len -= n;
len -= n;
written += n;
}
spin_lock(&wbuf->lock);
if (aligned_len)
/*
* And now we have what's left and what does not take whole
* min. I/O unit, so write it to the write-buffer and we are
* done.
*/
memcpy(wbuf->buf, buf + written, len);
wbuf->offs = offs;
wbuf->used = aligned_len;
wbuf->avail = c->min_io_size - aligned_len;
wbuf->next_ino = 0;
spin_unlock(&wbuf->lock);
exit:
if (wbuf->sync_callback) {
int free = c->leb_size - wbuf->offs - wbuf->used;
err = wbuf->sync_callback(c, wbuf->lnum, free, 0);
if (err)
goto out;
}
if (wbuf->used)
new_wbuf_timer_nolock(wbuf);
return 0;
out:
ubifs_err("cannot write %d bytes to LEB %d:%d, error %d",
len, wbuf->lnum, wbuf->offs, err);
dbg_dump_node(c, buf);
dbg_dump_stack();
dbg_dump_leb(c, wbuf->lnum);
return err;
}
/**
* ubifs_write_node - write node to the media.
* @c: UBIFS file-system description object
* @buf: the node to write
* @len: node length
* @lnum: logical eraseblock number
* @offs: offset within the logical eraseblock
* @dtype: node life-time hint (%UBI_LONGTERM, %UBI_SHORTTERM, %UBI_UNKNOWN)
*
* This function automatically fills node magic number, assigns sequence
* number, and calculates node CRC checksum. The length of the @buf buffer has
* to be aligned to the minimal I/O unit size. This function automatically
* appends padding node and padding bytes if needed. Returns zero in case of
* success and a negative error code in case of failure.
*/
int ubifs_write_node(struct ubifs_info *c, void *buf, int len, int lnum,
int offs, int dtype)
{
int err, buf_len = ALIGN(len, c->min_io_size);
dbg_io("LEB %d:%d, %s, length %d (aligned %d)",
lnum, offs, dbg_ntype(((struct ubifs_ch *)buf)->node_type), len,
buf_len);
ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
ubifs_assert(offs % c->min_io_size == 0 && offs < c->leb_size);
if (c->ro_media)
return -EROFS;
ubifs_prepare_node(c, buf, len, 1);
err = ubi_leb_write(c->ubi, lnum, buf, offs, buf_len, dtype);
if (err) {
ubifs_err("cannot write %d bytes to LEB %d:%d, error %d",
buf_len, lnum, offs, err);
dbg_dump_node(c, buf);
dbg_dump_stack();
}
return err;
}
/**
* ubifs_read_node_wbuf - read node from the media or write-buffer.
* @wbuf: wbuf to check for un-written data
* @buf: buffer to read to
* @type: node type
* @len: node length
* @lnum: logical eraseblock number
* @offs: offset within the logical eraseblock
*
* This function reads a node of known type and length, checks it and stores
* in @buf. If the node partially or fully sits in the write-buffer, this
* function takes data from the buffer, otherwise it reads the flash media.
* Returns zero in case of success, %-EUCLEAN if CRC mismatched and a negative
* error code in case of failure.
*/
int ubifs_read_node_wbuf(struct ubifs_wbuf *wbuf, void *buf, int type, int len,
int lnum, int offs)
{
const struct ubifs_info *c = wbuf->c;
int err, rlen, overlap;
struct ubifs_ch *ch = buf;
dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
ubifs_assert(wbuf && lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
ubifs_assert(!(offs & 7) && offs < c->leb_size);
ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
spin_lock(&wbuf->lock);
overlap = (lnum == wbuf->lnum && offs + len > wbuf->offs);
if (!overlap) {
/* We may safely unlock the write-buffer and read the data */
spin_unlock(&wbuf->lock);
return ubifs_read_node(c, buf, type, len, lnum, offs);
}
/* Don't read under wbuf */
rlen = wbuf->offs - offs;
if (rlen < 0)
rlen = 0;
/* Copy the rest from the write-buffer */
memcpy(buf + rlen, wbuf->buf + offs + rlen - wbuf->offs, len - rlen);
spin_unlock(&wbuf->lock);
if (rlen > 0) {
/* Read everything that goes before write-buffer */
err = ubi_read(c->ubi, lnum, buf, offs, rlen);
if (err && err != -EBADMSG) {
ubifs_err("failed to read node %d from LEB %d:%d, "
"error %d", type, lnum, offs, err);
dbg_dump_stack();
return err;
}
}
if (type != ch->node_type) {
ubifs_err("bad node type (%d but expected %d)",
ch->node_type, type);
goto out;
}
err = ubifs_check_node(c, buf, lnum, offs, 0);
if (err) {
ubifs_err("expected node type %d", type);
return err;
}
rlen = le32_to_cpu(ch->len);
if (rlen != len) {
ubifs_err("bad node length %d, expected %d", rlen, len);
goto out;
}
return 0;
out:
ubifs_err("bad node at LEB %d:%d", lnum, offs);
dbg_dump_node(c, buf);
dbg_dump_stack();
return -EINVAL;
}
/**
* ubifs_read_node - read node.
* @c: UBIFS file-system description object
* @buf: buffer to read to
* @type: node type
* @len: node length (not aligned)
* @lnum: logical eraseblock number
* @offs: offset within the logical eraseblock
*
* This function reads a node of known type and and length, checks it and
* stores in @buf. Returns zero in case of success, %-EUCLEAN if CRC mismatched
* and a negative error code in case of failure.
*/
int ubifs_read_node(const struct ubifs_info *c, void *buf, int type, int len,
int lnum, int offs)
{
int err, l;
struct ubifs_ch *ch = buf;
dbg_io("LEB %d:%d, %s, length %d", lnum, offs, dbg_ntype(type), len);
ubifs_assert(lnum >= 0 && lnum < c->leb_cnt && offs >= 0);
ubifs_assert(len >= UBIFS_CH_SZ && offs + len <= c->leb_size);
ubifs_assert(!(offs & 7) && offs < c->leb_size);
ubifs_assert(type >= 0 && type < UBIFS_NODE_TYPES_CNT);
err = ubi_read(c->ubi, lnum, buf, offs, len);
if (err && err != -EBADMSG) {
ubifs_err("cannot read node %d from LEB %d:%d, error %d",
type, lnum, offs, err);
return err;
}
if (type != ch->node_type) {
ubifs_err("bad node type (%d but expected %d)",
ch->node_type, type);
goto out;
}
err = ubifs_check_node(c, buf, lnum, offs, 0);
if (err) {
ubifs_err("expected node type %d", type);
return err;
}
l = le32_to_cpu(ch->len);
if (l != len) {
ubifs_err("bad node length %d, expected %d", l, len);
goto out;
}
return 0;
out:
ubifs_err("bad node at LEB %d:%d", lnum, offs);
dbg_dump_node(c, buf);
dbg_dump_stack();
return -EINVAL;
}
/**
* ubifs_wbuf_init - initialize write-buffer.
* @c: UBIFS file-system description object
* @wbuf: write-buffer to initialize
*
* This function initializes write buffer. Returns zero in case of success
* %-ENOMEM in case of failure.
*/
int ubifs_wbuf_init(struct ubifs_info *c, struct ubifs_wbuf *wbuf)
{
size_t size;
wbuf->buf = kmalloc(c->min_io_size, GFP_KERNEL);
if (!wbuf->buf)
return -ENOMEM;
size = (c->min_io_size / UBIFS_CH_SZ + 1) * sizeof(ino_t);
wbuf->inodes = kmalloc(size, GFP_KERNEL);
if (!wbuf->inodes) {
kfree(wbuf->buf);
wbuf->buf = NULL;
return -ENOMEM;
}
wbuf->used = 0;
wbuf->lnum = wbuf->offs = -1;
wbuf->avail = c->min_io_size;
wbuf->dtype = UBI_UNKNOWN;
wbuf->sync_callback = NULL;
mutex_init(&wbuf->io_mutex);
spin_lock_init(&wbuf->lock);
wbuf->c = c;
init_timer(&wbuf->timer);
wbuf->timer.function = wbuf_timer_callback_nolock;
wbuf->timer.data = (unsigned long)wbuf;
wbuf->timeout = DEFAULT_WBUF_TIMEOUT;
wbuf->next_ino = 0;
return 0;
}
/**
* ubifs_wbuf_add_ino_nolock - add an inode number into the wbuf inode array.
* @wbuf: the write-buffer whereto add
* @inum: the inode number
*
* This function adds an inode number to the inode array of the write-buffer.
*/
void ubifs_wbuf_add_ino_nolock(struct ubifs_wbuf *wbuf, ino_t inum)
{
if (!wbuf->buf)
/* NOR flash or something similar */
return;
spin_lock(&wbuf->lock);
if (wbuf->used)
wbuf->inodes[wbuf->next_ino++] = inum;
spin_unlock(&wbuf->lock);
}
/**
* wbuf_has_ino - returns if the wbuf contains data from the inode.
* @wbuf: the write-buffer
* @inum: the inode number
*
* This function returns with %1 if the write-buffer contains some data from the
* given inode otherwise it returns with %0.
*/
static int wbuf_has_ino(struct ubifs_wbuf *wbuf, ino_t inum)
{
int i, ret = 0;
spin_lock(&wbuf->lock);
for (i = 0; i < wbuf->next_ino; i++)
if (inum == wbuf->inodes[i]) {
ret = 1;
break;
}
spin_unlock(&wbuf->lock);
return ret;
}
/**
* ubifs_sync_wbufs_by_inode - synchronize write-buffers for an inode.
* @c: UBIFS file-system description object
* @inode: inode to synchronize
*
* This function synchronizes write-buffers which contain nodes belonging to
* @inode. Returns zero in case of success and a negative error code in case of
* failure.
*/
int ubifs_sync_wbufs_by_inode(struct ubifs_info *c, struct inode *inode)
{
int i, err = 0;
for (i = 0; i < c->jhead_cnt; i++) {
struct ubifs_wbuf *wbuf = &c->jheads[i].wbuf;
if (i == GCHD)
/*
* GC head is special, do not look at it. Even if the
* head contains something related to this inode, it is
* a _copy_ of corresponding on-flash node which sits
* somewhere else.
*/
continue;
if (!wbuf_has_ino(wbuf, inode->i_ino))
continue;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
if (wbuf_has_ino(wbuf, inode->i_ino))
err = ubifs_wbuf_sync_nolock(wbuf);
mutex_unlock(&wbuf->io_mutex);
if (err) {
ubifs_ro_mode(c, err);
return err;
}
}
return 0;
}

204
fs/ubifs/ioctl.c Normal file
View file

@ -0,0 +1,204 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
* Copyright (C) 2006, 2007 University of Szeged, Hungary
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Zoltan Sogor
* Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/* This file implements EXT2-compatible extended attribute ioctl() calls */
#include <linux/compat.h>
#include <linux/smp_lock.h>
#include <linux/mount.h>
#include "ubifs.h"
/**
* ubifs_set_inode_flags - set VFS inode flags.
* @inode: VFS inode to set flags for
*
* This function propagates flags from UBIFS inode object to VFS inode object.
*/
void ubifs_set_inode_flags(struct inode *inode)
{
unsigned int flags = ubifs_inode(inode)->flags;
inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_DIRSYNC);
if (flags & UBIFS_SYNC_FL)
inode->i_flags |= S_SYNC;
if (flags & UBIFS_APPEND_FL)
inode->i_flags |= S_APPEND;
if (flags & UBIFS_IMMUTABLE_FL)
inode->i_flags |= S_IMMUTABLE;
if (flags & UBIFS_DIRSYNC_FL)
inode->i_flags |= S_DIRSYNC;
}
/*
* ioctl2ubifs - convert ioctl inode flags to UBIFS inode flags.
* @ioctl_flags: flags to convert
*
* This function convert ioctl flags (@FS_COMPR_FL, etc) to UBIFS inode flags
* (@UBIFS_COMPR_FL, etc).
*/
static int ioctl2ubifs(int ioctl_flags)
{
int ubifs_flags = 0;
if (ioctl_flags & FS_COMPR_FL)
ubifs_flags |= UBIFS_COMPR_FL;
if (ioctl_flags & FS_SYNC_FL)
ubifs_flags |= UBIFS_SYNC_FL;
if (ioctl_flags & FS_APPEND_FL)
ubifs_flags |= UBIFS_APPEND_FL;
if (ioctl_flags & FS_IMMUTABLE_FL)
ubifs_flags |= UBIFS_IMMUTABLE_FL;
if (ioctl_flags & FS_DIRSYNC_FL)
ubifs_flags |= UBIFS_DIRSYNC_FL;
return ubifs_flags;
}
/*
* ubifs2ioctl - convert UBIFS inode flags to ioctl inode flags.
* @ubifs_flags: flags to convert
*
* This function convert UBIFS (@UBIFS_COMPR_FL, etc) to ioctl flags
* (@FS_COMPR_FL, etc).
*/
static int ubifs2ioctl(int ubifs_flags)
{
int ioctl_flags = 0;
if (ubifs_flags & UBIFS_COMPR_FL)
ioctl_flags |= FS_COMPR_FL;
if (ubifs_flags & UBIFS_SYNC_FL)
ioctl_flags |= FS_SYNC_FL;
if (ubifs_flags & UBIFS_APPEND_FL)
ioctl_flags |= FS_APPEND_FL;
if (ubifs_flags & UBIFS_IMMUTABLE_FL)
ioctl_flags |= FS_IMMUTABLE_FL;
if (ubifs_flags & UBIFS_DIRSYNC_FL)
ioctl_flags |= FS_DIRSYNC_FL;
return ioctl_flags;
}
static int setflags(struct inode *inode, int flags)
{
int oldflags, err, release;
struct ubifs_inode *ui = ubifs_inode(inode);
struct ubifs_info *c = inode->i_sb->s_fs_info;
struct ubifs_budget_req req = { .dirtied_ino = 1,
.dirtied_ino_d = ui->data_len };
err = ubifs_budget_space(c, &req);
if (err)
return err;
/*
* The IMMUTABLE and APPEND_ONLY flags can only be changed by
* the relevant capability.
*/
mutex_lock(&ui->ui_mutex);
oldflags = ubifs2ioctl(ui->flags);
if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL)) {
if (!capable(CAP_LINUX_IMMUTABLE)) {
err = -EPERM;
goto out_unlock;
}
}
ui->flags = ioctl2ubifs(flags);
ubifs_set_inode_flags(inode);
inode->i_ctime = ubifs_current_time(inode);
release = ui->dirty;
mark_inode_dirty_sync(inode);
mutex_unlock(&ui->ui_mutex);
if (release)
ubifs_release_budget(c, &req);
if (IS_SYNC(inode))
err = write_inode_now(inode, 1);
return err;
out_unlock:
ubifs_err("can't modify inode %lu attributes", inode->i_ino);
mutex_unlock(&ui->ui_mutex);
ubifs_release_budget(c, &req);
return err;
}
long ubifs_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
int flags, err;
struct inode *inode = file->f_path.dentry->d_inode;
switch (cmd) {
case FS_IOC_GETFLAGS:
flags = ubifs2ioctl(ubifs_inode(inode)->flags);
return put_user(flags, (int __user *) arg);
case FS_IOC_SETFLAGS: {
if (IS_RDONLY(inode))
return -EROFS;
if (!is_owner_or_cap(inode))
return -EACCES;
if (get_user(flags, (int __user *) arg))
return -EFAULT;
if (!S_ISDIR(inode->i_mode))
flags &= ~FS_DIRSYNC_FL;
/*
* Make sure the file-system is read-write and make sure it
* will not become read-only while we are changing the flags.
*/
err = mnt_want_write(file->f_path.mnt);
if (err)
return err;
err = setflags(inode, flags);
mnt_drop_write(file->f_path.mnt);
return err;
}
default:
return -ENOTTY;
}
}
#ifdef CONFIG_COMPAT
long ubifs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
switch (cmd) {
case FS_IOC32_GETFLAGS:
cmd = FS_IOC_GETFLAGS;
break;
case FS_IOC32_SETFLAGS:
cmd = FS_IOC_SETFLAGS;
break;
default:
return -ENOIOCTLCMD;
}
return ubifs_ioctl(file, cmd, (unsigned long)compat_ptr(arg));
}
#endif

1387
fs/ubifs/journal.c Normal file

File diff suppressed because it is too large Load diff

533
fs/ubifs/key.h Normal file
View file

@ -0,0 +1,533 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/*
* This header contains various key-related definitions and helper function.
* UBIFS allows several key schemes, so we access key fields only via these
* helpers. At the moment only one key scheme is supported.
*
* Simple key scheme
* ~~~~~~~~~~~~~~~~~
*
* Keys are 64-bits long. First 32-bits are inode number (parent inode number
* in case of direntry key). Next 3 bits are node type. The last 29 bits are
* 4KiB offset in case of inode node, and direntry hash in case of a direntry
* node. We use "r5" hash borrowed from reiserfs.
*/
#ifndef __UBIFS_KEY_H__
#define __UBIFS_KEY_H__
/**
* key_r5_hash - R5 hash function (borrowed from reiserfs).
* @s: direntry name
* @len: name length
*/
static inline uint32_t key_r5_hash(const char *s, int len)
{
uint32_t a = 0;
const signed char *str = (const signed char *)s;
while (*str) {
a += *str << 4;
a += *str >> 4;
a *= 11;
str++;
}
a &= UBIFS_S_KEY_HASH_MASK;
/*
* We use hash values as offset in directories, so values %0 and %1 are
* reserved for "." and "..". %2 is reserved for "end of readdir"
* marker.
*/
if (unlikely(a >= 0 && a <= 2))
a += 3;
return a;
}
/**
* key_test_hash - testing hash function.
* @str: direntry name
* @len: name length
*/
static inline uint32_t key_test_hash(const char *str, int len)
{
uint32_t a = 0;
len = min_t(uint32_t, len, 4);
memcpy(&a, str, len);
a &= UBIFS_S_KEY_HASH_MASK;
if (unlikely(a >= 0 && a <= 2))
a += 3;
return a;
}
/**
* ino_key_init - initialize inode key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: inode number
*/
static inline void ino_key_init(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = UBIFS_INO_KEY << UBIFS_S_KEY_BLOCK_BITS;
}
/**
* ino_key_init_flash - initialize on-flash inode key.
* @c: UBIFS file-system description object
* @k: key to initialize
* @inum: inode number
*/
static inline void ino_key_init_flash(const struct ubifs_info *c, void *k,
ino_t inum)
{
union ubifs_key *key = k;
key->j32[0] = cpu_to_le32(inum);
key->j32[1] = cpu_to_le32(UBIFS_INO_KEY << UBIFS_S_KEY_BLOCK_BITS);
memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
}
/**
* lowest_ino_key - get the lowest possible inode key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: inode number
*/
static inline void lowest_ino_key(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = 0;
}
/**
* highest_ino_key - get the highest possible inode key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: inode number
*/
static inline void highest_ino_key(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = 0xffffffff;
}
/**
* dent_key_init - initialize directory entry key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: parent inode number
* @nm: direntry name and length
*/
static inline void dent_key_init(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum,
const struct qstr *nm)
{
uint32_t hash = c->key_hash(nm->name, nm->len);
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->u32[0] = inum;
key->u32[1] = hash | (UBIFS_DENT_KEY << UBIFS_S_KEY_HASH_BITS);
}
/**
* dent_key_init_hash - initialize directory entry key without re-calculating
* hash function.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: parent inode number
* @hash: direntry name hash
*/
static inline void dent_key_init_hash(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum,
uint32_t hash)
{
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->u32[0] = inum;
key->u32[1] = hash | (UBIFS_DENT_KEY << UBIFS_S_KEY_HASH_BITS);
}
/**
* dent_key_init_flash - initialize on-flash directory entry key.
* @c: UBIFS file-system description object
* @k: key to initialize
* @inum: parent inode number
* @nm: direntry name and length
*/
static inline void dent_key_init_flash(const struct ubifs_info *c, void *k,
ino_t inum, const struct qstr *nm)
{
union ubifs_key *key = k;
uint32_t hash = c->key_hash(nm->name, nm->len);
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->j32[0] = cpu_to_le32(inum);
key->j32[1] = cpu_to_le32(hash |
(UBIFS_DENT_KEY << UBIFS_S_KEY_HASH_BITS));
memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
}
/**
* lowest_dent_key - get the lowest possible directory entry key.
* @c: UBIFS file-system description object
* @key: where to store the lowest key
* @inum: parent inode number
*/
static inline void lowest_dent_key(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = UBIFS_DENT_KEY << UBIFS_S_KEY_HASH_BITS;
}
/**
* xent_key_init - initialize extended attribute entry key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: host inode number
* @nm: extended attribute entry name and length
*/
static inline void xent_key_init(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum,
const struct qstr *nm)
{
uint32_t hash = c->key_hash(nm->name, nm->len);
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->u32[0] = inum;
key->u32[1] = hash | (UBIFS_XENT_KEY << UBIFS_S_KEY_HASH_BITS);
}
/**
* xent_key_init_hash - initialize extended attribute entry key without
* re-calculating hash function.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: host inode number
* @hash: extended attribute entry name hash
*/
static inline void xent_key_init_hash(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum,
uint32_t hash)
{
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->u32[0] = inum;
key->u32[1] = hash | (UBIFS_XENT_KEY << UBIFS_S_KEY_HASH_BITS);
}
/**
* xent_key_init_flash - initialize on-flash extended attribute entry key.
* @c: UBIFS file-system description object
* @k: key to initialize
* @inum: host inode number
* @nm: extended attribute entry name and length
*/
static inline void xent_key_init_flash(const struct ubifs_info *c, void *k,
ino_t inum, const struct qstr *nm)
{
union ubifs_key *key = k;
uint32_t hash = c->key_hash(nm->name, nm->len);
ubifs_assert(!(hash & ~UBIFS_S_KEY_HASH_MASK));
key->j32[0] = cpu_to_le32(inum);
key->j32[1] = cpu_to_le32(hash |
(UBIFS_XENT_KEY << UBIFS_S_KEY_HASH_BITS));
memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
}
/**
* lowest_xent_key - get the lowest possible extended attribute entry key.
* @c: UBIFS file-system description object
* @key: where to store the lowest key
* @inum: host inode number
*/
static inline void lowest_xent_key(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = UBIFS_XENT_KEY << UBIFS_S_KEY_HASH_BITS;
}
/**
* data_key_init - initialize data key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: inode number
* @block: block number
*/
static inline void data_key_init(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum,
unsigned int block)
{
ubifs_assert(!(block & ~UBIFS_S_KEY_BLOCK_MASK));
key->u32[0] = inum;
key->u32[1] = block | (UBIFS_DATA_KEY << UBIFS_S_KEY_BLOCK_BITS);
}
/**
* data_key_init_flash - initialize on-flash data key.
* @c: UBIFS file-system description object
* @k: key to initialize
* @inum: inode number
* @block: block number
*/
static inline void data_key_init_flash(const struct ubifs_info *c, void *k,
ino_t inum, unsigned int block)
{
union ubifs_key *key = k;
ubifs_assert(!(block & ~UBIFS_S_KEY_BLOCK_MASK));
key->j32[0] = cpu_to_le32(inum);
key->j32[1] = cpu_to_le32(block |
(UBIFS_DATA_KEY << UBIFS_S_KEY_BLOCK_BITS));
memset(k + 8, 0, UBIFS_MAX_KEY_LEN - 8);
}
/**
* trun_key_init - initialize truncation node key.
* @c: UBIFS file-system description object
* @key: key to initialize
* @inum: inode number
*
* Note, UBIFS does not have truncation keys on the media and this function is
* only used for purposes of replay.
*/
static inline void trun_key_init(const struct ubifs_info *c,
union ubifs_key *key, ino_t inum)
{
key->u32[0] = inum;
key->u32[1] = UBIFS_TRUN_KEY << UBIFS_S_KEY_BLOCK_BITS;
}
/**
* key_type - get key type.
* @c: UBIFS file-system description object
* @key: key to get type of
*/
static inline int key_type(const struct ubifs_info *c,
const union ubifs_key *key)
{
return key->u32[1] >> UBIFS_S_KEY_BLOCK_BITS;
}
/**
* key_type_flash - get type of a on-flash formatted key.
* @c: UBIFS file-system description object
* @k: key to get type of
*/
static inline int key_type_flash(const struct ubifs_info *c, const void *k)
{
const union ubifs_key *key = k;
return le32_to_cpu(key->u32[1]) >> UBIFS_S_KEY_BLOCK_BITS;
}
/**
* key_inum - fetch inode number from key.
* @c: UBIFS file-system description object
* @k: key to fetch inode number from
*/
static inline ino_t key_inum(const struct ubifs_info *c, const void *k)
{
const union ubifs_key *key = k;
return key->u32[0];
}
/**
* key_inum_flash - fetch inode number from an on-flash formatted key.
* @c: UBIFS file-system description object
* @k: key to fetch inode number from
*/
static inline ino_t key_inum_flash(const struct ubifs_info *c, const void *k)
{
const union ubifs_key *key = k;
return le32_to_cpu(key->j32[0]);
}
/**
* key_hash - get directory entry hash.
* @c: UBIFS file-system description object
* @key: the key to get hash from
*/
static inline int key_hash(const struct ubifs_info *c,
const union ubifs_key *key)
{
return key->u32[1] & UBIFS_S_KEY_HASH_MASK;
}
/**
* key_hash_flash - get directory entry hash from an on-flash formatted key.
* @c: UBIFS file-system description object
* @k: the key to get hash from
*/
static inline int key_hash_flash(const struct ubifs_info *c, const void *k)
{
const union ubifs_key *key = k;
return le32_to_cpu(key->j32[1]) & UBIFS_S_KEY_HASH_MASK;
}
/**
* key_block - get data block number.
* @c: UBIFS file-system description object
* @key: the key to get the block number from
*/
static inline unsigned int key_block(const struct ubifs_info *c,
const union ubifs_key *key)
{
return key->u32[1] & UBIFS_S_KEY_BLOCK_MASK;
}
/**
* key_block_flash - get data block number from an on-flash formatted key.
* @c: UBIFS file-system description object
* @k: the key to get the block number from
*/
static inline unsigned int key_block_flash(const struct ubifs_info *c,
const void *k)
{
const union ubifs_key *key = k;
return le32_to_cpu(key->u32[1]) & UBIFS_S_KEY_BLOCK_MASK;
}
/**
* key_read - transform a key to in-memory format.
* @c: UBIFS file-system description object
* @from: the key to transform
* @to: the key to store the result
*/
static inline void key_read(const struct ubifs_info *c, const void *from,
union ubifs_key *to)
{
const union ubifs_key *f = from;
to->u32[0] = le32_to_cpu(f->j32[0]);
to->u32[1] = le32_to_cpu(f->j32[1]);
}
/**
* key_write - transform a key from in-memory format.
* @c: UBIFS file-system description object
* @from: the key to transform
* @to: the key to store the result
*/
static inline void key_write(const struct ubifs_info *c,
const union ubifs_key *from, void *to)
{
union ubifs_key *t = to;
t->j32[0] = cpu_to_le32(from->u32[0]);
t->j32[1] = cpu_to_le32(from->u32[1]);
memset(to + 8, 0, UBIFS_MAX_KEY_LEN - 8);
}
/**
* key_write_idx - transform a key from in-memory format for the index.
* @c: UBIFS file-system description object
* @from: the key to transform
* @to: the key to store the result
*/
static inline void key_write_idx(const struct ubifs_info *c,
const union ubifs_key *from, void *to)
{
union ubifs_key *t = to;
t->j32[0] = cpu_to_le32(from->u32[0]);
t->j32[1] = cpu_to_le32(from->u32[1]);
}
/**
* key_copy - copy a key.
* @c: UBIFS file-system description object
* @from: the key to copy from
* @to: the key to copy to
*/
static inline void key_copy(const struct ubifs_info *c,
const union ubifs_key *from, union ubifs_key *to)
{
to->u64[0] = from->u64[0];
}
/**
* keys_cmp - compare keys.
* @c: UBIFS file-system description object
* @key1: the first key to compare
* @key2: the second key to compare
*
* This function compares 2 keys and returns %-1 if @key1 is less than
* @key2, 0 if the keys are equivalent and %1 if @key1 is greater than @key2.
*/
static inline int keys_cmp(const struct ubifs_info *c,
const union ubifs_key *key1,
const union ubifs_key *key2)
{
if (key1->u32[0] < key2->u32[0])
return -1;
if (key1->u32[0] > key2->u32[0])
return 1;
if (key1->u32[1] < key2->u32[1])
return -1;
if (key1->u32[1] > key2->u32[1])
return 1;
return 0;
}
/**
* is_hash_key - is a key vulnerable to hash collisions.
* @c: UBIFS file-system description object
* @key: key
*
* This function returns %1 if @key is a hashed key or %0 otherwise.
*/
static inline int is_hash_key(const struct ubifs_info *c,
const union ubifs_key *key)
{
int type = key_type(c, key);
return type == UBIFS_DENT_KEY || type == UBIFS_XENT_KEY;
}
/**
* key_max_inode_size - get maximum file size allowed by current key format.
* @c: UBIFS file-system description object
*/
static inline unsigned long long key_max_inode_size(const struct ubifs_info *c)
{
switch (c->key_fmt) {
case UBIFS_SIMPLE_KEY_FMT:
return (1ULL << UBIFS_S_KEY_BLOCK_BITS) * UBIFS_BLOCK_SIZE;
default:
return 0;
}
}
#endif /* !__UBIFS_KEY_H__ */

805
fs/ubifs/log.c Normal file
View file

@ -0,0 +1,805 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/*
* This file is a part of UBIFS journal implementation and contains various
* functions which manipulate the log. The log is a fixed area on the flash
* which does not contain any data but refers to buds. The log is a part of the
* journal.
*/
#include "ubifs.h"
#ifdef CONFIG_UBIFS_FS_DEBUG
static int dbg_check_bud_bytes(struct ubifs_info *c);
#else
#define dbg_check_bud_bytes(c) 0
#endif
/**
* ubifs_search_bud - search bud LEB.
* @c: UBIFS file-system description object
* @lnum: logical eraseblock number to search
*
* This function searches bud LEB @lnum. Returns bud description object in case
* of success and %NULL if there is no bud with this LEB number.
*/
struct ubifs_bud *ubifs_search_bud(struct ubifs_info *c, int lnum)
{
struct rb_node *p;
struct ubifs_bud *bud;
spin_lock(&c->buds_lock);
p = c->buds.rb_node;
while (p) {
bud = rb_entry(p, struct ubifs_bud, rb);
if (lnum < bud->lnum)
p = p->rb_left;
else if (lnum > bud->lnum)
p = p->rb_right;
else {
spin_unlock(&c->buds_lock);
return bud;
}
}
spin_unlock(&c->buds_lock);
return NULL;
}
/**
* ubifs_get_wbuf - get the wbuf associated with a LEB, if there is one.
* @c: UBIFS file-system description object
* @lnum: logical eraseblock number to search
*
* This functions returns the wbuf for @lnum or %NULL if there is not one.
*/
struct ubifs_wbuf *ubifs_get_wbuf(struct ubifs_info *c, int lnum)
{
struct rb_node *p;
struct ubifs_bud *bud;
int jhead;
if (!c->jheads)
return NULL;
spin_lock(&c->buds_lock);
p = c->buds.rb_node;
while (p) {
bud = rb_entry(p, struct ubifs_bud, rb);
if (lnum < bud->lnum)
p = p->rb_left;
else if (lnum > bud->lnum)
p = p->rb_right;
else {
jhead = bud->jhead;
spin_unlock(&c->buds_lock);
return &c->jheads[jhead].wbuf;
}
}
spin_unlock(&c->buds_lock);
return NULL;
}
/**
* next_log_lnum - switch to the next log LEB.
* @c: UBIFS file-system description object
* @lnum: current log LEB
*/
static inline int next_log_lnum(const struct ubifs_info *c, int lnum)
{
lnum += 1;
if (lnum > c->log_last)
lnum = UBIFS_LOG_LNUM;
return lnum;
}
/**
* empty_log_bytes - calculate amount of empty space in the log.
* @c: UBIFS file-system description object
*/
static inline long long empty_log_bytes(const struct ubifs_info *c)
{
long long h, t;
h = (long long)c->lhead_lnum * c->leb_size + c->lhead_offs;
t = (long long)c->ltail_lnum * c->leb_size;
if (h >= t)
return c->log_bytes - h + t;
else
return t - h;
}
/**
* ubifs_add_bud - add bud LEB to the tree of buds and its journal head list.
* @c: UBIFS file-system description object
* @bud: the bud to add
*/
void ubifs_add_bud(struct ubifs_info *c, struct ubifs_bud *bud)
{
struct rb_node **p, *parent = NULL;
struct ubifs_bud *b;
struct ubifs_jhead *jhead;
spin_lock(&c->buds_lock);
p = &c->buds.rb_node;
while (*p) {
parent = *p;
b = rb_entry(parent, struct ubifs_bud, rb);
ubifs_assert(bud->lnum != b->lnum);
if (bud->lnum < b->lnum)
p = &(*p)->rb_left;
else
p = &(*p)->rb_right;
}
rb_link_node(&bud->rb, parent, p);
rb_insert_color(&bud->rb, &c->buds);
if (c->jheads) {
jhead = &c->jheads[bud->jhead];
list_add_tail(&bud->list, &jhead->buds_list);
} else
ubifs_assert(c->replaying && (c->vfs_sb->s_flags & MS_RDONLY));
/*
* Note, although this is a new bud, we anyway account this space now,
* before any data has been written to it, because this is about to
* guarantee fixed mount time, and this bud will anyway be read and
* scanned.
*/
c->bud_bytes += c->leb_size - bud->start;
dbg_log("LEB %d:%d, jhead %d, bud_bytes %lld", bud->lnum,
bud->start, bud->jhead, c->bud_bytes);
spin_unlock(&c->buds_lock);
}
/**
* ubifs_create_buds_lists - create journal head buds lists for remount rw.
* @c: UBIFS file-system description object
*/
void ubifs_create_buds_lists(struct ubifs_info *c)
{
struct rb_node *p;
spin_lock(&c->buds_lock);
p = rb_first(&c->buds);
while (p) {
struct ubifs_bud *bud = rb_entry(p, struct ubifs_bud, rb);
struct ubifs_jhead *jhead = &c->jheads[bud->jhead];
list_add_tail(&bud->list, &jhead->buds_list);
p = rb_next(p);
}
spin_unlock(&c->buds_lock);
}
/**
* ubifs_add_bud_to_log - add a new bud to the log.
* @c: UBIFS file-system description object
* @jhead: journal head the bud belongs to
* @lnum: LEB number of the bud
* @offs: starting offset of the bud
*
* This function writes reference node for the new bud LEB @lnum it to the log,
* and adds it to the buds tress. It also makes sure that log size does not
* exceed the 'c->max_bud_bytes' limit. Returns zero in case of success,
* %-EAGAIN if commit is required, and a negative error codes in case of
* failure.
*/
int ubifs_add_bud_to_log(struct ubifs_info *c, int jhead, int lnum, int offs)
{
int err;
struct ubifs_bud *bud;
struct ubifs_ref_node *ref;
bud = kmalloc(sizeof(struct ubifs_bud), GFP_NOFS);
if (!bud)
return -ENOMEM;
ref = kzalloc(c->ref_node_alsz, GFP_NOFS);
if (!ref) {
kfree(bud);
return -ENOMEM;
}
mutex_lock(&c->log_mutex);
if (c->ro_media) {
err = -EROFS;
goto out_unlock;
}
/* Make sure we have enough space in the log */
if (empty_log_bytes(c) - c->ref_node_alsz < c->min_log_bytes) {
dbg_log("not enough log space - %lld, required %d",
empty_log_bytes(c), c->min_log_bytes);
ubifs_commit_required(c);
err = -EAGAIN;
goto out_unlock;
}
/*
* Make sure the the amount of space in buds will not exceed
* 'c->max_bud_bytes' limit, because we want to guarantee mount time
* limits.
*
* It is not necessary to hold @c->buds_lock when reading @c->bud_bytes
* because we are holding @c->log_mutex. All @c->bud_bytes take place
* when both @c->log_mutex and @c->bud_bytes are locked.
*/
if (c->bud_bytes + c->leb_size - offs > c->max_bud_bytes) {
dbg_log("bud bytes %lld (%lld max), require commit",
c->bud_bytes, c->max_bud_bytes);
ubifs_commit_required(c);
err = -EAGAIN;
goto out_unlock;
}
/*
* If the journal is full enough - start background commit. Note, it is
* OK to read 'c->cmt_state' without spinlock because integer reads
* are atomic in the kernel.
*/
if (c->bud_bytes >= c->bg_bud_bytes &&
c->cmt_state == COMMIT_RESTING) {
dbg_log("bud bytes %lld (%lld max), initiate BG commit",
c->bud_bytes, c->max_bud_bytes);
ubifs_request_bg_commit(c);
}
bud->lnum = lnum;
bud->start = offs;
bud->jhead = jhead;
ref->ch.node_type = UBIFS_REF_NODE;
ref->lnum = cpu_to_le32(bud->lnum);
ref->offs = cpu_to_le32(bud->start);
ref->jhead = cpu_to_le32(jhead);
if (c->lhead_offs > c->leb_size - c->ref_node_alsz) {
c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
c->lhead_offs = 0;
}
if (c->lhead_offs == 0) {
/* Must ensure next log LEB has been unmapped */
err = ubifs_leb_unmap(c, c->lhead_lnum);
if (err)
goto out_unlock;
}
if (bud->start == 0) {
/*
* Before writing the LEB reference which refers an empty LEB
* to the log, we have to make sure it is mapped, because
* otherwise we'd risk to refer an LEB with garbage in case of
* an unclean reboot, because the target LEB might have been
* unmapped, but not yet physically erased.
*/
err = ubi_leb_map(c->ubi, bud->lnum, UBI_SHORTTERM);
if (err)
goto out_unlock;
}
dbg_log("write ref LEB %d:%d",
c->lhead_lnum, c->lhead_offs);
err = ubifs_write_node(c, ref, UBIFS_REF_NODE_SZ, c->lhead_lnum,
c->lhead_offs, UBI_SHORTTERM);
if (err)
goto out_unlock;
c->lhead_offs += c->ref_node_alsz;
ubifs_add_bud(c, bud);
mutex_unlock(&c->log_mutex);
kfree(ref);
return 0;
out_unlock:
mutex_unlock(&c->log_mutex);
kfree(ref);
kfree(bud);
return err;
}
/**
* remove_buds - remove used buds.
* @c: UBIFS file-system description object
*
* This function removes use buds from the buds tree. It does not remove the
* buds which are pointed to by journal heads.
*/
static void remove_buds(struct ubifs_info *c)
{
struct rb_node *p;
ubifs_assert(list_empty(&c->old_buds));
c->cmt_bud_bytes = 0;
spin_lock(&c->buds_lock);
p = rb_first(&c->buds);
while (p) {
struct rb_node *p1 = p;
struct ubifs_bud *bud;
struct ubifs_wbuf *wbuf;
p = rb_next(p);
bud = rb_entry(p1, struct ubifs_bud, rb);
wbuf = &c->jheads[bud->jhead].wbuf;
if (wbuf->lnum == bud->lnum) {
/*
* Do not remove buds which are pointed to by journal
* heads (non-closed buds).
*/
c->cmt_bud_bytes += wbuf->offs - bud->start;
dbg_log("preserve %d:%d, jhead %d, bud bytes %d, "
"cmt_bud_bytes %lld", bud->lnum, bud->start,
bud->jhead, wbuf->offs - bud->start,
c->cmt_bud_bytes);
bud->start = wbuf->offs;
} else {
c->cmt_bud_bytes += c->leb_size - bud->start;
dbg_log("remove %d:%d, jhead %d, bud bytes %d, "
"cmt_bud_bytes %lld", bud->lnum, bud->start,
bud->jhead, c->leb_size - bud->start,
c->cmt_bud_bytes);
rb_erase(p1, &c->buds);
list_del(&bud->list);
/*
* If the commit does not finish, the recovery will need
* to replay the journal, in which case the old buds
* must be unchanged. Do not release them until post
* commit i.e. do not allow them to be garbage
* collected.
*/
list_add(&bud->list, &c->old_buds);
}
}
spin_unlock(&c->buds_lock);
}
/**
* ubifs_log_start_commit - start commit.
* @c: UBIFS file-system description object
* @ltail_lnum: return new log tail LEB number
*
* The commit operation starts with writing "commit start" node to the log and
* reference nodes for all journal heads which will define new journal after
* the commit has been finished. The commit start and reference nodes are
* written in one go to the nearest empty log LEB (hence, when commit is
* finished UBIFS may safely unmap all the previous log LEBs). This function
* returns zero in case of success and a negative error code in case of
* failure.
*/
int ubifs_log_start_commit(struct ubifs_info *c, int *ltail_lnum)
{
void *buf;
struct ubifs_cs_node *cs;
struct ubifs_ref_node *ref;
int err, i, max_len, len;
err = dbg_check_bud_bytes(c);
if (err)
return err;
max_len = UBIFS_CS_NODE_SZ + c->jhead_cnt * UBIFS_REF_NODE_SZ;
max_len = ALIGN(max_len, c->min_io_size);
buf = cs = kmalloc(max_len, GFP_NOFS);
if (!buf)
return -ENOMEM;
cs->ch.node_type = UBIFS_CS_NODE;
cs->cmt_no = cpu_to_le64(c->cmt_no + 1);
ubifs_prepare_node(c, cs, UBIFS_CS_NODE_SZ, 0);
/*
* Note, we do not lock 'c->log_mutex' because this is the commit start
* phase and we are exclusively using the log. And we do not lock
* write-buffer because nobody can write to the file-system at this
* phase.
*/
len = UBIFS_CS_NODE_SZ;
for (i = 0; i < c->jhead_cnt; i++) {
int lnum = c->jheads[i].wbuf.lnum;
int offs = c->jheads[i].wbuf.offs;
if (lnum == -1 || offs == c->leb_size)
continue;
dbg_log("add ref to LEB %d:%d for jhead %d", lnum, offs, i);
ref = buf + len;
ref->ch.node_type = UBIFS_REF_NODE;
ref->lnum = cpu_to_le32(lnum);
ref->offs = cpu_to_le32(offs);
ref->jhead = cpu_to_le32(i);
ubifs_prepare_node(c, ref, UBIFS_REF_NODE_SZ, 0);
len += UBIFS_REF_NODE_SZ;
}
ubifs_pad(c, buf + len, ALIGN(len, c->min_io_size) - len);
/* Switch to the next log LEB */
if (c->lhead_offs) {
c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
c->lhead_offs = 0;
}
if (c->lhead_offs == 0) {
/* Must ensure next LEB has been unmapped */
err = ubifs_leb_unmap(c, c->lhead_lnum);
if (err)
goto out;
}
len = ALIGN(len, c->min_io_size);
dbg_log("writing commit start at LEB %d:0, len %d", c->lhead_lnum, len);
err = ubifs_leb_write(c, c->lhead_lnum, cs, 0, len, UBI_SHORTTERM);
if (err)
goto out;
*ltail_lnum = c->lhead_lnum;
c->lhead_offs += len;
if (c->lhead_offs == c->leb_size) {
c->lhead_lnum = next_log_lnum(c, c->lhead_lnum);
c->lhead_offs = 0;
}
remove_buds(c);
/*
* We have started the commit and now users may use the rest of the log
* for new writes.
*/
c->min_log_bytes = 0;
out:
kfree(buf);
return err;
}
/**
* ubifs_log_end_commit - end commit.
* @c: UBIFS file-system description object
* @ltail_lnum: new log tail LEB number
*
* This function is called on when the commit operation was finished. It
* moves log tail to new position and unmaps LEBs which contain obsolete data.
* Returns zero in case of success and a negative error code in case of
* failure.
*/
int ubifs_log_end_commit(struct ubifs_info *c, int ltail_lnum)
{
int err;
/*
* At this phase we have to lock 'c->log_mutex' because UBIFS allows FS
* writes during commit. Its only short "commit" start phase when
* writers are blocked.
*/
mutex_lock(&c->log_mutex);
dbg_log("old tail was LEB %d:0, new tail is LEB %d:0",
c->ltail_lnum, ltail_lnum);
c->ltail_lnum = ltail_lnum;
/*
* The commit is finished and from now on it must be guaranteed that
* there is always enough space for the next commit.
*/
c->min_log_bytes = c->leb_size;
spin_lock(&c->buds_lock);
c->bud_bytes -= c->cmt_bud_bytes;
spin_unlock(&c->buds_lock);
err = dbg_check_bud_bytes(c);
mutex_unlock(&c->log_mutex);
return err;
}
/**
* ubifs_log_post_commit - things to do after commit is completed.
* @c: UBIFS file-system description object
* @old_ltail_lnum: old log tail LEB number
*
* Release buds only after commit is completed, because they must be unchanged
* if recovery is needed.
*
* Unmap log LEBs only after commit is completed, because they may be needed for
* recovery.
*
* This function returns %0 on success and a negative error code on failure.
*/
int ubifs_log_post_commit(struct ubifs_info *c, int old_ltail_lnum)
{
int lnum, err = 0;
while (!list_empty(&c->old_buds)) {
struct ubifs_bud *bud;
bud = list_entry(c->old_buds.next, struct ubifs_bud, list);
err = ubifs_return_leb(c, bud->lnum);
if (err)
return err;
list_del(&bud->list);
kfree(bud);
}
mutex_lock(&c->log_mutex);
for (lnum = old_ltail_lnum; lnum != c->ltail_lnum;
lnum = next_log_lnum(c, lnum)) {
dbg_log("unmap log LEB %d", lnum);
err = ubifs_leb_unmap(c, lnum);
if (err)
goto out;
}
out:
mutex_unlock(&c->log_mutex);
return err;
}
/**
* struct done_ref - references that have been done.
* @rb: rb-tree node
* @lnum: LEB number
*/
struct done_ref {
struct rb_node rb;
int lnum;
};
/**
* done_already - determine if a reference has been done already.
* @done_tree: rb-tree to store references that have been done
* @lnum: LEB number of reference
*
* This function returns %1 if the reference has been done, %0 if not, otherwise
* a negative error code is returned.
*/
static int done_already(struct rb_root *done_tree, int lnum)
{
struct rb_node **p = &done_tree->rb_node, *parent = NULL;
struct done_ref *dr;
while (*p) {
parent = *p;
dr = rb_entry(parent, struct done_ref, rb);
if (lnum < dr->lnum)
p = &(*p)->rb_left;
else if (lnum > dr->lnum)
p = &(*p)->rb_right;
else
return 1;
}
dr = kzalloc(sizeof(struct done_ref), GFP_NOFS);
if (!dr)
return -ENOMEM;
dr->lnum = lnum;
rb_link_node(&dr->rb, parent, p);
rb_insert_color(&dr->rb, done_tree);
return 0;
}
/**
* destroy_done_tree - destroy the done tree.
* @done_tree: done tree to destroy
*/
static void destroy_done_tree(struct rb_root *done_tree)
{
struct rb_node *this = done_tree->rb_node;
struct done_ref *dr;
while (this) {
if (this->rb_left) {
this = this->rb_left;
continue;
} else if (this->rb_right) {
this = this->rb_right;
continue;
}
dr = rb_entry(this, struct done_ref, rb);
this = rb_parent(this);
if (this) {
if (this->rb_left == &dr->rb)
this->rb_left = NULL;
else
this->rb_right = NULL;
}
kfree(dr);
}
}
/**
* add_node - add a node to the consolidated log.
* @c: UBIFS file-system description object
* @buf: buffer to which to add
* @lnum: LEB number to which to write is passed and returned here
* @offs: offset to where to write is passed and returned here
* @node: node to add
*
* This function returns %0 on success and a negative error code on failure.
*/
static int add_node(struct ubifs_info *c, void *buf, int *lnum, int *offs,
void *node)
{
struct ubifs_ch *ch = node;
int len = le32_to_cpu(ch->len), remains = c->leb_size - *offs;
if (len > remains) {
int sz = ALIGN(*offs, c->min_io_size), err;
ubifs_pad(c, buf + *offs, sz - *offs);
err = ubifs_leb_change(c, *lnum, buf, sz, UBI_SHORTTERM);
if (err)
return err;
*lnum = next_log_lnum(c, *lnum);
*offs = 0;
}
memcpy(buf + *offs, node, len);
*offs += ALIGN(len, 8);
return 0;
}
/**
* ubifs_consolidate_log - consolidate the log.
* @c: UBIFS file-system description object
*
* Repeated failed commits could cause the log to be full, but at least 1 LEB is
* needed for commit. This function rewrites the reference nodes in the log
* omitting duplicates, and failed CS nodes, and leaving no gaps.
*
* This function returns %0 on success and a negative error code on failure.
*/
int ubifs_consolidate_log(struct ubifs_info *c)
{
struct ubifs_scan_leb *sleb;
struct ubifs_scan_node *snod;
struct rb_root done_tree = RB_ROOT;
int lnum, err, first = 1, write_lnum, offs = 0;
void *buf;
dbg_rcvry("log tail LEB %d, log head LEB %d", c->ltail_lnum,
c->lhead_lnum);
buf = vmalloc(c->leb_size);
if (!buf)
return -ENOMEM;
lnum = c->ltail_lnum;
write_lnum = lnum;
while (1) {
sleb = ubifs_scan(c, lnum, 0, c->sbuf);
if (IS_ERR(sleb)) {
err = PTR_ERR(sleb);
goto out_free;
}
list_for_each_entry(snod, &sleb->nodes, list) {
switch (snod->type) {
case UBIFS_REF_NODE: {
struct ubifs_ref_node *ref = snod->node;
int ref_lnum = le32_to_cpu(ref->lnum);
err = done_already(&done_tree, ref_lnum);
if (err < 0)
goto out_scan;
if (err != 1) {
err = add_node(c, buf, &write_lnum,
&offs, snod->node);
if (err)
goto out_scan;
}
break;
}
case UBIFS_CS_NODE:
if (!first)
break;
err = add_node(c, buf, &write_lnum, &offs,
snod->node);
if (err)
goto out_scan;
first = 0;
break;
}
}
ubifs_scan_destroy(sleb);
if (lnum == c->lhead_lnum)
break;
lnum = next_log_lnum(c, lnum);
}
if (offs) {
int sz = ALIGN(offs, c->min_io_size);
ubifs_pad(c, buf + offs, sz - offs);
err = ubifs_leb_change(c, write_lnum, buf, sz, UBI_SHORTTERM);
if (err)
goto out_free;
offs = ALIGN(offs, c->min_io_size);
}
destroy_done_tree(&done_tree);
vfree(buf);
if (write_lnum == c->lhead_lnum) {
ubifs_err("log is too full");
return -EINVAL;
}
/* Unmap remaining LEBs */
lnum = write_lnum;
do {
lnum = next_log_lnum(c, lnum);
err = ubifs_leb_unmap(c, lnum);
if (err)
return err;
} while (lnum != c->lhead_lnum);
c->lhead_lnum = write_lnum;
c->lhead_offs = offs;
dbg_rcvry("new log head at %d:%d", c->lhead_lnum, c->lhead_offs);
return 0;
out_scan:
ubifs_scan_destroy(sleb);
out_free:
destroy_done_tree(&done_tree);
vfree(buf);
return err;
}
#ifdef CONFIG_UBIFS_FS_DEBUG
/**
* dbg_check_bud_bytes - make sure bud bytes calculation are all right.
* @c: UBIFS file-system description object
*
* This function makes sure the amount of flash space used by closed buds
* ('c->bud_bytes' is correct). Returns zero in case of success and %-EINVAL in
* case of failure.
*/
static int dbg_check_bud_bytes(struct ubifs_info *c)
{
int i, err = 0;
struct ubifs_bud *bud;
long long bud_bytes = 0;
if (!(ubifs_chk_flags & UBIFS_CHK_GEN))
return 0;
spin_lock(&c->buds_lock);
for (i = 0; i < c->jhead_cnt; i++)
list_for_each_entry(bud, &c->jheads[i].buds_list, list)
bud_bytes += c->leb_size - bud->start;
if (c->bud_bytes != bud_bytes) {
ubifs_err("bad bud_bytes %lld, calculated %lld",
c->bud_bytes, bud_bytes);
err = -EINVAL;
}
spin_unlock(&c->buds_lock);
return err;
}
#endif /* CONFIG_UBIFS_FS_DEBUG */

1357
fs/ubifs/lprops.c Normal file

File diff suppressed because it is too large Load diff

2243
fs/ubifs/lpt.c Normal file

File diff suppressed because it is too large Load diff

1648
fs/ubifs/lpt_commit.c Normal file

File diff suppressed because it is too large Load diff

387
fs/ubifs/master.c Normal file
View file

@ -0,0 +1,387 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/* This file implements reading and writing the master node */
#include "ubifs.h"
/**
* scan_for_master - search the valid master node.
* @c: UBIFS file-system description object
*
* This function scans the master node LEBs and search for the latest master
* node. Returns zero in case of success and a negative error code in case of
* failure.
*/
static int scan_for_master(struct ubifs_info *c)
{
struct ubifs_scan_leb *sleb;
struct ubifs_scan_node *snod;
int lnum, offs = 0, nodes_cnt;
lnum = UBIFS_MST_LNUM;
sleb = ubifs_scan(c, lnum, 0, c->sbuf);
if (IS_ERR(sleb))
return PTR_ERR(sleb);
nodes_cnt = sleb->nodes_cnt;
if (nodes_cnt > 0) {
snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node,
list);
if (snod->type != UBIFS_MST_NODE)
goto out;
memcpy(c->mst_node, snod->node, snod->len);
offs = snod->offs;
}
ubifs_scan_destroy(sleb);
lnum += 1;
sleb = ubifs_scan(c, lnum, 0, c->sbuf);
if (IS_ERR(sleb))
return PTR_ERR(sleb);
if (sleb->nodes_cnt != nodes_cnt)
goto out;
if (!sleb->nodes_cnt)
goto out;
snod = list_entry(sleb->nodes.prev, struct ubifs_scan_node, list);
if (snod->type != UBIFS_MST_NODE)
goto out;
if (snod->offs != offs)
goto out;
if (memcmp((void *)c->mst_node + UBIFS_CH_SZ,
(void *)snod->node + UBIFS_CH_SZ,
UBIFS_MST_NODE_SZ - UBIFS_CH_SZ))
goto out;
c->mst_offs = offs;
ubifs_scan_destroy(sleb);
return 0;
out:
ubifs_scan_destroy(sleb);
return -EINVAL;
}
/**
* validate_master - validate master node.
* @c: UBIFS file-system description object
*
* This function validates data which was read from master node. Returns zero
* if the data is all right and %-EINVAL if not.
*/
static int validate_master(const struct ubifs_info *c)
{
long long main_sz;
int err;
if (c->max_sqnum >= SQNUM_WATERMARK) {
err = 1;
goto out;
}
if (c->cmt_no >= c->max_sqnum) {
err = 2;
goto out;
}
if (c->highest_inum >= INUM_WATERMARK) {
err = 3;
goto out;
}
if (c->lhead_lnum < UBIFS_LOG_LNUM ||
c->lhead_lnum >= UBIFS_LOG_LNUM + c->log_lebs ||
c->lhead_offs < 0 || c->lhead_offs >= c->leb_size ||
c->lhead_offs & (c->min_io_size - 1)) {
err = 4;
goto out;
}
if (c->zroot.lnum >= c->leb_cnt || c->zroot.lnum < c->main_first ||
c->zroot.offs >= c->leb_size || c->zroot.offs & 7) {
err = 5;
goto out;
}
if (c->zroot.len < c->ranges[UBIFS_IDX_NODE].min_len ||
c->zroot.len > c->ranges[UBIFS_IDX_NODE].max_len) {
err = 6;
goto out;
}
if (c->gc_lnum >= c->leb_cnt || c->gc_lnum < c->main_first) {
err = 7;
goto out;
}
if (c->ihead_lnum >= c->leb_cnt || c->ihead_lnum < c->main_first ||
c->ihead_offs % c->min_io_size || c->ihead_offs < 0 ||
c->ihead_offs > c->leb_size || c->ihead_offs & 7) {
err = 8;
goto out;
}
main_sz = (long long)c->main_lebs * c->leb_size;
if (c->old_idx_sz & 7 || c->old_idx_sz >= main_sz) {
err = 9;
goto out;
}
if (c->lpt_lnum < c->lpt_first || c->lpt_lnum > c->lpt_last ||
c->lpt_offs < 0 || c->lpt_offs + c->nnode_sz > c->leb_size) {
err = 10;
goto out;
}
if (c->nhead_lnum < c->lpt_first || c->nhead_lnum > c->lpt_last ||
c->nhead_offs < 0 || c->nhead_offs % c->min_io_size ||
c->nhead_offs > c->leb_size) {
err = 11;
goto out;
}
if (c->ltab_lnum < c->lpt_first || c->ltab_lnum > c->lpt_last ||
c->ltab_offs < 0 ||
c->ltab_offs + c->ltab_sz > c->leb_size) {
err = 12;
goto out;
}
if (c->big_lpt && (c->lsave_lnum < c->lpt_first ||
c->lsave_lnum > c->lpt_last || c->lsave_offs < 0 ||
c->lsave_offs + c->lsave_sz > c->leb_size)) {
err = 13;
goto out;
}
if (c->lscan_lnum < c->main_first || c->lscan_lnum >= c->leb_cnt) {
err = 14;
goto out;
}
if (c->lst.empty_lebs < 0 || c->lst.empty_lebs > c->main_lebs - 2) {
err = 15;
goto out;
}
if (c->lst.idx_lebs < 0 || c->lst.idx_lebs > c->main_lebs - 1) {
err = 16;
goto out;
}
if (c->lst.total_free < 0 || c->lst.total_free > main_sz ||
c->lst.total_free & 7) {
err = 17;
goto out;
}
if (c->lst.total_dirty < 0 || (c->lst.total_dirty & 7)) {
err = 18;
goto out;
}
if (c->lst.total_used < 0 || (c->lst.total_used & 7)) {
err = 19;
goto out;
}
if (c->lst.total_free + c->lst.total_dirty +
c->lst.total_used > main_sz) {
err = 20;
goto out;
}
if (c->lst.total_dead + c->lst.total_dark +
c->lst.total_used + c->old_idx_sz > main_sz) {
err = 21;
goto out;
}
if (c->lst.total_dead < 0 ||
c->lst.total_dead > c->lst.total_free + c->lst.total_dirty ||
c->lst.total_dead & 7) {
err = 22;
goto out;
}
if (c->lst.total_dark < 0 ||
c->lst.total_dark > c->lst.total_free + c->lst.total_dirty ||
c->lst.total_dark & 7) {
err = 23;
goto out;
}
return 0;
out:
ubifs_err("bad master node at offset %d error %d", c->mst_offs, err);
dbg_dump_node(c, c->mst_node);
return -EINVAL;
}
/**
* ubifs_read_master - read master node.
* @c: UBIFS file-system description object
*
* This function finds and reads the master node during file-system mount. If
* the flash is empty, it creates default master node as well. Returns zero in
* case of success and a negative error code in case of failure.
*/
int ubifs_read_master(struct ubifs_info *c)
{
int err, old_leb_cnt;
c->mst_node = kzalloc(c->mst_node_alsz, GFP_KERNEL);
if (!c->mst_node)
return -ENOMEM;
err = scan_for_master(c);
if (err) {
err = ubifs_recover_master_node(c);
if (err)
/*
* Note, we do not free 'c->mst_node' here because the
* unmount routine will take care of this.
*/
return err;
}
/* Make sure that the recovery flag is clear */
c->mst_node->flags &= cpu_to_le32(~UBIFS_MST_RCVRY);
c->max_sqnum = le64_to_cpu(c->mst_node->ch.sqnum);
c->highest_inum = le64_to_cpu(c->mst_node->highest_inum);
c->cmt_no = le64_to_cpu(c->mst_node->cmt_no);
c->zroot.lnum = le32_to_cpu(c->mst_node->root_lnum);
c->zroot.offs = le32_to_cpu(c->mst_node->root_offs);
c->zroot.len = le32_to_cpu(c->mst_node->root_len);
c->lhead_lnum = le32_to_cpu(c->mst_node->log_lnum);
c->gc_lnum = le32_to_cpu(c->mst_node->gc_lnum);
c->ihead_lnum = le32_to_cpu(c->mst_node->ihead_lnum);
c->ihead_offs = le32_to_cpu(c->mst_node->ihead_offs);
c->old_idx_sz = le64_to_cpu(c->mst_node->index_size);
c->lpt_lnum = le32_to_cpu(c->mst_node->lpt_lnum);
c->lpt_offs = le32_to_cpu(c->mst_node->lpt_offs);
c->nhead_lnum = le32_to_cpu(c->mst_node->nhead_lnum);
c->nhead_offs = le32_to_cpu(c->mst_node->nhead_offs);
c->ltab_lnum = le32_to_cpu(c->mst_node->ltab_lnum);
c->ltab_offs = le32_to_cpu(c->mst_node->ltab_offs);
c->lsave_lnum = le32_to_cpu(c->mst_node->lsave_lnum);
c->lsave_offs = le32_to_cpu(c->mst_node->lsave_offs);
c->lscan_lnum = le32_to_cpu(c->mst_node->lscan_lnum);
c->lst.empty_lebs = le32_to_cpu(c->mst_node->empty_lebs);
c->lst.idx_lebs = le32_to_cpu(c->mst_node->idx_lebs);
old_leb_cnt = le32_to_cpu(c->mst_node->leb_cnt);
c->lst.total_free = le64_to_cpu(c->mst_node->total_free);
c->lst.total_dirty = le64_to_cpu(c->mst_node->total_dirty);
c->lst.total_used = le64_to_cpu(c->mst_node->total_used);
c->lst.total_dead = le64_to_cpu(c->mst_node->total_dead);
c->lst.total_dark = le64_to_cpu(c->mst_node->total_dark);
c->calc_idx_sz = c->old_idx_sz;
if (c->mst_node->flags & cpu_to_le32(UBIFS_MST_NO_ORPHS))
c->no_orphs = 1;
if (old_leb_cnt != c->leb_cnt) {
/* The file system has been resized */
int growth = c->leb_cnt - old_leb_cnt;
if (c->leb_cnt < old_leb_cnt ||
c->leb_cnt < UBIFS_MIN_LEB_CNT) {
ubifs_err("bad leb_cnt on master node");
dbg_dump_node(c, c->mst_node);
return -EINVAL;
}
dbg_mnt("Auto resizing (master) from %d LEBs to %d LEBs",
old_leb_cnt, c->leb_cnt);
c->lst.empty_lebs += growth;
c->lst.total_free += growth * (long long)c->leb_size;
c->lst.total_dark += growth * (long long)c->dark_wm;
/*
* Reflect changes back onto the master node. N.B. the master
* node gets written immediately whenever mounting (or
* remounting) in read-write mode, so we do not need to write it
* here.
*/
c->mst_node->leb_cnt = cpu_to_le32(c->leb_cnt);
c->mst_node->empty_lebs = cpu_to_le32(c->lst.empty_lebs);
c->mst_node->total_free = cpu_to_le64(c->lst.total_free);
c->mst_node->total_dark = cpu_to_le64(c->lst.total_dark);
}
err = validate_master(c);
if (err)
return err;
err = dbg_old_index_check_init(c, &c->zroot);
return err;
}
/**
* ubifs_write_master - write master node.
* @c: UBIFS file-system description object
*
* This function writes the master node. The caller has to take the
* @c->mst_mutex lock before calling this function. Returns zero in case of
* success and a negative error code in case of failure. The master node is
* written twice to enable recovery.
*/
int ubifs_write_master(struct ubifs_info *c)
{
int err, lnum, offs, len;
if (c->ro_media)
return -EINVAL;
lnum = UBIFS_MST_LNUM;
offs = c->mst_offs + c->mst_node_alsz;
len = UBIFS_MST_NODE_SZ;
if (offs + UBIFS_MST_NODE_SZ > c->leb_size) {
err = ubifs_leb_unmap(c, lnum);
if (err)
return err;
offs = 0;
}
c->mst_offs = offs;
c->mst_node->highest_inum = cpu_to_le64(c->highest_inum);
err = ubifs_write_node(c, c->mst_node, len, lnum, offs, UBI_SHORTTERM);
if (err)
return err;
lnum += 1;
if (offs == 0) {
err = ubifs_leb_unmap(c, lnum);
if (err)
return err;
}
err = ubifs_write_node(c, c->mst_node, len, lnum, offs, UBI_SHORTTERM);
return err;
}

342
fs/ubifs/misc.h Normal file
View file

@ -0,0 +1,342 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Authors: Artem Bityutskiy (Битюцкий Артём)
* Adrian Hunter
*/
/*
* This file contains miscellaneous helper functions.
*/
#ifndef __UBIFS_MISC_H__
#define __UBIFS_MISC_H__
/**
* ubifs_zn_dirty - check if znode is dirty.
* @znode: znode to check
*
* This helper function returns %1 if @znode is dirty and %0 otherwise.
*/
static inline int ubifs_zn_dirty(const struct ubifs_znode *znode)
{
return !!test_bit(DIRTY_ZNODE, &znode->flags);
}
/**
* ubifs_wake_up_bgt - wake up background thread.
* @c: UBIFS file-system description object
*/
static inline void ubifs_wake_up_bgt(struct ubifs_info *c)
{
if (c->bgt && !c->need_bgt) {
c->need_bgt = 1;
wake_up_process(c->bgt);
}
}
/**
* ubifs_tnc_find_child - find next child in znode.
* @znode: znode to search at
* @start: the zbranch index to start at
*
* This helper function looks for znode child starting at index @start. Returns
* the child or %NULL if no children were found.
*/
static inline struct ubifs_znode *
ubifs_tnc_find_child(struct ubifs_znode *znode, int start)
{
while (start < znode->child_cnt) {
if (znode->zbranch[start].znode)
return znode->zbranch[start].znode;
start += 1;
}
return NULL;
}
/**
* ubifs_inode - get UBIFS inode information by VFS 'struct inode' object.
* @inode: the VFS 'struct inode' pointer
*/
static inline struct ubifs_inode *ubifs_inode(const struct inode *inode)
{
return container_of(inode, struct ubifs_inode, vfs_inode);
}
/**
* ubifs_ro_mode - switch UBIFS to read read-only mode.
* @c: UBIFS file-system description object
* @err: error code which is the reason of switching to R/O mode
*/
static inline void ubifs_ro_mode(struct ubifs_info *c, int err)
{
if (!c->ro_media) {
c->ro_media = 1;
ubifs_warn("switched to read-only mode, error %d", err);
dbg_dump_stack();
}
}
/**
* ubifs_compr_present - check if compressor was compiled in.
* @compr_type: compressor type to check
*
* This function returns %1 of compressor of type @compr_type is present, and
* %0 if not.
*/
static inline int ubifs_compr_present(int compr_type)
{
ubifs_assert(compr_type >= 0 && compr_type < UBIFS_COMPR_TYPES_CNT);
return !!ubifs_compressors[compr_type]->capi_name;
}
/**
* ubifs_compr_name - get compressor name string by its type.
* @compr_type: compressor type
*
* This function returns compressor type string.
*/
static inline const char *ubifs_compr_name(int compr_type)
{
ubifs_assert(compr_type >= 0 && compr_type < UBIFS_COMPR_TYPES_CNT);
return ubifs_compressors[compr_type]->name;
}
/**
* ubifs_wbuf_sync - synchronize write-buffer.
* @wbuf: write-buffer to synchronize
*
* This is the same as as 'ubifs_wbuf_sync_nolock()' but it does not assume
* that the write-buffer is already locked.
*/
static inline int ubifs_wbuf_sync(struct ubifs_wbuf *wbuf)
{
int err;
mutex_lock_nested(&wbuf->io_mutex, wbuf->jhead);
err = ubifs_wbuf_sync_nolock(wbuf);
mutex_unlock(&wbuf->io_mutex);
return err;
}
/**
* ubifs_leb_unmap - unmap an LEB.
* @c: UBIFS file-system description object
* @lnum: LEB number to unmap
*
* This function returns %0 on success and a negative error code on failure.
*/
static inline int ubifs_leb_unmap(const struct ubifs_info *c, int lnum)
{
int err;
if (c->ro_media)
return -EROFS;
err = ubi_leb_unmap(c->ubi, lnum);
if (err) {
ubifs_err("unmap LEB %d failed, error %d", lnum, err);
return err;
}
return 0;
}
/**
* ubifs_leb_write - write to a LEB.
* @c: UBIFS file-system description object
* @lnum: LEB number to write
* @buf: buffer to write from
* @offs: offset within LEB to write to
* @len: length to write
* @dtype: data type
*
* This function returns %0 on success and a negative error code on failure.
*/
static inline int ubifs_leb_write(const struct ubifs_info *c, int lnum,
const void *buf, int offs, int len, int dtype)
{
int err;
if (c->ro_media)
return -EROFS;
err = ubi_leb_write(c->ubi, lnum, buf, offs, len, dtype);
if (err) {
ubifs_err("writing %d bytes at %d:%d, error %d",
len, lnum, offs, err);
return err;
}
return 0;
}
/**
* ubifs_leb_change - atomic LEB change.
* @c: UBIFS file-system description object
* @lnum: LEB number to write
* @buf: buffer to write from
* @len: length to write
* @dtype: data type
*
* This function returns %0 on success and a negative error code on failure.
*/
static inline int ubifs_leb_change(const struct ubifs_info *c, int lnum,
const void *buf, int len, int dtype)
{
int err;
if (c->ro_media)
return -EROFS;
err = ubi_leb_change(c->ubi, lnum, buf, len, dtype);
if (err) {
ubifs_err("changing %d bytes in LEB %d, error %d",
len, lnum, err);
return err;
}
return 0;
}
/**
* ubifs_encode_dev - encode device node IDs.
* @dev: UBIFS device node information
* @rdev: device IDs to encode
*
* This is a helper function which encodes major/minor numbers of a device node
* into UBIFS device node description. We use standard Linux "new" and "huge"
* encodings.
*/
static inline int ubifs_encode_dev(union ubifs_dev_desc *dev, dev_t rdev)
{
if (new_valid_dev(rdev)) {
dev->new = cpu_to_le32(new_encode_dev(rdev));
return sizeof(dev->new);
} else {
dev->huge = cpu_to_le64(huge_encode_dev(rdev));
return sizeof(dev->huge);
}
}
/**
* ubifs_add_dirt - add dirty space to LEB properties.
* @c: the UBIFS file-system description object
* @lnum: LEB to add dirty space for
* @dirty: dirty space to add
*
* This is a helper function which increased amount of dirty LEB space. Returns
* zero in case of success and a negative error code in case of failure.
*/
static inline int ubifs_add_dirt(struct ubifs_info *c, int lnum, int dirty)
{
return ubifs_update_one_lp(c, lnum, LPROPS_NC, dirty, 0, 0);
}
/**
* ubifs_return_leb - return LEB to lprops.
* @c: the UBIFS file-system description object
* @lnum: LEB to return
*
* This helper function cleans the "taken" flag of a logical eraseblock in the
* lprops. Returns zero in case of success and a negative error code in case of
* failure.
*/
static inline int ubifs_return_leb(struct ubifs_info *c, int lnum)
{
return ubifs_change_one_lp(c, lnum, LPROPS_NC, LPROPS_NC, 0,
LPROPS_TAKEN, 0);
}
/**
* ubifs_idx_node_sz - return index node size.
* @c: the UBIFS file-system description object
* @child_cnt: number of children of this index node
*/
static inline int ubifs_idx_node_sz(const struct ubifs_info *c, int child_cnt)
{
return UBIFS_IDX_NODE_SZ + (UBIFS_BRANCH_SZ + c->key_len) * child_cnt;
}
/**
* ubifs_idx_branch - return pointer to an index branch.
* @c: the UBIFS file-system description object
* @idx: index node
* @bnum: branch number
*/
static inline
struct ubifs_branch *ubifs_idx_branch(const struct ubifs_info *c,
const struct ubifs_idx_node *idx,
int bnum)
{
return (struct ubifs_branch *)((void *)idx->branches +
(UBIFS_BRANCH_SZ + c->key_len) * bnum);
}
/**
* ubifs_idx_key - return pointer to an index key.
* @c: the UBIFS file-system description object
* @idx: index node
*/
static inline void *ubifs_idx_key(const struct ubifs_info *c,
const struct ubifs_idx_node *idx)
{
return (void *)((struct ubifs_branch *)idx->branches)->key;
}
/**
* ubifs_reported_space - calculate reported free space.
* @c: the UBIFS file-system description object
* @free: amount of free space
*
* This function calculates amount of free space which will be reported to
* user-space. User-space application tend to expect that if the file-system
* (e.g., via the 'statfs()' call) reports that it has N bytes available, they
* are able to write a file of size N. UBIFS attaches node headers to each data
* node and it has to write indexind nodes as well. This introduces additional
* overhead, and UBIFS it has to report sligtly less free space to meet the
* above expectetion.
*
* This function assumes free space is made up of uncompressed data nodes and
* full index nodes (one per data node, doubled because we always allow enough
* space to write the index twice).
*
* Note, the calculation is pessimistic, which means that most of the time
* UBIFS reports less space than it actually has.
*/
static inline long long ubifs_reported_space(const struct ubifs_info *c,
uint64_t free)
{
int divisor, factor;
divisor = UBIFS_MAX_DATA_NODE_SZ + (c->max_idx_node_sz << 1);
factor = UBIFS_MAX_DATA_NODE_SZ - UBIFS_DATA_NODE_SZ;
do_div(free, divisor);
return free * factor;
}
/**
* ubifs_current_time - round current time to time granularity.
* @inode: inode
*/
static inline struct timespec ubifs_current_time(struct inode *inode)
{
return (inode->i_sb->s_time_gran < NSEC_PER_SEC) ?
current_fs_time(inode->i_sb) : CURRENT_TIME_SEC;
}
#endif /* __UBIFS_MISC_H__ */

958
fs/ubifs/orphan.c Normal file
View file

@ -0,0 +1,958 @@
/*
* This file is part of UBIFS.
*
* Copyright (C) 2006-2008 Nokia Corporation.
*
* This program is free software; you can redistribute it and/or modify it
* under the terms of the GNU General Public License version 2 as published by
* the Free Software Foundation.
*
* This program is distributed in the hope that it will be useful, but WITHOUT
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
*
* You should have received a copy of the GNU General Public License along with
* this program; if not, write to the Free Software Foundation, Inc., 51
* Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*
* Author: Adrian Hunter
*/
#include "ubifs.h"
/*
* An orphan is an inode number whose inode node has been committed to the index
* with a link count of zero. That happens when an open file is deleted
* (unlinked) and then a commit is run. In the normal course of events the inode
* would be deleted when the file is closed. However in the case of an unclean
* unmount, orphans need to be accounted for. After an unclean unmount, the
* orphans' inodes must be deleted which means either scanning the entire index
* looking for them, or keeping a list on flash somewhere. This unit implements
* the latter approach.
*
* The orphan area is a fixed number of LEBs situated between the LPT area and
* the main area. The number of orphan area LEBs is specified when the file
* system is created. The minimum number is 1. The size of the orphan area
* should be so that it can hold the maximum number of orphans that are expected
* to ever exist at one time.
*
* The number of orphans that can fit in a LEB is:
*
* (c->leb_size - UBIFS_ORPH_NODE_SZ) / sizeof(__le64)
*
* For example: a 15872 byte LEB can fit 1980 orphans so 1 LEB may be enough.
*
* Orphans are accumulated in a rb-tree. When an inode's link count drops to
* zero, the inode number is added to the rb-tree. It is removed from the tree
* when the inode is deleted. Any new orphans that are in the orphan tree when
* the commit is run, are written to the orphan area in 1 or more orph nodes.
* If the orphan area is full, it is consolidated to make space. There is
* always enough space because validation prevents the user from creating more
* than the maximum number of orphans allowed.
*/
#ifdef CONFIG_UBIFS_FS_DEBUG
static int dbg_check_orphans(struct ubifs_info *c);
#else
#define dbg_check_orphans(c) 0
#endif
/**
* ubifs_add_orphan - add an orphan.
* @c: UBIFS file-system description object
* @inum: orphan inode number
*
* Add an orphan. This function is called when an inodes link count drops to
* zero.
*/
int ubifs_add_orphan(struct ubifs_info *c, ino_t inum)
{
struct ubifs_orphan *orphan, *o;
struct rb_node **p, *parent = NULL;
orphan = kzalloc(sizeof(struct ubifs_orphan), GFP_NOFS);
if (!orphan)
return -ENOMEM;
orphan->inum = inum;
orphan->new = 1;
spin_lock(&c->orphan_lock);
if (c->tot_orphans >= c->max_orphans) {
spin_unlock(&c->orphan_lock);
kfree(orphan);
return -ENFILE;
}
p = &c->orph_tree.rb_node;
while (*p) {
parent = *p;