aha/drivers
Ed L. Cashin 9bb237b6a6 aoe: dynamically allocate a capped number of skbs when necessary
What this Patch Does

  Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
  driver was reusing a small set of skbs that were allocated once and
  were only used for outbound AoE commands.

  The network layer cannot be allowed to put_page on the data that is
  still associated with a bio we haven't returned to the block layer,
  so the aoe driver (even before the patch under discussion) is still
  the owner of skbs that have been handed to the network layer for
  transmission.  We need to keep track of these skbs so that we can
  free them, but by tracking them, we can also easily re-use them.

  The new patch was a response to the behavior of certain network
  drivers.  We cannot reuse an skb that the network driver still has
  in its transmit ring.  Network drivers can defer transmit ring
  cleanup and then use the state in the skb to determine how many data
  segments to clean up in its transmit ring.  The tg3 driver is one
  driver that behaves in this way.

  When the network driver defers cleanup of its transmit ring, the aoe
  driver can find itself in a situation where it would like to send an
  AoE command, and the AoE target is ready for more work, but the
  network driver still has all of the pre-allocated skbs.  In that
  case, the new patch just calls alloc_skb, as you'd expect.

  We don't want to get carried away, though.  We try not to do
  excessive allocation in the write path, so we cap the number of skbs
  we dynamically allocate.

  Probably calling it a "dynamic pool" is misleading.  We were already
  trying to use a small fixed-size set of pre-allocated skbs before
  this patch, and this patch just provides a little headroom (with a
  ceiling, though) to accomodate network drivers that hang onto skbs,
  by allocating when needed.  The d->skbpool_hd list of allocated skbs
  is necessary so that we can free them later.

  We didn't notice the need for this headroom until AoE targets got
  fast enough.

Alternatives

  If the network layer never did a put_page on the pages in the bio's
  we get from the block layer, then it would be possible for us to
  hand skbs to the network layer and forget about them, allowing the
  network layer to free skbs itself (and thereby calling our own
  skb->destructor callback function if we needed that).  In that case
  we could get rid of the pre-allocated skbs and also the
  d->skbpool_hd, instead just calling alloc_skb every time we wanted
  to transmit a packet.  The slab allocator would effectively maintain
  the list of skbs.

  Besides a loss of CPU cache locality, the main concern with that
  approach the danger that it would increase the likelihood of
  deadlock when VM is trying to free pages by writing dirty data from
  the page cache through the aoe driver out to persistent storage on
  an AoE device.  Right now we have a situation where we have
  pre-allocation that corresponds to how much we use, which seems
  ideal.

  Of course, there's still the separate issue of receiving the packets
  that tell us that a write has successfully completed on the AoE
  target.  When memory is low and VM is using AoE to flush dirty data
  to free up pages, it would be perfect if there were a way for us to
  register a fast callback that could recognize write command
  completion responses.  But I don't think the current problems with
  the receive side of the situation are a justification for
  exacerbating the problem on the transmit side.

Signed-off-by: Ed L. Cashin <ecashin@coraid.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-08 09:22:32 -08:00
..
acorn/char
acpi ACPI: fix build warning 2008-02-07 04:24:01 -05:00
amba
ata ata_piix.c:piix_init_one() must be __devinit 2008-02-06 07:01:56 -05:00
atm
auxdisplay
base Driver core: Revert "Fix Firmware class name collision" 2008-02-07 11:31:46 -08:00
block aoe: dynamically allocate a capped number of skbs when necessary 2008-02-08 09:22:32 -08:00
bluetooth Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 2008-02-05 10:09:07 -08:00
cdrom Merge branch 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc 2008-02-07 09:02:26 -08:00
char tty_ioctl: drag screaming into compliance with the coding style 2008-02-08 09:22:25 -08:00
clocksource
connector
cpufreq [CPUFREQ] fix configuration help message 2008-02-06 22:57:58 -05:00
cpuidle Revert "cpuidle: build fix for non-x86" 2008-02-07 04:16:34 -05:00
crypto
dca
dio dio: ARRAY_SIZE() cleanup 2008-02-05 09:44:23 -08:00
dma async_tx: replace 'int_en' with operation preparation flags 2008-02-06 10:12:18 -07:00
edac drivers/edac/i3000: document type promotion 2008-02-07 08:42:23 -08:00
eisa
firewire
firmware dmi: Let drivers walk the DMI table 2008-02-07 20:39:40 -05:00
gpio gpio: handle pca953{4,5,6,7,8} too 2008-02-06 10:41:15 -08:00
hid
hwmon hwmon: (lm80) Add individual alarm files 2008-02-07 20:39:45 -05:00
i2c hwmon: Discard useless I2C driver IDs 2008-02-07 20:39:44 -05:00
ide aout: remove unnecessary inclusions of {asm, linux}/a.out.h 2008-02-08 09:22:30 -08:00
ieee1394 ieee1394: sbp2: fix bogus s/g access change 2008-02-02 13:48:16 +01:00
infiniband RDMA/nes: Add a driver for NetEffect RNICs 2008-02-04 20:20:45 -08:00
input mn10300: add the MN10300/AM33 architecture to the kernel 2008-02-08 09:22:30 -08:00
isdn drivers/isdn/hardware/eicon/debug.c: fix uninitialized var warning 2008-02-06 10:41:12 -08:00
leds leds: Add HP Jornada 6xx driver 2008-02-07 10:10:28 +00:00
lguest virtio: reset function 2008-02-04 23:50:03 +11:00
macintosh ppc: fix #ifdef-s in mediabay driver (take 2) 2008-02-06 02:57:50 +01:00
mca
md dm raid1: report fault status 2008-02-08 02:11:39 +00:00
media Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input 2008-02-07 12:57:44 -08:00
message drivers/message/: Spelling fixes 2008-02-03 17:21:01 +02:00
mfd ASIC3 driver 2008-02-07 08:42:23 -08:00
misc [SCSI] enclosure: add support for enclosure services 2008-02-07 18:04:10 -06:00
mmc
mtd Merge git://git.infradead.org/mtd-2.6 2008-02-07 10:20:31 -08:00
net mn10300: add the MN10300/AM33 architecture to the kernel 2008-02-08 09:22:30 -08:00
nubus nubus: kill drivers/nubus/nubus_syms.c 2008-02-05 09:44:23 -08:00
of [POWERPC] Create and hook up of_platform_device_shutdown 2008-02-06 16:29:59 +11:00
oprofile
parisc iommu sg merging: parisc: make iommu respect the segment size limits 2008-02-05 09:44:10 -08:00
parport mn10300: add the MN10300/AM33 architecture to the kernel 2008-02-08 09:22:30 -08:00
pci mn10300: add the MN10300/AM33 architecture to the kernel 2008-02-08 09:22:30 -08:00
pcmcia drivers/pcmcia: add missing pci_dev_get 2008-02-05 09:44:09 -08:00
pnp Merge branches 'release', 'bugzilla-6217', 'bugzilla-6629', 'bugzilla-6933', 'bugzilla-7186', 'bugzilla-8269', 'bugzilla-8570', 'bugzilla-9139', 'bugzilla-9277', 'bugzilla-9341', 'bugzilla-9444', 'bugzilla-9614', 'bugzilla-9643' and 'bugzilla-9644' into release 2008-02-07 03:09:43 -05:00
power Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 2008-02-07 09:45:58 -08:00
ps3 ps3: use symbolic names for video modes 2008-02-06 10:41:17 -08:00
rapidio
rtc rtc: at91sam9 RTC support (RTT and/or RTC) 2008-02-06 10:41:14 -08:00
s390 calibrate_delay() must be __cpuinit 2008-02-06 10:41:08 -08:00
sbus
scsi Convert SG from nopage to fault. 2008-02-07 19:09:22 -08:00
serial serial_core: bring mostly into line with coding style 2008-02-08 09:22:25 -08:00
sh
sn
spi spi: remove more dev->power.power_state usage 2008-02-06 10:41:11 -08:00
ssb drivers/ssb/: Spelling fixes 2008-02-03 17:30:25 +02:00
tc
telephony
thermal the generic thermal sysfs driver 2008-02-01 23:12:19 -05:00
uio uio: nopage 2008-02-06 10:41:07 -08:00
usb usb: net2280 can't have a function called show_registers() 2008-02-08 09:22:30 -08:00
video mn10300: add the MN10300/AM33 architecture to the kernel 2008-02-08 09:22:30 -08:00
virtio virtio: add missing #include <linux/delay.h> 2008-02-06 10:41:21 -08:00
w1 DS1WM: decouple host IRQ and INTR active state settings 2008-02-07 08:42:06 -08:00
watchdog drivers/watchdog/: Spelling fixes 2008-02-03 17:32:52 +02:00
xen
zorro
Kconfig Merge branches 'release' and 'menlo' into release 2008-02-07 03:18:04 -05:00
Makefile Merge branches 'release' and 'menlo' into release 2008-02-07 03:18:04 -05:00