This version requires that host and guest have the same PAE status.
NX cap is not offered to the guest, yet.
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The "len" field in the used ring for virtio indicates the number of
bytes *written* to the buffer. This means the guest doesn't have to
zero the buffers in advance as it always knows the used length.
Erroneously, the console and network example code puts the length
*read* into that field. The guest ignores it, but it's wrong.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
18 months ago 5bbf89fc26 changed to loading
bzImages directly, and no longer manually ungzipping them, so we no longer
need libz.
Also, -m32 is useful for those on 64-bit platforms (and harmless on
32-bit).
Reported-by: Ron Minnich <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2088761152 (lguest: notify on empty) introduced
lguest support for the VIRTIO_F_NOTIFY_ON_EMPTY flag, but in fact it turned on
interrupts all the time.
Because we always process one buffer at a time, the inflight count is always 0
when call trigger_irq and so we always ignore VRING_AVAIL_F_NO_INTERRUPT from
the Guest.
It should be looking to see if there are more buffers in the Guest's queue:
if it's empty, then we force an interrupt.
This makes little difference, since we usually have an empty queue; but
that's the subject of another patch.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since the Launcher process runs the Guest, it doesn't have to be very
serious about its barriers: the Guest isn't running while we are (Guest
is UP).
Before we change to use threads to service devices, we need to fix this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We hand the /dev/lguest fd everywhere; it's far neater to just make it
a global (it already is, in fact, hidden in the waker_fds struct).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We can't trust the values in the device descriptor table once the
guest has booted, so keep local copies. They could set them to
strange values then cause us to segv (they're 8 bit values, so they
can't make our pointers go too wild).
This becomes more important with the following patches which read them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Robert noted that we don't actually document that lguest is 32-bit only,
nor that PAE must be off (CONFIG_PAE is now prompted for if HIGHMEM is
set to "off).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: lguest@ozlabs.org
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
Impact: barrier correctness in example launcher
I doubt either lguest user will complain about performance.
Reported-by: Christoph Hellwig <hch@infradead.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Makes all the warnings go away when compiling lguest on Ubuntu on
Intrepid or greater.
Signed-off-by: Timothy R Ansell <mithro@mithis.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch moves the initial guest page table creation code to the host,
so the launcher keeps working with PAE enabled configs.
Signed-off-by: Matias Zabaljauregui <zabaljauregui@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This doesn't really matter, since lguest is i386 only at the moment,
but we could actually choose a different value. (lguest doesn't have
a guarenteed ABI).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The Documentation/i386 and Documentation/x86_64 directories and their
contents have been moved into Documentation/x86. Fix references to
those files accordingly.
Signed-off-by: Uwe Hermann <uwe@hermann-uwe.de>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This shows up when trying to bridge:
tap0: received packet with own address as source address
As Max Krasnyansky points out, there's no reason to give the guest the
same mac address as the TUN device.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Max Krasnyansky <maxk@qualcomm.com>
lguest uses a Waker process to break it out of the kernel (ie.
actually running the guest) when file descriptor needs attention.
Changing this from a process to a thread somewhat simplifies things:
it can directly access the fd_set of things to watch. More
importantly, it means that the Waker can see Guest memory correctly,
so /dev/vring file descriptors will work as anticipated (the
alternative is to actually mmap MAP_SHARED, but you can't do that with
/dev/zero).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Since the correct timeout value varies, use a heuristic which adjusts
the timeout depending on how many packets we've seen. This gives
slightly worse results, but doesn't need tweaking when GSO is
introduced.
500 usec 19.1887 xmit 561141 recv 1 timeout 559657
Dynamic (278) 20.1974 xmit 214510 recv 5 timeout 214491 usec 278
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
virtio_ring has the ability to suppress notifications. This prevents
a guest exit for every packet, but we need to set a timer on packet
receipt to re-check if there were any remaining packets.
Here are the times for 1G TCP Guest->Host with different timeout
settings (it matters because the TCP window doesn't grow big enough to
fill the entire buffer):
Timeout value Seconds Xmit/Recv/Timeout
None (before) 25.3784 xmit 7750233 recv 1
2500 usec 62.5119 xmit 207020 recv 2 timeout 207020
1000 usec 34.5379 xmit 207003 recv 2 timeout 207003
750 usec 29.2305 xmit 207002 recv 1 timeout 207002
500 usec 19.1887 xmit 561141 recv 1 timeout 559657
250 usec 20.0465 xmit 214128 recv 2 timeout 214110
100 usec 19.2583 xmit 561621 recv 1 timeout 560153
(Note that these values are sensitive to the GSO patches which come
later, and probably other traffic-related variables, so take with a
large grain of salt).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To simplify the transition to when we publish indices in the ring
(and make shuffling my patch queue easier), wrap them in a lg_last_avail()
macro.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is a simple patch to add support for the virtio "hardware random
generator" to lguest. It gets about 1.2 MB/sec reading from /dev/hwrng
in the guest.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
If you've got a nice DHCP configuration which maps MAC
addresses to specific IP addresses, then you're going to
want to start your guest with one of those MAC addresses.
Also, in Fedora, we have persistent network interface naming
based on the MAC address, so with randomly assigned
addresses you're soon going to hit eth13. Who knows what
will happen then!
Allow assigning a MAC address to the network interface with
e.g.
--tunnet=bridge:eth0:00:FF:95:6B:DA:3D
or:
--tunnet=192.168.121.1:00:FF:95:6B:DA:3D
which is pretty unintelligable, but ...
(includes Rusty's minor rework)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This is the lguest implementation of the VIRTIO_F_NOTIFY_ON_EMPTY feature.
It is currently only published for network devices, but it is turned on for
everyone.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This brings us closer to Real Life, where we'd examine the device
features once it's set the DRIVER_OK status bit.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Ron Minnich points out that a struct containing a char is not always
sizeof(char); simplest to remove the structure to avoid confusion.
Cc: "ron minnich" <rminnich@gmail.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Took some cycles to re-read the Lguest Journey end-to-end, fix some
rot and tighten some phrases.
Only comments change. No new jokes, but a couple of recycled old jokes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Mention the config options for the Virtio drivers and move the Virtualization
menu to the toplevel.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The lguest launcher appends a space to the kernel command line (if kernel
arguments are specified on its command line). This space is unneeded. More
importantly, this appended space will make Red Hat's nash script interpreter
(used in a Fedora style initramfs) add an empty argument to init's command
line. This empty argument will make kernel arguments like "init=/bin/bash"
fail (because the shell will try to execute a script with an empty name).
This could be considered a bug in nash, but is easily fixed in the lguest
launcher too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
A reset function solves three problems:
1) It allows us to renegotiate features, eg. if we want to upgrade a
guest driver without rebooting the guest.
2) It gives us a clean way of shutting down virtqueues: after a reset,
we know that the buffers won't be used by the host, and
3) It helps the guest recover from messed-up drivers.
So we remove the ->shutdown hook, and the only way we now remove
feature bits is via reset.
We leave it to the driver to do the reset before it deletes queues:
the balloon driver, for example, needs to chat to the host in its
remove function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The other side (host) can set the NO_NOTIFY flag as an optimization,
to say "no need to kick me when you add things". Make it clear that
this is advisory only; especially that we should always notify when
the ring is full.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Previously we used a type/len pair within the config space, but this
seems overkill. We now simply define a structure which represents the
layout in the config space: the config space can now only be extended
at the end.
The main driver-visible changes:
1) We indicate what fields are present with an explicit feature bit.
2) Virtqueues are explicitly numbered, and not in the config space.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This patch makes uses of pread() and pwrite() in lguest launcher
to communicate the vcpu id to the lguest driver. The id is kept in
a thread variable, which means we'll span in the future, vcpus as
threads. But right now, only the infrastructure is out there.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Share net is not supported, Rusty is an "idiot" .
Signed-off-by: Sheela Sequeira <sheela.sequeira@gmail.com>
Reviewed-by: James Morris <jmorris@namei.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The virtio descriptor rings of size N-1 were nicely set up to be
aligned to an N-byte boundary. But as Anthony Liguori points out, the
free-running indices used by virtio require that the sizes be a power
of 2, otherwise we get problems on wrap (demonstrated with lguest).
So we replace the clever "2^n-1" scheme with a simple "align to page
boundary" scheme: this means that all virtio rings take at least two
pages, but it's safer than guessing cache alignment.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This seems like an obvious typo but it's worked in the past because the virtio
blk frontend just ignores the length field on completion.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Went through the documentation doing typo and content fixes. This
patch contains only comment and whitespace changes.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Now the kernel headers are clean for userspace export, we don't need
to typedef kernel types before including them. We also don't need
pci_ids.h (that was from an earlier virtio draft).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>