Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
This commit is contained in:
@@ -229,6 +229,6 @@ KernelVersion: 4.1
|
|||||||
Contact: linux-mtd@lists.infradead.org
|
Contact: linux-mtd@lists.infradead.org
|
||||||
Description:
|
Description:
|
||||||
For a partition, the offset of that partition from the start
|
For a partition, the offset of that partition from the start
|
||||||
of the master device in bytes. This attribute is absent on
|
of the parent (another partition or a flash device) in bytes.
|
||||||
main devices, so it can be used to distinguish between
|
This attribute is absent on flash devices, so it can be used
|
||||||
partitions and devices that aren't partitions.
|
to distinguish them from partitions.
|
||||||
|
|||||||
@@ -1,22 +1,24 @@
|
|||||||
|
=========================
|
||||||
Dynamic DMA mapping Guide
|
Dynamic DMA mapping Guide
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
David S. Miller <davem@redhat.com>
|
:Author: David S. Miller <davem@redhat.com>
|
||||||
Richard Henderson <rth@cygnus.com>
|
:Author: Richard Henderson <rth@cygnus.com>
|
||||||
Jakub Jelinek <jakub@redhat.com>
|
:Author: Jakub Jelinek <jakub@redhat.com>
|
||||||
|
|
||||||
This is a guide to device driver writers on how to use the DMA API
|
This is a guide to device driver writers on how to use the DMA API
|
||||||
with example pseudo-code. For a concise description of the API, see
|
with example pseudo-code. For a concise description of the API, see
|
||||||
DMA-API.txt.
|
DMA-API.txt.
|
||||||
|
|
||||||
CPU and DMA addresses
|
CPU and DMA addresses
|
||||||
|
=====================
|
||||||
|
|
||||||
There are several kinds of addresses involved in the DMA API, and it's
|
There are several kinds of addresses involved in the DMA API, and it's
|
||||||
important to understand the differences.
|
important to understand the differences.
|
||||||
|
|
||||||
The kernel normally uses virtual addresses. Any address returned by
|
The kernel normally uses virtual addresses. Any address returned by
|
||||||
kmalloc(), vmalloc(), and similar interfaces is a virtual address and can
|
kmalloc(), vmalloc(), and similar interfaces is a virtual address and can
|
||||||
be stored in a "void *".
|
be stored in a ``void *``.
|
||||||
|
|
||||||
The virtual memory system (TLB, page tables, etc.) translates virtual
|
The virtual memory system (TLB, page tables, etc.) translates virtual
|
||||||
addresses to CPU physical addresses, which are stored as "phys_addr_t" or
|
addresses to CPU physical addresses, which are stored as "phys_addr_t" or
|
||||||
@@ -37,7 +39,7 @@ be restricted to a subset of that space. For example, even if a system
|
|||||||
supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU
|
supports 64-bit addresses for main memory and PCI BARs, it may use an IOMMU
|
||||||
so devices only need to use 32-bit DMA addresses.
|
so devices only need to use 32-bit DMA addresses.
|
||||||
|
|
||||||
Here's a picture and some examples:
|
Here's a picture and some examples::
|
||||||
|
|
||||||
CPU CPU Bus
|
CPU CPU Bus
|
||||||
Virtual Physical Address
|
Virtual Physical Address
|
||||||
@@ -98,7 +100,7 @@ microprocessor architecture. You should use the DMA API rather than the
|
|||||||
bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the
|
bus-specific DMA API, i.e., use the dma_map_*() interfaces rather than the
|
||||||
pci_map_*() interfaces.
|
pci_map_*() interfaces.
|
||||||
|
|
||||||
First of all, you should make sure
|
First of all, you should make sure::
|
||||||
|
|
||||||
#include <linux/dma-mapping.h>
|
#include <linux/dma-mapping.h>
|
||||||
|
|
||||||
@@ -107,6 +109,7 @@ can hold any valid DMA address for the platform and should be used
|
|||||||
everywhere you hold a DMA address returned from the DMA mapping functions.
|
everywhere you hold a DMA address returned from the DMA mapping functions.
|
||||||
|
|
||||||
What memory is DMA'able?
|
What memory is DMA'able?
|
||||||
|
========================
|
||||||
|
|
||||||
The first piece of information you must know is what kernel memory can
|
The first piece of information you must know is what kernel memory can
|
||||||
be used with the DMA mapping facilities. There has been an unwritten
|
be used with the DMA mapping facilities. There has been an unwritten
|
||||||
@@ -144,6 +147,7 @@ networking subsystems make sure that the buffers they use are valid
|
|||||||
for you to DMA from/to.
|
for you to DMA from/to.
|
||||||
|
|
||||||
DMA addressing limitations
|
DMA addressing limitations
|
||||||
|
==========================
|
||||||
|
|
||||||
Does your device have any DMA addressing limitations? For example, is
|
Does your device have any DMA addressing limitations? For example, is
|
||||||
your device only capable of driving the low order 24-bits of address?
|
your device only capable of driving the low order 24-bits of address?
|
||||||
@@ -166,7 +170,7 @@ style to do this even if your device holds the default setting,
|
|||||||
because this shows that you did think about these issues wrt. your
|
because this shows that you did think about these issues wrt. your
|
||||||
device.
|
device.
|
||||||
|
|
||||||
The query is performed via a call to dma_set_mask_and_coherent():
|
The query is performed via a call to dma_set_mask_and_coherent()::
|
||||||
|
|
||||||
int dma_set_mask_and_coherent(struct device *dev, u64 mask);
|
int dma_set_mask_and_coherent(struct device *dev, u64 mask);
|
||||||
|
|
||||||
@@ -175,12 +179,12 @@ If you have some special requirements, then the following two separate
|
|||||||
queries can be used instead:
|
queries can be used instead:
|
||||||
|
|
||||||
The query for streaming mappings is performed via a call to
|
The query for streaming mappings is performed via a call to
|
||||||
dma_set_mask():
|
dma_set_mask()::
|
||||||
|
|
||||||
int dma_set_mask(struct device *dev, u64 mask);
|
int dma_set_mask(struct device *dev, u64 mask);
|
||||||
|
|
||||||
The query for consistent allocations is performed via a call
|
The query for consistent allocations is performed via a call
|
||||||
to dma_set_coherent_mask():
|
to dma_set_coherent_mask()::
|
||||||
|
|
||||||
int dma_set_coherent_mask(struct device *dev, u64 mask);
|
int dma_set_coherent_mask(struct device *dev, u64 mask);
|
||||||
|
|
||||||
@@ -209,7 +213,7 @@ of your driver reports that performance is bad or that the device is not
|
|||||||
even detected, you can ask them for the kernel messages to find out
|
even detected, you can ask them for the kernel messages to find out
|
||||||
exactly why.
|
exactly why.
|
||||||
|
|
||||||
The standard 32-bit addressing device would do something like this:
|
The standard 32-bit addressing device would do something like this::
|
||||||
|
|
||||||
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
|
if (dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))) {
|
||||||
dev_warn(dev, "mydev: No suitable DMA available\n");
|
dev_warn(dev, "mydev: No suitable DMA available\n");
|
||||||
@@ -225,7 +229,7 @@ than 64-bit addressing. For example, Sparc64 PCI SAC addressing is
|
|||||||
more efficient than DAC addressing.
|
more efficient than DAC addressing.
|
||||||
|
|
||||||
Here is how you would handle a 64-bit capable device which can drive
|
Here is how you would handle a 64-bit capable device which can drive
|
||||||
all 64-bits when accessing streaming DMA:
|
all 64-bits when accessing streaming DMA::
|
||||||
|
|
||||||
int using_dac;
|
int using_dac;
|
||||||
|
|
||||||
@@ -239,7 +243,7 @@ all 64-bits when accessing streaming DMA:
|
|||||||
}
|
}
|
||||||
|
|
||||||
If a card is capable of using 64-bit consistent allocations as well,
|
If a card is capable of using 64-bit consistent allocations as well,
|
||||||
the case would look like this:
|
the case would look like this::
|
||||||
|
|
||||||
int using_dac, consistent_using_dac;
|
int using_dac, consistent_using_dac;
|
||||||
|
|
||||||
@@ -260,7 +264,7 @@ uses consistent allocations, one would have to check the return value from
|
|||||||
dma_set_coherent_mask().
|
dma_set_coherent_mask().
|
||||||
|
|
||||||
Finally, if your device can only drive the low 24-bits of
|
Finally, if your device can only drive the low 24-bits of
|
||||||
address you might do something like:
|
address you might do something like::
|
||||||
|
|
||||||
if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
|
if (dma_set_mask(dev, DMA_BIT_MASK(24))) {
|
||||||
dev_warn(dev, "mydev: 24-bit DMA addressing not available\n");
|
dev_warn(dev, "mydev: 24-bit DMA addressing not available\n");
|
||||||
@@ -280,7 +284,7 @@ only provide the functionality which the machine can handle. It
|
|||||||
is important that the last call to dma_set_mask() be for the
|
is important that the last call to dma_set_mask() be for the
|
||||||
most specific mask.
|
most specific mask.
|
||||||
|
|
||||||
Here is pseudo-code showing how this might be done:
|
Here is pseudo-code showing how this might be done::
|
||||||
|
|
||||||
#define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32)
|
#define PLAYBACK_ADDRESS_BITS DMA_BIT_MASK(32)
|
||||||
#define RECORD_ADDRESS_BITS DMA_BIT_MASK(24)
|
#define RECORD_ADDRESS_BITS DMA_BIT_MASK(24)
|
||||||
@@ -309,6 +313,7 @@ devices seems to be littered with ISA chips given a PCI front end,
|
|||||||
and thus retaining the 16MB DMA addressing limitations of ISA.
|
and thus retaining the 16MB DMA addressing limitations of ISA.
|
||||||
|
|
||||||
Types of DMA mappings
|
Types of DMA mappings
|
||||||
|
=====================
|
||||||
|
|
||||||
There are two types of DMA mappings:
|
There are two types of DMA mappings:
|
||||||
|
|
||||||
@@ -336,12 +341,14 @@ There are two types of DMA mappings:
|
|||||||
to memory is immediately visible to the device, and vice
|
to memory is immediately visible to the device, and vice
|
||||||
versa. Consistent mappings guarantee this.
|
versa. Consistent mappings guarantee this.
|
||||||
|
|
||||||
IMPORTANT: Consistent DMA memory does not preclude the usage of
|
.. important::
|
||||||
|
|
||||||
|
Consistent DMA memory does not preclude the usage of
|
||||||
proper memory barriers. The CPU may reorder stores to
|
proper memory barriers. The CPU may reorder stores to
|
||||||
consistent memory just as it may normal memory. Example:
|
consistent memory just as it may normal memory. Example:
|
||||||
if it is important for the device to see the first word
|
if it is important for the device to see the first word
|
||||||
of a descriptor updated before the second, you must do
|
of a descriptor updated before the second, you must do
|
||||||
something like:
|
something like::
|
||||||
|
|
||||||
desc->word0 = address;
|
desc->word0 = address;
|
||||||
wmb();
|
wmb();
|
||||||
@@ -377,16 +384,17 @@ Also, systems with caches that aren't DMA-coherent will work better
|
|||||||
when the underlying buffers don't share cache lines with other data.
|
when the underlying buffers don't share cache lines with other data.
|
||||||
|
|
||||||
|
|
||||||
Using Consistent DMA mappings.
|
Using Consistent DMA mappings
|
||||||
|
=============================
|
||||||
|
|
||||||
To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
|
To allocate and map large (PAGE_SIZE or so) consistent DMA regions,
|
||||||
you should do:
|
you should do::
|
||||||
|
|
||||||
dma_addr_t dma_handle;
|
dma_addr_t dma_handle;
|
||||||
|
|
||||||
cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
|
cpu_addr = dma_alloc_coherent(dev, size, &dma_handle, gfp);
|
||||||
|
|
||||||
where device is a struct device *. This may be called in interrupt
|
where device is a ``struct device *``. This may be called in interrupt
|
||||||
context with the GFP_ATOMIC flag.
|
context with the GFP_ATOMIC flag.
|
||||||
|
|
||||||
Size is the length of the region you want to allocate, in bytes.
|
Size is the length of the region you want to allocate, in bytes.
|
||||||
@@ -415,7 +423,7 @@ exists (for example) to guarantee that if you allocate a chunk
|
|||||||
which is smaller than or equal to 64 kilobytes, the extent of the
|
which is smaller than or equal to 64 kilobytes, the extent of the
|
||||||
buffer you receive will not cross a 64K boundary.
|
buffer you receive will not cross a 64K boundary.
|
||||||
|
|
||||||
To unmap and free such a DMA region, you call:
|
To unmap and free such a DMA region, you call::
|
||||||
|
|
||||||
dma_free_coherent(dev, size, cpu_addr, dma_handle);
|
dma_free_coherent(dev, size, cpu_addr, dma_handle);
|
||||||
|
|
||||||
@@ -430,7 +438,7 @@ a kmem_cache, but it uses dma_alloc_coherent(), not __get_free_pages().
|
|||||||
Also, it understands common hardware constraints for alignment,
|
Also, it understands common hardware constraints for alignment,
|
||||||
like queue heads needing to be aligned on N byte boundaries.
|
like queue heads needing to be aligned on N byte boundaries.
|
||||||
|
|
||||||
Create a dma_pool like this:
|
Create a dma_pool like this::
|
||||||
|
|
||||||
struct dma_pool *pool;
|
struct dma_pool *pool;
|
||||||
|
|
||||||
@@ -444,7 +452,7 @@ pass 0 for boundary; passing 4096 says memory allocated from this pool
|
|||||||
must not cross 4KByte boundaries (but at that time it may be better to
|
must not cross 4KByte boundaries (but at that time it may be better to
|
||||||
use dma_alloc_coherent() directly instead).
|
use dma_alloc_coherent() directly instead).
|
||||||
|
|
||||||
Allocate memory from a DMA pool like this:
|
Allocate memory from a DMA pool like this::
|
||||||
|
|
||||||
cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
|
cpu_addr = dma_pool_alloc(pool, flags, &dma_handle);
|
||||||
|
|
||||||
@@ -452,7 +460,7 @@ flags are GFP_KERNEL if blocking is permitted (not in_interrupt nor
|
|||||||
holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(),
|
holding SMP locks), GFP_ATOMIC otherwise. Like dma_alloc_coherent(),
|
||||||
this returns two values, cpu_addr and dma_handle.
|
this returns two values, cpu_addr and dma_handle.
|
||||||
|
|
||||||
Free memory that was allocated from a dma_pool like this:
|
Free memory that was allocated from a dma_pool like this::
|
||||||
|
|
||||||
dma_pool_free(pool, cpu_addr, dma_handle);
|
dma_pool_free(pool, cpu_addr, dma_handle);
|
||||||
|
|
||||||
@@ -460,7 +468,7 @@ where pool is what you passed to dma_pool_alloc(), and cpu_addr and
|
|||||||
dma_handle are the values dma_pool_alloc() returned. This function
|
dma_handle are the values dma_pool_alloc() returned. This function
|
||||||
may be called in interrupt context.
|
may be called in interrupt context.
|
||||||
|
|
||||||
Destroy a dma_pool by calling:
|
Destroy a dma_pool by calling::
|
||||||
|
|
||||||
dma_pool_destroy(pool);
|
dma_pool_destroy(pool);
|
||||||
|
|
||||||
@@ -469,10 +477,11 @@ from a pool before you destroy the pool. This function may not
|
|||||||
be called in interrupt context.
|
be called in interrupt context.
|
||||||
|
|
||||||
DMA Direction
|
DMA Direction
|
||||||
|
=============
|
||||||
|
|
||||||
The interfaces described in subsequent portions of this document
|
The interfaces described in subsequent portions of this document
|
||||||
take a DMA direction argument, which is an integer and takes on
|
take a DMA direction argument, which is an integer and takes on
|
||||||
one of the following values:
|
one of the following values::
|
||||||
|
|
||||||
DMA_BIDIRECTIONAL
|
DMA_BIDIRECTIONAL
|
||||||
DMA_TO_DEVICE
|
DMA_TO_DEVICE
|
||||||
@@ -522,13 +531,14 @@ specifier. For receive packets, just the opposite, map/unmap them
|
|||||||
with the DMA_FROM_DEVICE direction specifier.
|
with the DMA_FROM_DEVICE direction specifier.
|
||||||
|
|
||||||
Using Streaming DMA mappings
|
Using Streaming DMA mappings
|
||||||
|
============================
|
||||||
|
|
||||||
The streaming DMA mapping routines can be called from interrupt
|
The streaming DMA mapping routines can be called from interrupt
|
||||||
context. There are two versions of each map/unmap, one which will
|
context. There are two versions of each map/unmap, one which will
|
||||||
map/unmap a single memory region, and one which will map/unmap a
|
map/unmap a single memory region, and one which will map/unmap a
|
||||||
scatterlist.
|
scatterlist.
|
||||||
|
|
||||||
To map a single region, you do:
|
To map a single region, you do::
|
||||||
|
|
||||||
struct device *dev = &my_dev->dev;
|
struct device *dev = &my_dev->dev;
|
||||||
dma_addr_t dma_handle;
|
dma_addr_t dma_handle;
|
||||||
@@ -545,7 +555,7 @@ To map a single region, you do:
|
|||||||
goto map_error_handling;
|
goto map_error_handling;
|
||||||
}
|
}
|
||||||
|
|
||||||
and to unmap it:
|
and to unmap it::
|
||||||
|
|
||||||
dma_unmap_single(dev, dma_handle, size, direction);
|
dma_unmap_single(dev, dma_handle, size, direction);
|
||||||
|
|
||||||
@@ -563,7 +573,7 @@ Using CPU pointers like this for single mappings has a disadvantage:
|
|||||||
you cannot reference HIGHMEM memory in this way. Thus, there is a
|
you cannot reference HIGHMEM memory in this way. Thus, there is a
|
||||||
map/unmap interface pair akin to dma_{map,unmap}_single(). These
|
map/unmap interface pair akin to dma_{map,unmap}_single(). These
|
||||||
interfaces deal with page/offset pairs instead of CPU pointers.
|
interfaces deal with page/offset pairs instead of CPU pointers.
|
||||||
Specifically:
|
Specifically::
|
||||||
|
|
||||||
struct device *dev = &my_dev->dev;
|
struct device *dev = &my_dev->dev;
|
||||||
dma_addr_t dma_handle;
|
dma_addr_t dma_handle;
|
||||||
@@ -593,7 +603,7 @@ error as outlined under the dma_map_single() discussion.
|
|||||||
You should call dma_unmap_page() when the DMA activity is finished, e.g.,
|
You should call dma_unmap_page() when the DMA activity is finished, e.g.,
|
||||||
from the interrupt which told you that the DMA transfer is done.
|
from the interrupt which told you that the DMA transfer is done.
|
||||||
|
|
||||||
With scatterlists, you map a region gathered from several regions by:
|
With scatterlists, you map a region gathered from several regions by::
|
||||||
|
|
||||||
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
||||||
struct scatterlist *sg;
|
struct scatterlist *sg;
|
||||||
@@ -617,13 +627,15 @@ Then you should loop count times (note: this can be less than nents times)
|
|||||||
and use sg_dma_address() and sg_dma_len() macros where you previously
|
and use sg_dma_address() and sg_dma_len() macros where you previously
|
||||||
accessed sg->address and sg->length as shown above.
|
accessed sg->address and sg->length as shown above.
|
||||||
|
|
||||||
To unmap a scatterlist, just call:
|
To unmap a scatterlist, just call::
|
||||||
|
|
||||||
dma_unmap_sg(dev, sglist, nents, direction);
|
dma_unmap_sg(dev, sglist, nents, direction);
|
||||||
|
|
||||||
Again, make sure DMA activity has already finished.
|
Again, make sure DMA activity has already finished.
|
||||||
|
|
||||||
PLEASE NOTE: The 'nents' argument to the dma_unmap_sg call must be
|
.. note::
|
||||||
|
|
||||||
|
The 'nents' argument to the dma_unmap_sg call must be
|
||||||
the _same_ one you passed into the dma_map_sg call,
|
the _same_ one you passed into the dma_map_sg call,
|
||||||
it should _NOT_ be the 'count' value _returned_ from the
|
it should _NOT_ be the 'count' value _returned_ from the
|
||||||
dma_map_sg call.
|
dma_map_sg call.
|
||||||
@@ -638,11 +650,11 @@ properly in order for the CPU and device to see the most up-to-date and
|
|||||||
correct copy of the DMA buffer.
|
correct copy of the DMA buffer.
|
||||||
|
|
||||||
So, firstly, just map it with dma_map_{single,sg}(), and after each DMA
|
So, firstly, just map it with dma_map_{single,sg}(), and after each DMA
|
||||||
transfer call either:
|
transfer call either::
|
||||||
|
|
||||||
dma_sync_single_for_cpu(dev, dma_handle, size, direction);
|
dma_sync_single_for_cpu(dev, dma_handle, size, direction);
|
||||||
|
|
||||||
or:
|
or::
|
||||||
|
|
||||||
dma_sync_sg_for_cpu(dev, sglist, nents, direction);
|
dma_sync_sg_for_cpu(dev, sglist, nents, direction);
|
||||||
|
|
||||||
@@ -650,17 +662,19 @@ as appropriate.
|
|||||||
|
|
||||||
Then, if you wish to let the device get at the DMA area again,
|
Then, if you wish to let the device get at the DMA area again,
|
||||||
finish accessing the data with the CPU, and then before actually
|
finish accessing the data with the CPU, and then before actually
|
||||||
giving the buffer to the hardware call either:
|
giving the buffer to the hardware call either::
|
||||||
|
|
||||||
dma_sync_single_for_device(dev, dma_handle, size, direction);
|
dma_sync_single_for_device(dev, dma_handle, size, direction);
|
||||||
|
|
||||||
or:
|
or::
|
||||||
|
|
||||||
dma_sync_sg_for_device(dev, sglist, nents, direction);
|
dma_sync_sg_for_device(dev, sglist, nents, direction);
|
||||||
|
|
||||||
as appropriate.
|
as appropriate.
|
||||||
|
|
||||||
PLEASE NOTE: The 'nents' argument to dma_sync_sg_for_cpu() and
|
.. note::
|
||||||
|
|
||||||
|
The 'nents' argument to dma_sync_sg_for_cpu() and
|
||||||
dma_sync_sg_for_device() must be the same passed to
|
dma_sync_sg_for_device() must be the same passed to
|
||||||
dma_map_sg(). It is _NOT_ the count returned by
|
dma_map_sg(). It is _NOT_ the count returned by
|
||||||
dma_map_sg().
|
dma_map_sg().
|
||||||
@@ -671,7 +685,7 @@ dma_map_*() call till dma_unmap_*(), then you don't have to call the
|
|||||||
dma_sync_*() routines at all.
|
dma_sync_*() routines at all.
|
||||||
|
|
||||||
Here is pseudo code which shows a situation in which you would need
|
Here is pseudo code which shows a situation in which you would need
|
||||||
to use the dma_sync_*() interfaces.
|
to use the dma_sync_*() interfaces::
|
||||||
|
|
||||||
my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
|
my_card_setup_receive_buffer(struct my_card *cp, char *buffer, int len)
|
||||||
{
|
{
|
||||||
@@ -748,6 +762,7 @@ they are entirely deprecated. Some ports already do not provide these
|
|||||||
as it is impossible to correctly support them.
|
as it is impossible to correctly support them.
|
||||||
|
|
||||||
Handling Errors
|
Handling Errors
|
||||||
|
===============
|
||||||
|
|
||||||
DMA address space is limited on some architectures and an allocation
|
DMA address space is limited on some architectures and an allocation
|
||||||
failure can be determined by:
|
failure can be determined by:
|
||||||
@@ -755,7 +770,7 @@ failure can be determined by:
|
|||||||
- checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0
|
- checking if dma_alloc_coherent() returns NULL or dma_map_sg returns 0
|
||||||
|
|
||||||
- checking the dma_addr_t returned from dma_map_single() and dma_map_page()
|
- checking the dma_addr_t returned from dma_map_single() and dma_map_page()
|
||||||
by using dma_mapping_error():
|
by using dma_mapping_error()::
|
||||||
|
|
||||||
dma_addr_t dma_handle;
|
dma_addr_t dma_handle;
|
||||||
|
|
||||||
@@ -773,7 +788,8 @@ failure can be determined by:
|
|||||||
of a multiple page mapping attempt. These example are applicable to
|
of a multiple page mapping attempt. These example are applicable to
|
||||||
dma_map_page() as well.
|
dma_map_page() as well.
|
||||||
|
|
||||||
Example 1:
|
Example 1::
|
||||||
|
|
||||||
dma_addr_t dma_handle1;
|
dma_addr_t dma_handle1;
|
||||||
dma_addr_t dma_handle2;
|
dma_addr_t dma_handle2;
|
||||||
|
|
||||||
@@ -802,8 +818,12 @@ Example 1:
|
|||||||
dma_unmap_single(dma_handle1);
|
dma_unmap_single(dma_handle1);
|
||||||
map_error_handling1:
|
map_error_handling1:
|
||||||
|
|
||||||
Example 2: (if buffers are allocated in a loop, unmap all mapped buffers when
|
Example 2::
|
||||||
mapping error is detected in the middle)
|
|
||||||
|
/*
|
||||||
|
* if buffers are allocated in a loop, unmap all mapped buffers when
|
||||||
|
* mapping error is detected in the middle
|
||||||
|
*/
|
||||||
|
|
||||||
dma_addr_t dma_addr;
|
dma_addr_t dma_addr;
|
||||||
dma_addr_t array[DMA_BUFFERS];
|
dma_addr_t array[DMA_BUFFERS];
|
||||||
@@ -847,6 +867,7 @@ fails in the queuecommand hook. This means that the SCSI subsystem
|
|||||||
passes the command to the driver again later.
|
passes the command to the driver again later.
|
||||||
|
|
||||||
Optimizing Unmap State Space Consumption
|
Optimizing Unmap State Space Consumption
|
||||||
|
========================================
|
||||||
|
|
||||||
On many platforms, dma_unmap_{single,page}() is simply a nop.
|
On many platforms, dma_unmap_{single,page}() is simply a nop.
|
||||||
Therefore, keeping track of the mapping address and length is a waste
|
Therefore, keeping track of the mapping address and length is a waste
|
||||||
@@ -858,7 +879,7 @@ Actually, instead of describing the macros one by one, we'll
|
|||||||
transform some example code.
|
transform some example code.
|
||||||
|
|
||||||
1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
|
1) Use DEFINE_DMA_UNMAP_{ADDR,LEN} in state saving structures.
|
||||||
Example, before:
|
Example, before::
|
||||||
|
|
||||||
struct ring_state {
|
struct ring_state {
|
||||||
struct sk_buff *skb;
|
struct sk_buff *skb;
|
||||||
@@ -866,7 +887,7 @@ transform some example code.
|
|||||||
__u32 len;
|
__u32 len;
|
||||||
};
|
};
|
||||||
|
|
||||||
after:
|
after::
|
||||||
|
|
||||||
struct ring_state {
|
struct ring_state {
|
||||||
struct sk_buff *skb;
|
struct sk_buff *skb;
|
||||||
@@ -875,23 +896,23 @@ transform some example code.
|
|||||||
};
|
};
|
||||||
|
|
||||||
2) Use dma_unmap_{addr,len}_set() to set these values.
|
2) Use dma_unmap_{addr,len}_set() to set these values.
|
||||||
Example, before:
|
Example, before::
|
||||||
|
|
||||||
ringp->mapping = FOO;
|
ringp->mapping = FOO;
|
||||||
ringp->len = BAR;
|
ringp->len = BAR;
|
||||||
|
|
||||||
after:
|
after::
|
||||||
|
|
||||||
dma_unmap_addr_set(ringp, mapping, FOO);
|
dma_unmap_addr_set(ringp, mapping, FOO);
|
||||||
dma_unmap_len_set(ringp, len, BAR);
|
dma_unmap_len_set(ringp, len, BAR);
|
||||||
|
|
||||||
3) Use dma_unmap_{addr,len}() to access these values.
|
3) Use dma_unmap_{addr,len}() to access these values.
|
||||||
Example, before:
|
Example, before::
|
||||||
|
|
||||||
dma_unmap_single(dev, ringp->mapping, ringp->len,
|
dma_unmap_single(dev, ringp->mapping, ringp->len,
|
||||||
DMA_FROM_DEVICE);
|
DMA_FROM_DEVICE);
|
||||||
|
|
||||||
after:
|
after::
|
||||||
|
|
||||||
dma_unmap_single(dev,
|
dma_unmap_single(dev,
|
||||||
dma_unmap_addr(ringp, mapping),
|
dma_unmap_addr(ringp, mapping),
|
||||||
@@ -903,6 +924,7 @@ separately, because it is possible for an implementation to only
|
|||||||
need the address in order to perform the unmap operation.
|
need the address in order to perform the unmap operation.
|
||||||
|
|
||||||
Platform Issues
|
Platform Issues
|
||||||
|
===============
|
||||||
|
|
||||||
If you are just writing drivers for Linux and do not maintain
|
If you are just writing drivers for Linux and do not maintain
|
||||||
an architecture port for the kernel, you can safely skip down
|
an architecture port for the kernel, you can safely skip down
|
||||||
@@ -929,11 +951,12 @@ to "Closing".
|
|||||||
objects).
|
objects).
|
||||||
|
|
||||||
Closing
|
Closing
|
||||||
|
=======
|
||||||
|
|
||||||
This document, and the API itself, would not be in its current
|
This document, and the API itself, would not be in its current
|
||||||
form without the feedback and suggestions from numerous individuals.
|
form without the feedback and suggestions from numerous individuals.
|
||||||
We would like to specifically mention, in no particular order, the
|
We would like to specifically mention, in no particular order, the
|
||||||
following people:
|
following people::
|
||||||
|
|
||||||
Russell King <rmk@arm.linux.org.uk>
|
Russell King <rmk@arm.linux.org.uk>
|
||||||
Leo Dagum <dagum@barrel.engr.sgi.com>
|
Leo Dagum <dagum@barrel.engr.sgi.com>
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
|
============================================
|
||||||
Dynamic DMA mapping using the generic device
|
Dynamic DMA mapping using the generic device
|
||||||
============================================
|
============================================
|
||||||
|
|
||||||
James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
|
:Author: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
|
||||||
|
|
||||||
This document describes the DMA API. For a more gentle introduction
|
This document describes the DMA API. For a more gentle introduction
|
||||||
of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt.
|
of the API (and actual examples), see Documentation/DMA-API-HOWTO.txt.
|
||||||
@@ -13,7 +14,7 @@ non-consistent platforms (this is usually only legacy platforms) you
|
|||||||
should only use the API described in part I.
|
should only use the API described in part I.
|
||||||
|
|
||||||
Part I - dma_API
|
Part I - dma_API
|
||||||
-------------------------------------
|
----------------
|
||||||
|
|
||||||
To get the dma_API, you must #include <linux/dma-mapping.h>. This
|
To get the dma_API, you must #include <linux/dma-mapping.h>. This
|
||||||
provides dma_addr_t and the interfaces described below.
|
provides dma_addr_t and the interfaces described below.
|
||||||
@@ -26,6 +27,8 @@ address space and the DMA address space.
|
|||||||
Part Ia - Using large DMA-coherent buffers
|
Part Ia - Using large DMA-coherent buffers
|
||||||
------------------------------------------
|
------------------------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *
|
void *
|
||||||
dma_alloc_coherent(struct device *dev, size_t size,
|
dma_alloc_coherent(struct device *dev, size_t size,
|
||||||
dma_addr_t *dma_handle, gfp_t flag)
|
dma_addr_t *dma_handle, gfp_t flag)
|
||||||
@@ -51,10 +54,12 @@ consolidate your requests for consistent memory as much as possible.
|
|||||||
The simplest way to do that is to use the dma_pool calls (see below).
|
The simplest way to do that is to use the dma_pool calls (see below).
|
||||||
|
|
||||||
The flag parameter (dma_alloc_coherent() only) allows the caller to
|
The flag parameter (dma_alloc_coherent() only) allows the caller to
|
||||||
specify the GFP_ flags (see kmalloc()) for the allocation (the
|
specify the ``GFP_`` flags (see kmalloc()) for the allocation (the
|
||||||
implementation may choose to ignore flags that affect the location of
|
implementation may choose to ignore flags that affect the location of
|
||||||
the returned memory, like GFP_DMA).
|
the returned memory, like GFP_DMA).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *
|
void *
|
||||||
dma_zalloc_coherent(struct device *dev, size_t size,
|
dma_zalloc_coherent(struct device *dev, size_t size,
|
||||||
dma_addr_t *dma_handle, gfp_t flag)
|
dma_addr_t *dma_handle, gfp_t flag)
|
||||||
@@ -62,6 +67,8 @@ dma_zalloc_coherent(struct device *dev, size_t size,
|
|||||||
Wraps dma_alloc_coherent() and also zeroes the returned memory if the
|
Wraps dma_alloc_coherent() and also zeroes the returned memory if the
|
||||||
allocation attempt succeeded.
|
allocation attempt succeeded.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
|
dma_free_coherent(struct device *dev, size_t size, void *cpu_addr,
|
||||||
dma_addr_t dma_handle)
|
dma_addr_t dma_handle)
|
||||||
@@ -88,6 +95,8 @@ not __get_free_pages(). Also, they understand common hardware constraints
|
|||||||
for alignment, like queue heads needing to be aligned on N-byte boundaries.
|
for alignment, like queue heads needing to be aligned on N-byte boundaries.
|
||||||
|
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct dma_pool *
|
struct dma_pool *
|
||||||
dma_pool_create(const char *name, struct device *dev,
|
dma_pool_create(const char *name, struct device *dev,
|
||||||
size_t size, size_t align, size_t alloc);
|
size_t size, size_t align, size_t alloc);
|
||||||
@@ -103,15 +112,20 @@ in bytes, and must be a power of two). If your device has no boundary
|
|||||||
crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
|
crossing restrictions, pass 0 for alloc; passing 4096 says memory allocated
|
||||||
from this pool must not cross 4KByte boundaries.
|
from this pool must not cross 4KByte boundaries.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
|
void *
|
||||||
|
dma_pool_zalloc(struct dma_pool *pool, gfp_t mem_flags,
|
||||||
dma_addr_t *handle)
|
dma_addr_t *handle)
|
||||||
|
|
||||||
Wraps dma_pool_alloc() and also zeroes the returned memory if the
|
Wraps dma_pool_alloc() and also zeroes the returned memory if the
|
||||||
allocation attempt succeeded.
|
allocation attempt succeeded.
|
||||||
|
|
||||||
|
|
||||||
void *dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
|
::
|
||||||
|
|
||||||
|
void *
|
||||||
|
dma_pool_alloc(struct dma_pool *pool, gfp_t gfp_flags,
|
||||||
dma_addr_t *dma_handle);
|
dma_addr_t *dma_handle);
|
||||||
|
|
||||||
This allocates memory from the pool; the returned memory will meet the
|
This allocates memory from the pool; the returned memory will meet the
|
||||||
@@ -122,16 +136,20 @@ blocking. Like dma_alloc_coherent(), this returns two values: an
|
|||||||
address usable by the CPU, and the DMA address usable by the pool's
|
address usable by the CPU, and the DMA address usable by the pool's
|
||||||
device.
|
device.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void dma_pool_free(struct dma_pool *pool, void *vaddr,
|
void
|
||||||
|
dma_pool_free(struct dma_pool *pool, void *vaddr,
|
||||||
dma_addr_t addr);
|
dma_addr_t addr);
|
||||||
|
|
||||||
This puts memory back into the pool. The pool is what was passed to
|
This puts memory back into the pool. The pool is what was passed to
|
||||||
dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what
|
dma_pool_alloc(); the CPU (vaddr) and DMA addresses are what
|
||||||
were returned when that routine allocated the memory being freed.
|
were returned when that routine allocated the memory being freed.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void dma_pool_destroy(struct dma_pool *pool);
|
void
|
||||||
|
dma_pool_destroy(struct dma_pool *pool);
|
||||||
|
|
||||||
dma_pool_destroy() frees the resources of the pool. It must be
|
dma_pool_destroy() frees the resources of the pool. It must be
|
||||||
called in a context which can sleep. Make sure you've freed all allocated
|
called in a context which can sleep. Make sure you've freed all allocated
|
||||||
@@ -141,6 +159,8 @@ memory back to the pool before you destroy it.
|
|||||||
Part Ic - DMA addressing limitations
|
Part Ic - DMA addressing limitations
|
||||||
------------------------------------
|
------------------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_set_mask_and_coherent(struct device *dev, u64 mask)
|
dma_set_mask_and_coherent(struct device *dev, u64 mask)
|
||||||
|
|
||||||
@@ -149,6 +169,8 @@ streaming and coherent DMA mask parameters if it is.
|
|||||||
|
|
||||||
Returns: 0 if successful and a negative error if not.
|
Returns: 0 if successful and a negative error if not.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_set_mask(struct device *dev, u64 mask)
|
dma_set_mask(struct device *dev, u64 mask)
|
||||||
|
|
||||||
@@ -157,6 +179,8 @@ parameters if it is.
|
|||||||
|
|
||||||
Returns: 0 if successful and a negative error if not.
|
Returns: 0 if successful and a negative error if not.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_set_coherent_mask(struct device *dev, u64 mask)
|
dma_set_coherent_mask(struct device *dev, u64 mask)
|
||||||
|
|
||||||
@@ -165,6 +189,8 @@ parameters if it is.
|
|||||||
|
|
||||||
Returns: 0 if successful and a negative error if not.
|
Returns: 0 if successful and a negative error if not.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
u64
|
u64
|
||||||
dma_get_required_mask(struct device *dev)
|
dma_get_required_mask(struct device *dev)
|
||||||
|
|
||||||
@@ -182,6 +208,8 @@ call to set the mask to the value returned.
|
|||||||
Part Id - Streaming DMA mappings
|
Part Id - Streaming DMA mappings
|
||||||
--------------------------------
|
--------------------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dma_addr_t
|
dma_addr_t
|
||||||
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
|
dma_map_single(struct device *dev, void *cpu_addr, size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
@@ -193,12 +221,16 @@ The direction for both APIs may be converted freely by casting.
|
|||||||
However the dma_API uses a strongly typed enumerator for its
|
However the dma_API uses a strongly typed enumerator for its
|
||||||
direction:
|
direction:
|
||||||
|
|
||||||
|
======================= =============================================
|
||||||
DMA_NONE no direction (used for debugging)
|
DMA_NONE no direction (used for debugging)
|
||||||
DMA_TO_DEVICE data is going from the memory to the device
|
DMA_TO_DEVICE data is going from the memory to the device
|
||||||
DMA_FROM_DEVICE data is coming from the device to the memory
|
DMA_FROM_DEVICE data is coming from the device to the memory
|
||||||
DMA_BIDIRECTIONAL direction isn't known
|
DMA_BIDIRECTIONAL direction isn't known
|
||||||
|
======================= =============================================
|
||||||
|
|
||||||
Notes: Not all memory regions in a machine can be mapped by this API.
|
.. note::
|
||||||
|
|
||||||
|
Not all memory regions in a machine can be mapped by this API.
|
||||||
Further, contiguous kernel virtual space may not be contiguous as
|
Further, contiguous kernel virtual space may not be contiguous as
|
||||||
physical memory. Since this API does not provide any scatter/gather
|
physical memory. Since this API does not provide any scatter/gather
|
||||||
capability, it will fail if the user tries to map a non-physically
|
capability, it will fail if the user tries to map a non-physically
|
||||||
@@ -223,7 +255,9 @@ maps an I/O DMA address to a physical memory address). However, to be
|
|||||||
portable, device driver writers may *not* assume that such an IOMMU
|
portable, device driver writers may *not* assume that such an IOMMU
|
||||||
exists.
|
exists.
|
||||||
|
|
||||||
Warnings: Memory coherency operates at a granularity called the cache
|
.. warning::
|
||||||
|
|
||||||
|
Memory coherency operates at a granularity called the cache
|
||||||
line width. In order for memory mapped by this API to operate
|
line width. In order for memory mapped by this API to operate
|
||||||
correctly, the mapped region must begin exactly on a cache line
|
correctly, the mapped region must begin exactly on a cache line
|
||||||
boundary and end exactly on one (to prevent two separately mapped
|
boundary and end exactly on one (to prevent two separately mapped
|
||||||
@@ -255,6 +289,8 @@ are flushed from the processor) and once before the data may be
|
|||||||
accessed after being used by the device (to make sure any processor
|
accessed after being used by the device (to make sure any processor
|
||||||
cache lines are updated with data that the device may have changed).
|
cache lines are updated with data that the device may have changed).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
|
dma_unmap_single(struct device *dev, dma_addr_t dma_addr, size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
@@ -263,10 +299,13 @@ Unmaps the region previously mapped. All the parameters passed in
|
|||||||
must be identical to those passed in (and returned) by the mapping
|
must be identical to those passed in (and returned) by the mapping
|
||||||
API.
|
API.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dma_addr_t
|
dma_addr_t
|
||||||
dma_map_page(struct device *dev, struct page *page,
|
dma_map_page(struct device *dev, struct page *page,
|
||||||
unsigned long offset, size_t size,
|
unsigned long offset, size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
|
dma_unmap_page(struct device *dev, dma_addr_t dma_address, size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
@@ -277,6 +316,8 @@ and <size> parameters are provided to do partial page mapping, it is
|
|||||||
recommended that you never use these unless you really know what the
|
recommended that you never use these unless you really know what the
|
||||||
cache width is.
|
cache width is.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dma_addr_t
|
dma_addr_t
|
||||||
dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
|
dma_map_resource(struct device *dev, phys_addr_t phys_addr, size_t size,
|
||||||
enum dma_data_direction dir, unsigned long attrs)
|
enum dma_data_direction dir, unsigned long attrs)
|
||||||
@@ -289,6 +330,8 @@ API for mapping and unmapping for MMIO resources. All the notes and
|
|||||||
warnings for the other mapping APIs apply here. The API should only be
|
warnings for the other mapping APIs apply here. The API should only be
|
||||||
used to map device MMIO resources, mapping of RAM is not permitted.
|
used to map device MMIO resources, mapping of RAM is not permitted.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
|
dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
|
||||||
|
|
||||||
@@ -298,6 +341,8 @@ the returned DMA address with dma_mapping_error(). A non-zero return value
|
|||||||
means the mapping could not be created and the driver should take appropriate
|
means the mapping could not be created and the driver should take appropriate
|
||||||
action (e.g. reduce current DMA mapping usage or delay and try again later).
|
action (e.g. reduce current DMA mapping usage or delay and try again later).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_map_sg(struct device *dev, struct scatterlist *sg,
|
dma_map_sg(struct device *dev, struct scatterlist *sg,
|
||||||
int nents, enum dma_data_direction direction)
|
int nents, enum dma_data_direction direction)
|
||||||
@@ -316,7 +361,7 @@ critical that the driver do something, in the case of a block driver
|
|||||||
aborting the request or even oopsing is better than doing nothing and
|
aborting the request or even oopsing is better than doing nothing and
|
||||||
corrupting the filesystem.
|
corrupting the filesystem.
|
||||||
|
|
||||||
With scatterlists, you use the resulting mapping like this:
|
With scatterlists, you use the resulting mapping like this::
|
||||||
|
|
||||||
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
int i, count = dma_map_sg(dev, sglist, nents, direction);
|
||||||
struct scatterlist *sg;
|
struct scatterlist *sg;
|
||||||
@@ -337,6 +382,8 @@ Then you should loop count times (note: this can be less than nents times)
|
|||||||
and use sg_dma_address() and sg_dma_len() macros where you previously
|
and use sg_dma_address() and sg_dma_len() macros where you previously
|
||||||
accessed sg->address and sg->length as shown above.
|
accessed sg->address and sg->length as shown above.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_unmap_sg(struct device *dev, struct scatterlist *sg,
|
dma_unmap_sg(struct device *dev, struct scatterlist *sg,
|
||||||
int nents, enum dma_data_direction direction)
|
int nents, enum dma_data_direction direction)
|
||||||
@@ -348,17 +395,26 @@ API.
|
|||||||
Note: <nents> must be the number you passed in, *not* the number of
|
Note: <nents> must be the number you passed in, *not* the number of
|
||||||
DMA address entries returned.
|
DMA address entries returned.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle, size_t size,
|
dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
|
||||||
|
size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle, size_t size,
|
dma_sync_single_for_device(struct device *dev, dma_addr_t dma_handle,
|
||||||
|
size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg, int nents,
|
dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
|
||||||
|
int nents,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg, int nents,
|
dma_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
|
||||||
|
int nents,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
|
|
||||||
Synchronise a single contiguous or scatter/gather mapping for the CPU
|
Synchronise a single contiguous or scatter/gather mapping for the CPU
|
||||||
@@ -367,7 +423,10 @@ as those passed into the single mapping API. With the sync_single API,
|
|||||||
you can use dma_handle and size parameters that aren't identical to
|
you can use dma_handle and size parameters that aren't identical to
|
||||||
those passed into the single mapping API to do a partial sync.
|
those passed into the single mapping API to do a partial sync.
|
||||||
|
|
||||||
Notes: You must do this:
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
You must do this:
|
||||||
|
|
||||||
- Before reading values that have been written by DMA from the device
|
- Before reading values that have been written by DMA from the device
|
||||||
(use the DMA_FROM_DEVICE direction)
|
(use the DMA_FROM_DEVICE direction)
|
||||||
@@ -378,6 +437,8 @@ Notes: You must do this:
|
|||||||
|
|
||||||
See also dma_map_single().
|
See also dma_map_single().
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
dma_addr_t
|
dma_addr_t
|
||||||
dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size,
|
dma_map_single_attrs(struct device *dev, void *cpu_addr, size_t size,
|
||||||
enum dma_data_direction dir,
|
enum dma_data_direction dir,
|
||||||
@@ -410,9 +471,9 @@ is identical to those of the corresponding function
|
|||||||
without the _attrs suffix. As a result dma_map_single_attrs()
|
without the _attrs suffix. As a result dma_map_single_attrs()
|
||||||
can generally replace dma_map_single(), etc.
|
can generally replace dma_map_single(), etc.
|
||||||
|
|
||||||
As an example of the use of the *_attrs functions, here's how
|
As an example of the use of the ``*_attrs`` functions, here's how
|
||||||
you could pass an attribute DMA_ATTR_FOO when mapping memory
|
you could pass an attribute DMA_ATTR_FOO when mapping memory
|
||||||
for DMA:
|
for DMA::
|
||||||
|
|
||||||
#include <linux/dma-mapping.h>
|
#include <linux/dma-mapping.h>
|
||||||
/* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and
|
/* DMA_ATTR_FOO should be defined in linux/dma-mapping.h and
|
||||||
@@ -427,7 +488,7 @@ for DMA:
|
|||||||
|
|
||||||
Architectures that care about DMA_ATTR_FOO would check for its
|
Architectures that care about DMA_ATTR_FOO would check for its
|
||||||
presence in their implementations of the mapping and unmapping
|
presence in their implementations of the mapping and unmapping
|
||||||
routines, e.g.:
|
routines, e.g.:::
|
||||||
|
|
||||||
void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
|
void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
|
||||||
size_t size, enum dma_data_direction dir,
|
size_t size, enum dma_data_direction dir,
|
||||||
@@ -437,10 +498,11 @@ void whizco_dma_map_sg_attrs(struct device *dev, dma_addr_t dma_addr,
|
|||||||
if (attrs & DMA_ATTR_FOO)
|
if (attrs & DMA_ATTR_FOO)
|
||||||
/* twizzle the frobnozzle */
|
/* twizzle the frobnozzle */
|
||||||
....
|
....
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
Part II - Advanced dma_ usage
|
Part II - Advanced dma usage
|
||||||
-----------------------------
|
----------------------------
|
||||||
|
|
||||||
Warning: These pieces of the DMA API should not be used in the
|
Warning: These pieces of the DMA API should not be used in the
|
||||||
majority of cases, since they cater for unlikely corner cases that
|
majority of cases, since they cater for unlikely corner cases that
|
||||||
@@ -450,6 +512,8 @@ If you don't understand how cache line coherency works between a
|
|||||||
processor and an I/O device, you should not be using this part of the
|
processor and an I/O device, you should not be using this part of the
|
||||||
API at all.
|
API at all.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *
|
void *
|
||||||
dma_alloc_noncoherent(struct device *dev, size_t size,
|
dma_alloc_noncoherent(struct device *dev, size_t size,
|
||||||
dma_addr_t *dma_handle, gfp_t flag)
|
dma_addr_t *dma_handle, gfp_t flag)
|
||||||
@@ -468,6 +532,8 @@ only use this API if you positively know your driver will be
|
|||||||
required to work on one of the rare (usually non-PCI) architectures
|
required to work on one of the rare (usually non-PCI) architectures
|
||||||
that simply cannot make consistent memory.
|
that simply cannot make consistent memory.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
|
dma_free_noncoherent(struct device *dev, size_t size, void *cpu_addr,
|
||||||
dma_addr_t dma_handle)
|
dma_addr_t dma_handle)
|
||||||
@@ -476,6 +542,8 @@ Free memory allocated by the nonconsistent API. All parameters must
|
|||||||
be identical to those passed in (and returned by
|
be identical to those passed in (and returned by
|
||||||
dma_alloc_noncoherent()).
|
dma_alloc_noncoherent()).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_get_cache_alignment(void)
|
dma_get_cache_alignment(void)
|
||||||
|
|
||||||
@@ -483,11 +551,15 @@ Returns the processor cache alignment. This is the absolute minimum
|
|||||||
alignment *and* width that you must observe when either mapping
|
alignment *and* width that you must observe when either mapping
|
||||||
memory or doing partial flushes.
|
memory or doing partial flushes.
|
||||||
|
|
||||||
Notes: This API may return a number *larger* than the actual cache
|
.. note::
|
||||||
|
|
||||||
|
This API may return a number *larger* than the actual cache
|
||||||
line, but it will guarantee that one or more cache lines fit exactly
|
line, but it will guarantee that one or more cache lines fit exactly
|
||||||
into the width returned by this call. It will also always be a power
|
into the width returned by this call. It will also always be a power
|
||||||
of two for easy alignment.
|
of two for easy alignment.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
|
dma_cache_sync(struct device *dev, void *vaddr, size_t size,
|
||||||
enum dma_data_direction direction)
|
enum dma_data_direction direction)
|
||||||
@@ -497,6 +569,8 @@ dma_alloc_noncoherent(), starting at virtual address vaddr and
|
|||||||
continuing on for size. Again, you *must* observe the cache line
|
continuing on for size. Again, you *must* observe the cache line
|
||||||
boundaries when doing this.
|
boundaries when doing this.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int
|
int
|
||||||
dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
|
dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr,
|
||||||
dma_addr_t device_addr, size_t size, int
|
dma_addr_t device_addr, size_t size, int
|
||||||
@@ -516,19 +590,19 @@ size is the size of the area (must be multiples of PAGE_SIZE).
|
|||||||
|
|
||||||
flags can be ORed together and are:
|
flags can be ORed together and are:
|
||||||
|
|
||||||
DMA_MEMORY_MAP - request that the memory returned from
|
- DMA_MEMORY_MAP - request that the memory returned from
|
||||||
dma_alloc_coherent() be directly writable.
|
dma_alloc_coherent() be directly writable.
|
||||||
|
|
||||||
DMA_MEMORY_IO - request that the memory returned from
|
- DMA_MEMORY_IO - request that the memory returned from
|
||||||
dma_alloc_coherent() be addressable using read()/write()/memcpy_toio() etc.
|
dma_alloc_coherent() be addressable using read()/write()/memcpy_toio() etc.
|
||||||
|
|
||||||
One or both of these flags must be present.
|
One or both of these flags must be present.
|
||||||
|
|
||||||
DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by
|
- DMA_MEMORY_INCLUDES_CHILDREN - make the declared memory be allocated by
|
||||||
dma_alloc_coherent of any child devices of this one (for memory residing
|
dma_alloc_coherent of any child devices of this one (for memory residing
|
||||||
on a bridge).
|
on a bridge).
|
||||||
|
|
||||||
DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions.
|
- DMA_MEMORY_EXCLUSIVE - only allocate memory from the declared regions.
|
||||||
Do not allow dma_alloc_coherent() to fall back to system memory when
|
Do not allow dma_alloc_coherent() to fall back to system memory when
|
||||||
it's out of memory in the declared region.
|
it's out of memory in the declared region.
|
||||||
|
|
||||||
@@ -543,13 +617,15 @@ must be accessed using the correct bus functions. If your driver
|
|||||||
isn't prepared to handle this contingency, it should not specify
|
isn't prepared to handle this contingency, it should not specify
|
||||||
DMA_MEMORY_IO in the input flags.
|
DMA_MEMORY_IO in the input flags.
|
||||||
|
|
||||||
As a simplification for the platforms, only *one* such region of
|
As a simplification for the platforms, only **one** such region of
|
||||||
memory may be declared per device.
|
memory may be declared per device.
|
||||||
|
|
||||||
For reasons of efficiency, most platforms choose to track the declared
|
For reasons of efficiency, most platforms choose to track the declared
|
||||||
region only at the granularity of a page. For smaller allocations,
|
region only at the granularity of a page. For smaller allocations,
|
||||||
you should use the dma_pool() API.
|
you should use the dma_pool() API.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
dma_release_declared_memory(struct device *dev)
|
dma_release_declared_memory(struct device *dev)
|
||||||
|
|
||||||
@@ -559,6 +635,8 @@ unconditionally having removed all the required structures. It is the
|
|||||||
driver's job to ensure that no parts of this memory region are
|
driver's job to ensure that no parts of this memory region are
|
||||||
currently in use.
|
currently in use.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *
|
void *
|
||||||
dma_mark_declared_memory_occupied(struct device *dev,
|
dma_mark_declared_memory_occupied(struct device *dev,
|
||||||
dma_addr_t device_addr, size_t size)
|
dma_addr_t device_addr, size_t size)
|
||||||
@@ -592,9 +670,8 @@ option has a performance impact. Do not enable it in production kernels.
|
|||||||
If you boot the resulting kernel will contain code which does some bookkeeping
|
If you boot the resulting kernel will contain code which does some bookkeeping
|
||||||
about what DMA memory was allocated for which device. If this code detects an
|
about what DMA memory was allocated for which device. If this code detects an
|
||||||
error it prints a warning message with some details into your kernel log. An
|
error it prints a warning message with some details into your kernel log. An
|
||||||
example warning message may look like this:
|
example warning message may look like this::
|
||||||
|
|
||||||
------------[ cut here ]------------
|
|
||||||
WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448
|
WARNING: at /data2/repos/linux-2.6-iommu/lib/dma-debug.c:448
|
||||||
check_unmap+0x203/0x490()
|
check_unmap+0x203/0x490()
|
||||||
Hardware name:
|
Hardware name:
|
||||||
@@ -637,6 +714,7 @@ details.
|
|||||||
The debugfs directory for the DMA-API debugging code is called dma-api/. In
|
The debugfs directory for the DMA-API debugging code is called dma-api/. In
|
||||||
this directory the following files can currently be found:
|
this directory the following files can currently be found:
|
||||||
|
|
||||||
|
=============================== ===============================================
|
||||||
dma-api/all_errors This file contains a numeric value. If this
|
dma-api/all_errors This file contains a numeric value. If this
|
||||||
value is not equal to zero the debugging code
|
value is not equal to zero the debugging code
|
||||||
will print a warning for every error it finds
|
will print a warning for every error it finds
|
||||||
@@ -657,23 +735,21 @@ this directory the following files can currently be found:
|
|||||||
one at system boot and be set by writing into
|
one at system boot and be set by writing into
|
||||||
this file
|
this file
|
||||||
|
|
||||||
dma-api/min_free_entries
|
dma-api/min_free_entries This read-only file can be read to get the
|
||||||
This read-only file can be read to get the
|
|
||||||
minimum number of free dma_debug_entries the
|
minimum number of free dma_debug_entries the
|
||||||
allocator has ever seen. If this value goes
|
allocator has ever seen. If this value goes
|
||||||
down to zero the code will disable itself
|
down to zero the code will disable itself
|
||||||
because it is not longer reliable.
|
because it is not longer reliable.
|
||||||
|
|
||||||
dma-api/num_free_entries
|
dma-api/num_free_entries The current number of free dma_debug_entries
|
||||||
The current number of free dma_debug_entries
|
|
||||||
in the allocator.
|
in the allocator.
|
||||||
|
|
||||||
dma-api/driver-filter
|
dma-api/driver-filter You can write a name of a driver into this file
|
||||||
You can write a name of a driver into this file
|
|
||||||
to limit the debug output to requests from that
|
to limit the debug output to requests from that
|
||||||
particular driver. Write an empty string to
|
particular driver. Write an empty string to
|
||||||
that file to disable the filter and see
|
that file to disable the filter and see
|
||||||
all errors again.
|
all errors again.
|
||||||
|
=============================== ===============================================
|
||||||
|
|
||||||
If you have this code compiled into your kernel it will be enabled by default.
|
If you have this code compiled into your kernel it will be enabled by default.
|
||||||
If you want to boot without the bookkeeping anyway you can provide
|
If you want to boot without the bookkeeping anyway you can provide
|
||||||
@@ -692,7 +768,10 @@ of preallocated entries is defined per architecture. If it is too low for you
|
|||||||
boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
|
boot with 'dma_debug_entries=<your_desired_number>' to overwrite the
|
||||||
architectural default.
|
architectural default.
|
||||||
|
|
||||||
void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
|
::
|
||||||
|
|
||||||
|
void
|
||||||
|
debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
|
||||||
|
|
||||||
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
|
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
|
||||||
to check DMA mapping errors on addresses returned by dma_map_single() and
|
to check DMA mapping errors on addresses returned by dma_map_single() and
|
||||||
@@ -702,4 +781,3 @@ the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
|
|||||||
this flag is still set, prints warning message that includes call trace that
|
this flag is still set, prints warning message that includes call trace that
|
||||||
leads up to the unmap. This interface can be called from dma_mapping_error()
|
leads up to the unmap. This interface can be called from dma_mapping_error()
|
||||||
routines to enable DMA mapping error check debugging.
|
routines to enable DMA mapping error check debugging.
|
||||||
|
|
||||||
|
|||||||
@@ -1,16 +1,17 @@
|
|||||||
|
============================
|
||||||
DMA with ISA and LPC devices
|
DMA with ISA and LPC devices
|
||||||
============================
|
============================
|
||||||
|
|
||||||
Pierre Ossman <drzeus@drzeus.cx>
|
:Author: Pierre Ossman <drzeus@drzeus.cx>
|
||||||
|
|
||||||
This document describes how to do DMA transfers using the old ISA DMA
|
This document describes how to do DMA transfers using the old ISA DMA
|
||||||
controller. Even though ISA is more or less dead today the LPC bus
|
controller. Even though ISA is more or less dead today the LPC bus
|
||||||
uses the same DMA system so it will be around for quite some time.
|
uses the same DMA system so it will be around for quite some time.
|
||||||
|
|
||||||
Part I - Headers and dependencies
|
Headers and dependencies
|
||||||
---------------------------------
|
------------------------
|
||||||
|
|
||||||
To do ISA style DMA you need to include two headers:
|
To do ISA style DMA you need to include two headers::
|
||||||
|
|
||||||
#include <linux/dma-mapping.h>
|
#include <linux/dma-mapping.h>
|
||||||
#include <asm/dma.h>
|
#include <asm/dma.h>
|
||||||
@@ -23,8 +24,8 @@ this is not present on all platforms make sure you construct your
|
|||||||
Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
|
Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
|
||||||
to build your driver on unsupported platforms.
|
to build your driver on unsupported platforms.
|
||||||
|
|
||||||
Part II - Buffer allocation
|
Buffer allocation
|
||||||
---------------------------
|
-----------------
|
||||||
|
|
||||||
The ISA DMA controller has some very strict requirements on which
|
The ISA DMA controller has some very strict requirements on which
|
||||||
memory it can access so extra care must be taken when allocating
|
memory it can access so extra care must be taken when allocating
|
||||||
@@ -42,13 +43,13 @@ requirements you pass the flag GFP_DMA to kmalloc.
|
|||||||
|
|
||||||
Unfortunately the memory available for ISA DMA is scarce so unless you
|
Unfortunately the memory available for ISA DMA is scarce so unless you
|
||||||
allocate the memory during boot-up it's a good idea to also pass
|
allocate the memory during boot-up it's a good idea to also pass
|
||||||
__GFP_REPEAT and __GFP_NOWARN to make the allocator try a bit harder.
|
__GFP_RETRY_MAYFAIL and __GFP_NOWARN to make the allocator try a bit harder.
|
||||||
|
|
||||||
(This scarcity also means that you should allocate the buffer as
|
(This scarcity also means that you should allocate the buffer as
|
||||||
early as possible and not release it until the driver is unloaded.)
|
early as possible and not release it until the driver is unloaded.)
|
||||||
|
|
||||||
Part III - Address translation
|
Address translation
|
||||||
------------------------------
|
-------------------
|
||||||
|
|
||||||
To translate the virtual address to a bus address, use the normal DMA
|
To translate the virtual address to a bus address, use the normal DMA
|
||||||
API. Do _not_ use isa_virt_to_phys() even though it does the same
|
API. Do _not_ use isa_virt_to_phys() even though it does the same
|
||||||
@@ -61,8 +62,8 @@ Note: x86_64 had a broken DMA API when it came to ISA but has since
|
|||||||
been fixed. If your arch has problems then fix the DMA API instead of
|
been fixed. If your arch has problems then fix the DMA API instead of
|
||||||
reverting to the ISA functions.
|
reverting to the ISA functions.
|
||||||
|
|
||||||
Part IV - Channels
|
Channels
|
||||||
------------------
|
--------
|
||||||
|
|
||||||
A normal ISA DMA controller has 8 channels. The lower four are for
|
A normal ISA DMA controller has 8 channels. The lower four are for
|
||||||
8-bit transfers and the upper four are for 16-bit transfers.
|
8-bit transfers and the upper four are for 16-bit transfers.
|
||||||
@@ -80,8 +81,8 @@ The ability to use 16-bit or 8-bit transfers is _not_ up to you as a
|
|||||||
driver author but depends on what the hardware supports. Check your
|
driver author but depends on what the hardware supports. Check your
|
||||||
specs or test different channels.
|
specs or test different channels.
|
||||||
|
|
||||||
Part V - Transfer data
|
Transfer data
|
||||||
----------------------
|
-------------
|
||||||
|
|
||||||
Now for the good stuff, the actual DMA transfer. :)
|
Now for the good stuff, the actual DMA transfer. :)
|
||||||
|
|
||||||
@@ -112,7 +113,7 @@ Once the DMA transfer is finished (or timed out) you should disable
|
|||||||
the channel again. You should also check get_dma_residue() to make
|
the channel again. You should also check get_dma_residue() to make
|
||||||
sure that all data has been transferred.
|
sure that all data has been transferred.
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
int flags, residue;
|
int flags, residue;
|
||||||
|
|
||||||
@@ -141,8 +142,8 @@ if (residue != 0)
|
|||||||
|
|
||||||
release_dma_lock(flags);
|
release_dma_lock(flags);
|
||||||
|
|
||||||
Part VI - Suspend/resume
|
Suspend/resume
|
||||||
------------------------
|
--------------
|
||||||
|
|
||||||
It is the driver's responsibility to make sure that the machine isn't
|
It is the driver's responsibility to make sure that the machine isn't
|
||||||
suspended while a DMA transfer is in progress. Also, all DMA settings
|
suspended while a DMA transfer is in progress. Also, all DMA settings
|
||||||
|
|||||||
@@ -1,3 +1,4 @@
|
|||||||
|
==============
|
||||||
DMA attributes
|
DMA attributes
|
||||||
==============
|
==============
|
||||||
|
|
||||||
@@ -108,6 +109,7 @@ This is a hint to the DMA-mapping subsystem that it's probably not worth
|
|||||||
the time to try to allocate memory to in a way that gives better TLB
|
the time to try to allocate memory to in a way that gives better TLB
|
||||||
efficiency (AKA it's not worth trying to build the mapping out of larger
|
efficiency (AKA it's not worth trying to build the mapping out of larger
|
||||||
pages). You might want to specify this if:
|
pages). You might want to specify this if:
|
||||||
|
|
||||||
- You know that the accesses to this memory won't thrash the TLB.
|
- You know that the accesses to this memory won't thrash the TLB.
|
||||||
You might know that the accesses are likely to be sequential or
|
You might know that the accesses are likely to be sequential or
|
||||||
that they aren't sequential but it's unlikely you'll ping-pong
|
that they aren't sequential but it's unlikely you'll ping-pong
|
||||||
@@ -121,10 +123,11 @@ pages). You might want to specify this if:
|
|||||||
the mapping to have a short lifetime then it may be worth it to
|
the mapping to have a short lifetime then it may be worth it to
|
||||||
optimize allocation (avoid coming up with large pages) instead of
|
optimize allocation (avoid coming up with large pages) instead of
|
||||||
getting the slight performance win of larger pages.
|
getting the slight performance win of larger pages.
|
||||||
|
|
||||||
Setting this hint doesn't guarantee that you won't get huge pages, but it
|
Setting this hint doesn't guarantee that you won't get huge pages, but it
|
||||||
means that we won't try quite as hard to get them.
|
means that we won't try quite as hard to get them.
|
||||||
|
|
||||||
NOTE: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
|
.. note:: At the moment DMA_ATTR_ALLOC_SINGLE_PAGES is only implemented on ARM,
|
||||||
though ARM64 patches will likely be posted soon.
|
though ARM64 patches will likely be posted soon.
|
||||||
|
|
||||||
DMA_ATTR_NO_WARN
|
DMA_ATTR_NO_WARN
|
||||||
@@ -142,10 +145,10 @@ problem at all, depending on the implementation of the retry mechanism.
|
|||||||
So, this provides a way for drivers to avoid those error messages on calls
|
So, this provides a way for drivers to avoid those error messages on calls
|
||||||
where allocation failures are not a problem, and shouldn't bother the logs.
|
where allocation failures are not a problem, and shouldn't bother the logs.
|
||||||
|
|
||||||
NOTE: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
|
.. note:: At the moment DMA_ATTR_NO_WARN is only implemented on PowerPC.
|
||||||
|
|
||||||
DMA_ATTR_PRIVILEGED
|
DMA_ATTR_PRIVILEGED
|
||||||
------------------------------
|
-------------------
|
||||||
|
|
||||||
Some advanced peripherals such as remote processors and GPUs perform
|
Some advanced peripherals such as remote processors and GPUs perform
|
||||||
accesses to DMA buffers in both privileged "supervisor" and unprivileged
|
accesses to DMA buffers in both privileged "supervisor" and unprivileged
|
||||||
|
|||||||
@@ -1,9 +1,8 @@
|
|||||||
|
=====================
|
||||||
The Linux IPMI Driver
|
The Linux IPMI Driver
|
||||||
---------------------
|
=====================
|
||||||
Corey Minyard
|
|
||||||
<minyard@mvista.com>
|
:Author: Corey Minyard <minyard@mvista.com> / <minyard@acm.org>
|
||||||
<minyard@acm.org>
|
|
||||||
|
|
||||||
The Intelligent Platform Management Interface, or IPMI, is a
|
The Intelligent Platform Management Interface, or IPMI, is a
|
||||||
standard for controlling intelligent devices that monitor a system.
|
standard for controlling intelligent devices that monitor a system.
|
||||||
@@ -141,7 +140,7 @@ Addressing
|
|||||||
----------
|
----------
|
||||||
|
|
||||||
The IPMI addressing works much like IP addresses, you have an overlay
|
The IPMI addressing works much like IP addresses, you have an overlay
|
||||||
to handle the different address types. The overlay is:
|
to handle the different address types. The overlay is::
|
||||||
|
|
||||||
struct ipmi_addr
|
struct ipmi_addr
|
||||||
{
|
{
|
||||||
@@ -153,7 +152,7 @@ to handle the different address types. The overlay is:
|
|||||||
The addr_type determines what the address really is. The driver
|
The addr_type determines what the address really is. The driver
|
||||||
currently understands two different types of addresses.
|
currently understands two different types of addresses.
|
||||||
|
|
||||||
"System Interface" addresses are defined as:
|
"System Interface" addresses are defined as::
|
||||||
|
|
||||||
struct ipmi_system_interface_addr
|
struct ipmi_system_interface_addr
|
||||||
{
|
{
|
||||||
@@ -166,7 +165,7 @@ straight to the BMC on the current card. The channel must be
|
|||||||
IPMI_BMC_CHANNEL.
|
IPMI_BMC_CHANNEL.
|
||||||
|
|
||||||
Messages that are destined to go out on the IPMB bus use the
|
Messages that are destined to go out on the IPMB bus use the
|
||||||
IPMI_IPMB_ADDR_TYPE address type. The format is
|
IPMI_IPMB_ADDR_TYPE address type. The format is::
|
||||||
|
|
||||||
struct ipmi_ipmb_addr
|
struct ipmi_ipmb_addr
|
||||||
{
|
{
|
||||||
@@ -184,7 +183,7 @@ spec.
|
|||||||
Messages
|
Messages
|
||||||
--------
|
--------
|
||||||
|
|
||||||
Messages are defined as:
|
Messages are defined as::
|
||||||
|
|
||||||
struct ipmi_msg
|
struct ipmi_msg
|
||||||
{
|
{
|
||||||
@@ -208,7 +207,7 @@ block of data, even when receiving messages. Otherwise the driver
|
|||||||
will have no place to put the message.
|
will have no place to put the message.
|
||||||
|
|
||||||
Messages coming up from the message handler in kernelland will come in
|
Messages coming up from the message handler in kernelland will come in
|
||||||
as:
|
as::
|
||||||
|
|
||||||
struct ipmi_recv_msg
|
struct ipmi_recv_msg
|
||||||
{
|
{
|
||||||
@@ -246,6 +245,7 @@ and the user should not have to care what type of SMI is below them.
|
|||||||
|
|
||||||
|
|
||||||
Watching For Interfaces
|
Watching For Interfaces
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
When your code comes up, the IPMI driver may or may not have detected
|
When your code comes up, the IPMI driver may or may not have detected
|
||||||
if IPMI devices exist. So you might have to defer your setup until
|
if IPMI devices exist. So you might have to defer your setup until
|
||||||
@@ -256,6 +256,7 @@ and tell you when they come and go.
|
|||||||
|
|
||||||
|
|
||||||
Creating the User
|
Creating the User
|
||||||
|
^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
To use the message handler, you must first create a user using
|
To use the message handler, you must first create a user using
|
||||||
ipmi_create_user. The interface number specifies which SMI you want
|
ipmi_create_user. The interface number specifies which SMI you want
|
||||||
@@ -272,6 +273,7 @@ closing the device automatically destroys the user.
|
|||||||
|
|
||||||
|
|
||||||
Messaging
|
Messaging
|
||||||
|
^^^^^^^^^
|
||||||
|
|
||||||
To send a message from kernel-land, the ipmi_request_settime() call does
|
To send a message from kernel-land, the ipmi_request_settime() call does
|
||||||
pretty much all message handling. Most of the parameter are
|
pretty much all message handling. Most of the parameter are
|
||||||
@@ -321,6 +323,7 @@ though, since it is tricky to manage your own buffers.
|
|||||||
|
|
||||||
|
|
||||||
Events and Incoming Commands
|
Events and Incoming Commands
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
The driver takes care of polling for IPMI events and receiving
|
The driver takes care of polling for IPMI events and receiving
|
||||||
commands (commands are messages that are not responses, they are
|
commands (commands are messages that are not responses, they are
|
||||||
@@ -367,7 +370,7 @@ in the system. It discovers interfaces through a host of different
|
|||||||
methods, depending on the system.
|
methods, depending on the system.
|
||||||
|
|
||||||
You can specify up to four interfaces on the module load line and
|
You can specify up to four interfaces on the module load line and
|
||||||
control some module parameters:
|
control some module parameters::
|
||||||
|
|
||||||
modprobe ipmi_si.o type=<type1>,<type2>....
|
modprobe ipmi_si.o type=<type1>,<type2>....
|
||||||
ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
|
ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
|
||||||
@@ -437,7 +440,7 @@ default is one. Setting to 0 is useful with the hotmod, but is
|
|||||||
obviously only useful for modules.
|
obviously only useful for modules.
|
||||||
|
|
||||||
When compiled into the kernel, the parameters can be specified on the
|
When compiled into the kernel, the parameters can be specified on the
|
||||||
kernel command line as:
|
kernel command line as::
|
||||||
|
|
||||||
ipmi_si.type=<type1>,<type2>...
|
ipmi_si.type=<type1>,<type2>...
|
||||||
ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
|
ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
|
||||||
@@ -474,16 +477,22 @@ The driver supports a hot add and remove of interfaces. This way,
|
|||||||
interfaces can be added or removed after the kernel is up and running.
|
interfaces can be added or removed after the kernel is up and running.
|
||||||
This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
|
This is done using /sys/modules/ipmi_si/parameters/hotmod, which is a
|
||||||
write-only parameter. You write a string to this interface. The string
|
write-only parameter. You write a string to this interface. The string
|
||||||
has the format:
|
has the format::
|
||||||
|
|
||||||
<op1>[:op2[:op3...]]
|
<op1>[:op2[:op3...]]
|
||||||
The "op"s are:
|
|
||||||
|
The "op"s are::
|
||||||
|
|
||||||
add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]]
|
add|remove,kcs|bt|smic,mem|i/o,<address>[,<opt1>[,<opt2>[,...]]]
|
||||||
You can specify more than one interface on the line. The "opt"s are:
|
|
||||||
|
You can specify more than one interface on the line. The "opt"s are::
|
||||||
|
|
||||||
rsp=<regspacing>
|
rsp=<regspacing>
|
||||||
rsi=<regsize>
|
rsi=<regsize>
|
||||||
rsh=<regshift>
|
rsh=<regshift>
|
||||||
irq=<irq>
|
irq=<irq>
|
||||||
ipmb=<ipmb slave addr>
|
ipmb=<ipmb slave addr>
|
||||||
|
|
||||||
and these have the same meanings as discussed above. Note that you
|
and these have the same meanings as discussed above. Note that you
|
||||||
can also use this on the kernel command line for a more compact format
|
can also use this on the kernel command line for a more compact format
|
||||||
for specifying an interface. Note that when removing an interface,
|
for specifying an interface. Note that when removing an interface,
|
||||||
@@ -496,7 +505,7 @@ The SMBus Driver (SSIF)
|
|||||||
The SMBus driver allows up to 4 SMBus devices to be configured in the
|
The SMBus driver allows up to 4 SMBus devices to be configured in the
|
||||||
system. By default, the driver will only register with something it
|
system. By default, the driver will only register with something it
|
||||||
finds in DMI or ACPI tables. You can change this
|
finds in DMI or ACPI tables. You can change this
|
||||||
at module load time (for a module) with:
|
at module load time (for a module) with::
|
||||||
|
|
||||||
modprobe ipmi_ssif.o
|
modprobe ipmi_ssif.o
|
||||||
addr=<i2caddr1>[,<i2caddr2>[,...]]
|
addr=<i2caddr1>[,<i2caddr2>[,...]]
|
||||||
@@ -535,7 +544,7 @@ the smb_addr parameter unless you have DMI or ACPI data to tell the
|
|||||||
driver what to use.
|
driver what to use.
|
||||||
|
|
||||||
When compiled into the kernel, the addresses can be specified on the
|
When compiled into the kernel, the addresses can be specified on the
|
||||||
kernel command line as:
|
kernel command line as::
|
||||||
|
|
||||||
ipmb_ssif.addr=<i2caddr1>[,<i2caddr2>[...]]
|
ipmb_ssif.addr=<i2caddr1>[,<i2caddr2>[...]]
|
||||||
ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]]
|
ipmi_ssif.adapter=<adapter1>[,<adapter2>[...]]
|
||||||
@@ -565,7 +574,7 @@ Some users need more detailed information about a device, like where
|
|||||||
the address came from or the raw base device for the IPMI interface.
|
the address came from or the raw base device for the IPMI interface.
|
||||||
You can use the IPMI smi_watcher to catch the IPMI interfaces as they
|
You can use the IPMI smi_watcher to catch the IPMI interfaces as they
|
||||||
come or go, and to grab the information, you can use the function
|
come or go, and to grab the information, you can use the function
|
||||||
ipmi_get_smi_info(), which returns the following structure:
|
ipmi_get_smi_info(), which returns the following structure::
|
||||||
|
|
||||||
struct ipmi_smi_info {
|
struct ipmi_smi_info {
|
||||||
enum ipmi_addr_src addr_src;
|
enum ipmi_addr_src addr_src;
|
||||||
@@ -590,7 +599,7 @@ Watchdog
|
|||||||
|
|
||||||
A watchdog timer is provided that implements the Linux-standard
|
A watchdog timer is provided that implements the Linux-standard
|
||||||
watchdog timer interface. It has three module parameters that can be
|
watchdog timer interface. It has three module parameters that can be
|
||||||
used to control it:
|
used to control it::
|
||||||
|
|
||||||
modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type>
|
modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type>
|
||||||
preaction=<preaction type> preop=<preop type> start_now=x
|
preaction=<preaction type> preop=<preop type> start_now=x
|
||||||
@@ -635,7 +644,7 @@ watchdog device is closed. The default value of nowayout is true
|
|||||||
if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not.
|
if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not.
|
||||||
|
|
||||||
When compiled into the kernel, the kernel command line is available
|
When compiled into the kernel, the kernel command line is available
|
||||||
for configuring the watchdog:
|
for configuring the watchdog::
|
||||||
|
|
||||||
ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t>
|
ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t>
|
||||||
ipmi_watchdog.action=<action type>
|
ipmi_watchdog.action=<action type>
|
||||||
@@ -675,6 +684,7 @@ also get a bunch of OEM events holding the panic string.
|
|||||||
|
|
||||||
|
|
||||||
The field settings of the events are:
|
The field settings of the events are:
|
||||||
|
|
||||||
* Generator ID: 0x21 (kernel)
|
* Generator ID: 0x21 (kernel)
|
||||||
* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format)
|
* EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format)
|
||||||
* Sensor Type: 0x20 (OS critical stop sensor)
|
* Sensor Type: 0x20 (OS critical stop sensor)
|
||||||
@@ -683,15 +693,17 @@ The field settings of the events are:
|
|||||||
* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3)
|
* Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3)
|
||||||
* Event data 2: second byte of panic string
|
* Event data 2: second byte of panic string
|
||||||
* Event data 3: third byte of panic string
|
* Event data 3: third byte of panic string
|
||||||
|
|
||||||
See the IPMI spec for the details of the event layout. This event is
|
See the IPMI spec for the details of the event layout. This event is
|
||||||
always sent to the local management controller. It will handle routing
|
always sent to the local management controller. It will handle routing
|
||||||
the message to the right place
|
the message to the right place
|
||||||
|
|
||||||
Other OEM events have the following format:
|
Other OEM events have the following format:
|
||||||
Record ID (bytes 0-1): Set by the SEL.
|
|
||||||
Record type (byte 2): 0xf0 (OEM non-timestamped)
|
* Record ID (bytes 0-1): Set by the SEL.
|
||||||
byte 3: The slave address of the card saving the panic
|
* Record type (byte 2): 0xf0 (OEM non-timestamped)
|
||||||
byte 4: A sequence number (starting at zero)
|
* byte 3: The slave address of the card saving the panic
|
||||||
|
* byte 4: A sequence number (starting at zero)
|
||||||
The rest of the bytes (11 bytes) are the panic string. If the panic string
|
The rest of the bytes (11 bytes) are the panic string. If the panic string
|
||||||
is longer than 11 bytes, multiple messages will be sent with increasing
|
is longer than 11 bytes, multiple messages will be sent with increasing
|
||||||
sequence numbers.
|
sequence numbers.
|
||||||
|
|||||||
@@ -1,8 +1,11 @@
|
|||||||
ChangeLog:
|
================
|
||||||
Started by Ingo Molnar <mingo@redhat.com>
|
|
||||||
Update by Max Krasnyansky <maxk@qualcomm.com>
|
|
||||||
|
|
||||||
SMP IRQ affinity
|
SMP IRQ affinity
|
||||||
|
================
|
||||||
|
|
||||||
|
ChangeLog:
|
||||||
|
- Started by Ingo Molnar <mingo@redhat.com>
|
||||||
|
- Update by Max Krasnyansky <maxk@qualcomm.com>
|
||||||
|
|
||||||
|
|
||||||
/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
|
/proc/irq/IRQ#/smp_affinity and /proc/irq/IRQ#/smp_affinity_list specify
|
||||||
which target CPUs are permitted for a given IRQ source. It's a bitmask
|
which target CPUs are permitted for a given IRQ source. It's a bitmask
|
||||||
@@ -16,7 +19,7 @@ will be set to the default mask. It can then be changed as described above.
|
|||||||
Default mask is 0xffffffff.
|
Default mask is 0xffffffff.
|
||||||
|
|
||||||
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
Here is an example of restricting IRQ44 (eth1) to CPU0-3 then restricting
|
||||||
it to CPU4-7 (this is an 8-CPU SMP box):
|
it to CPU4-7 (this is an 8-CPU SMP box)::
|
||||||
|
|
||||||
[root@moon 44]# cd /proc/irq/44
|
[root@moon 44]# cd /proc/irq/44
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
@@ -39,6 +42,8 @@ As can be seen from the line above IRQ44 was delivered only to the first four
|
|||||||
processors (0-3).
|
processors (0-3).
|
||||||
Now lets restrict that IRQ to CPU(4-7).
|
Now lets restrict that IRQ to CPU(4-7).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
[root@moon 44]# echo f0 > smp_affinity
|
[root@moon 44]# echo f0 > smp_affinity
|
||||||
[root@moon 44]# cat smp_affinity
|
[root@moon 44]# cat smp_affinity
|
||||||
000000f0
|
000000f0
|
||||||
@@ -55,7 +60,7 @@ round-trip min/avg/max = 0.1/0.5/585.4 ms
|
|||||||
This time around IRQ44 was delivered only to the last four processors.
|
This time around IRQ44 was delivered only to the last four processors.
|
||||||
i.e counters for the CPU0-3 did not change.
|
i.e counters for the CPU0-3 did not change.
|
||||||
|
|
||||||
Here is an example of limiting that same irq (44) to cpus 1024 to 1031:
|
Here is an example of limiting that same irq (44) to cpus 1024 to 1031::
|
||||||
|
|
||||||
[root@moon 44]# echo 1024-1031 > smp_affinity_list
|
[root@moon 44]# echo 1024-1031 > smp_affinity_list
|
||||||
[root@moon 44]# cat smp_affinity_list
|
[root@moon 44]# cat smp_affinity_list
|
||||||
|
|||||||
@@ -1,4 +1,6 @@
|
|||||||
irq_domain interrupt number mapping library
|
===============================================
|
||||||
|
The irq_domain interrupt number mapping library
|
||||||
|
===============================================
|
||||||
|
|
||||||
The current design of the Linux kernel uses a single large number
|
The current design of the Linux kernel uses a single large number
|
||||||
space where each separate IRQ source is assigned a different number.
|
space where each separate IRQ source is assigned a different number.
|
||||||
@@ -36,7 +38,9 @@ irq_domain also implements translation from an abstract irq_fwspec
|
|||||||
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
|
structure to hwirq numbers (Device Tree and ACPI GSI so far), and can
|
||||||
be easily extended to support other IRQ topology data sources.
|
be easily extended to support other IRQ topology data sources.
|
||||||
|
|
||||||
=== irq_domain usage ===
|
irq_domain usage
|
||||||
|
================
|
||||||
|
|
||||||
An interrupt controller driver creates and registers an irq_domain by
|
An interrupt controller driver creates and registers an irq_domain by
|
||||||
calling one of the irq_domain_add_*() functions (each mapping method
|
calling one of the irq_domain_add_*() functions (each mapping method
|
||||||
has a different allocator function, more on that later). The function
|
has a different allocator function, more on that later). The function
|
||||||
@@ -62,13 +66,19 @@ If the driver has the Linux IRQ number or the irq_data pointer, and
|
|||||||
needs to know the associated hwirq number (such as in the irq_chip
|
needs to know the associated hwirq number (such as in the irq_chip
|
||||||
callbacks) then it can be directly obtained from irq_data->hwirq.
|
callbacks) then it can be directly obtained from irq_data->hwirq.
|
||||||
|
|
||||||
=== Types of irq_domain mappings ===
|
Types of irq_domain mappings
|
||||||
|
============================
|
||||||
|
|
||||||
There are several mechanisms available for reverse mapping from hwirq
|
There are several mechanisms available for reverse mapping from hwirq
|
||||||
to Linux irq, and each mechanism uses a different allocation function.
|
to Linux irq, and each mechanism uses a different allocation function.
|
||||||
Which reverse map type should be used depends on the use case. Each
|
Which reverse map type should be used depends on the use case. Each
|
||||||
of the reverse map types are described below:
|
of the reverse map types are described below:
|
||||||
|
|
||||||
==== Linear ====
|
Linear
|
||||||
|
------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
irq_domain_add_linear()
|
irq_domain_add_linear()
|
||||||
irq_domain_create_linear()
|
irq_domain_create_linear()
|
||||||
|
|
||||||
@@ -89,7 +99,11 @@ accepts a more general abstraction 'struct fwnode_handle'.
|
|||||||
|
|
||||||
The majority of drivers should use the linear map.
|
The majority of drivers should use the linear map.
|
||||||
|
|
||||||
==== Tree ====
|
Tree
|
||||||
|
----
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
irq_domain_add_tree()
|
irq_domain_add_tree()
|
||||||
irq_domain_create_tree()
|
irq_domain_create_tree()
|
||||||
|
|
||||||
@@ -109,7 +123,11 @@ accepts a more general abstraction 'struct fwnode_handle'.
|
|||||||
|
|
||||||
Very few drivers should need this mapping.
|
Very few drivers should need this mapping.
|
||||||
|
|
||||||
==== No Map ===-
|
No Map
|
||||||
|
------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
irq_domain_add_nomap()
|
irq_domain_add_nomap()
|
||||||
|
|
||||||
The No Map mapping is to be used when the hwirq number is
|
The No Map mapping is to be used when the hwirq number is
|
||||||
@@ -121,7 +139,11 @@ Linux IRQ number into the hardware.
|
|||||||
|
|
||||||
Most drivers cannot use this mapping.
|
Most drivers cannot use this mapping.
|
||||||
|
|
||||||
==== Legacy ====
|
Legacy
|
||||||
|
------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
irq_domain_add_simple()
|
irq_domain_add_simple()
|
||||||
irq_domain_add_legacy()
|
irq_domain_add_legacy()
|
||||||
irq_domain_add_legacy_isa()
|
irq_domain_add_legacy_isa()
|
||||||
@@ -163,14 +185,17 @@ that the driver using the simple domain call irq_create_mapping()
|
|||||||
before any irq_find_mapping() since the latter will actually work
|
before any irq_find_mapping() since the latter will actually work
|
||||||
for the static IRQ assignment case.
|
for the static IRQ assignment case.
|
||||||
|
|
||||||
==== Hierarchy IRQ domain ====
|
Hierarchy IRQ domain
|
||||||
|
--------------------
|
||||||
|
|
||||||
On some architectures, there may be multiple interrupt controllers
|
On some architectures, there may be multiple interrupt controllers
|
||||||
involved in delivering an interrupt from the device to the target CPU.
|
involved in delivering an interrupt from the device to the target CPU.
|
||||||
Let's look at a typical interrupt delivering path on x86 platforms:
|
Let's look at a typical interrupt delivering path on x86 platforms::
|
||||||
|
|
||||||
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
Device --> IOAPIC -> Interrupt remapping Controller -> Local APIC -> CPU
|
||||||
|
|
||||||
There are three interrupt controllers involved:
|
There are three interrupt controllers involved:
|
||||||
|
|
||||||
1) IOAPIC controller
|
1) IOAPIC controller
|
||||||
2) Interrupt remapping controller
|
2) Interrupt remapping controller
|
||||||
3) Local APIC controller
|
3) Local APIC controller
|
||||||
@@ -180,7 +205,8 @@ hardware architecture, an irq_domain data structure is built for each
|
|||||||
interrupt controller and those irq_domains are organized into hierarchy.
|
interrupt controller and those irq_domains are organized into hierarchy.
|
||||||
When building irq_domain hierarchy, the irq_domain near to the device is
|
When building irq_domain hierarchy, the irq_domain near to the device is
|
||||||
child and the irq_domain near to CPU is parent. So a hierarchy structure
|
child and the irq_domain near to CPU is parent. So a hierarchy structure
|
||||||
as below will be built for the example above.
|
as below will be built for the example above::
|
||||||
|
|
||||||
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
|
CPU Vector irq_domain (root irq_domain to manage CPU vectors)
|
||||||
^
|
^
|
||||||
|
|
|
|
||||||
@@ -190,6 +216,7 @@ as below will be built for the example above.
|
|||||||
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
|
IOAPIC irq_domain (manage IOAPIC delivery entries/pins)
|
||||||
|
|
||||||
There are four major interfaces to use hierarchy irq_domain:
|
There are four major interfaces to use hierarchy irq_domain:
|
||||||
|
|
||||||
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
|
1) irq_domain_alloc_irqs(): allocate IRQ descriptors and interrupt
|
||||||
controller related resources to deliver these interrupts.
|
controller related resources to deliver these interrupts.
|
||||||
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
|
2) irq_domain_free_irqs(): free IRQ descriptors and interrupt controller
|
||||||
@@ -199,7 +226,8 @@ There are four major interfaces to use hierarchy irq_domain:
|
|||||||
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
|
4) irq_domain_deactivate_irq(): deactivate interrupt controller hardware
|
||||||
to stop delivering the interrupt.
|
to stop delivering the interrupt.
|
||||||
|
|
||||||
Following changes are needed to support hierarchy irq_domain.
|
Following changes are needed to support hierarchy irq_domain:
|
||||||
|
|
||||||
1) a new field 'parent' is added to struct irq_domain; it's used to
|
1) a new field 'parent' is added to struct irq_domain; it's used to
|
||||||
maintain irq_domain hierarchy information.
|
maintain irq_domain hierarchy information.
|
||||||
2) a new field 'parent_data' is added to struct irq_data; it's used to
|
2) a new field 'parent_data' is added to struct irq_data; it's used to
|
||||||
@@ -223,6 +251,7 @@ software architecture.
|
|||||||
|
|
||||||
For an interrupt controller driver to support hierarchy irq_domain, it
|
For an interrupt controller driver to support hierarchy irq_domain, it
|
||||||
needs to:
|
needs to:
|
||||||
|
|
||||||
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
|
1) Implement irq_domain_ops.alloc and irq_domain_ops.free
|
||||||
2) Optionally implement irq_domain_ops.activate and
|
2) Optionally implement irq_domain_ops.activate and
|
||||||
irq_domain_ops.deactivate.
|
irq_domain_ops.deactivate.
|
||||||
|
|||||||
@@ -1,4 +1,6 @@
|
|||||||
|
===============
|
||||||
What is an IRQ?
|
What is an IRQ?
|
||||||
|
===============
|
||||||
|
|
||||||
An IRQ is an interrupt request from a device.
|
An IRQ is an interrupt request from a device.
|
||||||
Currently they can come in over a pin, or over a packet.
|
Currently they can come in over a pin, or over a packet.
|
||||||
|
|||||||
@@ -1,3 +1,4 @@
|
|||||||
|
===================
|
||||||
Linux IOMMU Support
|
Linux IOMMU Support
|
||||||
===================
|
===================
|
||||||
|
|
||||||
@@ -9,11 +10,11 @@ This guide gives a quick cheat sheet for some basic understanding.
|
|||||||
|
|
||||||
Some Keywords
|
Some Keywords
|
||||||
|
|
||||||
DMAR - DMA remapping
|
- DMAR - DMA remapping
|
||||||
DRHD - DMA Remapping Hardware Unit Definition
|
- DRHD - DMA Remapping Hardware Unit Definition
|
||||||
RMRR - Reserved memory Region Reporting Structure
|
- RMRR - Reserved memory Region Reporting Structure
|
||||||
ZLR - Zero length reads from PCI devices
|
- ZLR - Zero length reads from PCI devices
|
||||||
IOVA - IO Virtual address.
|
- IOVA - IO Virtual address.
|
||||||
|
|
||||||
Basic stuff
|
Basic stuff
|
||||||
-----------
|
-----------
|
||||||
@@ -33,7 +34,7 @@ devices that need to access these regions. OS is expected to setup
|
|||||||
unity mappings for these regions for these devices to access these regions.
|
unity mappings for these regions for these devices to access these regions.
|
||||||
|
|
||||||
How is IOVA generated?
|
How is IOVA generated?
|
||||||
---------------------
|
----------------------
|
||||||
|
|
||||||
Well behaved drivers call pci_map_*() calls before sending command to device
|
Well behaved drivers call pci_map_*() calls before sending command to device
|
||||||
that needs to perform DMA. Once DMA is completed and mapping is no longer
|
that needs to perform DMA. Once DMA is completed and mapping is no longer
|
||||||
@@ -82,7 +83,7 @@ in ACPI.
|
|||||||
ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
|
ACPI: DMAR (v001 A M I OEMDMAR 0x00000001 MSFT 0x00000097) @ 0x000000007f5b5ef0
|
||||||
|
|
||||||
When DMAR is being processed and initialized by ACPI, prints DMAR locations
|
When DMAR is being processed and initialized by ACPI, prints DMAR locations
|
||||||
and any RMRR's processed.
|
and any RMRR's processed::
|
||||||
|
|
||||||
ACPI DMAR:Host address width 36
|
ACPI DMAR:Host address width 36
|
||||||
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
|
ACPI DMAR:DRHD (flags: 0x00000000)base: 0x00000000fed90000
|
||||||
@@ -98,6 +99,8 @@ PCI-DMA: Using DMAR IOMMU
|
|||||||
Fault reporting
|
Fault reporting
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||||
DMAR:[fault reason 05] PTE Write access is not set
|
DMAR:[fault reason 05] PTE Write access is not set
|
||||||
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000
|
||||||
|
|||||||
@@ -1,5 +1,9 @@
|
|||||||
Linux 2.4.2 Secure Attention Key (SAK) handling
|
=========================================
|
||||||
18 March 2001, Andrew Morton
|
Linux Secure Attention Key (SAK) handling
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
:Date: 18 March 2001
|
||||||
|
:Author: Andrew Morton
|
||||||
|
|
||||||
An operating system's Secure Attention Key is a security tool which is
|
An operating system's Secure Attention Key is a security tool which is
|
||||||
provided as protection against trojan password capturing programs. It
|
provided as protection against trojan password capturing programs. It
|
||||||
@@ -13,7 +17,7 @@ this sequence. It is only available if the kernel was compiled with
|
|||||||
sysrq support.
|
sysrq support.
|
||||||
|
|
||||||
The proper way of generating a SAK is to define the key sequence using
|
The proper way of generating a SAK is to define the key sequence using
|
||||||
`loadkeys'. This will work whether or not sysrq support is compiled
|
``loadkeys``. This will work whether or not sysrq support is compiled
|
||||||
into the kernel.
|
into the kernel.
|
||||||
|
|
||||||
SAK works correctly when the keyboard is in raw mode. This means that
|
SAK works correctly when the keyboard is in raw mode. This means that
|
||||||
@@ -25,22 +29,21 @@ What key sequence should you use? Well, CTRL-ALT-DEL is used to reboot
|
|||||||
the machine. CTRL-ALT-BACKSPACE is magical to the X server. We'll
|
the machine. CTRL-ALT-BACKSPACE is magical to the X server. We'll
|
||||||
choose CTRL-ALT-PAUSE.
|
choose CTRL-ALT-PAUSE.
|
||||||
|
|
||||||
In your rc.sysinit (or rc.local) file, add the command
|
In your rc.sysinit (or rc.local) file, add the command::
|
||||||
|
|
||||||
echo "control alt keycode 101 = SAK" | /bin/loadkeys
|
echo "control alt keycode 101 = SAK" | /bin/loadkeys
|
||||||
|
|
||||||
And that's it! Only the superuser may reprogram the SAK key.
|
And that's it! Only the superuser may reprogram the SAK key.
|
||||||
|
|
||||||
|
|
||||||
NOTES
|
.. note::
|
||||||
=====
|
|
||||||
|
|
||||||
1: Linux SAK is said to be not a "true SAK" as is required by
|
1. Linux SAK is said to be not a "true SAK" as is required by
|
||||||
systems which implement C2 level security. This author does not
|
systems which implement C2 level security. This author does not
|
||||||
know why.
|
know why.
|
||||||
|
|
||||||
|
|
||||||
2: On the PC keyboard, SAK kills all applications which have
|
2. On the PC keyboard, SAK kills all applications which have
|
||||||
/dev/console opened.
|
/dev/console opened.
|
||||||
|
|
||||||
Unfortunately this includes a number of things which you don't
|
Unfortunately this includes a number of things which you don't
|
||||||
@@ -49,38 +52,38 @@ NOTES
|
|||||||
Linux distributor about this!
|
Linux distributor about this!
|
||||||
|
|
||||||
You can identify processes which will be killed by SAK with the
|
You can identify processes which will be killed by SAK with the
|
||||||
command
|
command::
|
||||||
|
|
||||||
# ls -l /proc/[0-9]*/fd/* | grep console
|
# ls -l /proc/[0-9]*/fd/* | grep console
|
||||||
l-wx------ 1 root root 64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
|
l-wx------ 1 root root 64 Mar 18 00:46 /proc/579/fd/0 -> /dev/console
|
||||||
|
|
||||||
Then:
|
Then::
|
||||||
|
|
||||||
# ps aux|grep 579
|
# ps aux|grep 579
|
||||||
root 579 0.0 0.1 1088 436 ? S 00:43 0:00 gpm -t ps/2
|
root 579 0.0 0.1 1088 436 ? S 00:43 0:00 gpm -t ps/2
|
||||||
|
|
||||||
So `gpm' will be killed by SAK. This is a bug in gpm. It should
|
So ``gpm`` will be killed by SAK. This is a bug in gpm. It should
|
||||||
be closing standard input. You can work around this by finding the
|
be closing standard input. You can work around this by finding the
|
||||||
initscript which launches gpm and changing it thusly:
|
initscript which launches gpm and changing it thusly:
|
||||||
|
|
||||||
Old:
|
Old::
|
||||||
|
|
||||||
daemon gpm
|
daemon gpm
|
||||||
|
|
||||||
New:
|
New::
|
||||||
|
|
||||||
daemon gpm < /dev/null
|
daemon gpm < /dev/null
|
||||||
|
|
||||||
Vixie cron also seems to have this problem, and needs the same treatment.
|
Vixie cron also seems to have this problem, and needs the same treatment.
|
||||||
|
|
||||||
Also, one prominent Linux distribution has the following three
|
Also, one prominent Linux distribution has the following three
|
||||||
lines in its rc.sysinit and rc scripts:
|
lines in its rc.sysinit and rc scripts::
|
||||||
|
|
||||||
exec 3<&0
|
exec 3<&0
|
||||||
exec 4>&1
|
exec 4>&1
|
||||||
exec 5>&2
|
exec 5>&2
|
||||||
|
|
||||||
These commands cause *all* daemons which are launched by the
|
These commands cause **all** daemons which are launched by the
|
||||||
initscripts to have file descriptors 3, 4 and 5 attached to
|
initscripts to have file descriptors 3, 4 and 5 attached to
|
||||||
/dev/console. So SAK kills them all. A workaround is to simply
|
/dev/console. So SAK kills them all. A workaround is to simply
|
||||||
delete these lines, but this may cause system management
|
delete these lines, but this may cause system management
|
||||||
|
|||||||
@@ -1,7 +1,10 @@
|
|||||||
|
.. include:: <isonum.txt>
|
||||||
|
|
||||||
|
============
|
||||||
SM501 Driver
|
SM501 Driver
|
||||||
============
|
============
|
||||||
|
|
||||||
Copyright 2006, 2007 Simtec Electronics
|
:Copyright: |copy| 2006, 2007 Simtec Electronics
|
||||||
|
|
||||||
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
The Silicon Motion SM501 multimedia companion chip is a multifunction device
|
||||||
which may provide numerous interfaces including USB host controller USB gadget,
|
which may provide numerous interfaces including USB host controller USB gadget,
|
||||||
|
|||||||
@@ -1,10 +1,15 @@
|
|||||||
|
============================
|
||||||
|
A block layer cache (bcache)
|
||||||
|
============================
|
||||||
|
|
||||||
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
|
Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be
|
||||||
nice if you could use them as cache... Hence bcache.
|
nice if you could use them as cache... Hence bcache.
|
||||||
|
|
||||||
Wiki and git repositories are at:
|
Wiki and git repositories are at:
|
||||||
http://bcache.evilpiepirate.org
|
|
||||||
http://evilpiepirate.org/git/linux-bcache.git
|
- http://bcache.evilpiepirate.org
|
||||||
http://evilpiepirate.org/git/bcache-tools.git
|
- http://evilpiepirate.org/git/linux-bcache.git
|
||||||
|
- http://evilpiepirate.org/git/bcache-tools.git
|
||||||
|
|
||||||
It's designed around the performance characteristics of SSDs - it only allocates
|
It's designed around the performance characteristics of SSDs - it only allocates
|
||||||
in erase block sized buckets, and it uses a hybrid btree/log to track cached
|
in erase block sized buckets, and it uses a hybrid btree/log to track cached
|
||||||
@@ -37,17 +42,19 @@ to be flushed.
|
|||||||
|
|
||||||
Getting started:
|
Getting started:
|
||||||
You'll need make-bcache from the bcache-tools repository. Both the cache device
|
You'll need make-bcache from the bcache-tools repository. Both the cache device
|
||||||
and backing device must be formatted before use.
|
and backing device must be formatted before use::
|
||||||
|
|
||||||
make-bcache -B /dev/sdb
|
make-bcache -B /dev/sdb
|
||||||
make-bcache -C /dev/sdc
|
make-bcache -C /dev/sdc
|
||||||
|
|
||||||
make-bcache has the ability to format multiple devices at the same time - if
|
make-bcache has the ability to format multiple devices at the same time - if
|
||||||
you format your backing devices and cache device at the same time, you won't
|
you format your backing devices and cache device at the same time, you won't
|
||||||
have to manually attach:
|
have to manually attach::
|
||||||
|
|
||||||
make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
|
make-bcache -B /dev/sda /dev/sdb -C /dev/sdc
|
||||||
|
|
||||||
bcache-tools now ships udev rules, and bcache devices are known to the kernel
|
bcache-tools now ships udev rules, and bcache devices are known to the kernel
|
||||||
immediately. Without udev, you can manually register devices like this:
|
immediately. Without udev, you can manually register devices like this::
|
||||||
|
|
||||||
echo /dev/sdb > /sys/fs/bcache/register
|
echo /dev/sdb > /sys/fs/bcache/register
|
||||||
echo /dev/sdc > /sys/fs/bcache/register
|
echo /dev/sdc > /sys/fs/bcache/register
|
||||||
@@ -60,16 +67,16 @@ slow devices as bcache backing devices without a cache, and you can choose to ad
|
|||||||
a caching device later.
|
a caching device later.
|
||||||
See 'ATTACHING' section below.
|
See 'ATTACHING' section below.
|
||||||
|
|
||||||
The devices show up as:
|
The devices show up as::
|
||||||
|
|
||||||
/dev/bcache<N>
|
/dev/bcache<N>
|
||||||
|
|
||||||
As well as (with udev):
|
As well as (with udev)::
|
||||||
|
|
||||||
/dev/bcache/by-uuid/<uuid>
|
/dev/bcache/by-uuid/<uuid>
|
||||||
/dev/bcache/by-label/<label>
|
/dev/bcache/by-label/<label>
|
||||||
|
|
||||||
To get started:
|
To get started::
|
||||||
|
|
||||||
mkfs.ext4 /dev/bcache0
|
mkfs.ext4 /dev/bcache0
|
||||||
mount /dev/bcache0 /mnt
|
mount /dev/bcache0 /mnt
|
||||||
@@ -81,13 +88,13 @@ Cache devices are managed as sets; multiple caches per set isn't supported yet
|
|||||||
but will allow for mirroring of metadata and dirty data in the future. Your new
|
but will allow for mirroring of metadata and dirty data in the future. Your new
|
||||||
cache set shows up as /sys/fs/bcache/<UUID>
|
cache set shows up as /sys/fs/bcache/<UUID>
|
||||||
|
|
||||||
ATTACHING
|
Attaching
|
||||||
---------
|
---------
|
||||||
|
|
||||||
After your cache device and backing device are registered, the backing device
|
After your cache device and backing device are registered, the backing device
|
||||||
must be attached to your cache set to enable caching. Attaching a backing
|
must be attached to your cache set to enable caching. Attaching a backing
|
||||||
device to a cache set is done thusly, with the UUID of the cache set in
|
device to a cache set is done thusly, with the UUID of the cache set in
|
||||||
/sys/fs/bcache:
|
/sys/fs/bcache::
|
||||||
|
|
||||||
echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
|
echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
|
||||||
|
|
||||||
@@ -97,7 +104,7 @@ your bcache devices. If a backing device has data in a cache somewhere, the
|
|||||||
important if you have writeback caching turned on.
|
important if you have writeback caching turned on.
|
||||||
|
|
||||||
If you're booting up and your cache device is gone and never coming back, you
|
If you're booting up and your cache device is gone and never coming back, you
|
||||||
can force run the backing device:
|
can force run the backing device::
|
||||||
|
|
||||||
echo 1 > /sys/block/sdb/bcache/running
|
echo 1 > /sys/block/sdb/bcache/running
|
||||||
|
|
||||||
@@ -110,7 +117,7 @@ but all the cached data will be invalidated. If there was dirty data in the
|
|||||||
cache, don't expect the filesystem to be recoverable - you will have massive
|
cache, don't expect the filesystem to be recoverable - you will have massive
|
||||||
filesystem corruption, though ext4's fsck does work miracles.
|
filesystem corruption, though ext4's fsck does work miracles.
|
||||||
|
|
||||||
ERROR HANDLING
|
Error Handling
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
Bcache tries to transparently handle IO errors to/from the cache device without
|
Bcache tries to transparently handle IO errors to/from the cache device without
|
||||||
@@ -134,25 +141,27 @@ the backing devices to passthrough mode.
|
|||||||
read some of the dirty data, though.
|
read some of the dirty data, though.
|
||||||
|
|
||||||
|
|
||||||
HOWTO/COOKBOOK
|
Howto/cookbook
|
||||||
--------------
|
--------------
|
||||||
|
|
||||||
A) Starting a bcache with a missing caching device
|
A) Starting a bcache with a missing caching device
|
||||||
|
|
||||||
If registering the backing device doesn't help, it's already there, you just need
|
If registering the backing device doesn't help, it's already there, you just need
|
||||||
to force it to run without the cache:
|
to force it to run without the cache::
|
||||||
|
|
||||||
host:~# echo /dev/sdb1 > /sys/fs/bcache/register
|
host:~# echo /dev/sdb1 > /sys/fs/bcache/register
|
||||||
[ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
|
[ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered
|
||||||
|
|
||||||
Next, you try to register your caching device if it's present. However
|
Next, you try to register your caching device if it's present. However
|
||||||
if it's absent, or registration fails for some reason, you can still
|
if it's absent, or registration fails for some reason, you can still
|
||||||
start your bcache without its cache, like so:
|
start your bcache without its cache, like so::
|
||||||
|
|
||||||
host:/sys/block/sdb/sdb1/bcache# echo 1 > running
|
host:/sys/block/sdb/sdb1/bcache# echo 1 > running
|
||||||
|
|
||||||
Note that this may cause data loss if you were running in writeback mode.
|
Note that this may cause data loss if you were running in writeback mode.
|
||||||
|
|
||||||
|
|
||||||
B) Bcache does not find its cache
|
B) Bcache does not find its cache::
|
||||||
|
|
||||||
host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
|
host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach
|
||||||
[ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
|
[ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set
|
||||||
@@ -160,7 +169,8 @@ B) Bcache does not find its cache
|
|||||||
[ 1933.478179] : cache set not found
|
[ 1933.478179] : cache set not found
|
||||||
|
|
||||||
In this case, the caching device was simply not registered at boot
|
In this case, the caching device was simply not registered at boot
|
||||||
or disappeared and came back, and needs to be (re-)registered:
|
or disappeared and came back, and needs to be (re-)registered::
|
||||||
|
|
||||||
host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
|
host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register
|
||||||
|
|
||||||
|
|
||||||
@@ -180,7 +190,8 @@ device is still available at an 8KiB offset. So either via a loopdev
|
|||||||
of the backing device created with --offset 8K, or any value defined by
|
of the backing device created with --offset 8K, or any value defined by
|
||||||
--data-offset when you originally formatted bcache with `make-bcache`.
|
--data-offset when you originally formatted bcache with `make-bcache`.
|
||||||
|
|
||||||
For example:
|
For example::
|
||||||
|
|
||||||
losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
|
losetup -o 8192 /dev/loop0 /dev/your_bcache_backing_dev
|
||||||
|
|
||||||
This should present your unmodified backing device data in /dev/loop0
|
This should present your unmodified backing device data in /dev/loop0
|
||||||
@@ -191,11 +202,14 @@ cache device without loosing data.
|
|||||||
|
|
||||||
E) Wiping a cache device
|
E) Wiping a cache device
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
host:~# wipefs -a /dev/sdh2
|
host:~# wipefs -a /dev/sdh2
|
||||||
16 bytes were erased at offset 0x1018 (bcache)
|
16 bytes were erased at offset 0x1018 (bcache)
|
||||||
they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
||||||
|
|
||||||
After you boot back with bcache enabled, you recreate the cache and attach it:
|
After you boot back with bcache enabled, you recreate the cache and attach it::
|
||||||
|
|
||||||
host:~# make-bcache -C /dev/sdh2
|
host:~# make-bcache -C /dev/sdh2
|
||||||
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
|
UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045
|
||||||
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
|
Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1
|
||||||
@@ -209,15 +223,17 @@ first_bucket: 1
|
|||||||
[ 650.511912] bcache: run_cache_set() invalidating existing data
|
[ 650.511912] bcache: run_cache_set() invalidating existing data
|
||||||
[ 650.549228] bcache: register_cache() registered cache device sdh2
|
[ 650.549228] bcache: register_cache() registered cache device sdh2
|
||||||
|
|
||||||
start backing device with missing cache:
|
start backing device with missing cache::
|
||||||
|
|
||||||
host:/sys/block/md5/bcache# echo 1 > running
|
host:/sys/block/md5/bcache# echo 1 > running
|
||||||
|
|
||||||
attach new cache:
|
attach new cache::
|
||||||
|
|
||||||
host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
|
host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach
|
||||||
[ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
|
[ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1
|
||||||
|
|
||||||
|
|
||||||
F) Remove or replace a caching device
|
F) Remove or replace a caching device::
|
||||||
|
|
||||||
host:/sys/block/sda/sda7/bcache# echo 1 > detach
|
host:/sys/block/sda/sda7/bcache# echo 1 > detach
|
||||||
[ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
|
[ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7
|
||||||
@@ -226,13 +242,15 @@ F) Remove or replace a caching device
|
|||||||
wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
|
wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy
|
||||||
Ooops, it's disabled, but not unregistered, so it's still protected
|
Ooops, it's disabled, but not unregistered, so it's still protected
|
||||||
|
|
||||||
We need to go and unregister it:
|
We need to go and unregister it::
|
||||||
|
|
||||||
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
|
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0
|
||||||
lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
|
lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/
|
||||||
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
|
host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop
|
||||||
kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
|
kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered
|
||||||
|
|
||||||
Now we can wipe it:
|
Now we can wipe it::
|
||||||
|
|
||||||
host:~# wipefs -a /dev/nvme0n1p4
|
host:~# wipefs -a /dev/nvme0n1p4
|
||||||
/dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
/dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81
|
||||||
|
|
||||||
@@ -252,15 +270,18 @@ if there are any active backing or caching devices left on it:
|
|||||||
|
|
||||||
1) Is it present in /dev/bcache* ? (there are times where it won't be)
|
1) Is it present in /dev/bcache* ? (there are times where it won't be)
|
||||||
|
|
||||||
If so, it's easy:
|
If so, it's easy::
|
||||||
|
|
||||||
host:/sys/block/bcache0/bcache# echo 1 > stop
|
host:/sys/block/bcache0/bcache# echo 1 > stop
|
||||||
|
|
||||||
2) But if your backing device is gone, this won't work:
|
2) But if your backing device is gone, this won't work::
|
||||||
|
|
||||||
host:/sys/block/bcache0# cd bcache
|
host:/sys/block/bcache0# cd bcache
|
||||||
bash: cd: bcache: No such file or directory
|
bash: cd: bcache: No such file or directory
|
||||||
|
|
||||||
In this case, you may have to unregister the dmcrypt block device that
|
In this case, you may have to unregister the dmcrypt block device that
|
||||||
references this bcache to free it up:
|
references this bcache to free it up::
|
||||||
|
|
||||||
host:~# dmsetup remove oldds1
|
host:~# dmsetup remove oldds1
|
||||||
bcache: bcache_device_free() bcache0 stopped
|
bcache: bcache_device_free() bcache0 stopped
|
||||||
bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
|
bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered
|
||||||
@@ -269,7 +290,7 @@ This causes the backing bcache to be removed from /sys/fs/bcache and
|
|||||||
then it can be reused. This would be true of any block device stacking
|
then it can be reused. This would be true of any block device stacking
|
||||||
where bcache is a lower device.
|
where bcache is a lower device.
|
||||||
|
|
||||||
3) In other cases, you can also look in /sys/fs/bcache/:
|
3) In other cases, you can also look in /sys/fs/bcache/::
|
||||||
|
|
||||||
host:/sys/fs/bcache# ls -l */{cache?,bdev?}
|
host:/sys/fs/bcache# ls -l */{cache?,bdev?}
|
||||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
|
lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/
|
||||||
@@ -277,7 +298,8 @@ lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache
|
|||||||
lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
|
lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/
|
||||||
|
|
||||||
The device names will show which UUID is relevant, cd in that directory
|
The device names will show which UUID is relevant, cd in that directory
|
||||||
and stop the cache:
|
and stop the cache::
|
||||||
|
|
||||||
host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
|
host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop
|
||||||
|
|
||||||
This will free up bcache references and let you reuse the partition for
|
This will free up bcache references and let you reuse the partition for
|
||||||
@@ -285,7 +307,7 @@ other purposes.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
TROUBLESHOOTING PERFORMANCE
|
Troubleshooting performance
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
Bcache has a bunch of config options and tunables. The defaults are intended to
|
Bcache has a bunch of config options and tunables. The defaults are intended to
|
||||||
@@ -301,11 +323,13 @@ want for getting the best possible numbers when benchmarking.
|
|||||||
raid stripe size to get the disk multiples that you would like.
|
raid stripe size to get the disk multiples that you would like.
|
||||||
|
|
||||||
For example: If you have a 64k stripe size, then the following offset
|
For example: If you have a 64k stripe size, then the following offset
|
||||||
would provide alignment for many common RAID5 data spindle counts:
|
would provide alignment for many common RAID5 data spindle counts::
|
||||||
|
|
||||||
64k * 2*2*2*3*3*5*7 bytes = 161280k
|
64k * 2*2*2*3*3*5*7 bytes = 161280k
|
||||||
|
|
||||||
That space is wasted, but for only 157.5MB you can grow your RAID 5
|
That space is wasted, but for only 157.5MB you can grow your RAID 5
|
||||||
volume to the following data-spindle counts without re-aligning:
|
volume to the following data-spindle counts without re-aligning::
|
||||||
|
|
||||||
3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
|
3,4,5,6,7,8,9,10,12,14,15,18,20,21 ...
|
||||||
|
|
||||||
- Bad write performance
|
- Bad write performance
|
||||||
@@ -313,7 +337,7 @@ want for getting the best possible numbers when benchmarking.
|
|||||||
If write performance is not what you expected, you probably wanted to be
|
If write performance is not what you expected, you probably wanted to be
|
||||||
running in writeback mode, which isn't the default (not due to a lack of
|
running in writeback mode, which isn't the default (not due to a lack of
|
||||||
maturity, but simply because in writeback mode you'll lose data if something
|
maturity, but simply because in writeback mode you'll lose data if something
|
||||||
happens to your SSD)
|
happens to your SSD)::
|
||||||
|
|
||||||
# echo writeback > /sys/block/bcache0/bcache/cache_mode
|
# echo writeback > /sys/block/bcache0/bcache/cache_mode
|
||||||
|
|
||||||
@@ -325,11 +349,11 @@ want for getting the best possible numbers when benchmarking.
|
|||||||
accessed data out of your cache.
|
accessed data out of your cache.
|
||||||
|
|
||||||
But if you want to benchmark reads from cache, and you start out with fio
|
But if you want to benchmark reads from cache, and you start out with fio
|
||||||
writing an 8 gigabyte test file - so you want to disable that.
|
writing an 8 gigabyte test file - so you want to disable that::
|
||||||
|
|
||||||
# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
|
# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
|
||||||
|
|
||||||
To set it back to the default (4 mb), do
|
To set it back to the default (4 mb), do::
|
||||||
|
|
||||||
# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
|
# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
|
||||||
|
|
||||||
@@ -344,7 +368,7 @@ want for getting the best possible numbers when benchmarking.
|
|||||||
throttles traffic if the latency exceeds a threshold (it does this by
|
throttles traffic if the latency exceeds a threshold (it does this by
|
||||||
cranking down the sequential bypass).
|
cranking down the sequential bypass).
|
||||||
|
|
||||||
You can disable this if you need to by setting the thresholds to 0:
|
You can disable this if you need to by setting the thresholds to 0::
|
||||||
|
|
||||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
|
# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
|
||||||
# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
|
# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
|
||||||
@@ -369,7 +393,7 @@ want for getting the best possible numbers when benchmarking.
|
|||||||
a fix for the issue there).
|
a fix for the issue there).
|
||||||
|
|
||||||
|
|
||||||
SYSFS - BACKING DEVICE
|
Sysfs - backing device
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
|
Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and
|
||||||
@@ -454,7 +478,8 @@ writeback_running
|
|||||||
still be added to the cache until it is mostly full; only meant for
|
still be added to the cache until it is mostly full; only meant for
|
||||||
benchmarking. Defaults to on.
|
benchmarking. Defaults to on.
|
||||||
|
|
||||||
SYSFS - BACKING DEVICE STATS:
|
Sysfs - backing device stats
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
There are directories with these numbers for a running total, as well as
|
There are directories with these numbers for a running total, as well as
|
||||||
versions that decay over the past day, hour and 5 minutes; they're also
|
versions that decay over the past day, hour and 5 minutes; they're also
|
||||||
@@ -463,14 +488,11 @@ aggregated in the cache set directory as well.
|
|||||||
bypassed
|
bypassed
|
||||||
Amount of IO (both reads and writes) that has bypassed the cache
|
Amount of IO (both reads and writes) that has bypassed the cache
|
||||||
|
|
||||||
cache_hits
|
cache_hits, cache_misses, cache_hit_ratio
|
||||||
cache_misses
|
|
||||||
cache_hit_ratio
|
|
||||||
Hits and misses are counted per individual IO as bcache sees them; a
|
Hits and misses are counted per individual IO as bcache sees them; a
|
||||||
partial hit is counted as a miss.
|
partial hit is counted as a miss.
|
||||||
|
|
||||||
cache_bypass_hits
|
cache_bypass_hits, cache_bypass_misses
|
||||||
cache_bypass_misses
|
|
||||||
Hits and misses for IO that is intended to skip the cache are still counted,
|
Hits and misses for IO that is intended to skip the cache are still counted,
|
||||||
but broken out here.
|
but broken out here.
|
||||||
|
|
||||||
@@ -482,7 +504,8 @@ cache_miss_collisions
|
|||||||
cache_readaheads
|
cache_readaheads
|
||||||
Count of times readahead occurred.
|
Count of times readahead occurred.
|
||||||
|
|
||||||
SYSFS - CACHE SET:
|
Sysfs - cache set
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Available at /sys/fs/bcache/<cset-uuid>
|
Available at /sys/fs/bcache/<cset-uuid>
|
||||||
|
|
||||||
@@ -520,8 +543,7 @@ flash_vol_create
|
|||||||
Echoing a size to this file (in human readable units, k/M/G) creates a thinly
|
Echoing a size to this file (in human readable units, k/M/G) creates a thinly
|
||||||
provisioned volume backed by the cache set.
|
provisioned volume backed by the cache set.
|
||||||
|
|
||||||
io_error_halflife
|
io_error_halflife, io_error_limit
|
||||||
io_error_limit
|
|
||||||
These determines how many errors we accept before disabling the cache.
|
These determines how many errors we accept before disabling the cache.
|
||||||
Each error is decayed by the half life (in # ios). If the decaying count
|
Each error is decayed by the half life (in # ios). If the decaying count
|
||||||
reaches io_error_limit dirty data is written out and the cache is disabled.
|
reaches io_error_limit dirty data is written out and the cache is disabled.
|
||||||
@@ -545,7 +567,8 @@ unregister
|
|||||||
Detaches all backing devices and closes the cache devices; if dirty data is
|
Detaches all backing devices and closes the cache devices; if dirty data is
|
||||||
present it will disable writeback caching and wait for it to be flushed.
|
present it will disable writeback caching and wait for it to be flushed.
|
||||||
|
|
||||||
SYSFS - CACHE SET INTERNAL:
|
Sysfs - cache set internal
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
This directory also exposes timings for a number of internal operations, with
|
This directory also exposes timings for a number of internal operations, with
|
||||||
separate files for average duration, average frequency, last occurrence and max
|
separate files for average duration, average frequency, last occurrence and max
|
||||||
@@ -574,7 +597,8 @@ cache_read_races
|
|||||||
trigger_gc
|
trigger_gc
|
||||||
Writing to this file forces garbage collection to run.
|
Writing to this file forces garbage collection to run.
|
||||||
|
|
||||||
SYSFS - CACHE DEVICE:
|
Sysfs - Cache device
|
||||||
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Available at /sys/block/<cdev>/bcache
|
Available at /sys/block/<cdev>/bcache
|
||||||
|
|
||||||
|
|||||||
@@ -1,12 +1,8 @@
|
|||||||
===============================================================
|
===================================================================
|
||||||
== BT8XXGPIO driver ==
|
A driver for a selfmade cheap BT8xx based PCI GPIO-card (bt8xxgpio)
|
||||||
== ==
|
===================================================================
|
||||||
== A driver for a selfmade cheap BT8xx based PCI GPIO-card ==
|
|
||||||
== ==
|
|
||||||
== For advanced documentation, see ==
|
|
||||||
== http://www.bu3sch.de/btgpio.php ==
|
|
||||||
===============================================================
|
|
||||||
|
|
||||||
|
For advanced documentation, see http://www.bu3sch.de/btgpio.php
|
||||||
|
|
||||||
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
A generic digital 24-port PCI GPIO card can be built out of an ordinary
|
||||||
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
Brooktree bt848, bt849, bt878 or bt879 based analog TV tuner card. The
|
||||||
@@ -17,9 +13,8 @@ The bt8xx chip does have 24 digital GPIO ports.
|
|||||||
These ports are accessible via 24 pins on the SMD chip package.
|
These ports are accessible via 24 pins on the SMD chip package.
|
||||||
|
|
||||||
|
|
||||||
==============================================
|
How to physically access the GPIO pins
|
||||||
== How to physically access the GPIO pins ==
|
======================================
|
||||||
==============================================
|
|
||||||
|
|
||||||
The are several ways to access these pins. One might unsolder the whole chip
|
The are several ways to access these pins. One might unsolder the whole chip
|
||||||
and put it on a custom PCI board, or one might only unsolder each individual
|
and put it on a custom PCI board, or one might only unsolder each individual
|
||||||
@@ -27,7 +22,7 @@ GPIO pin and solder that to some tiny wire. As the chip package really is tiny
|
|||||||
there are some advanced soldering skills needed in any case.
|
there are some advanced soldering skills needed in any case.
|
||||||
|
|
||||||
The physical pinouts are drawn in the following ASCII art.
|
The physical pinouts are drawn in the following ASCII art.
|
||||||
The GPIO pins are marked with G00-G23
|
The GPIO pins are marked with G00-G23::
|
||||||
|
|
||||||
G G G G G G G G G G G G G G G G G G
|
G G G G G G G G G G G G G G G G G G
|
||||||
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
|
||||||
|
|||||||
@@ -1,18 +1,16 @@
|
|||||||
=======================================================================
|
=============
|
||||||
README for btmrvl driver
|
btmrvl driver
|
||||||
=======================================================================
|
=============
|
||||||
|
|
||||||
|
|
||||||
All commands are used via debugfs interface.
|
All commands are used via debugfs interface.
|
||||||
|
|
||||||
=====================
|
Set/get driver configurations
|
||||||
Set/get driver configurations:
|
=============================
|
||||||
|
|
||||||
Path: /debug/btmrvl/config/
|
Path: /debug/btmrvl/config/
|
||||||
|
|
||||||
gpiogap=[n]
|
gpiogap=[n], hscfgcmd
|
||||||
hscfgcmd
|
These commands are used to configure the host sleep parameters::
|
||||||
These commands are used to configure the host sleep parameters.
|
|
||||||
bit 8:0 -- Gap
|
bit 8:0 -- Gap
|
||||||
bit 16:8 -- GPIO
|
bit 16:8 -- GPIO
|
||||||
|
|
||||||
@@ -23,7 +21,8 @@ hscfgcmd
|
|||||||
where Gap is the gap in milli seconds between wakeup signal and
|
where Gap is the gap in milli seconds between wakeup signal and
|
||||||
wakeup event, or 0xff for special host sleep setting.
|
wakeup event, or 0xff for special host sleep setting.
|
||||||
|
|
||||||
Usage:
|
Usage::
|
||||||
|
|
||||||
# Use SDIO interface to wake up the host and set GAP to 0x80:
|
# Use SDIO interface to wake up the host and set GAP to 0x80:
|
||||||
echo 0xff80 > /debug/btmrvl/config/gpiogap
|
echo 0xff80 > /debug/btmrvl/config/gpiogap
|
||||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||||
@@ -32,15 +31,16 @@ hscfgcmd
|
|||||||
echo 0x03ff > /debug/btmrvl/config/gpiogap
|
echo 0x03ff > /debug/btmrvl/config/gpiogap
|
||||||
echo 1 > /debug/btmrvl/config/hscfgcmd
|
echo 1 > /debug/btmrvl/config/hscfgcmd
|
||||||
|
|
||||||
psmode=[n]
|
psmode=[n], pscmd
|
||||||
pscmd
|
|
||||||
These commands are used to enable/disable auto sleep mode
|
These commands are used to enable/disable auto sleep mode
|
||||||
|
|
||||||
where the option is:
|
where the option is::
|
||||||
|
|
||||||
1 -- Enable auto sleep mode
|
1 -- Enable auto sleep mode
|
||||||
0 -- Disable auto sleep mode
|
0 -- Disable auto sleep mode
|
||||||
|
|
||||||
Usage:
|
Usage::
|
||||||
|
|
||||||
# Enable auto sleep mode
|
# Enable auto sleep mode
|
||||||
echo 1 > /debug/btmrvl/config/psmode
|
echo 1 > /debug/btmrvl/config/psmode
|
||||||
echo 1 > /debug/btmrvl/config/pscmd
|
echo 1 > /debug/btmrvl/config/pscmd
|
||||||
@@ -50,15 +50,16 @@ pscmd
|
|||||||
echo 1 > /debug/btmrvl/config/pscmd
|
echo 1 > /debug/btmrvl/config/pscmd
|
||||||
|
|
||||||
|
|
||||||
hsmode=[n]
|
hsmode=[n], hscmd
|
||||||
hscmd
|
|
||||||
These commands are used to enable host sleep or wake up firmware
|
These commands are used to enable host sleep or wake up firmware
|
||||||
|
|
||||||
where the option is:
|
where the option is::
|
||||||
|
|
||||||
1 -- Enable host sleep
|
1 -- Enable host sleep
|
||||||
0 -- Wake up firmware
|
0 -- Wake up firmware
|
||||||
|
|
||||||
Usage:
|
Usage::
|
||||||
|
|
||||||
# Enable host sleep
|
# Enable host sleep
|
||||||
echo 1 > /debug/btmrvl/config/hsmode
|
echo 1 > /debug/btmrvl/config/hsmode
|
||||||
echo 1 > /debug/btmrvl/config/hscmd
|
echo 1 > /debug/btmrvl/config/hscmd
|
||||||
@@ -68,12 +69,13 @@ hscmd
|
|||||||
echo 1 > /debug/btmrvl/config/hscmd
|
echo 1 > /debug/btmrvl/config/hscmd
|
||||||
|
|
||||||
|
|
||||||
======================
|
Get driver status
|
||||||
Get driver status:
|
=================
|
||||||
|
|
||||||
Path: /debug/btmrvl/status/
|
Path: /debug/btmrvl/status/
|
||||||
|
|
||||||
Usage:
|
Usage::
|
||||||
|
|
||||||
cat /debug/btmrvl/status/<args>
|
cat /debug/btmrvl/status/<args>
|
||||||
|
|
||||||
where the args are:
|
where the args are:
|
||||||
@@ -90,14 +92,17 @@ hsstate
|
|||||||
txdnldrdy
|
txdnldrdy
|
||||||
This command displays the value of Tx download ready flag.
|
This command displays the value of Tx download ready flag.
|
||||||
|
|
||||||
|
Issuing a raw hci command
|
||||||
=====================
|
=========================
|
||||||
|
|
||||||
Use hcitool to issue raw hci command, refer to hcitool manual
|
Use hcitool to issue raw hci command, refer to hcitool manual
|
||||||
|
|
||||||
Usage: Hcitool cmd <ogf> <ocf> [Parameters]
|
Usage::
|
||||||
|
|
||||||
|
Hcitool cmd <ogf> <ocf> [Parameters]
|
||||||
|
|
||||||
|
Interface Control Command::
|
||||||
|
|
||||||
Interface Control Command
|
|
||||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
|
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x00 --Enable All interface
|
||||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
|
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x01 --Enable Wlan interface
|
||||||
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
|
hcitool cmd 0x3f 0x5b 0xf5 0x01 0x02 --Enable BT interface
|
||||||
@@ -105,13 +110,13 @@ Use hcitool to issue raw hci command, refer to hcitool manual
|
|||||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
|
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x01 --Disable Wlan interface
|
||||||
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
|
hcitool cmd 0x3f 0x5b 0xf5 0x00 0x02 --Disable BT interface
|
||||||
|
|
||||||
=======================================================================
|
SD8688 firmware
|
||||||
|
===============
|
||||||
|
|
||||||
|
Images:
|
||||||
|
|
||||||
SD8688 firmware:
|
- /lib/firmware/sd8688_helper.bin
|
||||||
|
- /lib/firmware/sd8688.bin
|
||||||
/lib/firmware/sd8688_helper.bin
|
|
||||||
/lib/firmware/sd8688.bin
|
|
||||||
|
|
||||||
|
|
||||||
The images can be downloaded from:
|
The images can be downloaded from:
|
||||||
|
|||||||
@@ -1,8 +1,18 @@
|
|||||||
[ NOTE: The virt_to_bus() and bus_to_virt() functions have been
|
==========================================================
|
||||||
|
How to access I/O mapped memory from within device drivers
|
||||||
|
==========================================================
|
||||||
|
|
||||||
|
:Author: Linus
|
||||||
|
|
||||||
|
.. warning::
|
||||||
|
|
||||||
|
The virt_to_bus() and bus_to_virt() functions have been
|
||||||
superseded by the functionality provided by the PCI DMA interface
|
superseded by the functionality provided by the PCI DMA interface
|
||||||
(see Documentation/DMA-API-HOWTO.txt). They continue
|
(see Documentation/DMA-API-HOWTO.txt). They continue
|
||||||
to be documented below for historical purposes, but new code
|
to be documented below for historical purposes, but new code
|
||||||
must not use them. --davidm 00/12/12 ]
|
must not use them. --davidm 00/12/12
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
[ This is a mail message in response to a query on IO mapping, thus the
|
[ This is a mail message in response to a query on IO mapping, thus the
|
||||||
strange format for a "document" ]
|
strange format for a "document" ]
|
||||||
@@ -11,7 +21,7 @@ The AHA-1542 is a bus-master device, and your patch makes the driver give the
|
|||||||
controller the physical address of the buffers, which is correct on x86
|
controller the physical address of the buffers, which is correct on x86
|
||||||
(because all bus master devices see the physical memory mappings directly).
|
(because all bus master devices see the physical memory mappings directly).
|
||||||
|
|
||||||
However, on many setups, there are actually _three_ different ways of looking
|
However, on many setups, there are actually **three** different ways of looking
|
||||||
at memory addresses, and in this case we actually want the third, the
|
at memory addresses, and in this case we actually want the third, the
|
||||||
so-called "bus address".
|
so-called "bus address".
|
||||||
|
|
||||||
@@ -38,7 +48,7 @@ because the memory and the devices share the same address space, and that is
|
|||||||
not generally necessarily true on other PCI/ISA setups.
|
not generally necessarily true on other PCI/ISA setups.
|
||||||
|
|
||||||
Now, just as an example, on the PReP (PowerPC Reference Platform), the
|
Now, just as an example, on the PReP (PowerPC Reference Platform), the
|
||||||
CPU sees a memory map something like this (this is from memory):
|
CPU sees a memory map something like this (this is from memory)::
|
||||||
|
|
||||||
0-2 GB "real memory"
|
0-2 GB "real memory"
|
||||||
2 GB-3 GB "system IO" (inb/out and similar accesses on x86)
|
2 GB-3 GB "system IO" (inb/out and similar accesses on x86)
|
||||||
@@ -52,7 +62,7 @@ So when the CPU wants any bus master to write to physical memory 0, it
|
|||||||
has to give the master address 0x80000000 as the memory address.
|
has to give the master address 0x80000000 as the memory address.
|
||||||
|
|
||||||
So, for example, depending on how the kernel is actually mapped on the
|
So, for example, depending on how the kernel is actually mapped on the
|
||||||
PPC, you can end up with a setup like this:
|
PPC, you can end up with a setup like this::
|
||||||
|
|
||||||
physical address: 0
|
physical address: 0
|
||||||
virtual address: 0xC0000000
|
virtual address: 0xC0000000
|
||||||
@@ -61,7 +71,7 @@ PPC, you can end up with a setup like this:
|
|||||||
where all the addresses actually point to the same thing. It's just seen
|
where all the addresses actually point to the same thing. It's just seen
|
||||||
through different translations..
|
through different translations..
|
||||||
|
|
||||||
Similarly, on the Alpha, the normal translation is
|
Similarly, on the Alpha, the normal translation is::
|
||||||
|
|
||||||
physical address: 0
|
physical address: 0
|
||||||
virtual address: 0xfffffc0000000000
|
virtual address: 0xfffffc0000000000
|
||||||
@@ -70,7 +80,7 @@ Similarly, on the Alpha, the normal translation is
|
|||||||
(but there are also Alphas where the physical address and the bus address
|
(but there are also Alphas where the physical address and the bus address
|
||||||
are the same).
|
are the same).
|
||||||
|
|
||||||
Anyway, the way to look up all these translations, you do
|
Anyway, the way to look up all these translations, you do::
|
||||||
|
|
||||||
#include <asm/io.h>
|
#include <asm/io.h>
|
||||||
|
|
||||||
@@ -81,8 +91,8 @@ Anyway, the way to look up all these translations, you do
|
|||||||
|
|
||||||
Now, when do you need these?
|
Now, when do you need these?
|
||||||
|
|
||||||
You want the _virtual_ address when you are actually going to access that
|
You want the **virtual** address when you are actually going to access that
|
||||||
pointer from the kernel. So you can have something like this:
|
pointer from the kernel. So you can have something like this::
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* this is the hardware "mailbox" we use to communicate with
|
* this is the hardware "mailbox" we use to communicate with
|
||||||
@@ -104,7 +114,7 @@ pointer from the kernel. So you can have something like this:
|
|||||||
...
|
...
|
||||||
|
|
||||||
on the other hand, you want the bus address when you have a buffer that
|
on the other hand, you want the bus address when you have a buffer that
|
||||||
you want to give to the controller:
|
you want to give to the controller::
|
||||||
|
|
||||||
/* ask the controller to read the sense status into "sense_buffer" */
|
/* ask the controller to read the sense status into "sense_buffer" */
|
||||||
mbox.bufstart = virt_to_bus(&sense_buffer);
|
mbox.bufstart = virt_to_bus(&sense_buffer);
|
||||||
@@ -112,7 +122,7 @@ you want to give to the controller:
|
|||||||
mbox.status = 0;
|
mbox.status = 0;
|
||||||
notify_controller(&mbox);
|
notify_controller(&mbox);
|
||||||
|
|
||||||
And you generally _never_ want to use the physical address, because you can't
|
And you generally **never** want to use the physical address, because you can't
|
||||||
use that from the CPU (the CPU only uses translated virtual addresses), and
|
use that from the CPU (the CPU only uses translated virtual addresses), and
|
||||||
you can't use it from the bus master.
|
you can't use it from the bus master.
|
||||||
|
|
||||||
@@ -124,7 +134,9 @@ be remapped as measured in units of pages, a.k.a. the pfn (the memory
|
|||||||
management layer doesn't know about devices outside the CPU, so it
|
management layer doesn't know about devices outside the CPU, so it
|
||||||
shouldn't need to know about "bus addresses" etc).
|
shouldn't need to know about "bus addresses" etc).
|
||||||
|
|
||||||
NOTE NOTE NOTE! The above is only one part of the whole equation. The above
|
.. note::
|
||||||
|
|
||||||
|
The above is only one part of the whole equation. The above
|
||||||
only talks about "real memory", that is, CPU memory (RAM).
|
only talks about "real memory", that is, CPU memory (RAM).
|
||||||
|
|
||||||
There is a completely different type of memory too, and that's the "shared
|
There is a completely different type of memory too, and that's the "shared
|
||||||
@@ -137,20 +149,22 @@ whatever, and there is only one way to access it: the readb/writeb and
|
|||||||
related functions. You should never take the address of such memory, because
|
related functions. You should never take the address of such memory, because
|
||||||
there is really nothing you can do with such an address: it's not
|
there is really nothing you can do with such an address: it's not
|
||||||
conceptually in the same memory space as "real memory" at all, so you cannot
|
conceptually in the same memory space as "real memory" at all, so you cannot
|
||||||
just dereference a pointer. (Sadly, on x86 it _is_ in the same memory space,
|
just dereference a pointer. (Sadly, on x86 it **is** in the same memory space,
|
||||||
so on x86 it actually works to just deference a pointer, but it's not
|
so on x86 it actually works to just deference a pointer, but it's not
|
||||||
portable).
|
portable).
|
||||||
|
|
||||||
For such memory, you can do things like
|
For such memory, you can do things like:
|
||||||
|
|
||||||
|
- reading::
|
||||||
|
|
||||||
- reading:
|
|
||||||
/*
|
/*
|
||||||
* read first 32 bits from ISA memory at 0xC0000, aka
|
* read first 32 bits from ISA memory at 0xC0000, aka
|
||||||
* C000:0000 in DOS terms
|
* C000:0000 in DOS terms
|
||||||
*/
|
*/
|
||||||
unsigned int signature = isa_readl(0xC0000);
|
unsigned int signature = isa_readl(0xC0000);
|
||||||
|
|
||||||
- remapping and writing:
|
- remapping and writing::
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* remap framebuffer PCI memory area at 0xFC000000,
|
* remap framebuffer PCI memory area at 0xFC000000,
|
||||||
* size 1MB, so that we can access it: We can directly
|
* size 1MB, so that we can access it: We can directly
|
||||||
@@ -165,7 +179,8 @@ For such memory, you can do things like
|
|||||||
/* unmap when we unload the driver */
|
/* unmap when we unload the driver */
|
||||||
iounmap(baseptr);
|
iounmap(baseptr);
|
||||||
|
|
||||||
- copying and clearing:
|
- copying and clearing::
|
||||||
|
|
||||||
/* get the 6-byte Ethernet address at ISA address E000:0040 */
|
/* get the 6-byte Ethernet address at ISA address E000:0040 */
|
||||||
memcpy_fromio(kernel_buffer, 0xE0040, 6);
|
memcpy_fromio(kernel_buffer, 0xE0040, 6);
|
||||||
/* write a packet to the driver */
|
/* write a packet to the driver */
|
||||||
@@ -181,7 +196,7 @@ happy that your driver works ;)
|
|||||||
Note that kernel versions 2.0.x (and earlier) mistakenly called the
|
Note that kernel versions 2.0.x (and earlier) mistakenly called the
|
||||||
ioremap() function "vremap()". ioremap() is the proper name, but I
|
ioremap() function "vremap()". ioremap() is the proper name, but I
|
||||||
didn't think straight when I wrote it originally. People who have to
|
didn't think straight when I wrote it originally. People who have to
|
||||||
support both can do something like:
|
support both can do something like::
|
||||||
|
|
||||||
/* support old naming silliness */
|
/* support old naming silliness */
|
||||||
#if LINUX_VERSION_CODE < 0x020100
|
#if LINUX_VERSION_CODE < 0x020100
|
||||||
@@ -196,13 +211,10 @@ And the above sounds worse than it really is. Most real drivers really
|
|||||||
don't do all that complex things (or rather: the complexity is not so
|
don't do all that complex things (or rather: the complexity is not so
|
||||||
much in the actual IO accesses as in error handling and timeouts etc).
|
much in the actual IO accesses as in error handling and timeouts etc).
|
||||||
It's generally not hard to fix drivers, and in many cases the code
|
It's generally not hard to fix drivers, and in many cases the code
|
||||||
actually looks better afterwards:
|
actually looks better afterwards::
|
||||||
|
|
||||||
unsigned long signature = *(unsigned int *) 0xC0000;
|
unsigned long signature = *(unsigned int *) 0xC0000;
|
||||||
vs
|
vs
|
||||||
unsigned long signature = readl(0xC0000);
|
unsigned long signature = readl(0xC0000);
|
||||||
|
|
||||||
I think the second version actually is more readable, no?
|
I think the second version actually is more readable, no?
|
||||||
|
|
||||||
Linus
|
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
Cache and TLB Flushing
|
==================================
|
||||||
Under Linux
|
Cache and TLB Flushing Under Linux
|
||||||
|
==================================
|
||||||
|
|
||||||
David S. Miller <davem@redhat.com>
|
:Author: David S. Miller <davem@redhat.com>
|
||||||
|
|
||||||
This document describes the cache/tlb flushing interfaces called
|
This document describes the cache/tlb flushing interfaces called
|
||||||
by the Linux VM subsystem. It enumerates over each interface,
|
by the Linux VM subsystem. It enumerates over each interface,
|
||||||
@@ -28,7 +29,7 @@ Therefore when software page table changes occur, the kernel will
|
|||||||
invoke one of the following flush methods _after_ the page table
|
invoke one of the following flush methods _after_ the page table
|
||||||
changes occur:
|
changes occur:
|
||||||
|
|
||||||
1) void flush_tlb_all(void)
|
1) ``void flush_tlb_all(void)``
|
||||||
|
|
||||||
The most severe flush of all. After this interface runs,
|
The most severe flush of all. After this interface runs,
|
||||||
any previous page table modification whatsoever will be
|
any previous page table modification whatsoever will be
|
||||||
@@ -37,7 +38,7 @@ changes occur:
|
|||||||
This is usually invoked when the kernel page tables are
|
This is usually invoked when the kernel page tables are
|
||||||
changed, since such translations are "global" in nature.
|
changed, since such translations are "global" in nature.
|
||||||
|
|
||||||
2) void flush_tlb_mm(struct mm_struct *mm)
|
2) ``void flush_tlb_mm(struct mm_struct *mm)``
|
||||||
|
|
||||||
This interface flushes an entire user address space from
|
This interface flushes an entire user address space from
|
||||||
the TLB. After running, this interface must make sure that
|
the TLB. After running, this interface must make sure that
|
||||||
@@ -49,8 +50,8 @@ changes occur:
|
|||||||
page table operations such as what happens during
|
page table operations such as what happens during
|
||||||
fork, and exec.
|
fork, and exec.
|
||||||
|
|
||||||
3) void flush_tlb_range(struct vm_area_struct *vma,
|
3) ``void flush_tlb_range(struct vm_area_struct *vma,
|
||||||
unsigned long start, unsigned long end)
|
unsigned long start, unsigned long end)``
|
||||||
|
|
||||||
Here we are flushing a specific range of (user) virtual
|
Here we are flushing a specific range of (user) virtual
|
||||||
address translations from the TLB. After running, this
|
address translations from the TLB. After running, this
|
||||||
@@ -69,7 +70,7 @@ changes occur:
|
|||||||
call flush_tlb_page (see below) for each entry which may be
|
call flush_tlb_page (see below) for each entry which may be
|
||||||
modified.
|
modified.
|
||||||
|
|
||||||
4) void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)
|
4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)``
|
||||||
|
|
||||||
This time we need to remove the PAGE_SIZE sized translation
|
This time we need to remove the PAGE_SIZE sized translation
|
||||||
from the TLB. The 'vma' is the backing structure used by
|
from the TLB. The 'vma' is the backing structure used by
|
||||||
@@ -87,8 +88,8 @@ changes occur:
|
|||||||
|
|
||||||
This is used primarily during fault processing.
|
This is used primarily during fault processing.
|
||||||
|
|
||||||
5) void update_mmu_cache(struct vm_area_struct *vma,
|
5) ``void update_mmu_cache(struct vm_area_struct *vma,
|
||||||
unsigned long address, pte_t *ptep)
|
unsigned long address, pte_t *ptep)``
|
||||||
|
|
||||||
At the end of every page fault, this routine is invoked to
|
At the end of every page fault, this routine is invoked to
|
||||||
tell the architecture specific code that a translation
|
tell the architecture specific code that a translation
|
||||||
@@ -100,7 +101,7 @@ changes occur:
|
|||||||
translations for software managed TLB configurations.
|
translations for software managed TLB configurations.
|
||||||
The sparc64 port currently does this.
|
The sparc64 port currently does this.
|
||||||
|
|
||||||
6) void tlb_migrate_finish(struct mm_struct *mm)
|
6) ``void tlb_migrate_finish(struct mm_struct *mm)``
|
||||||
|
|
||||||
This interface is called at the end of an explicit
|
This interface is called at the end of an explicit
|
||||||
process migration. This interface provides a hook
|
process migration. This interface provides a hook
|
||||||
@@ -112,7 +113,7 @@ changes occur:
|
|||||||
|
|
||||||
Next, we have the cache flushing interfaces. In general, when Linux
|
Next, we have the cache flushing interfaces. In general, when Linux
|
||||||
is changing an existing virtual-->physical mapping to a new value,
|
is changing an existing virtual-->physical mapping to a new value,
|
||||||
the sequence will be in one of the following forms:
|
the sequence will be in one of the following forms::
|
||||||
|
|
||||||
1) flush_cache_mm(mm);
|
1) flush_cache_mm(mm);
|
||||||
change_all_page_tables_of(mm);
|
change_all_page_tables_of(mm);
|
||||||
@@ -143,7 +144,7 @@ and have no dependency on translation information.
|
|||||||
|
|
||||||
Here are the routines, one by one:
|
Here are the routines, one by one:
|
||||||
|
|
||||||
1) void flush_cache_mm(struct mm_struct *mm)
|
1) ``void flush_cache_mm(struct mm_struct *mm)``
|
||||||
|
|
||||||
This interface flushes an entire user address space from
|
This interface flushes an entire user address space from
|
||||||
the caches. That is, after running, there will be no cache
|
the caches. That is, after running, there will be no cache
|
||||||
@@ -152,7 +153,7 @@ Here are the routines, one by one:
|
|||||||
This interface is used to handle whole address space
|
This interface is used to handle whole address space
|
||||||
page table operations such as what happens during exit and exec.
|
page table operations such as what happens during exit and exec.
|
||||||
|
|
||||||
2) void flush_cache_dup_mm(struct mm_struct *mm)
|
2) ``void flush_cache_dup_mm(struct mm_struct *mm)``
|
||||||
|
|
||||||
This interface flushes an entire user address space from
|
This interface flushes an entire user address space from
|
||||||
the caches. That is, after running, there will be no cache
|
the caches. That is, after running, there will be no cache
|
||||||
@@ -164,8 +165,8 @@ Here are the routines, one by one:
|
|||||||
This option is separate from flush_cache_mm to allow some
|
This option is separate from flush_cache_mm to allow some
|
||||||
optimizations for VIPT caches.
|
optimizations for VIPT caches.
|
||||||
|
|
||||||
3) void flush_cache_range(struct vm_area_struct *vma,
|
3) ``void flush_cache_range(struct vm_area_struct *vma,
|
||||||
unsigned long start, unsigned long end)
|
unsigned long start, unsigned long end)``
|
||||||
|
|
||||||
Here we are flushing a specific range of (user) virtual
|
Here we are flushing a specific range of (user) virtual
|
||||||
addresses from the cache. After running, there will be no
|
addresses from the cache. After running, there will be no
|
||||||
@@ -181,7 +182,7 @@ Here are the routines, one by one:
|
|||||||
call flush_cache_page (see below) for each entry which may be
|
call flush_cache_page (see below) for each entry which may be
|
||||||
modified.
|
modified.
|
||||||
|
|
||||||
4) void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)
|
4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)``
|
||||||
|
|
||||||
This time we need to remove a PAGE_SIZE sized range
|
This time we need to remove a PAGE_SIZE sized range
|
||||||
from the cache. The 'vma' is the backing structure used by
|
from the cache. The 'vma' is the backing structure used by
|
||||||
@@ -202,7 +203,7 @@ Here are the routines, one by one:
|
|||||||
|
|
||||||
This is used primarily during fault processing.
|
This is used primarily during fault processing.
|
||||||
|
|
||||||
5) void flush_cache_kmaps(void)
|
5) ``void flush_cache_kmaps(void)``
|
||||||
|
|
||||||
This routine need only be implemented if the platform utilizes
|
This routine need only be implemented if the platform utilizes
|
||||||
highmem. It will be called right before all of the kmaps
|
highmem. It will be called right before all of the kmaps
|
||||||
@@ -214,8 +215,8 @@ Here are the routines, one by one:
|
|||||||
|
|
||||||
This routing should be implemented in asm/highmem.h
|
This routing should be implemented in asm/highmem.h
|
||||||
|
|
||||||
6) void flush_cache_vmap(unsigned long start, unsigned long end)
|
6) ``void flush_cache_vmap(unsigned long start, unsigned long end)``
|
||||||
void flush_cache_vunmap(unsigned long start, unsigned long end)
|
``void flush_cache_vunmap(unsigned long start, unsigned long end)``
|
||||||
|
|
||||||
Here in these two interfaces we are flushing a specific range
|
Here in these two interfaces we are flushing a specific range
|
||||||
of (kernel) virtual addresses from the cache. After running,
|
of (kernel) virtual addresses from the cache. After running,
|
||||||
@@ -243,7 +244,9 @@ size). This setting will force the SYSv IPC layer to only allow user
|
|||||||
processes to mmap shared memory at address which are a multiple of
|
processes to mmap shared memory at address which are a multiple of
|
||||||
this value.
|
this value.
|
||||||
|
|
||||||
NOTE: This does not fix shared mmaps, check out the sparc64 port for
|
.. note::
|
||||||
|
|
||||||
|
This does not fix shared mmaps, check out the sparc64 port for
|
||||||
one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
|
one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
|
||||||
|
|
||||||
Next, you have to solve the D-cache aliasing issue for all
|
Next, you have to solve the D-cache aliasing issue for all
|
||||||
@@ -255,8 +258,8 @@ physical page into its address space, by implication the D-cache
|
|||||||
aliasing problem has the potential to exist since the kernel already
|
aliasing problem has the potential to exist since the kernel already
|
||||||
maps this page at its virtual address.
|
maps this page at its virtual address.
|
||||||
|
|
||||||
void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)
|
``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)``
|
||||||
void clear_user_page(void *to, unsigned long addr, struct page *page)
|
``void clear_user_page(void *to, unsigned long addr, struct page *page)``
|
||||||
|
|
||||||
These two routines store data in user anonymous or COW
|
These two routines store data in user anonymous or COW
|
||||||
pages. It allows a port to efficiently avoid D-cache alias
|
pages. It allows a port to efficiently avoid D-cache alias
|
||||||
@@ -276,14 +279,16 @@ maps this page at its virtual address.
|
|||||||
If D-cache aliasing is not an issue, these two routines may
|
If D-cache aliasing is not an issue, these two routines may
|
||||||
simply call memcpy/memset directly and do nothing more.
|
simply call memcpy/memset directly and do nothing more.
|
||||||
|
|
||||||
void flush_dcache_page(struct page *page)
|
``void flush_dcache_page(struct page *page)``
|
||||||
|
|
||||||
Any time the kernel writes to a page cache page, _OR_
|
Any time the kernel writes to a page cache page, _OR_
|
||||||
the kernel is about to read from a page cache page and
|
the kernel is about to read from a page cache page and
|
||||||
user space shared/writable mappings of this page potentially
|
user space shared/writable mappings of this page potentially
|
||||||
exist, this routine is called.
|
exist, this routine is called.
|
||||||
|
|
||||||
NOTE: This routine need only be called for page cache pages
|
.. note::
|
||||||
|
|
||||||
|
This routine need only be called for page cache pages
|
||||||
which can potentially ever be mapped into the address
|
which can potentially ever be mapped into the address
|
||||||
space of a user process. So for example, VFS layer code
|
space of a user process. So for example, VFS layer code
|
||||||
handling vfs symlinks in the page cache need not call
|
handling vfs symlinks in the page cache need not call
|
||||||
@@ -322,18 +327,19 @@ maps this page at its virtual address.
|
|||||||
made of this flag bit, and if set the flush is done and the flag
|
made of this flag bit, and if set the flush is done and the flag
|
||||||
bit is cleared.
|
bit is cleared.
|
||||||
|
|
||||||
IMPORTANT NOTE: It is often important, if you defer the flush,
|
.. important::
|
||||||
|
|
||||||
|
It is often important, if you defer the flush,
|
||||||
that the actual flush occurs on the same CPU
|
that the actual flush occurs on the same CPU
|
||||||
as did the cpu stores into the page to make it
|
as did the cpu stores into the page to make it
|
||||||
dirty. Again, see sparc64 for examples of how
|
dirty. Again, see sparc64 for examples of how
|
||||||
to deal with this.
|
to deal with this.
|
||||||
|
|
||||||
void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
|
``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
|
||||||
unsigned long user_vaddr,
|
unsigned long user_vaddr, void *dst, void *src, int len)``
|
||||||
void *dst, void *src, int len)
|
``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
|
||||||
void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
|
unsigned long user_vaddr, void *dst, void *src, int len)``
|
||||||
unsigned long user_vaddr,
|
|
||||||
void *dst, void *src, int len)
|
|
||||||
When the kernel needs to copy arbitrary data in and out
|
When the kernel needs to copy arbitrary data in and out
|
||||||
of arbitrary user pages (f.e. for ptrace()) it will use
|
of arbitrary user pages (f.e. for ptrace()) it will use
|
||||||
these two routines.
|
these two routines.
|
||||||
@@ -344,8 +350,9 @@ maps this page at its virtual address.
|
|||||||
likely that you will need to flush the instruction cache
|
likely that you will need to flush the instruction cache
|
||||||
for copy_to_user_page().
|
for copy_to_user_page().
|
||||||
|
|
||||||
void flush_anon_page(struct vm_area_struct *vma, struct page *page,
|
``void flush_anon_page(struct vm_area_struct *vma, struct page *page,
|
||||||
unsigned long vmaddr)
|
unsigned long vmaddr)``
|
||||||
|
|
||||||
When the kernel needs to access the contents of an anonymous
|
When the kernel needs to access the contents of an anonymous
|
||||||
page, it calls this function (currently only
|
page, it calls this function (currently only
|
||||||
get_user_pages()). Note: flush_dcache_page() deliberately
|
get_user_pages()). Note: flush_dcache_page() deliberately
|
||||||
@@ -354,7 +361,8 @@ maps this page at its virtual address.
|
|||||||
architectures). For incoherent architectures, it should flush
|
architectures). For incoherent architectures, it should flush
|
||||||
the cache of the page at vmaddr.
|
the cache of the page at vmaddr.
|
||||||
|
|
||||||
void flush_kernel_dcache_page(struct page *page)
|
``void flush_kernel_dcache_page(struct page *page)``
|
||||||
|
|
||||||
When the kernel needs to modify a user page is has obtained
|
When the kernel needs to modify a user page is has obtained
|
||||||
with kmap, it calls this function after all modifications are
|
with kmap, it calls this function after all modifications are
|
||||||
complete (but before kunmapping it) to bring the underlying
|
complete (but before kunmapping it) to bring the underlying
|
||||||
@@ -366,14 +374,16 @@ maps this page at its virtual address.
|
|||||||
the kernel cache for page (using page_address(page)).
|
the kernel cache for page (using page_address(page)).
|
||||||
|
|
||||||
|
|
||||||
void flush_icache_range(unsigned long start, unsigned long end)
|
``void flush_icache_range(unsigned long start, unsigned long end)``
|
||||||
|
|
||||||
When the kernel stores into addresses that it will execute
|
When the kernel stores into addresses that it will execute
|
||||||
out of (eg when loading modules), this function is called.
|
out of (eg when loading modules), this function is called.
|
||||||
|
|
||||||
If the icache does not snoop stores then this routine will need
|
If the icache does not snoop stores then this routine will need
|
||||||
to flush it.
|
to flush it.
|
||||||
|
|
||||||
void flush_icache_page(struct vm_area_struct *vma, struct page *page)
|
``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
|
||||||
|
|
||||||
All the functionality of flush_icache_page can be implemented in
|
All the functionality of flush_icache_page can be implemented in
|
||||||
flush_dcache_page and update_mmu_cache. In the future, the hope
|
flush_dcache_page and update_mmu_cache. In the future, the hope
|
||||||
is to remove this interface completely.
|
is to remove this interface completely.
|
||||||
@@ -387,7 +397,8 @@ the kernel trying to do I/O to vmap areas must manually manage
|
|||||||
coherency. It must do this by flushing the vmap range before doing
|
coherency. It must do this by flushing the vmap range before doing
|
||||||
I/O and invalidating it after the I/O returns.
|
I/O and invalidating it after the I/O returns.
|
||||||
|
|
||||||
void flush_kernel_vmap_range(void *vaddr, int size)
|
``void flush_kernel_vmap_range(void *vaddr, int size)``
|
||||||
|
|
||||||
flushes the kernel cache for a given virtual address range in
|
flushes the kernel cache for a given virtual address range in
|
||||||
the vmap area. This is to make sure that any data the kernel
|
the vmap area. This is to make sure that any data the kernel
|
||||||
modified in the vmap range is made visible to the physical
|
modified in the vmap range is made visible to the physical
|
||||||
@@ -395,7 +406,8 @@ I/O and invalidating it after the I/O returns.
|
|||||||
Note that this API does *not* also flush the offset map alias
|
Note that this API does *not* also flush the offset map alias
|
||||||
of the area.
|
of the area.
|
||||||
|
|
||||||
void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates
|
``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates``
|
||||||
|
|
||||||
the cache for a given virtual address range in the vmap area
|
the cache for a given virtual address range in the vmap area
|
||||||
which prevents the processor from making the cache stale by
|
which prevents the processor from making the cache stale by
|
||||||
speculatively reading data while the I/O was occurring to the
|
speculatively reading data while the I/O was occurring to the
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,9 +1,9 @@
|
|||||||
================
|
================
|
||||||
CIRCULAR BUFFERS
|
Circular Buffers
|
||||||
================
|
================
|
||||||
|
|
||||||
By: David Howells <dhowells@redhat.com>
|
:Author: David Howells <dhowells@redhat.com>
|
||||||
Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
:Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
|
||||||
|
|
||||||
|
|
||||||
Linux provides a number of features that can be used to implement circular
|
Linux provides a number of features that can be used to implement circular
|
||||||
@@ -20,7 +20,7 @@ producer and just one consumer. It is possible to handle multiple producers by
|
|||||||
serialising them, and to handle multiple consumers by serialising them.
|
serialising them, and to handle multiple consumers by serialising them.
|
||||||
|
|
||||||
|
|
||||||
Contents:
|
.. Contents:
|
||||||
|
|
||||||
(*) What is a circular buffer?
|
(*) What is a circular buffer?
|
||||||
|
|
||||||
@@ -31,8 +31,8 @@ Contents:
|
|||||||
- The consumer.
|
- The consumer.
|
||||||
|
|
||||||
|
|
||||||
==========================
|
|
||||||
WHAT IS A CIRCULAR BUFFER?
|
What is a circular buffer?
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
First of all, what is a circular buffer? A circular buffer is a buffer of
|
First of all, what is a circular buffer? A circular buffer is a buffer of
|
||||||
@@ -60,9 +60,7 @@ buffer, provided that neither index overtakes the other. The implementer must
|
|||||||
be careful, however, as a region more than one unit in size may wrap the end of
|
be careful, however, as a region more than one unit in size may wrap the end of
|
||||||
the buffer and be broken into two segments.
|
the buffer and be broken into two segments.
|
||||||
|
|
||||||
|
Measuring power-of-2 buffers
|
||||||
============================
|
|
||||||
MEASURING POWER-OF-2 BUFFERS
|
|
||||||
============================
|
============================
|
||||||
|
|
||||||
Calculation of the occupancy or the remaining capacity of an arbitrarily sized
|
Calculation of the occupancy or the remaining capacity of an arbitrarily sized
|
||||||
@@ -71,13 +69,13 @@ modulus (divide) instruction. However, if the buffer is of a power-of-2 size,
|
|||||||
then a much quicker bitwise-AND instruction can be used instead.
|
then a much quicker bitwise-AND instruction can be used instead.
|
||||||
|
|
||||||
Linux provides a set of macros for handling power-of-2 circular buffers. These
|
Linux provides a set of macros for handling power-of-2 circular buffers. These
|
||||||
can be made use of by:
|
can be made use of by::
|
||||||
|
|
||||||
#include <linux/circ_buf.h>
|
#include <linux/circ_buf.h>
|
||||||
|
|
||||||
The macros are:
|
The macros are:
|
||||||
|
|
||||||
(*) Measure the remaining capacity of a buffer:
|
(#) Measure the remaining capacity of a buffer::
|
||||||
|
|
||||||
CIRC_SPACE(head_index, tail_index, buffer_size);
|
CIRC_SPACE(head_index, tail_index, buffer_size);
|
||||||
|
|
||||||
@@ -85,7 +83,7 @@ The macros are:
|
|||||||
can be inserted.
|
can be inserted.
|
||||||
|
|
||||||
|
|
||||||
(*) Measure the maximum consecutive immediate space in a buffer:
|
(#) Measure the maximum consecutive immediate space in a buffer::
|
||||||
|
|
||||||
CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
|
CIRC_SPACE_TO_END(head_index, tail_index, buffer_size);
|
||||||
|
|
||||||
@@ -94,14 +92,14 @@ The macros are:
|
|||||||
beginning of the buffer.
|
beginning of the buffer.
|
||||||
|
|
||||||
|
|
||||||
(*) Measure the occupancy of a buffer:
|
(#) Measure the occupancy of a buffer::
|
||||||
|
|
||||||
CIRC_CNT(head_index, tail_index, buffer_size);
|
CIRC_CNT(head_index, tail_index, buffer_size);
|
||||||
|
|
||||||
This returns the number of items currently occupying a buffer[2].
|
This returns the number of items currently occupying a buffer[2].
|
||||||
|
|
||||||
|
|
||||||
(*) Measure the non-wrapping occupancy of a buffer:
|
(#) Measure the non-wrapping occupancy of a buffer::
|
||||||
|
|
||||||
CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
|
CIRC_CNT_TO_END(head_index, tail_index, buffer_size);
|
||||||
|
|
||||||
@@ -112,7 +110,7 @@ The macros are:
|
|||||||
Each of these macros will nominally return a value between 0 and buffer_size-1,
|
Each of these macros will nominally return a value between 0 and buffer_size-1,
|
||||||
however:
|
however:
|
||||||
|
|
||||||
[1] CIRC_SPACE*() are intended to be used in the producer. To the producer
|
(1) CIRC_SPACE*() are intended to be used in the producer. To the producer
|
||||||
they will return a lower bound as the producer controls the head index,
|
they will return a lower bound as the producer controls the head index,
|
||||||
but the consumer may still be depleting the buffer on another CPU and
|
but the consumer may still be depleting the buffer on another CPU and
|
||||||
moving the tail index.
|
moving the tail index.
|
||||||
@@ -120,7 +118,7 @@ however:
|
|||||||
To the consumer it will show an upper bound as the producer may be busy
|
To the consumer it will show an upper bound as the producer may be busy
|
||||||
depleting the space.
|
depleting the space.
|
||||||
|
|
||||||
[2] CIRC_CNT*() are intended to be used in the consumer. To the consumer they
|
(2) CIRC_CNT*() are intended to be used in the consumer. To the consumer they
|
||||||
will return a lower bound as the consumer controls the tail index, but the
|
will return a lower bound as the consumer controls the tail index, but the
|
||||||
producer may still be filling the buffer on another CPU and moving the
|
producer may still be filling the buffer on another CPU and moving the
|
||||||
head index.
|
head index.
|
||||||
@@ -128,14 +126,12 @@ however:
|
|||||||
To the producer it will show an upper bound as the consumer may be busy
|
To the producer it will show an upper bound as the consumer may be busy
|
||||||
emptying the buffer.
|
emptying the buffer.
|
||||||
|
|
||||||
[3] To a third party, the order in which the writes to the indices by the
|
(3) To a third party, the order in which the writes to the indices by the
|
||||||
producer and consumer become visible cannot be guaranteed as they are
|
producer and consumer become visible cannot be guaranteed as they are
|
||||||
independent and may be made on different CPUs - so the result in such a
|
independent and may be made on different CPUs - so the result in such a
|
||||||
situation will merely be a guess, and may even be negative.
|
situation will merely be a guess, and may even be negative.
|
||||||
|
|
||||||
|
Using memory barriers with circular buffers
|
||||||
===========================================
|
|
||||||
USING MEMORY BARRIERS WITH CIRCULAR BUFFERS
|
|
||||||
===========================================
|
===========================================
|
||||||
|
|
||||||
By using memory barriers in conjunction with circular buffers, you can avoid
|
By using memory barriers in conjunction with circular buffers, you can avoid
|
||||||
@@ -152,10 +148,10 @@ time, and only one thing should be emptying a buffer at any one time, but the
|
|||||||
two sides can operate simultaneously.
|
two sides can operate simultaneously.
|
||||||
|
|
||||||
|
|
||||||
THE PRODUCER
|
The producer
|
||||||
------------
|
------------
|
||||||
|
|
||||||
The producer will look something like this:
|
The producer will look something like this::
|
||||||
|
|
||||||
spin_lock(&producer_lock);
|
spin_lock(&producer_lock);
|
||||||
|
|
||||||
@@ -193,10 +189,10 @@ ordering between the read of the index indicating that the consumer has
|
|||||||
vacated a given element and the write by the producer to that same element.
|
vacated a given element and the write by the producer to that same element.
|
||||||
|
|
||||||
|
|
||||||
THE CONSUMER
|
The Consumer
|
||||||
------------
|
------------
|
||||||
|
|
||||||
The consumer will look something like this:
|
The consumer will look something like this::
|
||||||
|
|
||||||
spin_lock(&consumer_lock);
|
spin_lock(&consumer_lock);
|
||||||
|
|
||||||
@@ -235,8 +231,7 @@ prevents the compiler from tearing the store, and enforces ordering
|
|||||||
against previous accesses.
|
against previous accesses.
|
||||||
|
|
||||||
|
|
||||||
===============
|
Further reading
|
||||||
FURTHER READING
|
|
||||||
===============
|
===============
|
||||||
|
|
||||||
See also Documentation/memory-barriers.txt for a description of Linux's memory
|
See also Documentation/memory-barriers.txt for a description of Linux's memory
|
||||||
|
|||||||
@@ -1,12 +1,16 @@
|
|||||||
|
========================
|
||||||
The Common Clk Framework
|
The Common Clk Framework
|
||||||
Mike Turquette <mturquette@ti.com>
|
========================
|
||||||
|
|
||||||
|
:Author: Mike Turquette <mturquette@ti.com>
|
||||||
|
|
||||||
This document endeavours to explain the common clk framework details,
|
This document endeavours to explain the common clk framework details,
|
||||||
and how to port a platform over to this framework. It is not yet a
|
and how to port a platform over to this framework. It is not yet a
|
||||||
detailed explanation of the clock api in include/linux/clk.h, but
|
detailed explanation of the clock api in include/linux/clk.h, but
|
||||||
perhaps someday it will include that information.
|
perhaps someday it will include that information.
|
||||||
|
|
||||||
Part 1 - introduction and interface split
|
Introduction and interface split
|
||||||
|
================================
|
||||||
|
|
||||||
The common clk framework is an interface to control the clock nodes
|
The common clk framework is an interface to control the clock nodes
|
||||||
available on various devices today. This may come in the form of clock
|
available on various devices today. This may come in the form of clock
|
||||||
@@ -35,10 +39,11 @@ is defined in struct clk_foo and pointed to within struct clk_core. This
|
|||||||
allows for easy navigation between the two discrete halves of the common
|
allows for easy navigation between the two discrete halves of the common
|
||||||
clock interface.
|
clock interface.
|
||||||
|
|
||||||
Part 2 - common data structures and api
|
Common data structures and api
|
||||||
|
==============================
|
||||||
|
|
||||||
Below is the common struct clk_core definition from
|
Below is the common struct clk_core definition from
|
||||||
drivers/clk/clk.c, modified for brevity:
|
drivers/clk/clk.c, modified for brevity::
|
||||||
|
|
||||||
struct clk_core {
|
struct clk_core {
|
||||||
const char *name;
|
const char *name;
|
||||||
@@ -59,7 +64,7 @@ struct clk. That api is documented in include/linux/clk.h.
|
|||||||
|
|
||||||
Platforms and devices utilizing the common struct clk_core use the struct
|
Platforms and devices utilizing the common struct clk_core use the struct
|
||||||
clk_ops pointer in struct clk_core to perform the hardware-specific parts of
|
clk_ops pointer in struct clk_core to perform the hardware-specific parts of
|
||||||
the operations defined in clk-provider.h:
|
the operations defined in clk-provider.h::
|
||||||
|
|
||||||
struct clk_ops {
|
struct clk_ops {
|
||||||
int (*prepare)(struct clk_hw *hw);
|
int (*prepare)(struct clk_hw *hw);
|
||||||
@@ -95,12 +100,13 @@ the operations defined in clk-provider.h:
|
|||||||
struct dentry *dentry);
|
struct dentry *dentry);
|
||||||
};
|
};
|
||||||
|
|
||||||
Part 3 - hardware clk implementations
|
Hardware clk implementations
|
||||||
|
============================
|
||||||
|
|
||||||
The strength of the common struct clk_core comes from its .ops and .hw pointers
|
The strength of the common struct clk_core comes from its .ops and .hw pointers
|
||||||
which abstract the details of struct clk from the hardware-specific bits, and
|
which abstract the details of struct clk from the hardware-specific bits, and
|
||||||
vice versa. To illustrate consider the simple gateable clk implementation in
|
vice versa. To illustrate consider the simple gateable clk implementation in
|
||||||
drivers/clk/clk-gate.c:
|
drivers/clk/clk-gate.c::
|
||||||
|
|
||||||
struct clk_gate {
|
struct clk_gate {
|
||||||
struct clk_hw hw;
|
struct clk_hw hw;
|
||||||
@@ -115,7 +121,7 @@ Nothing about clock topology or accounting, such as enable_count or
|
|||||||
notifier_count, is needed here. That is all handled by the common
|
notifier_count, is needed here. That is all handled by the common
|
||||||
framework code and struct clk_core.
|
framework code and struct clk_core.
|
||||||
|
|
||||||
Let's walk through enabling this clk from driver code:
|
Let's walk through enabling this clk from driver code::
|
||||||
|
|
||||||
struct clk *clk;
|
struct clk *clk;
|
||||||
clk = clk_get(NULL, "my_gateable_clk");
|
clk = clk_get(NULL, "my_gateable_clk");
|
||||||
@@ -123,7 +129,7 @@ Let's walk through enabling this clk from driver code:
|
|||||||
clk_prepare(clk);
|
clk_prepare(clk);
|
||||||
clk_enable(clk);
|
clk_enable(clk);
|
||||||
|
|
||||||
The call graph for clk_enable is very simple:
|
The call graph for clk_enable is very simple::
|
||||||
|
|
||||||
clk_enable(clk);
|
clk_enable(clk);
|
||||||
clk->ops->enable(clk->hw);
|
clk->ops->enable(clk->hw);
|
||||||
@@ -132,7 +138,7 @@ clk_enable(clk);
|
|||||||
[resolves struct clk gate with to_clk_gate(hw)]
|
[resolves struct clk gate with to_clk_gate(hw)]
|
||||||
clk_gate_set_bit(gate);
|
clk_gate_set_bit(gate);
|
||||||
|
|
||||||
And the definition of clk_gate_set_bit:
|
And the definition of clk_gate_set_bit::
|
||||||
|
|
||||||
static void clk_gate_set_bit(struct clk_gate *gate)
|
static void clk_gate_set_bit(struct clk_gate *gate)
|
||||||
{
|
{
|
||||||
@@ -143,22 +149,23 @@ static void clk_gate_set_bit(struct clk_gate *gate)
|
|||||||
writel(reg, gate->reg);
|
writel(reg, gate->reg);
|
||||||
}
|
}
|
||||||
|
|
||||||
Note that to_clk_gate is defined as:
|
Note that to_clk_gate is defined as::
|
||||||
|
|
||||||
#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw)
|
#define to_clk_gate(_hw) container_of(_hw, struct clk_gate, hw)
|
||||||
|
|
||||||
This pattern of abstraction is used for every clock hardware
|
This pattern of abstraction is used for every clock hardware
|
||||||
representation.
|
representation.
|
||||||
|
|
||||||
Part 4 - supporting your own clk hardware
|
Supporting your own clk hardware
|
||||||
|
================================
|
||||||
|
|
||||||
When implementing support for a new type of clock it is only necessary to
|
When implementing support for a new type of clock it is only necessary to
|
||||||
include the following header:
|
include the following header::
|
||||||
|
|
||||||
#include <linux/clk-provider.h>
|
#include <linux/clk-provider.h>
|
||||||
|
|
||||||
To construct a clk hardware structure for your platform you must define
|
To construct a clk hardware structure for your platform you must define
|
||||||
the following:
|
the following::
|
||||||
|
|
||||||
struct clk_foo {
|
struct clk_foo {
|
||||||
struct clk_hw hw;
|
struct clk_hw hw;
|
||||||
@@ -166,14 +173,14 @@ struct clk_foo {
|
|||||||
};
|
};
|
||||||
|
|
||||||
To take advantage of your data you'll need to support valid operations
|
To take advantage of your data you'll need to support valid operations
|
||||||
for your clk:
|
for your clk::
|
||||||
|
|
||||||
struct clk_ops clk_foo_ops {
|
struct clk_ops clk_foo_ops {
|
||||||
.enable = &clk_foo_enable;
|
.enable = &clk_foo_enable;
|
||||||
.disable = &clk_foo_disable;
|
.disable = &clk_foo_disable;
|
||||||
};
|
};
|
||||||
|
|
||||||
Implement the above functions using container_of:
|
Implement the above functions using container_of::
|
||||||
|
|
||||||
#define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw)
|
#define to_clk_foo(_hw) container_of(_hw, struct clk_foo, hw)
|
||||||
|
|
||||||
@@ -194,41 +201,56 @@ mandatory, a cell marked as "n" implies that either including that
|
|||||||
callback is invalid or otherwise unnecessary. Empty cells are either
|
callback is invalid or otherwise unnecessary. Empty cells are either
|
||||||
optional or must be evaluated on a case-by-case basis.
|
optional or must be evaluated on a case-by-case basis.
|
||||||
|
|
||||||
clock hardware characteristics
|
.. table:: clock hardware characteristics
|
||||||
-----------------------------------------------------------
|
|
||||||
| gate | change rate | single parent | multiplexer | root |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|------|-------------|---------------|-------------|------|
|
| | gate | change rate | single parent | multiplexer | root |
|
||||||
.prepare | | | | | |
|
+================+======+=============+===============+=============+======+
|
||||||
.unprepare | | | | | |
|
|.prepare | | | | | |
|
||||||
| | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.enable | y | | | | |
|
|.unprepare | | | | | |
|
||||||
.disable | y | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.is_enabled | y | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
| | | | | |
|
|.enable | y | | | | |
|
||||||
.recalc_rate | | y | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.round_rate | | y [1] | | | |
|
|.disable | y | | | | |
|
||||||
.determine_rate | | y [1] | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.set_rate | | y | | | |
|
|.is_enabled | y | | | | |
|
||||||
| | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.set_parent | | | n | y | n |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.get_parent | | | n | y | n |
|
|.recalc_rate | | y | | | |
|
||||||
| | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.recalc_accuracy| | | | | |
|
|.round_rate | | y [1]_ | | | |
|
||||||
| | | | | |
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
.init | | | | | |
|
|.determine_rate | | y [1]_ | | | |
|
||||||
-----------------------------------------------------------
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
[1] either one of round_rate or determine_rate is required.
|
|.set_rate | | y | | | |
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
|.set_parent | | | n | y | n |
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
|.get_parent | | | n | y | n |
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
|.recalc_accuracy| | | | | |
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
|.init | | | | | |
|
||||||
|
+----------------+------+-------------+---------------+-------------+------+
|
||||||
|
|
||||||
|
.. [1] either one of round_rate or determine_rate is required.
|
||||||
|
|
||||||
Finally, register your clock at run-time with a hardware-specific
|
Finally, register your clock at run-time with a hardware-specific
|
||||||
registration function. This function simply populates struct clk_foo's
|
registration function. This function simply populates struct clk_foo's
|
||||||
data and then passes the common struct clk parameters to the framework
|
data and then passes the common struct clk parameters to the framework
|
||||||
with a call to:
|
with a call to::
|
||||||
|
|
||||||
clk_register(...)
|
clk_register(...)
|
||||||
|
|
||||||
See the basic clock types in drivers/clk/clk-*.c for examples.
|
See the basic clock types in ``drivers/clk/clk-*.c`` for examples.
|
||||||
|
|
||||||
Part 5 - Disabling clock gating of unused clocks
|
Disabling clock gating of unused clocks
|
||||||
|
=======================================
|
||||||
|
|
||||||
Sometimes during development it can be useful to be able to bypass the
|
Sometimes during development it can be useful to be able to bypass the
|
||||||
default disabling of unused clocks. For example, if drivers aren't enabling
|
default disabling of unused clocks. For example, if drivers aren't enabling
|
||||||
@@ -239,7 +261,8 @@ are sorted out.
|
|||||||
To bypass this disabling, include "clk_ignore_unused" in the bootargs to the
|
To bypass this disabling, include "clk_ignore_unused" in the bootargs to the
|
||||||
kernel.
|
kernel.
|
||||||
|
|
||||||
Part 6 - Locking
|
Locking
|
||||||
|
=======
|
||||||
|
|
||||||
The common clock framework uses two global locks, the prepare lock and the
|
The common clock framework uses two global locks, the prepare lock and the
|
||||||
enable lock.
|
enable lock.
|
||||||
|
|||||||
@@ -114,7 +114,7 @@ The Slab Cache
|
|||||||
User Space Memory Access
|
User Space Memory Access
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
.. kernel-doc:: arch/x86/include/asm/uaccess_32.h
|
.. kernel-doc:: arch/x86/include/asm/uaccess.h
|
||||||
:internal:
|
:internal:
|
||||||
|
|
||||||
.. kernel-doc:: arch/x86/lib/usercopy_32.c
|
.. kernel-doc:: arch/x86/lib/usercopy_32.c
|
||||||
|
|||||||
@@ -1,9 +1,10 @@
|
|||||||
|
========
|
||||||
CPU load
|
CPU load
|
||||||
--------
|
========
|
||||||
|
|
||||||
Linux exports various bits of information via `/proc/stat' and
|
Linux exports various bits of information via ``/proc/stat`` and
|
||||||
`/proc/uptime' that userland tools, such as top(1), use to calculate
|
``/proc/uptime`` that userland tools, such as top(1), use to calculate
|
||||||
the average time system spent in a particular state, for example:
|
the average time system spent in a particular state, for example::
|
||||||
|
|
||||||
$ iostat
|
$ iostat
|
||||||
Linux 2.6.18.3-exp (linmac) 02/20/2007
|
Linux 2.6.18.3-exp (linmac) 02/20/2007
|
||||||
@@ -17,7 +18,7 @@ Here the system thinks that over the default sampling period the
|
|||||||
system spent 10.01% of the time doing work in user space, 2.92% in the
|
system spent 10.01% of the time doing work in user space, 2.92% in the
|
||||||
kernel, and was overall 81.63% of the time idle.
|
kernel, and was overall 81.63% of the time idle.
|
||||||
|
|
||||||
In most cases the `/proc/stat' information reflects the reality quite
|
In most cases the ``/proc/stat`` information reflects the reality quite
|
||||||
closely, however due to the nature of how/when the kernel collects
|
closely, however due to the nature of how/when the kernel collects
|
||||||
this data sometimes it can not be trusted at all.
|
this data sometimes it can not be trusted at all.
|
||||||
|
|
||||||
@@ -33,7 +34,7 @@ Example
|
|||||||
-------
|
-------
|
||||||
|
|
||||||
If we imagine the system with one task that periodically burns cycles
|
If we imagine the system with one task that periodically burns cycles
|
||||||
in the following manner:
|
in the following manner::
|
||||||
|
|
||||||
time line between two timer interrupts
|
time line between two timer interrupts
|
||||||
|--------------------------------------|
|
|--------------------------------------|
|
||||||
@@ -43,12 +44,12 @@ in the following manner:
|
|||||||
(only to be awaken quite soon)
|
(only to be awaken quite soon)
|
||||||
|
|
||||||
In the above situation the system will be 0% loaded according to the
|
In the above situation the system will be 0% loaded according to the
|
||||||
`/proc/stat' (since the timer interrupt will always happen when the
|
``/proc/stat`` (since the timer interrupt will always happen when the
|
||||||
system is executing the idle handler), but in reality the load is
|
system is executing the idle handler), but in reality the load is
|
||||||
closer to 99%.
|
closer to 99%.
|
||||||
|
|
||||||
One can imagine many more situations where this behavior of the kernel
|
One can imagine many more situations where this behavior of the kernel
|
||||||
will lead to quite erratic information inside `/proc/stat'.
|
will lead to quite erratic information inside ``/proc/stat``::
|
||||||
|
|
||||||
|
|
||||||
/* gcc -o hog smallhog.c */
|
/* gcc -o hog smallhog.c */
|
||||||
@@ -103,8 +104,8 @@ int main (void)
|
|||||||
References
|
References
|
||||||
----------
|
----------
|
||||||
|
|
||||||
http://lkml.org/lkml/2007/2/12/6
|
- http://lkml.org/lkml/2007/2/12/6
|
||||||
Documentation/filesystems/proc.txt (1.8)
|
- Documentation/filesystems/proc.txt (1.8)
|
||||||
|
|
||||||
|
|
||||||
Thanks
|
Thanks
|
||||||
|
|||||||
@@ -1,3 +1,6 @@
|
|||||||
|
===========================================
|
||||||
|
How CPU topology info is exported via sysfs
|
||||||
|
===========================================
|
||||||
|
|
||||||
Export CPU topology info via sysfs. Items (attributes) are similar
|
Export CPU topology info via sysfs. Items (attributes) are similar
|
||||||
to /proc/cpuinfo output of some architectures:
|
to /proc/cpuinfo output of some architectures:
|
||||||
@@ -75,7 +78,8 @@ CONFIG_SCHED_BOOK and CONFIG_DRAWER are currently only used on s390, where
|
|||||||
they reflect the cpu and cache hierarchy.
|
they reflect the cpu and cache hierarchy.
|
||||||
|
|
||||||
For an architecture to support this feature, it must define some of
|
For an architecture to support this feature, it must define some of
|
||||||
these macros in include/asm-XXX/topology.h:
|
these macros in include/asm-XXX/topology.h::
|
||||||
|
|
||||||
#define topology_physical_package_id(cpu)
|
#define topology_physical_package_id(cpu)
|
||||||
#define topology_core_id(cpu)
|
#define topology_core_id(cpu)
|
||||||
#define topology_book_id(cpu)
|
#define topology_book_id(cpu)
|
||||||
@@ -85,14 +89,15 @@ these macros in include/asm-XXX/topology.h:
|
|||||||
#define topology_book_cpumask(cpu)
|
#define topology_book_cpumask(cpu)
|
||||||
#define topology_drawer_cpumask(cpu)
|
#define topology_drawer_cpumask(cpu)
|
||||||
|
|
||||||
The type of **_id macros is int.
|
The type of ``**_id macros`` is int.
|
||||||
The type of **_cpumask macros is (const) struct cpumask *. The latter
|
The type of ``**_cpumask macros`` is ``(const) struct cpumask *``. The latter
|
||||||
correspond with appropriate **_siblings sysfs attributes (except for
|
correspond with appropriate ``**_siblings`` sysfs attributes (except for
|
||||||
topology_sibling_cpumask() which corresponds with thread_siblings).
|
topology_sibling_cpumask() which corresponds with thread_siblings).
|
||||||
|
|
||||||
To be consistent on all architectures, include/linux/topology.h
|
To be consistent on all architectures, include/linux/topology.h
|
||||||
provides default definitions for any of the above macros that are
|
provides default definitions for any of the above macros that are
|
||||||
not defined by include/asm-XXX/topology.h:
|
not defined by include/asm-XXX/topology.h:
|
||||||
|
|
||||||
1) physical_package_id: -1
|
1) physical_package_id: -1
|
||||||
2) core_id: 0
|
2) core_id: 0
|
||||||
3) sibling_cpumask: just the given CPU
|
3) sibling_cpumask: just the given CPU
|
||||||
@@ -107,6 +112,7 @@ Additionally, CPU topology information is provided under
|
|||||||
/sys/devices/system/cpu and includes these files. The internal
|
/sys/devices/system/cpu and includes these files. The internal
|
||||||
source for the output is in brackets ("[]").
|
source for the output is in brackets ("[]").
|
||||||
|
|
||||||
|
=========== ==========================================================
|
||||||
kernel_max: the maximum CPU index allowed by the kernel configuration.
|
kernel_max: the maximum CPU index allowed by the kernel configuration.
|
||||||
[NR_CPUS-1]
|
[NR_CPUS-1]
|
||||||
|
|
||||||
@@ -122,6 +128,7 @@ source for the output is in brackets ("[]").
|
|||||||
|
|
||||||
present: CPUs that have been identified as being present in the
|
present: CPUs that have been identified as being present in the
|
||||||
system. [cpu_present_mask]
|
system. [cpu_present_mask]
|
||||||
|
=========== ==========================================================
|
||||||
|
|
||||||
The format for the above output is compatible with cpulist_parse()
|
The format for the above output is compatible with cpulist_parse()
|
||||||
[see <linux/cpumask.h>]. Some examples follow.
|
[see <linux/cpumask.h>]. Some examples follow.
|
||||||
@@ -129,7 +136,7 @@ The format for the above output is compatible with cpulist_parse()
|
|||||||
In this example, there are 64 CPUs in the system but cpus 32-63 exceed
|
In this example, there are 64 CPUs in the system but cpus 32-63 exceed
|
||||||
the kernel max which is limited to 0..31 by the NR_CPUS config option
|
the kernel max which is limited to 0..31 by the NR_CPUS config option
|
||||||
being 32. Note also that CPUs 2 and 4-31 are not online but could be
|
being 32. Note also that CPUs 2 and 4-31 are not online but could be
|
||||||
brought online as they are both present and possible.
|
brought online as they are both present and possible::
|
||||||
|
|
||||||
kernel_max: 31
|
kernel_max: 31
|
||||||
offline: 2,4-31,32-63
|
offline: 2,4-31,32-63
|
||||||
@@ -140,7 +147,7 @@ brought online as they are both present and possible.
|
|||||||
In this example, the NR_CPUS config option is 128, but the kernel was
|
In this example, the NR_CPUS config option is 128, but the kernel was
|
||||||
started with possible_cpus=144. There are 4 CPUs in the system and cpu2
|
started with possible_cpus=144. There are 4 CPUs in the system and cpu2
|
||||||
was manually taken offline (and is the only CPU that can be brought
|
was manually taken offline (and is the only CPU that can be brought
|
||||||
online.)
|
online.)::
|
||||||
|
|
||||||
kernel_max: 127
|
kernel_max: 127
|
||||||
offline: 2,4-127,128-143
|
offline: 2,4-127,128-143
|
||||||
|
|||||||
@@ -1,4 +1,6 @@
|
|||||||
A brief CRC tutorial.
|
=================================
|
||||||
|
brief tutorial on CRC computation
|
||||||
|
=================================
|
||||||
|
|
||||||
A CRC is a long-division remainder. You add the CRC to the message,
|
A CRC is a long-division remainder. You add the CRC to the message,
|
||||||
and the whole thing (message+CRC) is a multiple of the given
|
and the whole thing (message+CRC) is a multiple of the given
|
||||||
@@ -8,7 +10,8 @@ remainder computed on the message+CRC is 0. This latter approach
|
|||||||
is used by a lot of hardware implementations, and is why so many
|
is used by a lot of hardware implementations, and is why so many
|
||||||
protocols put the end-of-frame flag after the CRC.
|
protocols put the end-of-frame flag after the CRC.
|
||||||
|
|
||||||
It's actually the same long division you learned in school, except that
|
It's actually the same long division you learned in school, except that:
|
||||||
|
|
||||||
- We're working in binary, so the digits are only 0 and 1, and
|
- We're working in binary, so the digits are only 0 and 1, and
|
||||||
- When dividing polynomials, there are no carries. Rather than add and
|
- When dividing polynomials, there are no carries. Rather than add and
|
||||||
subtract, we just xor. Thus, we tend to get a bit sloppy about
|
subtract, we just xor. Thus, we tend to get a bit sloppy about
|
||||||
@@ -40,7 +43,8 @@ throw the quotient bit away, but subtract the appropriate multiple of
|
|||||||
the polynomial from the remainder and we're back to where we started,
|
the polynomial from the remainder and we're back to where we started,
|
||||||
ready to process the next bit.
|
ready to process the next bit.
|
||||||
|
|
||||||
A big-endian CRC written this way would be coded like:
|
A big-endian CRC written this way would be coded like::
|
||||||
|
|
||||||
for (i = 0; i < input_bits; i++) {
|
for (i = 0; i < input_bits; i++) {
|
||||||
multiple = remainder & 0x80000000 ? CRCPOLY : 0;
|
multiple = remainder & 0x80000000 ? CRCPOLY : 0;
|
||||||
remainder = (remainder << 1 | next_input_bit()) ^ multiple;
|
remainder = (remainder << 1 | next_input_bit()) ^ multiple;
|
||||||
@@ -54,12 +58,12 @@ the remainder don't actually affect any decision-making until
|
|||||||
32 bits later. Thus, the first 32 cycles of this are pretty boring.
|
32 bits later. Thus, the first 32 cycles of this are pretty boring.
|
||||||
Also, to add the CRC to a message, we need a 32-bit-long hole for it at
|
Also, to add the CRC to a message, we need a 32-bit-long hole for it at
|
||||||
the end, so we have to add 32 extra cycles shifting in zeros at the
|
the end, so we have to add 32 extra cycles shifting in zeros at the
|
||||||
end of every message,
|
end of every message.
|
||||||
|
|
||||||
These details lead to a standard trick: rearrange merging in the
|
These details lead to a standard trick: rearrange merging in the
|
||||||
next_input_bit() until the moment it's needed. Then the first 32 cycles
|
next_input_bit() until the moment it's needed. Then the first 32 cycles
|
||||||
can be precomputed, and merging in the final 32 zero bits to make room
|
can be precomputed, and merging in the final 32 zero bits to make room
|
||||||
for the CRC can be skipped entirely. This changes the code to:
|
for the CRC can be skipped entirely. This changes the code to::
|
||||||
|
|
||||||
for (i = 0; i < input_bits; i++) {
|
for (i = 0; i < input_bits; i++) {
|
||||||
remainder ^= next_input_bit() << 31;
|
remainder ^= next_input_bit() << 31;
|
||||||
@@ -67,7 +71,8 @@ for (i = 0; i < input_bits; i++) {
|
|||||||
remainder = (remainder << 1) ^ multiple;
|
remainder = (remainder << 1) ^ multiple;
|
||||||
}
|
}
|
||||||
|
|
||||||
With this optimization, the little-endian code is particularly simple:
|
With this optimization, the little-endian code is particularly simple::
|
||||||
|
|
||||||
for (i = 0; i < input_bits; i++) {
|
for (i = 0; i < input_bits; i++) {
|
||||||
remainder ^= next_input_bit();
|
remainder ^= next_input_bit();
|
||||||
multiple = (remainder & 1) ? CRCPOLY : 0;
|
multiple = (remainder & 1) ? CRCPOLY : 0;
|
||||||
@@ -81,7 +86,8 @@ be bit-reversed) and next_input_bit().
|
|||||||
|
|
||||||
As long as next_input_bit is returning the bits in a sensible order, we don't
|
As long as next_input_bit is returning the bits in a sensible order, we don't
|
||||||
*have* to wait until the last possible moment to merge in additional bits.
|
*have* to wait until the last possible moment to merge in additional bits.
|
||||||
We can do it 8 bits at a time rather than 1 bit at a time:
|
We can do it 8 bits at a time rather than 1 bit at a time::
|
||||||
|
|
||||||
for (i = 0; i < input_bytes; i++) {
|
for (i = 0; i < input_bytes; i++) {
|
||||||
remainder ^= next_input_byte() << 24;
|
remainder ^= next_input_byte() << 24;
|
||||||
for (j = 0; j < 8; j++) {
|
for (j = 0; j < 8; j++) {
|
||||||
@@ -90,7 +96,8 @@ for (i = 0; i < input_bytes; i++) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Or in little-endian:
|
Or in little-endian::
|
||||||
|
|
||||||
for (i = 0; i < input_bytes; i++) {
|
for (i = 0; i < input_bytes; i++) {
|
||||||
remainder ^= next_input_byte();
|
remainder ^= next_input_byte();
|
||||||
for (j = 0; j < 8; j++) {
|
for (j = 0; j < 8; j++) {
|
||||||
|
|||||||
@@ -10,6 +10,7 @@ Contents:
|
|||||||
- Signature verification.
|
- Signature verification.
|
||||||
- Asymmetric key subtypes.
|
- Asymmetric key subtypes.
|
||||||
- Instantiation data parsers.
|
- Instantiation data parsers.
|
||||||
|
- Keyring link restrictions.
|
||||||
|
|
||||||
|
|
||||||
========
|
========
|
||||||
@@ -318,7 +319,8 @@ KEYRING LINK RESTRICTIONS
|
|||||||
=========================
|
=========================
|
||||||
|
|
||||||
Keyrings created from userspace using add_key can be configured to check the
|
Keyrings created from userspace using add_key can be configured to check the
|
||||||
signature of the key being linked.
|
signature of the key being linked. Keys without a valid signature are not
|
||||||
|
allowed to link.
|
||||||
|
|
||||||
Several restriction methods are available:
|
Several restriction methods are available:
|
||||||
|
|
||||||
@@ -327,9 +329,10 @@ Several restriction methods are available:
|
|||||||
- Option string used with KEYCTL_RESTRICT_KEYRING:
|
- Option string used with KEYCTL_RESTRICT_KEYRING:
|
||||||
- "builtin_trusted"
|
- "builtin_trusted"
|
||||||
|
|
||||||
The kernel builtin trusted keyring will be searched for the signing
|
The kernel builtin trusted keyring will be searched for the signing key.
|
||||||
key. The ca_keys kernel parameter also affects which keys are used for
|
If the builtin trusted keyring is not configured, all links will be
|
||||||
signature verification.
|
rejected. The ca_keys kernel parameter also affects which keys are used
|
||||||
|
for signature verification.
|
||||||
|
|
||||||
(2) Restrict using the kernel builtin and secondary trusted keyrings
|
(2) Restrict using the kernel builtin and secondary trusted keyrings
|
||||||
|
|
||||||
@@ -337,8 +340,10 @@ Several restriction methods are available:
|
|||||||
- "builtin_and_secondary_trusted"
|
- "builtin_and_secondary_trusted"
|
||||||
|
|
||||||
The kernel builtin and secondary trusted keyrings will be searched for the
|
The kernel builtin and secondary trusted keyrings will be searched for the
|
||||||
signing key. The ca_keys kernel parameter also affects which keys are used
|
signing key. If the secondary trusted keyring is not configured, this
|
||||||
for signature verification.
|
restriction will behave like the "builtin_trusted" option. The ca_keys
|
||||||
|
kernel parameter also affects which keys are used for signature
|
||||||
|
verification.
|
||||||
|
|
||||||
(3) Restrict using a separate key or keyring
|
(3) Restrict using a separate key or keyring
|
||||||
|
|
||||||
@@ -354,7 +359,51 @@ Several restriction methods are available:
|
|||||||
When the "chain" option is provided at the end of the string, the keys
|
When the "chain" option is provided at the end of the string, the keys
|
||||||
within the destination keyring will also be searched for signing keys.
|
within the destination keyring will also be searched for signing keys.
|
||||||
This allows for verification of certificate chains by adding each
|
This allows for verification of certificate chains by adding each
|
||||||
cert in order (starting closest to the root) to one keyring.
|
certificate in order (starting closest to the root) to a keyring. For
|
||||||
|
instance, one keyring can be populated with links to a set of root
|
||||||
|
certificates, with a separate, restricted keyring set up for each
|
||||||
|
certificate chain to be validated:
|
||||||
|
|
||||||
|
# Create and populate a keyring for root certificates
|
||||||
|
root_id=`keyctl add keyring root-certs "" @s`
|
||||||
|
keyctl padd asymmetric "" $root_id < root1.cert
|
||||||
|
keyctl padd asymmetric "" $root_id < root2.cert
|
||||||
|
|
||||||
|
# Create and restrict a keyring for the certificate chain
|
||||||
|
chain_id=`keyctl add keyring chain "" @s`
|
||||||
|
keyctl restrict_keyring $chain_id asymmetric key_or_keyring:$root_id:chain
|
||||||
|
|
||||||
|
# Attempt to add each certificate in the chain, starting with the
|
||||||
|
# certificate closest to the root.
|
||||||
|
keyctl padd asymmetric "" $chain_id < intermediateA.cert
|
||||||
|
keyctl padd asymmetric "" $chain_id < intermediateB.cert
|
||||||
|
keyctl padd asymmetric "" $chain_id < end-entity.cert
|
||||||
|
|
||||||
|
If the final end-entity certificate is successfully added to the "chain"
|
||||||
|
keyring, we can be certain that it has a valid signing chain going back to
|
||||||
|
one of the root certificates.
|
||||||
|
|
||||||
|
A single keyring can be used to verify a chain of signatures by
|
||||||
|
restricting the keyring after linking the root certificate:
|
||||||
|
|
||||||
|
# Create a keyring for the certificate chain and add the root
|
||||||
|
chain2_id=`keyctl add keyring chain2 "" @s`
|
||||||
|
keyctl padd asymmetric "" $chain2_id < root1.cert
|
||||||
|
|
||||||
|
# Restrict the keyring that already has root1.cert linked. The cert
|
||||||
|
# will remain linked by the keyring.
|
||||||
|
keyctl restrict_keyring $chain2_id asymmetric key_or_keyring:0:chain
|
||||||
|
|
||||||
|
# Attempt to add each certificate in the chain, starting with the
|
||||||
|
# certificate closest to the root.
|
||||||
|
keyctl padd asymmetric "" $chain2_id < intermediateA.cert
|
||||||
|
keyctl padd asymmetric "" $chain2_id < intermediateB.cert
|
||||||
|
keyctl padd asymmetric "" $chain2_id < end-entity.cert
|
||||||
|
|
||||||
|
If the final end-entity certificate is successfully added to the "chain2"
|
||||||
|
keyring, we can be certain that there is a valid signing chain going back
|
||||||
|
to the root certificate that was added before the keyring was restricted.
|
||||||
|
|
||||||
|
|
||||||
In all of these cases, if the signing key is found the signature of the key to
|
In all of these cases, if the signing key is found the signature of the key to
|
||||||
be linked will be verified using the signing key. The requested key is added
|
be linked will be verified using the signing key. The requested key is added
|
||||||
|
|||||||
@@ -1,4 +1,9 @@
|
|||||||
|
===================================
|
||||||
|
Dell Systems Management Base Driver
|
||||||
|
===================================
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
The Dell Systems Management Base Driver provides a sysfs interface for
|
The Dell Systems Management Base Driver provides a sysfs interface for
|
||||||
systems management software such as Dell OpenManage to perform system
|
systems management software such as Dell OpenManage to perform system
|
||||||
@@ -17,6 +22,7 @@ more information about the libsmbios project.
|
|||||||
|
|
||||||
|
|
||||||
System Management Interrupt
|
System Management Interrupt
|
||||||
|
===========================
|
||||||
|
|
||||||
On some Dell systems, systems management software must access certain
|
On some Dell systems, systems management software must access certain
|
||||||
management information via a system management interrupt (SMI). The SMI data
|
management information via a system management interrupt (SMI). The SMI data
|
||||||
@@ -24,7 +30,7 @@ buffer must reside in 32-bit address space, and the physical address of the
|
|||||||
buffer is required for the SMI. The driver maintains the memory required for
|
buffer is required for the SMI. The driver maintains the memory required for
|
||||||
the SMI and provides a way for the application to generate the SMI.
|
the SMI and provides a way for the application to generate the SMI.
|
||||||
The driver creates the following sysfs entries for systems management
|
The driver creates the following sysfs entries for systems management
|
||||||
software to perform these system management interrupts:
|
software to perform these system management interrupts::
|
||||||
|
|
||||||
/sys/devices/platform/dcdbas/smi_data
|
/sys/devices/platform/dcdbas/smi_data
|
||||||
/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
|
/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
|
||||||
@@ -43,6 +49,7 @@ a SMI using this driver:
|
|||||||
|
|
||||||
|
|
||||||
Host Control Action
|
Host Control Action
|
||||||
|
===================
|
||||||
|
|
||||||
Dell OpenManage supports a host control feature that allows the administrator
|
Dell OpenManage supports a host control feature that allows the administrator
|
||||||
to perform a power cycle or power off of the system after the OS has finished
|
to perform a power cycle or power off of the system after the OS has finished
|
||||||
@@ -69,12 +76,14 @@ power off host control action using this driver:
|
|||||||
|
|
||||||
|
|
||||||
Host Control SMI Type
|
Host Control SMI Type
|
||||||
|
=====================
|
||||||
|
|
||||||
The following table shows the value to write to host_control_smi_type to
|
The following table shows the value to write to host_control_smi_type to
|
||||||
perform a power cycle or power off host control action:
|
perform a power cycle or power off host control action:
|
||||||
|
|
||||||
|
=================== =====================
|
||||||
PowerEdge System Host Control SMI Type
|
PowerEdge System Host Control SMI Type
|
||||||
---------------- ---------------------
|
=================== =====================
|
||||||
300 HC_SMITYPE_TYPE1
|
300 HC_SMITYPE_TYPE1
|
||||||
1300 HC_SMITYPE_TYPE1
|
1300 HC_SMITYPE_TYPE1
|
||||||
1400 HC_SMITYPE_TYPE2
|
1400 HC_SMITYPE_TYPE2
|
||||||
@@ -87,5 +96,4 @@ PowerEdge System Host Control SMI Type
|
|||||||
1655MC HC_SMITYPE_TYPE2
|
1655MC HC_SMITYPE_TYPE2
|
||||||
700 HC_SMITYPE_TYPE3
|
700 HC_SMITYPE_TYPE3
|
||||||
750 HC_SMITYPE_TYPE3
|
750 HC_SMITYPE_TYPE3
|
||||||
|
=================== =====================
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
|
===========================================================================
|
||||||
Using physical DMA provided by OHCI-1394 FireWire controllers for debugging
|
Using physical DMA provided by OHCI-1394 FireWire controllers for debugging
|
||||||
---------------------------------------------------------------------------
|
===========================================================================
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
------------
|
------------
|
||||||
@@ -91,7 +91,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
|
|||||||
1) Verify that your hardware is supported:
|
1) Verify that your hardware is supported:
|
||||||
|
|
||||||
Load the firewire-ohci module and check your kernel logs.
|
Load the firewire-ohci module and check your kernel logs.
|
||||||
You should see a line similar to
|
You should see a line similar to::
|
||||||
|
|
||||||
firewire_ohci 0000:15:00.1: added OHCI v1.0 device as card 2, 4 IR + 4 IT
|
firewire_ohci 0000:15:00.1: added OHCI v1.0 device as card 2, 4 IR + 4 IT
|
||||||
... contexts, quirks 0x11
|
... contexts, quirks 0x11
|
||||||
@@ -113,7 +113,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
|
|||||||
stable connection and has matching connectors (there are small 4-pin and
|
stable connection and has matching connectors (there are small 4-pin and
|
||||||
large 6-pin FireWire ports) will do.
|
large 6-pin FireWire ports) will do.
|
||||||
|
|
||||||
If an driver is running on both machines you should see a line like
|
If an driver is running on both machines you should see a line like::
|
||||||
|
|
||||||
firewire_core 0000:15:00.1: created device fw1: GUID 00061b0020105917, S400
|
firewire_core 0000:15:00.1: created device fw1: GUID 00061b0020105917, S400
|
||||||
|
|
||||||
@@ -123,7 +123,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
|
|||||||
3) Test physical DMA using firescope:
|
3) Test physical DMA using firescope:
|
||||||
|
|
||||||
On the debug host, make sure that /dev/fw* is accessible,
|
On the debug host, make sure that /dev/fw* is accessible,
|
||||||
then start firescope:
|
then start firescope::
|
||||||
|
|
||||||
$ firescope
|
$ firescope
|
||||||
Port 0 (/dev/fw1) opened, 2 nodes detected
|
Port 0 (/dev/fw1) opened, 2 nodes detected
|
||||||
@@ -163,7 +163,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
|
|||||||
host loaded, reboot the debugged machine, booting the kernel which has
|
host loaded, reboot the debugged machine, booting the kernel which has
|
||||||
CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early.
|
CONFIG_PROVIDE_OHCI1394_DMA_INIT enabled, with the option ohci1394_dma=early.
|
||||||
|
|
||||||
Then, on the debugging host, run firescope, for example by using -A:
|
Then, on the debugging host, run firescope, for example by using -A::
|
||||||
|
|
||||||
firescope -A System.map-of-debug-target-kernel
|
firescope -A System.map-of-debug-target-kernel
|
||||||
|
|
||||||
@@ -178,6 +178,7 @@ Step-by-step instructions for using firescope with early OHCI initialization:
|
|||||||
|
|
||||||
Notes
|
Notes
|
||||||
-----
|
-----
|
||||||
|
|
||||||
Documentation and specifications: http://halobates.de/firewire/
|
Documentation and specifications: http://halobates.de/firewire/
|
||||||
|
|
||||||
FireWire is a trademark of Apple Inc. - for more information please refer to:
|
FireWire is a trademark of Apple Inc. - for more information please refer to:
|
||||||
|
|||||||
@@ -1,18 +1,30 @@
|
|||||||
Purpose:
|
=============================================================
|
||||||
Demonstrate the usage of the new open sourced rbu (Remote BIOS Update) driver
|
Usage of the new open sourced rbu (Remote BIOS Update) driver
|
||||||
|
=============================================================
|
||||||
|
|
||||||
|
Purpose
|
||||||
|
=======
|
||||||
|
|
||||||
|
Document demonstrating the use of the Dell Remote BIOS Update driver.
|
||||||
for updating BIOS images on Dell servers and desktops.
|
for updating BIOS images on Dell servers and desktops.
|
||||||
|
|
||||||
Scope:
|
Scope
|
||||||
|
=====
|
||||||
|
|
||||||
This document discusses the functionality of the rbu driver only.
|
This document discusses the functionality of the rbu driver only.
|
||||||
It does not cover the support needed from applications to enable the BIOS to
|
It does not cover the support needed from applications to enable the BIOS to
|
||||||
update itself with the image downloaded in to the memory.
|
update itself with the image downloaded in to the memory.
|
||||||
|
|
||||||
Overview:
|
Overview
|
||||||
|
========
|
||||||
|
|
||||||
This driver works with Dell OpenManage or Dell Update Packages for updating
|
This driver works with Dell OpenManage or Dell Update Packages for updating
|
||||||
the BIOS on Dell servers (starting from servers sold since 1999), desktops
|
the BIOS on Dell servers (starting from servers sold since 1999), desktops
|
||||||
and notebooks (starting from those sold in 2005).
|
and notebooks (starting from those sold in 2005).
|
||||||
|
|
||||||
Please go to http://support.dell.com register and you can find info on
|
Please go to http://support.dell.com register and you can find info on
|
||||||
OpenManage and Dell Update packages (DUP).
|
OpenManage and Dell Update packages (DUP).
|
||||||
|
|
||||||
Libsmbios can also be used to update BIOS on Dell systems go to
|
Libsmbios can also be used to update BIOS on Dell systems go to
|
||||||
http://linux.dell.com/libsmbios/ for details.
|
http://linux.dell.com/libsmbios/ for details.
|
||||||
|
|
||||||
@@ -22,6 +34,7 @@ of physical pages having the BIOS image. In case of packetized the app
|
|||||||
using the driver breaks the image in to packets of fixed sizes and the driver
|
using the driver breaks the image in to packets of fixed sizes and the driver
|
||||||
would place each packet in contiguous physical memory. The driver also
|
would place each packet in contiguous physical memory. The driver also
|
||||||
maintains a link list of packets for reading them back.
|
maintains a link list of packets for reading them back.
|
||||||
|
|
||||||
If the dell_rbu driver is unloaded all the allocated memory is freed.
|
If the dell_rbu driver is unloaded all the allocated memory is freed.
|
||||||
|
|
||||||
The rbu driver needs to have an application (as mentioned above)which will
|
The rbu driver needs to have an application (as mentioned above)which will
|
||||||
@@ -30,7 +43,8 @@ inform the BIOS to enable the update in the next system reboot.
|
|||||||
The user should not unload the rbu driver after downloading the BIOS image
|
The user should not unload the rbu driver after downloading the BIOS image
|
||||||
or updating.
|
or updating.
|
||||||
|
|
||||||
The driver load creates the following directories under the /sys file system.
|
The driver load creates the following directories under the /sys file system::
|
||||||
|
|
||||||
/sys/class/firmware/dell_rbu/loading
|
/sys/class/firmware/dell_rbu/loading
|
||||||
/sys/class/firmware/dell_rbu/data
|
/sys/class/firmware/dell_rbu/data
|
||||||
/sys/devices/platform/dell_rbu/image_type
|
/sys/devices/platform/dell_rbu/image_type
|
||||||
@@ -41,17 +55,21 @@ The driver supports two types of update mechanism; monolithic and packetized.
|
|||||||
These update mechanism depends upon the BIOS currently running on the system.
|
These update mechanism depends upon the BIOS currently running on the system.
|
||||||
Most of the Dell systems support a monolithic update where the BIOS image is
|
Most of the Dell systems support a monolithic update where the BIOS image is
|
||||||
copied to a single contiguous block of physical memory.
|
copied to a single contiguous block of physical memory.
|
||||||
|
|
||||||
In case of packet mechanism the single memory can be broken in smaller chunks
|
In case of packet mechanism the single memory can be broken in smaller chunks
|
||||||
of contiguous memory and the BIOS image is scattered in these packets.
|
of contiguous memory and the BIOS image is scattered in these packets.
|
||||||
|
|
||||||
By default the driver uses monolithic memory for the update type. This can be
|
By default the driver uses monolithic memory for the update type. This can be
|
||||||
changed to packets during the driver load time by specifying the load
|
changed to packets during the driver load time by specifying the load
|
||||||
parameter image_type=packet. This can also be changed later as below
|
parameter image_type=packet. This can also be changed later as below::
|
||||||
|
|
||||||
echo packet > /sys/devices/platform/dell_rbu/image_type
|
echo packet > /sys/devices/platform/dell_rbu/image_type
|
||||||
|
|
||||||
In packet update mode the packet size has to be given before any packets can
|
In packet update mode the packet size has to be given before any packets can
|
||||||
be downloaded. It is done as below
|
be downloaded. It is done as below::
|
||||||
|
|
||||||
echo XXXX > /sys/devices/platform/dell_rbu/packet_size
|
echo XXXX > /sys/devices/platform/dell_rbu/packet_size
|
||||||
|
|
||||||
In the packet update mechanism, the user needs to create a new file having
|
In the packet update mechanism, the user needs to create a new file having
|
||||||
packets of data arranged back to back. It can be done as follows
|
packets of data arranged back to back. It can be done as follows
|
||||||
The user creates packets header, gets the chunk of the BIOS image and
|
The user creates packets header, gets the chunk of the BIOS image and
|
||||||
@@ -60,39 +78,52 @@ added together should match the specified packet_size. This makes one
|
|||||||
packet, the user needs to create more such packets out of the entire BIOS
|
packet, the user needs to create more such packets out of the entire BIOS
|
||||||
image file and then arrange all these packets back to back in to one single
|
image file and then arrange all these packets back to back in to one single
|
||||||
file.
|
file.
|
||||||
|
|
||||||
This file is then copied to /sys/class/firmware/dell_rbu/data.
|
This file is then copied to /sys/class/firmware/dell_rbu/data.
|
||||||
Once this file gets to the driver, the driver extracts packet_size data from
|
Once this file gets to the driver, the driver extracts packet_size data from
|
||||||
the file and spreads it across the physical memory in contiguous packet_sized
|
the file and spreads it across the physical memory in contiguous packet_sized
|
||||||
space.
|
space.
|
||||||
|
|
||||||
This method makes sure that all the packets get to the driver in a single operation.
|
This method makes sure that all the packets get to the driver in a single operation.
|
||||||
|
|
||||||
In monolithic update the user simply get the BIOS image (.hdr file) and copies
|
In monolithic update the user simply get the BIOS image (.hdr file) and copies
|
||||||
to the data file as is without any change to the BIOS image itself.
|
to the data file as is without any change to the BIOS image itself.
|
||||||
|
|
||||||
Do the steps below to download the BIOS image.
|
Do the steps below to download the BIOS image.
|
||||||
|
|
||||||
1) echo 1 > /sys/class/firmware/dell_rbu/loading
|
1) echo 1 > /sys/class/firmware/dell_rbu/loading
|
||||||
2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
|
2) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
|
||||||
3) echo 0 > /sys/class/firmware/dell_rbu/loading
|
3) echo 0 > /sys/class/firmware/dell_rbu/loading
|
||||||
|
|
||||||
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
|
The /sys/class/firmware/dell_rbu/ entries will remain till the following is
|
||||||
done.
|
done.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
echo -1 > /sys/class/firmware/dell_rbu/loading
|
echo -1 > /sys/class/firmware/dell_rbu/loading
|
||||||
|
|
||||||
Until this step is completed the driver cannot be unloaded.
|
Until this step is completed the driver cannot be unloaded.
|
||||||
|
|
||||||
Also echoing either mono, packet or init in to image_type will free up the
|
Also echoing either mono, packet or init in to image_type will free up the
|
||||||
memory allocated by the driver.
|
memory allocated by the driver.
|
||||||
|
|
||||||
If a user by accident executes steps 1 and 3 above without executing step 2;
|
If a user by accident executes steps 1 and 3 above without executing step 2;
|
||||||
it will make the /sys/class/firmware/dell_rbu/ entries disappear.
|
it will make the /sys/class/firmware/dell_rbu/ entries disappear.
|
||||||
The entries can be recreated by doing the following
|
|
||||||
|
The entries can be recreated by doing the following::
|
||||||
|
|
||||||
echo init > /sys/devices/platform/dell_rbu/image_type
|
echo init > /sys/devices/platform/dell_rbu/image_type
|
||||||
NOTE: echoing init in image_type does not change it original value.
|
|
||||||
|
.. note:: echoing init in image_type does not change it original value.
|
||||||
|
|
||||||
Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
|
Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
|
||||||
read back the image downloaded.
|
read back the image downloaded.
|
||||||
|
|
||||||
NOTE:
|
.. note::
|
||||||
|
|
||||||
This driver requires a patch for firmware_class.c which has the modified
|
This driver requires a patch for firmware_class.c which has the modified
|
||||||
request_firmware_nowait function.
|
request_firmware_nowait function.
|
||||||
|
|
||||||
Also after updating the BIOS image a user mode application needs to execute
|
Also after updating the BIOS image a user mode application needs to execute
|
||||||
code which sends the BIOS update request to the BIOS. So on the next reboot
|
code which sends the BIOS update request to the BIOS. So on the next reboot
|
||||||
the BIOS knows about the new image downloaded and it updates itself.
|
the BIOS knows about the new image downloaded and it updates itself.
|
||||||
|
|||||||
31
Documentation/devicetree/bindings/clock/img,boston-clock.txt
Normal file
31
Documentation/devicetree/bindings/clock/img,boston-clock.txt
Normal file
@@ -0,0 +1,31 @@
|
|||||||
|
Binding for Imagination Technologies MIPS Boston clock sources.
|
||||||
|
|
||||||
|
This binding uses the common clock binding[1].
|
||||||
|
|
||||||
|
[1] Documentation/devicetree/bindings/clock/clock-bindings.txt
|
||||||
|
|
||||||
|
The device node must be a child node of the syscon node corresponding to the
|
||||||
|
Boston system's platform registers.
|
||||||
|
|
||||||
|
Required properties:
|
||||||
|
- compatible : Should be "img,boston-clock".
|
||||||
|
- #clock-cells : Should be set to 1.
|
||||||
|
Values available for clock consumers can be found in the header file:
|
||||||
|
<dt-bindings/clock/boston-clock.h>
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
system-controller@17ffd000 {
|
||||||
|
compatible = "img,boston-platform-regs", "syscon";
|
||||||
|
reg = <0x17ffd000 0x1000>;
|
||||||
|
|
||||||
|
clk_boston: clock {
|
||||||
|
compatible = "img,boston-clock";
|
||||||
|
#clock-cells = <1>;
|
||||||
|
};
|
||||||
|
};
|
||||||
|
|
||||||
|
uart0: uart@17ffe000 {
|
||||||
|
/* ... */
|
||||||
|
clocks = <&clk_boston BOSTON_CLK_SYS>;
|
||||||
|
};
|
||||||
@@ -3,10 +3,23 @@
|
|||||||
Required properties:
|
Required properties:
|
||||||
- compatible : should be one of the following:
|
- compatible : should be one of the following:
|
||||||
"altr,socfpga-denali-nand" - for Altera SOCFPGA
|
"altr,socfpga-denali-nand" - for Altera SOCFPGA
|
||||||
|
"socionext,uniphier-denali-nand-v5a" - for Socionext UniPhier (v5a)
|
||||||
|
"socionext,uniphier-denali-nand-v5b" - for Socionext UniPhier (v5b)
|
||||||
- reg : should contain registers location and length for data and reg.
|
- reg : should contain registers location and length for data and reg.
|
||||||
- reg-names: Should contain the reg names "nand_data" and "denali_reg"
|
- reg-names: Should contain the reg names "nand_data" and "denali_reg"
|
||||||
- interrupts : The interrupt number.
|
- interrupts : The interrupt number.
|
||||||
|
|
||||||
|
Optional properties:
|
||||||
|
- nand-ecc-step-size: see nand.txt for details. If present, the value must be
|
||||||
|
512 for "altr,socfpga-denali-nand"
|
||||||
|
1024 for "socionext,uniphier-denali-nand-v5a"
|
||||||
|
1024 for "socionext,uniphier-denali-nand-v5b"
|
||||||
|
- nand-ecc-strength: see nand.txt for details. Valid values are:
|
||||||
|
8, 15 for "altr,socfpga-denali-nand"
|
||||||
|
8, 16, 24 for "socionext,uniphier-denali-nand-v5a"
|
||||||
|
8, 16 for "socionext,uniphier-denali-nand-v5b"
|
||||||
|
- nand-ecc-maximize: see nand.txt for details
|
||||||
|
|
||||||
The device tree may optionally contain sub-nodes describing partitions of the
|
The device tree may optionally contain sub-nodes describing partitions of the
|
||||||
address space. See partition.txt for more detail.
|
address space. See partition.txt for more detail.
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,7 @@
|
|||||||
Error location module
|
Error location module
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- compatible: Must be "ti,am33xx-elm"
|
- compatible: Must be "ti,am3352-elm"
|
||||||
- reg: physical base address and size of the registers map.
|
- reg: physical base address and size of the registers map.
|
||||||
- interrupts: Interrupt number for the elm.
|
- interrupts: Interrupt number for the elm.
|
||||||
|
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ the GPMC controller with a name of "nand".
|
|||||||
|
|
||||||
All timing relevant properties as well as generic gpmc child properties are
|
All timing relevant properties as well as generic gpmc child properties are
|
||||||
explained in a separate documents - please refer to
|
explained in a separate documents - please refer to
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
For NAND specific properties such as ECC modes or bus width, please refer to
|
For NAND specific properties such as ECC modes or bus width, please refer to
|
||||||
Documentation/devicetree/bindings/mtd/nand.txt
|
Documentation/devicetree/bindings/mtd/nand.txt
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ child nodes of the GPMC controller with a name of "nor".
|
|||||||
|
|
||||||
All timing relevant properties as well as generic GPMC child properties are
|
All timing relevant properties as well as generic GPMC child properties are
|
||||||
explained in a separate documents. Please refer to
|
explained in a separate documents. Please refer to
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- bank-width: Width of NOR flash in bytes. GPMC supports 8-bit and
|
- bank-width: Width of NOR flash in bytes. GPMC supports 8-bit and
|
||||||
@@ -28,7 +28,7 @@ Required properties:
|
|||||||
|
|
||||||
Optional properties:
|
Optional properties:
|
||||||
- gpmc,XXX Additional GPMC timings and settings parameters. See
|
- gpmc,XXX Additional GPMC timings and settings parameters. See
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
Optional properties for partition table parsing:
|
Optional properties for partition table parsing:
|
||||||
- #address-cells: should be set to 1
|
- #address-cells: should be set to 1
|
||||||
|
|||||||
@@ -5,7 +5,7 @@ the GPMC controller with a name of "onenand".
|
|||||||
|
|
||||||
All timing relevant properties as well as generic gpmc child properties are
|
All timing relevant properties as well as generic gpmc child properties are
|
||||||
explained in a separate documents - please refer to
|
explained in a separate documents - please refer to
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
|
|
||||||
|
|||||||
@@ -4,7 +4,12 @@ The GPMI nand controller provides an interface to control the
|
|||||||
NAND flash chips.
|
NAND flash chips.
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- compatible : should be "fsl,<chip>-gpmi-nand"
|
- compatible : should be "fsl,<chip>-gpmi-nand", chip can be:
|
||||||
|
* imx23
|
||||||
|
* imx28
|
||||||
|
* imx6q
|
||||||
|
* imx6sx
|
||||||
|
* imx7d
|
||||||
- reg : should contain registers location and length for gpmi and bch.
|
- reg : should contain registers location and length for gpmi and bch.
|
||||||
- reg-names: Should contain the reg names "gpmi-nand" and "bch"
|
- reg-names: Should contain the reg names "gpmi-nand" and "bch"
|
||||||
- interrupts : BCH interrupt number.
|
- interrupts : BCH interrupt number.
|
||||||
@@ -13,6 +18,13 @@ Required properties:
|
|||||||
and GPMI DMA channel ID.
|
and GPMI DMA channel ID.
|
||||||
Refer to dma.txt and fsl-mxs-dma.txt for details.
|
Refer to dma.txt and fsl-mxs-dma.txt for details.
|
||||||
- dma-names: Must be "rx-tx".
|
- dma-names: Must be "rx-tx".
|
||||||
|
- clocks : clocks phandle and clock specifier corresponding to each clock
|
||||||
|
specified in clock-names.
|
||||||
|
- clock-names : The "gpmi_io" clock is always required. Which clocks are
|
||||||
|
exactly required depends on chip:
|
||||||
|
* imx23/imx28 : "gpmi_io"
|
||||||
|
* imx6q/sx : "gpmi_io", "gpmi_apb", "gpmi_bch", "gpmi_bch_apb", "per1_bch"
|
||||||
|
* imx7d : "gpmi_io", "gpmi_bch_apb"
|
||||||
|
|
||||||
Optional properties:
|
Optional properties:
|
||||||
- nand-on-flash-bbt: boolean to enable on flash bbt option if not
|
- nand-on-flash-bbt: boolean to enable on flash bbt option if not
|
||||||
|
|||||||
@@ -0,0 +1,18 @@
|
|||||||
|
* MTD SPI driver for Microchip 23K256 (and similar) serial SRAM
|
||||||
|
|
||||||
|
Required properties:
|
||||||
|
- #address-cells, #size-cells : Must be present if the device has sub-nodes
|
||||||
|
representing partitions.
|
||||||
|
- compatible : Must be one of "microchip,mchp23k256" or "microchip,mchp23lcv1024"
|
||||||
|
- reg : Chip-Select number
|
||||||
|
- spi-max-frequency : Maximum frequency of the SPI bus the chip can operate at
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
spi-sram@0 {
|
||||||
|
#address-cells = <1>;
|
||||||
|
#size-cells = <1>;
|
||||||
|
compatible = "microchip,mchp23k256";
|
||||||
|
reg = <0>;
|
||||||
|
spi-max-frequency = <20000000>;
|
||||||
|
};
|
||||||
@@ -12,7 +12,8 @@ tree nodes.
|
|||||||
|
|
||||||
The first part of NFC is NAND Controller Interface (NFI) HW.
|
The first part of NFC is NAND Controller Interface (NFI) HW.
|
||||||
Required NFI properties:
|
Required NFI properties:
|
||||||
- compatible: Should be "mediatek,mtxxxx-nfc".
|
- compatible: Should be one of "mediatek,mt2701-nfc",
|
||||||
|
"mediatek,mt2712-nfc".
|
||||||
- reg: Base physical address and size of NFI.
|
- reg: Base physical address and size of NFI.
|
||||||
- interrupts: Interrupts of NFI.
|
- interrupts: Interrupts of NFI.
|
||||||
- clocks: NFI required clocks.
|
- clocks: NFI required clocks.
|
||||||
@@ -141,7 +142,7 @@ Example:
|
|||||||
==============
|
==============
|
||||||
|
|
||||||
Required BCH properties:
|
Required BCH properties:
|
||||||
- compatible: Should be "mediatek,mtxxxx-ecc".
|
- compatible: Should be one of "mediatek,mt2701-ecc", "mediatek,mt2712-ecc".
|
||||||
- reg: Base physical address and size of ECC.
|
- reg: Base physical address and size of ECC.
|
||||||
- interrupts: Interrupts of ECC.
|
- interrupts: Interrupts of ECC.
|
||||||
- clocks: ECC required clocks.
|
- clocks: ECC required clocks.
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ Optional NAND chip properties:
|
|||||||
|
|
||||||
- nand-ecc-mode : String, operation mode of the NAND ecc mode.
|
- nand-ecc-mode : String, operation mode of the NAND ecc mode.
|
||||||
Supported values are: "none", "soft", "hw", "hw_syndrome",
|
Supported values are: "none", "soft", "hw", "hw_syndrome",
|
||||||
"hw_oob_first".
|
"hw_oob_first", "on-die".
|
||||||
Deprecated values:
|
Deprecated values:
|
||||||
"soft_bch": use "soft" and nand-ecc-algo instead
|
"soft_bch": use "soft" and nand-ecc-algo instead
|
||||||
- nand-ecc-algo: string, algorithm of NAND ECC.
|
- nand-ecc-algo: string, algorithm of NAND ECC.
|
||||||
|
|||||||
@@ -1,29 +1,49 @@
|
|||||||
Representing flash partitions in devicetree
|
Flash partitions in device tree
|
||||||
|
===============================
|
||||||
|
|
||||||
Partitions can be represented by sub-nodes of an mtd device. This can be used
|
Flash devices can be partitioned into one or more functional ranges (e.g. "boot
|
||||||
|
code", "nvram", "kernel").
|
||||||
|
|
||||||
|
Different devices may be partitioned in a different ways. Some may use a fixed
|
||||||
|
flash layout set at production time. Some may use on-flash table that describes
|
||||||
|
the geometry and naming/purpose of each functional region. It is also possible
|
||||||
|
to see these methods mixed.
|
||||||
|
|
||||||
|
To assist system software in locating partitions, we allow describing which
|
||||||
|
method is used for a given flash device. To describe the method there should be
|
||||||
|
a subnode of the flash device that is named 'partitions'. It must have a
|
||||||
|
'compatible' property, which is used to identify the method to use.
|
||||||
|
|
||||||
|
We currently only document a binding for fixed layouts.
|
||||||
|
|
||||||
|
|
||||||
|
Fixed Partitions
|
||||||
|
================
|
||||||
|
|
||||||
|
Partitions can be represented by sub-nodes of a flash device. This can be used
|
||||||
on platforms which have strong conventions about which portions of a flash are
|
on platforms which have strong conventions about which portions of a flash are
|
||||||
used for what purposes, but which don't use an on-flash partition table such
|
used for what purposes, but which don't use an on-flash partition table such
|
||||||
as RedBoot.
|
as RedBoot.
|
||||||
|
|
||||||
The partition table should be a subnode of the mtd node and should be named
|
The partition table should be a subnode of the flash node and should be named
|
||||||
'partitions'. This node should have the following property:
|
'partitions'. This node should have the following property:
|
||||||
- compatible : (required) must be "fixed-partitions"
|
- compatible : (required) must be "fixed-partitions"
|
||||||
Partitions are then defined in subnodes of the partitions node.
|
Partitions are then defined in subnodes of the partitions node.
|
||||||
|
|
||||||
For backwards compatibility partitions as direct subnodes of the mtd device are
|
For backwards compatibility partitions as direct subnodes of the flash device are
|
||||||
supported. This use is discouraged.
|
supported. This use is discouraged.
|
||||||
NOTE: also for backwards compatibility, direct subnodes that have a compatible
|
NOTE: also for backwards compatibility, direct subnodes that have a compatible
|
||||||
string are not considered partitions, as they may be used for other bindings.
|
string are not considered partitions, as they may be used for other bindings.
|
||||||
|
|
||||||
#address-cells & #size-cells must both be present in the partitions subnode of the
|
#address-cells & #size-cells must both be present in the partitions subnode of the
|
||||||
mtd device. There are two valid values for both:
|
flash device. There are two valid values for both:
|
||||||
<1>: for partitions that require a single 32-bit cell to represent their
|
<1>: for partitions that require a single 32-bit cell to represent their
|
||||||
size/address (aka the value is below 4 GiB)
|
size/address (aka the value is below 4 GiB)
|
||||||
<2>: for partitions that require two 32-bit cells to represent their
|
<2>: for partitions that require two 32-bit cells to represent their
|
||||||
size/address (aka the value is 4 GiB or greater).
|
size/address (aka the value is 4 GiB or greater).
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- reg : The partition's offset and size within the mtd bank.
|
- reg : The partition's offset and size within the flash
|
||||||
|
|
||||||
Optional properties:
|
Optional properties:
|
||||||
- label : The label / name for this partition. If omitted, the label is taken
|
- label : The label / name for this partition. If omitted, the label is taken
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ Required properties:
|
|||||||
- reg-names: Names of the registers.
|
- reg-names: Names of the registers.
|
||||||
"amac_base": Address and length of the GMAC registers
|
"amac_base": Address and length of the GMAC registers
|
||||||
"idm_base": Address and length of the GMAC IDM registers
|
"idm_base": Address and length of the GMAC IDM registers
|
||||||
|
(required for NSP and Northstar2)
|
||||||
"nicpm_base": Address and length of the NIC Port Manager
|
"nicpm_base": Address and length of the NIC Port Manager
|
||||||
registers (required for Northstar2)
|
registers (required for Northstar2)
|
||||||
- interrupts: Interrupt number
|
- interrupts: Interrupt number
|
||||||
|
|||||||
@@ -1,24 +0,0 @@
|
|||||||
Broadcom GMAC Ethernet Controller Device Tree Bindings
|
|
||||||
-------------------------------------------------------------
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
- compatible: "brcm,bgmac-nsp"
|
|
||||||
- reg: Address and length of the GMAC registers,
|
|
||||||
Address and length of the GMAC IDM registers
|
|
||||||
- reg-names: Names of the registers. Must have both "gmac_base" and
|
|
||||||
"idm_base"
|
|
||||||
- interrupts: Interrupt number
|
|
||||||
|
|
||||||
Optional properties:
|
|
||||||
- mac-address: See ethernet.txt file in the same directory
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
|
|
||||||
gmac0: ethernet@18022000 {
|
|
||||||
compatible = "brcm,bgmac-nsp";
|
|
||||||
reg = <0x18022000 0x1000>,
|
|
||||||
<0x18110000 0x1000>;
|
|
||||||
reg-names = "gmac_base", "idm_base";
|
|
||||||
interrupts = <GIC_SPI 147 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
status = "disabled";
|
|
||||||
};
|
|
||||||
@@ -9,7 +9,7 @@ the GPMC controller with an "ethernet" name.
|
|||||||
|
|
||||||
All timing relevant properties as well as generic GPMC child properties are
|
All timing relevant properties as well as generic GPMC child properties are
|
||||||
explained in a separate documents. Please refer to
|
explained in a separate documents. Please refer to
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
For the properties relevant to the ethernet controller connected to the GPMC
|
For the properties relevant to the ethernet controller connected to the GPMC
|
||||||
refer to the binding documentation of the device. For example, the documentation
|
refer to the binding documentation of the device. For example, the documentation
|
||||||
@@ -43,7 +43,7 @@ Required properties:
|
|||||||
|
|
||||||
Optional properties:
|
Optional properties:
|
||||||
- gpmc,XXX Additional GPMC timings and settings parameters. See
|
- gpmc,XXX Additional GPMC timings and settings parameters. See
|
||||||
Documentation/devicetree/bindings/bus/ti-gpmc.txt
|
Documentation/devicetree/bindings/memory-controllers/omap-gpmc.txt
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
|
|||||||
@@ -1,13 +1,20 @@
|
|||||||
* Broadcom Digital Timing Engine(DTE) based PTP clock driver
|
* Broadcom Digital Timing Engine(DTE) based PTP clock
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- compatible: should be "brcm,ptp-dte"
|
- compatible: should contain the core compatibility string
|
||||||
|
and the SoC compatibility string. The SoC
|
||||||
|
compatibility string is to handle SoC specific
|
||||||
|
hardware differences.
|
||||||
|
Core compatibility string:
|
||||||
|
"brcm,ptp-dte"
|
||||||
|
SoC compatibility strings:
|
||||||
|
"brcm,iproc-ptp-dte" - for iproc based SoC's
|
||||||
- reg: address and length of the DTE block's NCO registers
|
- reg: address and length of the DTE block's NCO registers
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
ptp_dte: ptp_dte@180af650 {
|
ptp: ptp-dte@180af650 {
|
||||||
compatible = "brcm,ptp-dte";
|
compatible = "brcm,iproc-ptp-dte", "brcm,ptp-dte";
|
||||||
reg = <0x180af650 0x10>;
|
reg = <0x180af650 0x10>;
|
||||||
status = "okay";
|
status = "okay";
|
||||||
};
|
};
|
||||||
|
|||||||
@@ -2,7 +2,9 @@ Amlogic Meson PWM Controller
|
|||||||
============================
|
============================
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- compatible: Shall contain "amlogic,meson8b-pwm" or "amlogic,meson-gxbb-pwm".
|
- compatible: Shall contain "amlogic,meson8b-pwm"
|
||||||
|
or "amlogic,meson-gxbb-pwm"
|
||||||
|
or "amlogic,meson-gxbb-ao-pwm"
|
||||||
- #pwm-cells: Should be 3. See pwm.txt in this directory for a description of
|
- #pwm-cells: Should be 3. See pwm.txt in this directory for a description of
|
||||||
the cells format.
|
the cells format.
|
||||||
|
|
||||||
|
|||||||
@@ -24,7 +24,7 @@ Example:
|
|||||||
compatible = "st,stm32-timers";
|
compatible = "st,stm32-timers";
|
||||||
reg = <0x40010000 0x400>;
|
reg = <0x40010000 0x400>;
|
||||||
clocks = <&rcc 0 160>;
|
clocks = <&rcc 0 160>;
|
||||||
clock-names = "clk_int";
|
clock-names = "int";
|
||||||
|
|
||||||
pwm {
|
pwm {
|
||||||
compatible = "st,stm32-pwm";
|
compatible = "st,stm32-pwm";
|
||||||
|
|||||||
@@ -8,6 +8,7 @@ Required Properties:
|
|||||||
- "renesas,pwm-r8a7791": for R-Car M2-W
|
- "renesas,pwm-r8a7791": for R-Car M2-W
|
||||||
- "renesas,pwm-r8a7794": for R-Car E2
|
- "renesas,pwm-r8a7794": for R-Car E2
|
||||||
- "renesas,pwm-r8a7795": for R-Car H3
|
- "renesas,pwm-r8a7795": for R-Car H3
|
||||||
|
- "renesas,pwm-r8a7796": for R-Car M3-W
|
||||||
- reg: base address and length of the registers block for the PWM.
|
- reg: base address and length of the registers block for the PWM.
|
||||||
- #pwm-cells: should be 2. See pwm.txt in this directory for a description of
|
- #pwm-cells: should be 2. See pwm.txt in this directory for a description of
|
||||||
the cells format.
|
the cells format.
|
||||||
|
|||||||
@@ -0,0 +1,22 @@
|
|||||||
|
Broadcom STB wake-up Timer
|
||||||
|
|
||||||
|
The Broadcom STB wake-up timer provides a 27Mhz resolution timer, with the
|
||||||
|
ability to wake up the system from low-power suspend/standby modes.
|
||||||
|
|
||||||
|
Required properties:
|
||||||
|
- compatible : should contain "brcm,brcmstb-waketimer"
|
||||||
|
- reg : the register start and length for the WKTMR block
|
||||||
|
- interrupts : The TIMER interrupt
|
||||||
|
- interrupt-parent: The phandle to the Always-On (AON) Power Management (PM) L2
|
||||||
|
interrupt controller node
|
||||||
|
- clocks : The phandle to the UPG fixed clock (27Mhz domain)
|
||||||
|
|
||||||
|
Example:
|
||||||
|
|
||||||
|
waketimer@f0411580 {
|
||||||
|
compatible = "brcm,brcmstb-waketimer";
|
||||||
|
reg = <0xf0411580 0x14>;
|
||||||
|
interrupts = <0x3>;
|
||||||
|
interrupt-parent = <&aon_pm_l2_intc>;
|
||||||
|
clocks = <&upg_fixed>;
|
||||||
|
};
|
||||||
@@ -1,14 +0,0 @@
|
|||||||
* Cortina Systems Gemini RTC
|
|
||||||
|
|
||||||
Gemini SoC real-time clock.
|
|
||||||
|
|
||||||
Required properties:
|
|
||||||
- compatible : Should be "cortina,gemini-rtc"
|
|
||||||
|
|
||||||
Examples:
|
|
||||||
|
|
||||||
rtc@45000000 {
|
|
||||||
compatible = "cortina,gemini-rtc";
|
|
||||||
reg = <0x45000000 0x100>;
|
|
||||||
interrupts = <17 IRQ_TYPE_LEVEL_HIGH>;
|
|
||||||
};
|
|
||||||
28
Documentation/devicetree/bindings/rtc/faraday,ftrtc010.txt
Normal file
28
Documentation/devicetree/bindings/rtc/faraday,ftrtc010.txt
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
* Faraday Technology FTRTC010 Real Time Clock
|
||||||
|
|
||||||
|
This RTC appears in for example the Storlink Gemini family of
|
||||||
|
SoCs.
|
||||||
|
|
||||||
|
Required properties:
|
||||||
|
- compatible : Should be one of:
|
||||||
|
"faraday,ftrtc010"
|
||||||
|
"cortina,gemini-rtc", "faraday,ftrtc010"
|
||||||
|
|
||||||
|
Optional properties:
|
||||||
|
- clocks: when present should contain clock references to the
|
||||||
|
PCLK and EXTCLK clocks. Faraday calls the later CLK1HZ and
|
||||||
|
says the clock should be 1 Hz, but implementers actually seem
|
||||||
|
to choose different clocks here, like Cortina who chose
|
||||||
|
32768 Hz (a typical low-power clock).
|
||||||
|
- clock-names: should name the clocks "PCLK" and "EXTCLK"
|
||||||
|
respectively.
|
||||||
|
|
||||||
|
Examples:
|
||||||
|
|
||||||
|
rtc@45000000 {
|
||||||
|
compatible = "cortina,gemini-rtc";
|
||||||
|
reg = <0x45000000 0x100>;
|
||||||
|
interrupts = <17 IRQ_TYPE_LEVEL_HIGH>;
|
||||||
|
clocks = <&foo 0>, <&foo 1>;
|
||||||
|
clock-names = "PCLK", "EXTCLK";
|
||||||
|
};
|
||||||
@@ -1,17 +1,25 @@
|
|||||||
STM32 Real Time Clock
|
STM32 Real Time Clock
|
||||||
|
|
||||||
Required properties:
|
Required properties:
|
||||||
- compatible: "st,stm32-rtc".
|
- compatible: can be either "st,stm32-rtc" or "st,stm32h7-rtc", depending on
|
||||||
|
the device is compatible with stm32(f4/f7) or stm32h7.
|
||||||
- reg: address range of rtc register set.
|
- reg: address range of rtc register set.
|
||||||
- clocks: reference to the clock entry ck_rtc.
|
- clocks: can use up to two clocks, depending on part used:
|
||||||
|
- "rtc_ck": RTC clock source.
|
||||||
|
It is required on stm32(f4/f7) and stm32h7.
|
||||||
|
- "pclk": RTC APB interface clock.
|
||||||
|
It is not present on stm32(f4/f7).
|
||||||
|
It is required on stm32h7.
|
||||||
|
- clock-names: must be "rtc_ck" and "pclk".
|
||||||
|
It is required only on stm32h7.
|
||||||
- interrupt-parent: phandle for the interrupt controller.
|
- interrupt-parent: phandle for the interrupt controller.
|
||||||
- interrupts: rtc alarm interrupt.
|
- interrupts: rtc alarm interrupt.
|
||||||
- st,syscfg: phandle for pwrcfg, mandatory to disable/enable backup domain
|
- st,syscfg: phandle for pwrcfg, mandatory to disable/enable backup domain
|
||||||
(RTC registers) write protection.
|
(RTC registers) write protection.
|
||||||
|
|
||||||
Optional properties (to override default ck_rtc parent clock):
|
Optional properties (to override default rtc_ck parent clock):
|
||||||
- assigned-clocks: reference to the ck_rtc clock entry.
|
- assigned-clocks: reference to the rtc_ck clock entry.
|
||||||
- assigned-clock-parents: phandle of the new parent clock of ck_rtc.
|
- assigned-clock-parents: phandle of the new parent clock of rtc_ck.
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
@@ -25,3 +33,17 @@ Example:
|
|||||||
interrupts = <17 1>;
|
interrupts = <17 1>;
|
||||||
st,syscfg = <&pwrcfg>;
|
st,syscfg = <&pwrcfg>;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
rtc: rtc@58004000 {
|
||||||
|
compatible = "st,stm32h7-rtc";
|
||||||
|
reg = <0x58004000 0x400>;
|
||||||
|
clocks = <&rcc RTCAPB_CK>, <&rcc RTC_CK>;
|
||||||
|
clock-names = "pclk", "rtc_ck";
|
||||||
|
assigned-clocks = <&rcc RTC_CK>;
|
||||||
|
assigned-clock-parents = <&rcc LSE_CK>;
|
||||||
|
interrupt-parent = <&exti>;
|
||||||
|
interrupts = <17 1>;
|
||||||
|
interrupt-names = "alarm";
|
||||||
|
st,syscfg = <&pwrcfg>;
|
||||||
|
status = "disabled";
|
||||||
|
};
|
||||||
|
|||||||
@@ -1,13 +1,20 @@
|
|||||||
|
==================================
|
||||||
Digital Signature Verification API
|
Digital Signature Verification API
|
||||||
|
==================================
|
||||||
|
|
||||||
CONTENTS
|
:Author: Dmitry Kasatkin
|
||||||
|
:Date: 06.10.2011
|
||||||
|
|
||||||
|
|
||||||
|
.. CONTENTS
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
2. API
|
2. API
|
||||||
3. User-space utilities
|
3. User-space utilities
|
||||||
|
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
Digital signature verification API provides a method to verify digital signature.
|
Digital signature verification API provides a method to verify digital signature.
|
||||||
Currently digital signatures are used by the IMA/EVM integrity protection subsystem.
|
Currently digital signatures are used by the IMA/EVM integrity protection subsystem.
|
||||||
@@ -17,7 +24,7 @@ GnuPG multi-precision integers (MPI) library. The kernel port provides
|
|||||||
memory allocation errors handling, has been refactored according to kernel
|
memory allocation errors handling, has been refactored according to kernel
|
||||||
coding style, and checkpatch.pl reported errors and warnings have been fixed.
|
coding style, and checkpatch.pl reported errors and warnings have been fixed.
|
||||||
|
|
||||||
Public key and signature consist of header and MPIs.
|
Public key and signature consist of header and MPIs::
|
||||||
|
|
||||||
struct pubkey_hdr {
|
struct pubkey_hdr {
|
||||||
uint8_t version; /* key format version */
|
uint8_t version; /* key format version */
|
||||||
@@ -43,9 +50,10 @@ Such approach insures that key or signature header could not be changed.
|
|||||||
It protects timestamp from been changed and can be used for rollback
|
It protects timestamp from been changed and can be used for rollback
|
||||||
protection.
|
protection.
|
||||||
|
|
||||||
2. API
|
API
|
||||||
|
===
|
||||||
|
|
||||||
API currently includes only 1 function:
|
API currently includes only 1 function::
|
||||||
|
|
||||||
digsig_verify() - digital signature verification with public key
|
digsig_verify() - digital signature verification with public key
|
||||||
|
|
||||||
@@ -67,7 +75,8 @@ API currently includes only 1 function:
|
|||||||
int digsig_verify(struct key *keyring, const char *sig, int siglen,
|
int digsig_verify(struct key *keyring, const char *sig, int siglen,
|
||||||
const char *data, int datalen);
|
const char *data, int datalen);
|
||||||
|
|
||||||
3. User-space utilities
|
User-space utilities
|
||||||
|
====================
|
||||||
|
|
||||||
The signing and key management utilities evm-utils provide functionality
|
The signing and key management utilities evm-utils provide functionality
|
||||||
to generate signatures, to load keys into the kernel keyring.
|
to generate signatures, to load keys into the kernel keyring.
|
||||||
@@ -75,7 +84,7 @@ Keys can be in PEM or converted to the kernel format.
|
|||||||
When the key is added to the kernel keyring, the keyid defines the name
|
When the key is added to the kernel keyring, the keyid defines the name
|
||||||
of the key: 5D2B05FC633EE3E8 in the example bellow.
|
of the key: 5D2B05FC633EE3E8 in the example bellow.
|
||||||
|
|
||||||
Here is example output of the keyctl utility.
|
Here is example output of the keyctl utility::
|
||||||
|
|
||||||
$ keyctl show
|
$ keyctl show
|
||||||
Session Keyring
|
Session Keyring
|
||||||
@@ -90,7 +99,3 @@ Session Keyring
|
|||||||
$ keyctl list 128198054
|
$ keyctl list 128198054
|
||||||
1 key in keyring:
|
1 key in keyring:
|
||||||
620789745: --alswrv 0 0 user: 5D2B05FC633EE3E8
|
620789745: --alswrv 0 0 user: 5D2B05FC633EE3E8
|
||||||
|
|
||||||
|
|
||||||
Dmitry Kasatkin
|
|
||||||
06.10.2011
|
|
||||||
|
|||||||
@@ -106,9 +106,6 @@ Kernel utility functions
|
|||||||
.. kernel-doc:: kernel/sys.c
|
.. kernel-doc:: kernel/sys.c
|
||||||
:export:
|
:export:
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/srcu.c
|
|
||||||
:export:
|
|
||||||
|
|
||||||
.. kernel-doc:: kernel/rcu/tree.c
|
.. kernel-doc:: kernel/rcu/tree.c
|
||||||
:export:
|
:export:
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
|
=================
|
||||||
The EFI Boot Stub
|
The EFI Boot Stub
|
||||||
---------------------------
|
=================
|
||||||
|
|
||||||
On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
|
On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade
|
||||||
as a PE/COFF image, thereby convincing EFI firmware loaders to load
|
as a PE/COFF image, thereby convincing EFI firmware loaders to load
|
||||||
@@ -25,7 +26,8 @@ a certain sense it *IS* the boot loader.
|
|||||||
The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
|
The EFI boot stub is enabled with the CONFIG_EFI_STUB kernel option.
|
||||||
|
|
||||||
|
|
||||||
**** How to install bzImage.efi
|
How to install bzImage.efi
|
||||||
|
--------------------------
|
||||||
|
|
||||||
The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
|
The bzImage located in arch/x86/boot/bzImage must be copied to the EFI
|
||||||
System Partition (ESP) and renamed with the extension ".efi". Without
|
System Partition (ESP) and renamed with the extension ".efi". Without
|
||||||
@@ -37,14 +39,16 @@ may not need to be renamed. Similarly for arm64, arch/arm64/boot/Image
|
|||||||
should be copied but not necessarily renamed.
|
should be copied but not necessarily renamed.
|
||||||
|
|
||||||
|
|
||||||
**** Passing kernel parameters from the EFI shell
|
Passing kernel parameters from the EFI shell
|
||||||
|
--------------------------------------------
|
||||||
|
|
||||||
Arguments to the kernel can be passed after bzImage.efi, e.g.
|
Arguments to the kernel can be passed after bzImage.efi, e.g.::
|
||||||
|
|
||||||
fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
|
fs0:> bzImage.efi console=ttyS0 root=/dev/sda4
|
||||||
|
|
||||||
|
|
||||||
**** The "initrd=" option
|
The "initrd=" option
|
||||||
|
--------------------
|
||||||
|
|
||||||
Like most boot loaders, the EFI stub allows the user to specify
|
Like most boot loaders, the EFI stub allows the user to specify
|
||||||
multiple initrd files using the "initrd=" option. This is the only EFI
|
multiple initrd files using the "initrd=" option. This is the only EFI
|
||||||
@@ -54,7 +58,7 @@ kernel when it boots.
|
|||||||
The path to the initrd file must be an absolute path from the
|
The path to the initrd file must be an absolute path from the
|
||||||
beginning of the ESP, relative path names do not work. Also, the path
|
beginning of the ESP, relative path names do not work. Also, the path
|
||||||
is an EFI-style path and directory elements must be separated with
|
is an EFI-style path and directory elements must be separated with
|
||||||
backslashes (\). For example, given the following directory layout,
|
backslashes (\). For example, given the following directory layout::
|
||||||
|
|
||||||
fs0:>
|
fs0:>
|
||||||
Kernels\
|
Kernels\
|
||||||
@@ -66,7 +70,7 @@ fs0:>
|
|||||||
initrd-medium.img
|
initrd-medium.img
|
||||||
|
|
||||||
to boot with the initrd-large.img file if the current working
|
to boot with the initrd-large.img file if the current working
|
||||||
directory is fs0:\Kernels, the following command must be used,
|
directory is fs0:\Kernels, the following command must be used::
|
||||||
|
|
||||||
fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
|
fs0:\Kernels> bzImage.efi initrd=\Kernels\initrd-large.img
|
||||||
|
|
||||||
@@ -76,7 +80,8 @@ which understands relative paths, whereas the rest of the command line
|
|||||||
is passed to bzImage.efi.
|
is passed to bzImage.efi.
|
||||||
|
|
||||||
|
|
||||||
**** The "dtb=" option
|
The "dtb=" option
|
||||||
|
-----------------
|
||||||
|
|
||||||
For the ARM and arm64 architectures, we also need to be able to provide a
|
For the ARM and arm64 architectures, we also need to be able to provide a
|
||||||
device tree to the kernel. This is done with the "dtb=" command line option,
|
device tree to the kernel. This is done with the "dtb=" command line option,
|
||||||
|
|||||||
@@ -1,4 +1,8 @@
|
|||||||
EISA bus support (Marc Zyngier <maz@wild-wind.fr.eu.org>)
|
================
|
||||||
|
EISA bus support
|
||||||
|
================
|
||||||
|
|
||||||
|
:Author: Marc Zyngier <maz@wild-wind.fr.eu.org>
|
||||||
|
|
||||||
This document groups random notes about porting EISA drivers to the
|
This document groups random notes about porting EISA drivers to the
|
||||||
new EISA/sysfs API.
|
new EISA/sysfs API.
|
||||||
@@ -37,13 +41,16 @@ The EISA infrastructure is made up of three parts :
|
|||||||
Every function/structure below lives in <linux/eisa.h>, which depends
|
Every function/structure below lives in <linux/eisa.h>, which depends
|
||||||
heavily on <linux/device.h>.
|
heavily on <linux/device.h>.
|
||||||
|
|
||||||
** Bus root driver :
|
Bus root driver
|
||||||
|
===============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int eisa_root_register (struct eisa_root_device *root);
|
int eisa_root_register (struct eisa_root_device *root);
|
||||||
|
|
||||||
The eisa_root_register function is used to declare a device as the
|
The eisa_root_register function is used to declare a device as the
|
||||||
root of an EISA bus. The eisa_root_device structure holds a reference
|
root of an EISA bus. The eisa_root_device structure holds a reference
|
||||||
to this device, as well as some parameters for probing purposes.
|
to this device, as well as some parameters for probing purposes::
|
||||||
|
|
||||||
struct eisa_root_device {
|
struct eisa_root_device {
|
||||||
struct device *dev; /* Pointer to bridge device */
|
struct device *dev; /* Pointer to bridge device */
|
||||||
@@ -56,22 +63,29 @@ struct eisa_root_device {
|
|||||||
struct resource eisa_root_res; /* ditto */
|
struct resource eisa_root_res; /* ditto */
|
||||||
};
|
};
|
||||||
|
|
||||||
node : used for eisa_root_register internal purpose
|
============= ======================================================
|
||||||
dev : pointer to the root device
|
node used for eisa_root_register internal purpose
|
||||||
res : root device I/O resource
|
dev pointer to the root device
|
||||||
bus_base_addr : slot 0 address on this bus
|
res root device I/O resource
|
||||||
slots : max slot number to probe
|
bus_base_addr slot 0 address on this bus
|
||||||
force_probe : Probe even when slot 0 is empty (no EISA mainboard)
|
slots max slot number to probe
|
||||||
dma_mask : Default DMA mask. Usually the bridge device dma_mask.
|
force_probe Probe even when slot 0 is empty (no EISA mainboard)
|
||||||
bus_nr : unique bus id, set by eisa_root_register
|
dma_mask Default DMA mask. Usually the bridge device dma_mask.
|
||||||
|
bus_nr unique bus id, set by eisa_root_register
|
||||||
|
============= ======================================================
|
||||||
|
|
||||||
** Driver :
|
Driver
|
||||||
|
======
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int eisa_driver_register (struct eisa_driver *edrv);
|
int eisa_driver_register (struct eisa_driver *edrv);
|
||||||
void eisa_driver_unregister (struct eisa_driver *edrv);
|
void eisa_driver_unregister (struct eisa_driver *edrv);
|
||||||
|
|
||||||
Clear enough ?
|
Clear enough ?
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct eisa_device_id {
|
struct eisa_device_id {
|
||||||
char sig[EISA_SIG_LEN];
|
char sig[EISA_SIG_LEN];
|
||||||
unsigned long driver_data;
|
unsigned long driver_data;
|
||||||
@@ -82,16 +96,18 @@ struct eisa_driver {
|
|||||||
struct device_driver driver;
|
struct device_driver driver;
|
||||||
};
|
};
|
||||||
|
|
||||||
id_table : an array of NULL terminated EISA id strings,
|
=============== ====================================================
|
||||||
|
id_table an array of NULL terminated EISA id strings,
|
||||||
followed by an empty string. Each string can
|
followed by an empty string. Each string can
|
||||||
optionally be paired with a driver-dependent value
|
optionally be paired with a driver-dependent value
|
||||||
(driver_data).
|
(driver_data).
|
||||||
|
|
||||||
driver : a generic driver, such as described in
|
driver a generic driver, such as described in
|
||||||
Documentation/driver-model/driver.txt. Only .name,
|
Documentation/driver-model/driver.txt. Only .name,
|
||||||
.probe and .remove members are mandatory.
|
.probe and .remove members are mandatory.
|
||||||
|
=============== ====================================================
|
||||||
|
|
||||||
An example is the 3c59x driver :
|
An example is the 3c59x driver::
|
||||||
|
|
||||||
static struct eisa_device_id vortex_eisa_ids[] = {
|
static struct eisa_device_id vortex_eisa_ids[] = {
|
||||||
{ "TCM5920", EISA_3C592_OFFSET },
|
{ "TCM5920", EISA_3C592_OFFSET },
|
||||||
@@ -108,14 +124,15 @@ static struct eisa_driver vortex_eisa_driver = {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
** Device :
|
Device
|
||||||
|
======
|
||||||
|
|
||||||
The sysfs framework calls .probe and .remove functions upon device
|
The sysfs framework calls .probe and .remove functions upon device
|
||||||
discovery and removal (note that the .remove function is only called
|
discovery and removal (note that the .remove function is only called
|
||||||
when driver is built as a module).
|
when driver is built as a module).
|
||||||
|
|
||||||
Both functions are passed a pointer to a 'struct device', which is
|
Both functions are passed a pointer to a 'struct device', which is
|
||||||
encapsulated in a 'struct eisa_device' described as follows :
|
encapsulated in a 'struct eisa_device' described as follows::
|
||||||
|
|
||||||
struct eisa_device {
|
struct eisa_device {
|
||||||
struct eisa_device_id id;
|
struct eisa_device_id id;
|
||||||
@@ -127,55 +144,63 @@ struct eisa_device {
|
|||||||
struct device dev; /* generic device */
|
struct device dev; /* generic device */
|
||||||
};
|
};
|
||||||
|
|
||||||
id : EISA id, as read from device. id.driver_data is set from the
|
======== ============================================================
|
||||||
|
id EISA id, as read from device. id.driver_data is set from the
|
||||||
matching driver EISA id.
|
matching driver EISA id.
|
||||||
slot : slot number which the device was detected on
|
slot slot number which the device was detected on
|
||||||
state : set of flags indicating the state of the device. Current
|
state set of flags indicating the state of the device. Current
|
||||||
flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
|
flags are EISA_CONFIG_ENABLED and EISA_CONFIG_FORCED.
|
||||||
res : set of four 256 bytes I/O regions allocated to this device
|
res set of four 256 bytes I/O regions allocated to this device
|
||||||
dma_mask: DMA mask set from the parent device.
|
dma_mask DMA mask set from the parent device.
|
||||||
dev : generic device (see Documentation/driver-model/device.txt)
|
dev generic device (see Documentation/driver-model/device.txt)
|
||||||
|
======== ============================================================
|
||||||
|
|
||||||
You can get the 'struct eisa_device' from 'struct device' using the
|
You can get the 'struct eisa_device' from 'struct device' using the
|
||||||
'to_eisa_device' macro.
|
'to_eisa_device' macro.
|
||||||
|
|
||||||
** Misc stuff :
|
Misc stuff
|
||||||
|
==========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void eisa_set_drvdata (struct eisa_device *edev, void *data);
|
void eisa_set_drvdata (struct eisa_device *edev, void *data);
|
||||||
|
|
||||||
Stores data into the device's driver_data area.
|
Stores data into the device's driver_data area.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *eisa_get_drvdata (struct eisa_device *edev):
|
void *eisa_get_drvdata (struct eisa_device *edev):
|
||||||
|
|
||||||
Gets the pointer previously stored into the device's driver_data area.
|
Gets the pointer previously stored into the device's driver_data area.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int eisa_get_region_index (void *addr);
|
int eisa_get_region_index (void *addr);
|
||||||
|
|
||||||
Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given
|
Returns the region number (0 <= x < EISA_MAX_RESOURCES) of a given
|
||||||
address.
|
address.
|
||||||
|
|
||||||
** Kernel parameters :
|
Kernel parameters
|
||||||
|
=================
|
||||||
eisa_bus.enable_dev :
|
|
||||||
|
|
||||||
|
eisa_bus.enable_dev
|
||||||
A comma-separated list of slots to be enabled, even if the firmware
|
A comma-separated list of slots to be enabled, even if the firmware
|
||||||
set the card as disabled. The driver must be able to properly
|
set the card as disabled. The driver must be able to properly
|
||||||
initialize the device in such conditions.
|
initialize the device in such conditions.
|
||||||
|
|
||||||
eisa_bus.disable_dev :
|
eisa_bus.disable_dev
|
||||||
|
|
||||||
A comma-separated list of slots to be enabled, even if the firmware
|
A comma-separated list of slots to be enabled, even if the firmware
|
||||||
set the card as enabled. The driver won't be called to handle this
|
set the card as enabled. The driver won't be called to handle this
|
||||||
device.
|
device.
|
||||||
|
|
||||||
virtual_root.force_probe :
|
virtual_root.force_probe
|
||||||
|
|
||||||
Force the probing code to probe EISA slots even when it cannot find an
|
Force the probing code to probe EISA slots even when it cannot find an
|
||||||
EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
|
EISA compliant mainboard (nothing appears on slot 0). Defaults to 0
|
||||||
(don't force), and set to 1 (force probing) when either
|
(don't force), and set to 1 (force probing) when either
|
||||||
CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
|
CONFIG_ALPHA_JENSEN or CONFIG_EISA_VLB_PRIMING are set.
|
||||||
|
|
||||||
** Random notes :
|
Random notes
|
||||||
|
============
|
||||||
|
|
||||||
Converting an EISA driver to the new API mostly involves *deleting*
|
Converting an EISA driver to the new API mostly involves *deleting*
|
||||||
code (since probing is now in the core EISA code). Unfortunately, most
|
code (since probing is now in the core EISA code). Unfortunately, most
|
||||||
@@ -194,9 +219,11 @@ routine.
|
|||||||
For example, switching your favorite EISA SCSI card to the "hotplug"
|
For example, switching your favorite EISA SCSI card to the "hotplug"
|
||||||
model is "the right thing"(tm).
|
model is "the right thing"(tm).
|
||||||
|
|
||||||
** Thanks :
|
Thanks
|
||||||
|
======
|
||||||
|
|
||||||
I'd like to thank the following people for their help:
|
I'd like to thank the following people for their help:
|
||||||
|
|
||||||
- Xavier Benigni for lending me a wonderful Alpha Jensen,
|
- Xavier Benigni for lending me a wonderful Alpha Jensen,
|
||||||
- James Bottomley, Jeff Garzik for getting this stuff into the kernel,
|
- James Bottomley, Jeff Garzik for getting this stuff into the kernel,
|
||||||
- Andries Brouwer for contributing numerous EISA ids,
|
- Andries Brouwer for contributing numerous EISA ids,
|
||||||
|
|||||||
@@ -134,6 +134,23 @@ use the boot option:
|
|||||||
fail_futex=
|
fail_futex=
|
||||||
mmc_core.fail_request=<interval>,<probability>,<space>,<times>
|
mmc_core.fail_request=<interval>,<probability>,<space>,<times>
|
||||||
|
|
||||||
|
o proc entries
|
||||||
|
|
||||||
|
- /proc/<pid>/fail-nth:
|
||||||
|
- /proc/self/task/<tid>/fail-nth:
|
||||||
|
|
||||||
|
Write to this file of integer N makes N-th call in the task fail.
|
||||||
|
Read from this file returns a integer value. A value of '0' indicates
|
||||||
|
that the fault setup with a previous write to this file was injected.
|
||||||
|
A positive integer N indicates that the fault wasn't yet injected.
|
||||||
|
Note that this file enables all types of faults (slab, futex, etc).
|
||||||
|
This setting takes precedence over all other generic debugfs settings
|
||||||
|
like probability, interval, times, etc. But per-capability settings
|
||||||
|
(e.g. fail_futex/ignore-private) take precedence over it.
|
||||||
|
|
||||||
|
This feature is intended for systematic testing of faults in a single
|
||||||
|
system call. See an example below.
|
||||||
|
|
||||||
How to add new fault injection capability
|
How to add new fault injection capability
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
@@ -278,3 +295,65 @@ allocation failure.
|
|||||||
# env FAILCMD_TYPE=fail_page_alloc \
|
# env FAILCMD_TYPE=fail_page_alloc \
|
||||||
./tools/testing/fault-injection/failcmd.sh --times=100 \
|
./tools/testing/fault-injection/failcmd.sh --times=100 \
|
||||||
-- make -C tools/testing/selftests/ run_tests
|
-- make -C tools/testing/selftests/ run_tests
|
||||||
|
|
||||||
|
Systematic faults using fail-nth
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
The following code systematically faults 0-th, 1-st, 2-nd and so on
|
||||||
|
capabilities in the socketpair() system call.
|
||||||
|
|
||||||
|
#include <sys/types.h>
|
||||||
|
#include <sys/stat.h>
|
||||||
|
#include <sys/socket.h>
|
||||||
|
#include <sys/syscall.h>
|
||||||
|
#include <fcntl.h>
|
||||||
|
#include <unistd.h>
|
||||||
|
#include <string.h>
|
||||||
|
#include <stdlib.h>
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <errno.h>
|
||||||
|
|
||||||
|
int main()
|
||||||
|
{
|
||||||
|
int i, err, res, fail_nth, fds[2];
|
||||||
|
char buf[128];
|
||||||
|
|
||||||
|
system("echo N > /sys/kernel/debug/failslab/ignore-gfp-wait");
|
||||||
|
sprintf(buf, "/proc/self/task/%ld/fail-nth", syscall(SYS_gettid));
|
||||||
|
fail_nth = open(buf, O_RDWR);
|
||||||
|
for (i = 1;; i++) {
|
||||||
|
sprintf(buf, "%d", i);
|
||||||
|
write(fail_nth, buf, strlen(buf));
|
||||||
|
res = socketpair(AF_LOCAL, SOCK_STREAM, 0, fds);
|
||||||
|
err = errno;
|
||||||
|
pread(fail_nth, buf, sizeof(buf), 0);
|
||||||
|
if (res == 0) {
|
||||||
|
close(fds[0]);
|
||||||
|
close(fds[1]);
|
||||||
|
}
|
||||||
|
printf("%d-th fault %c: res=%d/%d\n", i, atoi(buf) ? 'N' : 'Y',
|
||||||
|
res, err);
|
||||||
|
if (atoi(buf))
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
An example output:
|
||||||
|
|
||||||
|
1-th fault Y: res=-1/23
|
||||||
|
2-th fault Y: res=-1/23
|
||||||
|
3-th fault Y: res=-1/12
|
||||||
|
4-th fault Y: res=-1/12
|
||||||
|
5-th fault Y: res=-1/23
|
||||||
|
6-th fault Y: res=-1/23
|
||||||
|
7-th fault Y: res=-1/23
|
||||||
|
8-th fault Y: res=-1/12
|
||||||
|
9-th fault Y: res=-1/12
|
||||||
|
10-th fault Y: res=-1/12
|
||||||
|
11-th fault Y: res=-1/12
|
||||||
|
12-th fault Y: res=-1/12
|
||||||
|
13-th fault Y: res=-1/12
|
||||||
|
14-th fault Y: res=-1/12
|
||||||
|
15-th fault Y: res=-1/12
|
||||||
|
16-th fault N: res=0/12
|
||||||
|
|||||||
@@ -1786,12 +1786,16 @@ pair provide additional information particular to the objects they represent.
|
|||||||
pos: 0
|
pos: 0
|
||||||
flags: 02
|
flags: 02
|
||||||
mnt_id: 9
|
mnt_id: 9
|
||||||
tfd: 5 events: 1d data: ffffffffffffffff
|
tfd: 5 events: 1d data: ffffffffffffffff pos:0 ino:61af sdev:7
|
||||||
|
|
||||||
where 'tfd' is a target file descriptor number in decimal form,
|
where 'tfd' is a target file descriptor number in decimal form,
|
||||||
'events' is events mask being watched and the 'data' is data
|
'events' is events mask being watched and the 'data' is data
|
||||||
associated with a target [see epoll(7) for more details].
|
associated with a target [see epoll(7) for more details].
|
||||||
|
|
||||||
|
The 'pos' is current offset of the target file in decimal form
|
||||||
|
[see lseek(2)], 'ino' and 'sdev' are inode and device numbers
|
||||||
|
where target file resides, all in hex format.
|
||||||
|
|
||||||
Fsnotify files
|
Fsnotify files
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
For inotify files the format is the following
|
For inotify files the format is the following
|
||||||
|
|||||||
@@ -1225,12 +1225,6 @@ The underlying reason for the above rules is to make sure, that a
|
|||||||
mount can be accurately replicated (e.g. umounting and mounting again)
|
mount can be accurately replicated (e.g. umounting and mounting again)
|
||||||
based on the information found in /proc/mounts.
|
based on the information found in /proc/mounts.
|
||||||
|
|
||||||
A simple method of saving options at mount/remount time and showing
|
|
||||||
them is provided with the save_mount_options() and
|
|
||||||
generic_show_options() helper functions. Please note, that using
|
|
||||||
these may have drawbacks. For more info see header comments for these
|
|
||||||
functions in fs/namespace.c.
|
|
||||||
|
|
||||||
Resources
|
Resources
|
||||||
=========
|
=========
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,9 @@
|
|||||||
|
===================================
|
||||||
Using flexible arrays in the kernel
|
Using flexible arrays in the kernel
|
||||||
Last updated for 2.6.32
|
===================================
|
||||||
Jonathan Corbet <corbet@lwn.net>
|
|
||||||
|
:Updated: Last updated for 2.6.32
|
||||||
|
:Author: Jonathan Corbet <corbet@lwn.net>
|
||||||
|
|
||||||
Large contiguous memory allocations can be unreliable in the Linux kernel.
|
Large contiguous memory allocations can be unreliable in the Linux kernel.
|
||||||
Kernel programmers will sometimes respond to this problem by allocating
|
Kernel programmers will sometimes respond to this problem by allocating
|
||||||
@@ -26,7 +29,7 @@ operation. It's also worth noting that flexible arrays do no internal
|
|||||||
locking at all; if concurrent access to an array is possible, then the
|
locking at all; if concurrent access to an array is possible, then the
|
||||||
caller must arrange for appropriate mutual exclusion.
|
caller must arrange for appropriate mutual exclusion.
|
||||||
|
|
||||||
The creation of a flexible array is done with:
|
The creation of a flexible array is done with::
|
||||||
|
|
||||||
#include <linux/flex_array.h>
|
#include <linux/flex_array.h>
|
||||||
|
|
||||||
@@ -40,14 +43,14 @@ argument is passed directly to the internal memory allocation calls. With
|
|||||||
the current code, using flags to ask for high memory is likely to lead to
|
the current code, using flags to ask for high memory is likely to lead to
|
||||||
notably unpleasant side effects.
|
notably unpleasant side effects.
|
||||||
|
|
||||||
It is also possible to define flexible arrays at compile time with:
|
It is also possible to define flexible arrays at compile time with::
|
||||||
|
|
||||||
DEFINE_FLEX_ARRAY(name, element_size, total);
|
DEFINE_FLEX_ARRAY(name, element_size, total);
|
||||||
|
|
||||||
This macro will result in a definition of an array with the given name; the
|
This macro will result in a definition of an array with the given name; the
|
||||||
element size and total will be checked for validity at compile time.
|
element size and total will be checked for validity at compile time.
|
||||||
|
|
||||||
Storing data into a flexible array is accomplished with a call to:
|
Storing data into a flexible array is accomplished with a call to::
|
||||||
|
|
||||||
int flex_array_put(struct flex_array *array, unsigned int element_nr,
|
int flex_array_put(struct flex_array *array, unsigned int element_nr,
|
||||||
void *src, gfp_t flags);
|
void *src, gfp_t flags);
|
||||||
@@ -63,7 +66,7 @@ running in some sort of atomic context; in this situation, sleeping in the
|
|||||||
memory allocator would be a bad thing. That can be avoided by using
|
memory allocator would be a bad thing. That can be avoided by using
|
||||||
GFP_ATOMIC for the flags value, but, often, there is a better way. The
|
GFP_ATOMIC for the flags value, but, often, there is a better way. The
|
||||||
trick is to ensure that any needed memory allocations are done before
|
trick is to ensure that any needed memory allocations are done before
|
||||||
entering atomic context, using:
|
entering atomic context, using::
|
||||||
|
|
||||||
int flex_array_prealloc(struct flex_array *array, unsigned int start,
|
int flex_array_prealloc(struct flex_array *array, unsigned int start,
|
||||||
unsigned int nr_elements, gfp_t flags);
|
unsigned int nr_elements, gfp_t flags);
|
||||||
@@ -73,7 +76,7 @@ defined by start and nr_elements has been allocated. Thereafter, a
|
|||||||
flex_array_put() call on an element in that range is guaranteed not to
|
flex_array_put() call on an element in that range is guaranteed not to
|
||||||
block.
|
block.
|
||||||
|
|
||||||
Getting data back out of the array is done with:
|
Getting data back out of the array is done with::
|
||||||
|
|
||||||
void *flex_array_get(struct flex_array *fa, unsigned int element_nr);
|
void *flex_array_get(struct flex_array *fa, unsigned int element_nr);
|
||||||
|
|
||||||
@@ -89,7 +92,7 @@ involving that number probably result from use of unstored array entries.
|
|||||||
Note that, if array elements are allocated with __GFP_ZERO, they will be
|
Note that, if array elements are allocated with __GFP_ZERO, they will be
|
||||||
initialized to zero and this poisoning will not happen.
|
initialized to zero and this poisoning will not happen.
|
||||||
|
|
||||||
Individual elements in the array can be cleared with:
|
Individual elements in the array can be cleared with::
|
||||||
|
|
||||||
int flex_array_clear(struct flex_array *array, unsigned int element_nr);
|
int flex_array_clear(struct flex_array *array, unsigned int element_nr);
|
||||||
|
|
||||||
@@ -97,7 +100,7 @@ This function will set the given element to FLEX_ARRAY_FREE and return
|
|||||||
zero. If storage for the indicated element is not allocated for the array,
|
zero. If storage for the indicated element is not allocated for the array,
|
||||||
flex_array_clear() will return -EINVAL instead. Note that clearing an
|
flex_array_clear() will return -EINVAL instead. Note that clearing an
|
||||||
element does not release the storage associated with it; to reduce the
|
element does not release the storage associated with it; to reduce the
|
||||||
allocated size of an array, call:
|
allocated size of an array, call::
|
||||||
|
|
||||||
int flex_array_shrink(struct flex_array *array);
|
int flex_array_shrink(struct flex_array *array);
|
||||||
|
|
||||||
@@ -106,12 +109,12 @@ This function works by scanning the array for pages containing nothing but
|
|||||||
FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work
|
FLEX_ARRAY_FREE bytes, so (1) it can be expensive, and (2) it will not work
|
||||||
if the array's pages are allocated with __GFP_ZERO.
|
if the array's pages are allocated with __GFP_ZERO.
|
||||||
|
|
||||||
It is possible to remove all elements of an array with a call to:
|
It is possible to remove all elements of an array with a call to::
|
||||||
|
|
||||||
void flex_array_free_parts(struct flex_array *array);
|
void flex_array_free_parts(struct flex_array *array);
|
||||||
|
|
||||||
This call frees all elements, but leaves the array itself in place.
|
This call frees all elements, but leaves the array itself in place.
|
||||||
Freeing the entire array is done with:
|
Freeing the entire array is done with::
|
||||||
|
|
||||||
void flex_array_free(struct flex_array *array);
|
void flex_array_free(struct flex_array *array);
|
||||||
|
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
|
================
|
||||||
Futex Requeue PI
|
Futex Requeue PI
|
||||||
----------------
|
================
|
||||||
|
|
||||||
Requeueing of tasks from a non-PI futex to a PI futex requires
|
Requeueing of tasks from a non-PI futex to a PI futex requires
|
||||||
special handling in order to ensure the underlying rt_mutex is never
|
special handling in order to ensure the underlying rt_mutex is never
|
||||||
@@ -20,7 +21,7 @@ implementation would wake the highest-priority waiter, and leave the
|
|||||||
rest to the natural wakeup inherent in unlocking the mutex
|
rest to the natural wakeup inherent in unlocking the mutex
|
||||||
associated with the condvar.
|
associated with the condvar.
|
||||||
|
|
||||||
Consider the simplified glibc calls:
|
Consider the simplified glibc calls::
|
||||||
|
|
||||||
/* caller must lock mutex */
|
/* caller must lock mutex */
|
||||||
pthread_cond_wait(cond, mutex)
|
pthread_cond_wait(cond, mutex)
|
||||||
@@ -53,7 +54,7 @@ In order to support PI-aware pthread_condvar's, the kernel needs to
|
|||||||
be able to requeue tasks to PI futexes. This support implies that
|
be able to requeue tasks to PI futexes. This support implies that
|
||||||
upon a successful futex_wait system call, the caller would return to
|
upon a successful futex_wait system call, the caller would return to
|
||||||
user space already holding the PI futex. The glibc implementation
|
user space already holding the PI futex. The glibc implementation
|
||||||
would be modified as follows:
|
would be modified as follows::
|
||||||
|
|
||||||
|
|
||||||
/* caller must lock mutex */
|
/* caller must lock mutex */
|
||||||
|
|||||||
@@ -1,14 +1,15 @@
|
|||||||
|
=========================
|
||||||
GCC plugin infrastructure
|
GCC plugin infrastructure
|
||||||
=========================
|
=========================
|
||||||
|
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
===============
|
============
|
||||||
|
|
||||||
GCC plugins are loadable modules that provide extra features to the
|
GCC plugins are loadable modules that provide extra features to the
|
||||||
compiler [1]. They are useful for runtime instrumentation and static analysis.
|
compiler [1]_. They are useful for runtime instrumentation and static analysis.
|
||||||
We can analyse, change and add further code during compilation via
|
We can analyse, change and add further code during compilation via
|
||||||
callbacks [2], GIMPLE [3], IPA [4] and RTL passes [5].
|
callbacks [2]_, GIMPLE [3]_, IPA [4]_ and RTL passes [5]_.
|
||||||
|
|
||||||
The GCC plugin infrastructure of the kernel supports all gcc versions from
|
The GCC plugin infrastructure of the kernel supports all gcc versions from
|
||||||
4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
|
4.5 to 6.0, building out-of-tree modules, cross-compilation and building in a
|
||||||
@@ -21,56 +22,61 @@ and versions 4.8+ can only be compiled by a C++ compiler.
|
|||||||
Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
|
Currently the GCC plugin infrastructure supports only the x86, arm, arm64 and
|
||||||
powerpc architectures.
|
powerpc architectures.
|
||||||
|
|
||||||
This infrastructure was ported from grsecurity [6] and PaX [7].
|
This infrastructure was ported from grsecurity [6]_ and PaX [7]_.
|
||||||
|
|
||||||
--
|
--
|
||||||
[1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
|
|
||||||
[2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
|
.. [1] https://gcc.gnu.org/onlinedocs/gccint/Plugins.html
|
||||||
[3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
|
.. [2] https://gcc.gnu.org/onlinedocs/gccint/Plugin-API.html#Plugin-API
|
||||||
[4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
|
.. [3] https://gcc.gnu.org/onlinedocs/gccint/GIMPLE.html
|
||||||
[5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
|
.. [4] https://gcc.gnu.org/onlinedocs/gccint/IPA.html
|
||||||
[6] https://grsecurity.net/
|
.. [5] https://gcc.gnu.org/onlinedocs/gccint/RTL.html
|
||||||
[7] https://pax.grsecurity.net/
|
.. [6] https://grsecurity.net/
|
||||||
|
.. [7] https://pax.grsecurity.net/
|
||||||
|
|
||||||
|
|
||||||
2. Files
|
Files
|
||||||
========
|
=====
|
||||||
|
|
||||||
|
**$(src)/scripts/gcc-plugins**
|
||||||
|
|
||||||
$(src)/scripts/gcc-plugins
|
|
||||||
This is the directory of the GCC plugins.
|
This is the directory of the GCC plugins.
|
||||||
|
|
||||||
$(src)/scripts/gcc-plugins/gcc-common.h
|
**$(src)/scripts/gcc-plugins/gcc-common.h**
|
||||||
|
|
||||||
This is a compatibility header for GCC plugins.
|
This is a compatibility header for GCC plugins.
|
||||||
It should be always included instead of individual gcc headers.
|
It should be always included instead of individual gcc headers.
|
||||||
|
|
||||||
$(src)/scripts/gcc-plugin.sh
|
**$(src)/scripts/gcc-plugin.sh**
|
||||||
|
|
||||||
This script checks the availability of the included headers in
|
This script checks the availability of the included headers in
|
||||||
gcc-common.h and chooses the proper host compiler to build the plugins
|
gcc-common.h and chooses the proper host compiler to build the plugins
|
||||||
(gcc-4.7 can be built by either gcc or g++).
|
(gcc-4.7 can be built by either gcc or g++).
|
||||||
|
|
||||||
$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h
|
**$(src)/scripts/gcc-plugins/gcc-generate-gimple-pass.h,
|
||||||
$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h
|
$(src)/scripts/gcc-plugins/gcc-generate-ipa-pass.h,
|
||||||
$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h
|
$(src)/scripts/gcc-plugins/gcc-generate-simple_ipa-pass.h,
|
||||||
$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h
|
$(src)/scripts/gcc-plugins/gcc-generate-rtl-pass.h**
|
||||||
|
|
||||||
These headers automatically generate the registration structures for
|
These headers automatically generate the registration structures for
|
||||||
GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
|
GIMPLE, SIMPLE_IPA, IPA and RTL passes. They support all gcc versions
|
||||||
from 4.5 to 6.0.
|
from 4.5 to 6.0.
|
||||||
They should be preferred to creating the structures by hand.
|
They should be preferred to creating the structures by hand.
|
||||||
|
|
||||||
|
|
||||||
3. Usage
|
Usage
|
||||||
========
|
=====
|
||||||
|
|
||||||
You must install the gcc plugin headers for your gcc version,
|
You must install the gcc plugin headers for your gcc version,
|
||||||
e.g., on Ubuntu for gcc-4.9:
|
e.g., on Ubuntu for gcc-4.9::
|
||||||
|
|
||||||
apt-get install gcc-4.9-plugin-dev
|
apt-get install gcc-4.9-plugin-dev
|
||||||
|
|
||||||
Enable a GCC plugin based feature in the kernel config:
|
Enable a GCC plugin based feature in the kernel config::
|
||||||
|
|
||||||
CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
|
CONFIG_GCC_PLUGIN_CYC_COMPLEXITY = y
|
||||||
|
|
||||||
To compile only the plugin(s):
|
To compile only the plugin(s)::
|
||||||
|
|
||||||
make gcc-plugins
|
make gcc-plugins
|
||||||
|
|
||||||
|
|||||||
@@ -1,4 +1,9 @@
|
|||||||
Notes on the change from 16-bit UIDs to 32-bit UIDs:
|
===================================================
|
||||||
|
Notes on the change from 16-bit UIDs to 32-bit UIDs
|
||||||
|
===================================================
|
||||||
|
|
||||||
|
:Author: Chris Wing <wingc@umich.edu>
|
||||||
|
:Last updated: January 11, 2000
|
||||||
|
|
||||||
- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
|
- kernel code MUST take into account __kernel_uid_t and __kernel_uid32_t
|
||||||
when communicating between user and kernel space in an ioctl or data
|
when communicating between user and kernel space in an ioctl or data
|
||||||
@@ -28,30 +33,34 @@ What's left to be done for 32-bit UIDs on all Linux architectures:
|
|||||||
uses the 32-bit UID system calls properly otherwise.
|
uses the 32-bit UID system calls properly otherwise.
|
||||||
|
|
||||||
This affects at least:
|
This affects at least:
|
||||||
iBCS on Intel
|
|
||||||
|
|
||||||
sparc32 emulation on sparc64
|
- iBCS on Intel
|
||||||
|
|
||||||
|
- sparc32 emulation on sparc64
|
||||||
(need to support whatever new 32-bit UID system calls are added to
|
(need to support whatever new 32-bit UID system calls are added to
|
||||||
sparc32)
|
sparc32)
|
||||||
|
|
||||||
- Validate that all filesystems behave properly.
|
- Validate that all filesystems behave properly.
|
||||||
|
|
||||||
At present, 32-bit UIDs _should_ work for:
|
At present, 32-bit UIDs _should_ work for:
|
||||||
ext2
|
|
||||||
ufs
|
- ext2
|
||||||
isofs
|
- ufs
|
||||||
nfs
|
- isofs
|
||||||
coda
|
- nfs
|
||||||
udf
|
- coda
|
||||||
|
- udf
|
||||||
|
|
||||||
Ioctl() fixups have been made for:
|
Ioctl() fixups have been made for:
|
||||||
ncpfs
|
|
||||||
smbfs
|
- ncpfs
|
||||||
|
- smbfs
|
||||||
|
|
||||||
Filesystems with simple fixups to prevent 16-bit UID wraparound:
|
Filesystems with simple fixups to prevent 16-bit UID wraparound:
|
||||||
minix
|
|
||||||
sysv
|
- minix
|
||||||
qnx4
|
- sysv
|
||||||
|
- qnx4
|
||||||
|
|
||||||
Other filesystems have not been checked yet.
|
Other filesystems have not been checked yet.
|
||||||
|
|
||||||
@@ -69,9 +78,3 @@ What's left to be done for 32-bit UIDs on all Linux architectures:
|
|||||||
- make sure that the UID mapping feature of AX25 networking works properly
|
- make sure that the UID mapping feature of AX25 networking works properly
|
||||||
(it should be safe because it's always used a 32-bit integer to
|
(it should be safe because it's always used a 32-bit integer to
|
||||||
communicate between user and kernel)
|
communicate between user and kernel)
|
||||||
|
|
||||||
|
|
||||||
Chris Wing
|
|
||||||
wingc@umich.edu
|
|
||||||
|
|
||||||
last updated: January 11, 2000
|
|
||||||
|
|||||||
@@ -1,4 +1,9 @@
|
|||||||
Introduction:
|
==========================================================
|
||||||
|
Linux support for random number generator in i8xx chipsets
|
||||||
|
==========================================================
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
The hw_random framework is software that makes use of a
|
The hw_random framework is software that makes use of a
|
||||||
special hardware feature on your CPU or motherboard,
|
special hardware feature on your CPU or motherboard,
|
||||||
@@ -18,7 +23,8 @@ Introduction:
|
|||||||
which is used internally and exported by the /dev/urandom and
|
which is used internally and exported by the /dev/urandom and
|
||||||
/dev/random special files.
|
/dev/random special files.
|
||||||
|
|
||||||
Theory of operation:
|
Theory of operation
|
||||||
|
===================
|
||||||
|
|
||||||
CHARACTER DEVICE. Using the standard open()
|
CHARACTER DEVICE. Using the standard open()
|
||||||
and read() system calls, you can read random data from
|
and read() system calls, you can read random data from
|
||||||
@@ -44,12 +50,14 @@ Theory of operation:
|
|||||||
|
|
||||||
==========================================================================
|
==========================================================================
|
||||||
|
|
||||||
|
|
||||||
Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
|
Hardware driver for Intel/AMD/VIA Random Number Generators (RNG)
|
||||||
Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
|
- Copyright 2000,2001 Jeff Garzik <jgarzik@pobox.com>
|
||||||
Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
|
- Copyright 2000,2001 Philipp Rumpf <prumpf@mandrakesoft.com>
|
||||||
|
|
||||||
|
|
||||||
About the Intel RNG hardware, from the firmware hub datasheet:
|
About the Intel RNG hardware, from the firmware hub datasheet
|
||||||
|
=============================================================
|
||||||
|
|
||||||
The Firmware Hub integrates a Random Number Generator (RNG)
|
The Firmware Hub integrates a Random Number Generator (RNG)
|
||||||
using thermal noise generated from inherently random quantum
|
using thermal noise generated from inherently random quantum
|
||||||
@@ -59,27 +67,34 @@ About the Intel RNG hardware, from the firmware hub datasheet:
|
|||||||
access to our RNG for use as a security feature. At this time,
|
access to our RNG for use as a security feature. At this time,
|
||||||
the RNG is only to be used with a system in an OS-present state.
|
the RNG is only to be used with a system in an OS-present state.
|
||||||
|
|
||||||
Intel RNG Driver notes:
|
Intel RNG Driver notes
|
||||||
|
======================
|
||||||
|
|
||||||
* FIXME: support poll(2)
|
FIXME: support poll(2)
|
||||||
|
|
||||||
NOTE: request_mem_region was removed, for three reasons:
|
.. note::
|
||||||
1) Only one RNG is supported by this driver, 2) The location
|
|
||||||
used by the RNG is a fixed location in MMIO-addressable memory,
|
request_mem_region was removed, for three reasons:
|
||||||
|
|
||||||
|
1) Only one RNG is supported by this driver;
|
||||||
|
2) The location used by the RNG is a fixed location in
|
||||||
|
MMIO-addressable memory;
|
||||||
3) users with properly working BIOS e820 handling will always
|
3) users with properly working BIOS e820 handling will always
|
||||||
have the region in which the RNG is located reserved, so
|
have the region in which the RNG is located reserved, so
|
||||||
request_mem_region calls always fail for proper setups.
|
request_mem_region calls always fail for proper setups.
|
||||||
However, for people who use mem=XX, BIOS e820 information is
|
However, for people who use mem=XX, BIOS e820 information is
|
||||||
-not- in /proc/iomem, and request_mem_region(RNG_ADDR) can
|
**not** in /proc/iomem, and request_mem_region(RNG_ADDR) can
|
||||||
succeed.
|
succeed.
|
||||||
|
|
||||||
Driver details:
|
Driver details
|
||||||
|
==============
|
||||||
|
|
||||||
Based on:
|
Based on:
|
||||||
Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
|
Intel 82802AB/82802AC Firmware Hub (FWH) Datasheet
|
||||||
May 1999 Order Number: 290658-002 R
|
May 1999 Order Number: 290658-002 R
|
||||||
|
|
||||||
Intel 82802 Firmware Hub: Random Number Generator
|
Intel 82802 Firmware Hub:
|
||||||
|
Random Number Generator
|
||||||
Programmer's Reference Manual
|
Programmer's Reference Manual
|
||||||
December 1999 Order Number: 298029-001 R
|
December 1999 Order Number: 298029-001 R
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,9 @@
|
|||||||
|
===========================
|
||||||
Hardware Spinlock Framework
|
Hardware Spinlock Framework
|
||||||
|
===========================
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
Hardware spinlock modules provide hardware assistance for synchronization
|
Hardware spinlock modules provide hardware assistance for synchronization
|
||||||
and mutual exclusion between heterogeneous processors and those not operating
|
and mutual exclusion between heterogeneous processors and those not operating
|
||||||
@@ -32,136 +35,206 @@ structure).
|
|||||||
A common hwspinlock interface makes it possible to have generic, platform-
|
A common hwspinlock interface makes it possible to have generic, platform-
|
||||||
independent, drivers.
|
independent, drivers.
|
||||||
|
|
||||||
2. User API
|
User API
|
||||||
|
========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct hwspinlock *hwspin_lock_request(void);
|
struct hwspinlock *hwspin_lock_request(void);
|
||||||
- dynamically assign an hwspinlock and return its address, or NULL
|
|
||||||
|
Dynamically assign an hwspinlock and return its address, or NULL
|
||||||
in case an unused hwspinlock isn't available. Users of this
|
in case an unused hwspinlock isn't available. Users of this
|
||||||
API will usually want to communicate the lock's id to the remote core
|
API will usually want to communicate the lock's id to the remote core
|
||||||
before it can be used to achieve synchronization.
|
before it can be used to achieve synchronization.
|
||||||
|
|
||||||
Should be called from a process context (might sleep).
|
Should be called from a process context (might sleep).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct hwspinlock *hwspin_lock_request_specific(unsigned int id);
|
struct hwspinlock *hwspin_lock_request_specific(unsigned int id);
|
||||||
- assign a specific hwspinlock id and return its address, or NULL
|
|
||||||
|
Assign a specific hwspinlock id and return its address, or NULL
|
||||||
if that hwspinlock is already in use. Usually board code will
|
if that hwspinlock is already in use. Usually board code will
|
||||||
be calling this function in order to reserve specific hwspinlock
|
be calling this function in order to reserve specific hwspinlock
|
||||||
ids for predefined purposes.
|
ids for predefined purposes.
|
||||||
|
|
||||||
Should be called from a process context (might sleep).
|
Should be called from a process context (might sleep).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int of_hwspin_lock_get_id(struct device_node *np, int index);
|
int of_hwspin_lock_get_id(struct device_node *np, int index);
|
||||||
- retrieve the global lock id for an OF phandle-based specific lock.
|
|
||||||
|
Retrieve the global lock id for an OF phandle-based specific lock.
|
||||||
This function provides a means for DT users of a hwspinlock module
|
This function provides a means for DT users of a hwspinlock module
|
||||||
to get the global lock id of a specific hwspinlock, so that it can
|
to get the global lock id of a specific hwspinlock, so that it can
|
||||||
be requested using the normal hwspin_lock_request_specific() API.
|
be requested using the normal hwspin_lock_request_specific() API.
|
||||||
|
|
||||||
The function returns a lock id number on success, -EPROBE_DEFER if
|
The function returns a lock id number on success, -EPROBE_DEFER if
|
||||||
the hwspinlock device is not yet registered with the core, or other
|
the hwspinlock device is not yet registered with the core, or other
|
||||||
error values.
|
error values.
|
||||||
|
|
||||||
Should be called from a process context (might sleep).
|
Should be called from a process context (might sleep).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_free(struct hwspinlock *hwlock);
|
int hwspin_lock_free(struct hwspinlock *hwlock);
|
||||||
- free a previously-assigned hwspinlock; returns 0 on success, or an
|
|
||||||
|
Free a previously-assigned hwspinlock; returns 0 on success, or an
|
||||||
appropriate error code on failure (e.g. -EINVAL if the hwspinlock
|
appropriate error code on failure (e.g. -EINVAL if the hwspinlock
|
||||||
is already free).
|
is already free).
|
||||||
|
|
||||||
Should be called from a process context (might sleep).
|
Should be called from a process context (might sleep).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int timeout);
|
int hwspin_lock_timeout(struct hwspinlock *hwlock, unsigned int timeout);
|
||||||
- lock a previously-assigned hwspinlock with a timeout limit (specified in
|
|
||||||
|
Lock a previously-assigned hwspinlock with a timeout limit (specified in
|
||||||
msecs). If the hwspinlock is already taken, the function will busy loop
|
msecs). If the hwspinlock is already taken, the function will busy loop
|
||||||
waiting for it to be released, but give up when the timeout elapses.
|
waiting for it to be released, but give up when the timeout elapses.
|
||||||
Upon a successful return from this function, preemption is disabled so
|
Upon a successful return from this function, preemption is disabled so
|
||||||
the caller must not sleep, and is advised to release the hwspinlock as
|
the caller must not sleep, and is advised to release the hwspinlock as
|
||||||
soon as possible, in order to minimize remote cores polling on the
|
soon as possible, in order to minimize remote cores polling on the
|
||||||
hardware interconnect.
|
hardware interconnect.
|
||||||
|
|
||||||
Returns 0 when successful and an appropriate error code otherwise (most
|
Returns 0 when successful and an appropriate error code otherwise (most
|
||||||
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_timeout_irq(struct hwspinlock *hwlock, unsigned int timeout);
|
int hwspin_lock_timeout_irq(struct hwspinlock *hwlock, unsigned int timeout);
|
||||||
- lock a previously-assigned hwspinlock with a timeout limit (specified in
|
|
||||||
|
Lock a previously-assigned hwspinlock with a timeout limit (specified in
|
||||||
msecs). If the hwspinlock is already taken, the function will busy loop
|
msecs). If the hwspinlock is already taken, the function will busy loop
|
||||||
waiting for it to be released, but give up when the timeout elapses.
|
waiting for it to be released, but give up when the timeout elapses.
|
||||||
Upon a successful return from this function, preemption and the local
|
Upon a successful return from this function, preemption and the local
|
||||||
interrupts are disabled, so the caller must not sleep, and is advised to
|
interrupts are disabled, so the caller must not sleep, and is advised to
|
||||||
release the hwspinlock as soon as possible.
|
release the hwspinlock as soon as possible.
|
||||||
|
|
||||||
Returns 0 when successful and an appropriate error code otherwise (most
|
Returns 0 when successful and an appropriate error code otherwise (most
|
||||||
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_timeout_irqsave(struct hwspinlock *hwlock, unsigned int to,
|
int hwspin_lock_timeout_irqsave(struct hwspinlock *hwlock, unsigned int to,
|
||||||
unsigned long *flags);
|
unsigned long *flags);
|
||||||
- lock a previously-assigned hwspinlock with a timeout limit (specified in
|
|
||||||
|
Lock a previously-assigned hwspinlock with a timeout limit (specified in
|
||||||
msecs). If the hwspinlock is already taken, the function will busy loop
|
msecs). If the hwspinlock is already taken, the function will busy loop
|
||||||
waiting for it to be released, but give up when the timeout elapses.
|
waiting for it to be released, but give up when the timeout elapses.
|
||||||
Upon a successful return from this function, preemption is disabled,
|
Upon a successful return from this function, preemption is disabled,
|
||||||
local interrupts are disabled and their previous state is saved at the
|
local interrupts are disabled and their previous state is saved at the
|
||||||
given flags placeholder. The caller must not sleep, and is advised to
|
given flags placeholder. The caller must not sleep, and is advised to
|
||||||
release the hwspinlock as soon as possible.
|
release the hwspinlock as soon as possible.
|
||||||
|
|
||||||
Returns 0 when successful and an appropriate error code otherwise (most
|
Returns 0 when successful and an appropriate error code otherwise (most
|
||||||
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
notably -ETIMEDOUT if the hwspinlock is still busy after timeout msecs).
|
||||||
|
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_trylock(struct hwspinlock *hwlock);
|
int hwspin_trylock(struct hwspinlock *hwlock);
|
||||||
- attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
|
||||||
|
|
||||||
|
Attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
||||||
it is already taken.
|
it is already taken.
|
||||||
|
|
||||||
Upon a successful return from this function, preemption is disabled so
|
Upon a successful return from this function, preemption is disabled so
|
||||||
caller must not sleep, and is advised to release the hwspinlock as soon as
|
caller must not sleep, and is advised to release the hwspinlock as soon as
|
||||||
possible, in order to minimize remote cores polling on the hardware
|
possible, in order to minimize remote cores polling on the hardware
|
||||||
interconnect.
|
interconnect.
|
||||||
|
|
||||||
Returns 0 on success and an appropriate error code otherwise (most
|
Returns 0 on success and an appropriate error code otherwise (most
|
||||||
notably -EBUSY if the hwspinlock was already taken).
|
notably -EBUSY if the hwspinlock was already taken).
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_trylock_irq(struct hwspinlock *hwlock);
|
int hwspin_trylock_irq(struct hwspinlock *hwlock);
|
||||||
- attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
|
||||||
|
|
||||||
|
Attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
||||||
it is already taken.
|
it is already taken.
|
||||||
|
|
||||||
Upon a successful return from this function, preemption and the local
|
Upon a successful return from this function, preemption and the local
|
||||||
interrupts are disabled so caller must not sleep, and is advised to
|
interrupts are disabled so caller must not sleep, and is advised to
|
||||||
release the hwspinlock as soon as possible.
|
release the hwspinlock as soon as possible.
|
||||||
|
|
||||||
Returns 0 on success and an appropriate error code otherwise (most
|
Returns 0 on success and an appropriate error code otherwise (most
|
||||||
notably -EBUSY if the hwspinlock was already taken).
|
notably -EBUSY if the hwspinlock was already taken).
|
||||||
|
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_trylock_irqsave(struct hwspinlock *hwlock, unsigned long *flags);
|
int hwspin_trylock_irqsave(struct hwspinlock *hwlock, unsigned long *flags);
|
||||||
- attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
|
||||||
|
Attempt to lock a previously-assigned hwspinlock, but immediately fail if
|
||||||
it is already taken.
|
it is already taken.
|
||||||
|
|
||||||
Upon a successful return from this function, preemption is disabled,
|
Upon a successful return from this function, preemption is disabled,
|
||||||
the local interrupts are disabled and their previous state is saved
|
the local interrupts are disabled and their previous state is saved
|
||||||
at the given flags placeholder. The caller must not sleep, and is advised
|
at the given flags placeholder. The caller must not sleep, and is advised
|
||||||
to release the hwspinlock as soon as possible.
|
to release the hwspinlock as soon as possible.
|
||||||
|
|
||||||
Returns 0 on success and an appropriate error code otherwise (most
|
Returns 0 on success and an appropriate error code otherwise (most
|
||||||
notably -EBUSY if the hwspinlock was already taken).
|
notably -EBUSY if the hwspinlock was already taken).
|
||||||
The function will never sleep.
|
The function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void hwspin_unlock(struct hwspinlock *hwlock);
|
void hwspin_unlock(struct hwspinlock *hwlock);
|
||||||
- unlock a previously-locked hwspinlock. Always succeed, and can be called
|
|
||||||
from any context (the function never sleeps). Note: code should _never_
|
Unlock a previously-locked hwspinlock. Always succeed, and can be called
|
||||||
unlock an hwspinlock which is already unlocked (there is no protection
|
from any context (the function never sleeps).
|
||||||
against this).
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
code should **never** unlock an hwspinlock which is already unlocked
|
||||||
|
(there is no protection against this).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void hwspin_unlock_irq(struct hwspinlock *hwlock);
|
void hwspin_unlock_irq(struct hwspinlock *hwlock);
|
||||||
- unlock a previously-locked hwspinlock and enable local interrupts.
|
|
||||||
The caller should _never_ unlock an hwspinlock which is already unlocked.
|
Unlock a previously-locked hwspinlock and enable local interrupts.
|
||||||
|
The caller should **never** unlock an hwspinlock which is already unlocked.
|
||||||
|
|
||||||
Doing so is considered a bug (there is no protection against this).
|
Doing so is considered a bug (there is no protection against this).
|
||||||
Upon a successful return from this function, preemption and local
|
Upon a successful return from this function, preemption and local
|
||||||
interrupts are enabled. This function will never sleep.
|
interrupts are enabled. This function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void
|
void
|
||||||
hwspin_unlock_irqrestore(struct hwspinlock *hwlock, unsigned long *flags);
|
hwspin_unlock_irqrestore(struct hwspinlock *hwlock, unsigned long *flags);
|
||||||
- unlock a previously-locked hwspinlock.
|
|
||||||
The caller should _never_ unlock an hwspinlock which is already unlocked.
|
Unlock a previously-locked hwspinlock.
|
||||||
|
|
||||||
|
The caller should **never** unlock an hwspinlock which is already unlocked.
|
||||||
Doing so is considered a bug (there is no protection against this).
|
Doing so is considered a bug (there is no protection against this).
|
||||||
Upon a successful return from this function, preemption is reenabled,
|
Upon a successful return from this function, preemption is reenabled,
|
||||||
and the state of the local interrupts is restored to the state saved at
|
and the state of the local interrupts is restored to the state saved at
|
||||||
the given flags. This function will never sleep.
|
the given flags. This function will never sleep.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_get_id(struct hwspinlock *hwlock);
|
int hwspin_lock_get_id(struct hwspinlock *hwlock);
|
||||||
- retrieve id number of a given hwspinlock. This is needed when an
|
|
||||||
|
Retrieve id number of a given hwspinlock. This is needed when an
|
||||||
hwspinlock is dynamically assigned: before it can be used to achieve
|
hwspinlock is dynamically assigned: before it can be used to achieve
|
||||||
mutual exclusion with a remote cpu, the id number should be communicated
|
mutual exclusion with a remote cpu, the id number should be communicated
|
||||||
to the remote task with which we want to synchronize.
|
to the remote task with which we want to synchronize.
|
||||||
|
|
||||||
Returns the hwspinlock id number, or -EINVAL if hwlock is null.
|
Returns the hwspinlock id number, or -EINVAL if hwlock is null.
|
||||||
|
|
||||||
3. Typical usage
|
Typical usage
|
||||||
|
=============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/hwspinlock.h>
|
#include <linux/hwspinlock.h>
|
||||||
#include <linux/err.h>
|
#include <linux/err.h>
|
||||||
@@ -235,30 +308,43 @@ int hwspinlock_example2(void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
4. API for implementors
|
API for implementors
|
||||||
|
====================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_register(struct hwspinlock_device *bank, struct device *dev,
|
int hwspin_lock_register(struct hwspinlock_device *bank, struct device *dev,
|
||||||
const struct hwspinlock_ops *ops, int base_id, int num_locks);
|
const struct hwspinlock_ops *ops, int base_id, int num_locks);
|
||||||
- to be called from the underlying platform-specific implementation, in
|
|
||||||
|
To be called from the underlying platform-specific implementation, in
|
||||||
order to register a new hwspinlock device (which is usually a bank of
|
order to register a new hwspinlock device (which is usually a bank of
|
||||||
numerous locks). Should be called from a process context (this function
|
numerous locks). Should be called from a process context (this function
|
||||||
might sleep).
|
might sleep).
|
||||||
|
|
||||||
Returns 0 on success, or appropriate error code on failure.
|
Returns 0 on success, or appropriate error code on failure.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int hwspin_lock_unregister(struct hwspinlock_device *bank);
|
int hwspin_lock_unregister(struct hwspinlock_device *bank);
|
||||||
- to be called from the underlying vendor-specific implementation, in order
|
|
||||||
|
To be called from the underlying vendor-specific implementation, in order
|
||||||
to unregister an hwspinlock device (which is usually a bank of numerous
|
to unregister an hwspinlock device (which is usually a bank of numerous
|
||||||
locks).
|
locks).
|
||||||
|
|
||||||
Should be called from a process context (this function might sleep).
|
Should be called from a process context (this function might sleep).
|
||||||
|
|
||||||
Returns the address of hwspinlock on success, or NULL on error (e.g.
|
Returns the address of hwspinlock on success, or NULL on error (e.g.
|
||||||
if the hwspinlock is still in use).
|
if the hwspinlock is still in use).
|
||||||
|
|
||||||
5. Important structs
|
Important structs
|
||||||
|
=================
|
||||||
|
|
||||||
struct hwspinlock_device is a device which usually contains a bank
|
struct hwspinlock_device is a device which usually contains a bank
|
||||||
of hardware locks. It is registered by the underlying hwspinlock
|
of hardware locks. It is registered by the underlying hwspinlock
|
||||||
implementation using the hwspin_lock_register() API.
|
implementation using the hwspin_lock_register() API.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct hwspinlock_device - a device which usually spans numerous hwspinlocks
|
* struct hwspinlock_device - a device which usually spans numerous hwspinlocks
|
||||||
* @dev: underlying device, will be used to invoke runtime PM api
|
* @dev: underlying device, will be used to invoke runtime PM api
|
||||||
@@ -276,7 +362,7 @@ struct hwspinlock_device {
|
|||||||
};
|
};
|
||||||
|
|
||||||
struct hwspinlock_device contains an array of hwspinlock structs, each
|
struct hwspinlock_device contains an array of hwspinlock structs, each
|
||||||
of which represents a single hardware lock:
|
of which represents a single hardware lock::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct hwspinlock - this struct represents a single hwspinlock instance
|
* struct hwspinlock - this struct represents a single hwspinlock instance
|
||||||
@@ -294,9 +380,10 @@ When registering a bank of locks, the hwspinlock driver only needs to
|
|||||||
set the priv members of the locks. The rest of the members are set and
|
set the priv members of the locks. The rest of the members are set and
|
||||||
initialized by the hwspinlock core itself.
|
initialized by the hwspinlock core itself.
|
||||||
|
|
||||||
6. Implementation callbacks
|
Implementation callbacks
|
||||||
|
========================
|
||||||
|
|
||||||
There are three possible callbacks defined in 'struct hwspinlock_ops':
|
There are three possible callbacks defined in 'struct hwspinlock_ops'::
|
||||||
|
|
||||||
struct hwspinlock_ops {
|
struct hwspinlock_ops {
|
||||||
int (*trylock)(struct hwspinlock *lock);
|
int (*trylock)(struct hwspinlock *lock);
|
||||||
@@ -307,11 +394,11 @@ struct hwspinlock_ops {
|
|||||||
The first two callbacks are mandatory:
|
The first two callbacks are mandatory:
|
||||||
|
|
||||||
The ->trylock() callback should make a single attempt to take the lock, and
|
The ->trylock() callback should make a single attempt to take the lock, and
|
||||||
return 0 on failure and 1 on success. This callback may _not_ sleep.
|
return 0 on failure and 1 on success. This callback may **not** sleep.
|
||||||
|
|
||||||
The ->unlock() callback releases the lock. It always succeed, and it, too,
|
The ->unlock() callback releases the lock. It always succeed, and it, too,
|
||||||
may _not_ sleep.
|
may **not** sleep.
|
||||||
|
|
||||||
The ->relax() callback is optional. It is called by hwspinlock core while
|
The ->relax() callback is optional. It is called by hwspinlock core while
|
||||||
spinning on a lock, and can be used by the underlying implementation to force
|
spinning on a lock, and can be used by the underlying implementation to force
|
||||||
a delay between two successive invocations of ->trylock(). It may _not_ sleep.
|
a delay between two successive invocations of ->trylock(). It may **not** sleep.
|
||||||
|
|||||||
@@ -6,7 +6,6 @@ Contents:
|
|||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
:maxdepth: 2
|
:maxdepth: 2
|
||||||
:numbered:
|
|
||||||
|
|
||||||
input_uapi
|
input_uapi
|
||||||
input_kapi
|
input_kapi
|
||||||
|
|||||||
@@ -1,4 +1,5 @@
|
|||||||
Intel(R) TXT Overview:
|
=====================
|
||||||
|
Intel(R) TXT Overview
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
Intel's technology for safer computing, Intel(R) Trusted Execution
|
Intel's technology for safer computing, Intel(R) Trusted Execution
|
||||||
@@ -8,9 +9,10 @@ provide the building blocks for creating trusted platforms.
|
|||||||
Intel TXT was formerly known by the code name LaGrande Technology (LT).
|
Intel TXT was formerly known by the code name LaGrande Technology (LT).
|
||||||
|
|
||||||
Intel TXT in Brief:
|
Intel TXT in Brief:
|
||||||
o Provides dynamic root of trust for measurement (DRTM)
|
|
||||||
o Data protection in case of improper shutdown
|
- Provides dynamic root of trust for measurement (DRTM)
|
||||||
o Measurement and verification of launched environment
|
- Data protection in case of improper shutdown
|
||||||
|
- Measurement and verification of launched environment
|
||||||
|
|
||||||
Intel TXT is part of the vPro(TM) brand and is also available some
|
Intel TXT is part of the vPro(TM) brand and is also available some
|
||||||
non-vPro systems. It is currently available on desktop systems
|
non-vPro systems. It is currently available on desktop systems
|
||||||
@@ -24,16 +26,21 @@ which has been updated for the new released platforms.
|
|||||||
|
|
||||||
Intel TXT has been presented at various events over the past few
|
Intel TXT has been presented at various events over the past few
|
||||||
years, some of which are:
|
years, some of which are:
|
||||||
LinuxTAG 2008:
|
|
||||||
|
- LinuxTAG 2008:
|
||||||
http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html
|
http://www.linuxtag.org/2008/en/conf/events/vp-donnerstag.html
|
||||||
TRUST2008:
|
|
||||||
|
- TRUST2008:
|
||||||
http://www.trust-conference.eu/downloads/Keynote-Speakers/
|
http://www.trust-conference.eu/downloads/Keynote-Speakers/
|
||||||
3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf
|
3_David-Grawrock_The-Front-Door-of-Trusted-Computing.pdf
|
||||||
IDF, Shanghai:
|
|
||||||
http://www.prcidf.com.cn/index_en.html
|
|
||||||
IDFs 2006, 2007 (I'm not sure if/where they are online)
|
|
||||||
|
|
||||||
Trusted Boot Project Overview:
|
- IDF, Shanghai:
|
||||||
|
http://www.prcidf.com.cn/index_en.html
|
||||||
|
|
||||||
|
- IDFs 2006, 2007
|
||||||
|
(I'm not sure if/where they are online)
|
||||||
|
|
||||||
|
Trusted Boot Project Overview
|
||||||
=============================
|
=============================
|
||||||
|
|
||||||
Trusted Boot (tboot) is an open source, pre-kernel/VMM module that
|
Trusted Boot (tboot) is an open source, pre-kernel/VMM module that
|
||||||
@@ -87,11 +94,12 @@ Intel-provided firmware).
|
|||||||
How Does it Work?
|
How Does it Work?
|
||||||
=================
|
=================
|
||||||
|
|
||||||
o Tboot is an executable that is launched by the bootloader as
|
- Tboot is an executable that is launched by the bootloader as
|
||||||
the "kernel" (the binary the bootloader executes).
|
the "kernel" (the binary the bootloader executes).
|
||||||
o It performs all of the work necessary to determine if the
|
- It performs all of the work necessary to determine if the
|
||||||
platform supports Intel TXT and, if so, executes the GETSEC[SENTER]
|
platform supports Intel TXT and, if so, executes the GETSEC[SENTER]
|
||||||
processor instruction that initiates the dynamic root of trust.
|
processor instruction that initiates the dynamic root of trust.
|
||||||
|
|
||||||
- If tboot determines that the system does not support Intel TXT
|
- If tboot determines that the system does not support Intel TXT
|
||||||
or is not configured correctly (e.g. the SINIT AC Module was
|
or is not configured correctly (e.g. the SINIT AC Module was
|
||||||
incorrect), it will directly launch the kernel with no changes
|
incorrect), it will directly launch the kernel with no changes
|
||||||
@@ -99,12 +107,14 @@ o It performs all of the work necessary to determine if the
|
|||||||
- Tboot will output various information about its progress to the
|
- Tboot will output various information about its progress to the
|
||||||
terminal, serial port, and/or an in-memory log; the output
|
terminal, serial port, and/or an in-memory log; the output
|
||||||
locations can be configured with a command line switch.
|
locations can be configured with a command line switch.
|
||||||
o The GETSEC[SENTER] instruction will return control to tboot and
|
|
||||||
|
- The GETSEC[SENTER] instruction will return control to tboot and
|
||||||
tboot then verifies certain aspects of the environment (e.g. TPM NV
|
tboot then verifies certain aspects of the environment (e.g. TPM NV
|
||||||
lock, e820 table does not have invalid entries, etc.).
|
lock, e820 table does not have invalid entries, etc.).
|
||||||
o It will wake the APs from the special sleep state the GETSEC[SENTER]
|
- It will wake the APs from the special sleep state the GETSEC[SENTER]
|
||||||
instruction had put them in and place them into a wait-for-SIPI
|
instruction had put them in and place them into a wait-for-SIPI
|
||||||
state.
|
state.
|
||||||
|
|
||||||
- Because the processors will not respond to an INIT or SIPI when
|
- Because the processors will not respond to an INIT or SIPI when
|
||||||
in the TXT environment, it is necessary to create a small VT-x
|
in the TXT environment, it is necessary to create a small VT-x
|
||||||
guest for the APs. When they run in this guest, they will
|
guest for the APs. When they run in this guest, they will
|
||||||
@@ -112,8 +122,10 @@ o It will wake the APs from the special sleep state the GETSEC[SENTER]
|
|||||||
VMEXITs, and then disable VT and jump to the SIPI vector. This
|
VMEXITs, and then disable VT and jump to the SIPI vector. This
|
||||||
approach seemed like a better choice than having to insert
|
approach seemed like a better choice than having to insert
|
||||||
special code into the kernel's MP wakeup sequence.
|
special code into the kernel's MP wakeup sequence.
|
||||||
o Tboot then applies an (optional) user-defined launch policy to
|
|
||||||
|
- Tboot then applies an (optional) user-defined launch policy to
|
||||||
verify the kernel and initrd.
|
verify the kernel and initrd.
|
||||||
|
|
||||||
- This policy is rooted in TPM NV and is described in the tboot
|
- This policy is rooted in TPM NV and is described in the tboot
|
||||||
project. The tboot project also contains code for tools to
|
project. The tboot project also contains code for tools to
|
||||||
create and provision the policy.
|
create and provision the policy.
|
||||||
@@ -121,30 +133,34 @@ o Tboot then applies an (optional) user-defined launch policy to
|
|||||||
then any kernel will be launched.
|
then any kernel will be launched.
|
||||||
- Policy action is flexible and can include halting on failures
|
- Policy action is flexible and can include halting on failures
|
||||||
or simply logging them and continuing.
|
or simply logging them and continuing.
|
||||||
o Tboot adjusts the e820 table provided by the bootloader to reserve
|
|
||||||
|
- Tboot adjusts the e820 table provided by the bootloader to reserve
|
||||||
its own location in memory as well as to reserve certain other
|
its own location in memory as well as to reserve certain other
|
||||||
TXT-related regions.
|
TXT-related regions.
|
||||||
o As part of its launch, tboot DMA protects all of RAM (using the
|
- As part of its launch, tboot DMA protects all of RAM (using the
|
||||||
VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on'
|
VT-d PMRs). Thus, the kernel must be booted with 'intel_iommu=on'
|
||||||
in order to remove this blanket protection and use VT-d's
|
in order to remove this blanket protection and use VT-d's
|
||||||
page-level protection.
|
page-level protection.
|
||||||
o Tboot will populate a shared page with some data about itself and
|
- Tboot will populate a shared page with some data about itself and
|
||||||
pass this to the Linux kernel as it transfers control.
|
pass this to the Linux kernel as it transfers control.
|
||||||
|
|
||||||
- The location of the shared page is passed via the boot_params
|
- The location of the shared page is passed via the boot_params
|
||||||
struct as a physical address.
|
struct as a physical address.
|
||||||
o The kernel will look for the tboot shared page address and, if it
|
|
||||||
|
- The kernel will look for the tboot shared page address and, if it
|
||||||
exists, map it.
|
exists, map it.
|
||||||
o As one of the checks/protections provided by TXT, it makes a copy
|
- As one of the checks/protections provided by TXT, it makes a copy
|
||||||
of the VT-d DMARs in a DMA-protected region of memory and verifies
|
of the VT-d DMARs in a DMA-protected region of memory and verifies
|
||||||
them for correctness. The VT-d code will detect if the kernel was
|
them for correctness. The VT-d code will detect if the kernel was
|
||||||
launched with tboot and use this copy instead of the one in the
|
launched with tboot and use this copy instead of the one in the
|
||||||
ACPI table.
|
ACPI table.
|
||||||
o At this point, tboot and TXT are out of the picture until a
|
- At this point, tboot and TXT are out of the picture until a
|
||||||
shutdown (S<n>)
|
shutdown (S<n>)
|
||||||
o In order to put a system into any of the sleep states after a TXT
|
- In order to put a system into any of the sleep states after a TXT
|
||||||
launch, TXT must first be exited. This is to prevent attacks that
|
launch, TXT must first be exited. This is to prevent attacks that
|
||||||
attempt to crash the system to gain control on reboot and steal
|
attempt to crash the system to gain control on reboot and steal
|
||||||
data left in memory.
|
data left in memory.
|
||||||
|
|
||||||
- The kernel will perform all of its sleep preparation and
|
- The kernel will perform all of its sleep preparation and
|
||||||
populate the shared page with the ACPI data needed to put the
|
populate the shared page with the ACPI data needed to put the
|
||||||
platform in the desired sleep state.
|
platform in the desired sleep state.
|
||||||
@@ -172,7 +188,7 @@ o In order to put a system into any of the sleep states after a TXT
|
|||||||
That's pretty much it for TXT support.
|
That's pretty much it for TXT support.
|
||||||
|
|
||||||
|
|
||||||
Configuring the System:
|
Configuring the System
|
||||||
======================
|
======================
|
||||||
|
|
||||||
This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels.
|
This code works with 32bit, 32bit PAE, and 64bit (x86_64) kernels.
|
||||||
@@ -181,7 +197,8 @@ In BIOS, the user must enable: TPM, TXT, VT-x, VT-d. Not all BIOSes
|
|||||||
allow these to be individually enabled/disabled and the screens in
|
allow these to be individually enabled/disabled and the screens in
|
||||||
which to find them are BIOS-specific.
|
which to find them are BIOS-specific.
|
||||||
|
|
||||||
grub.conf needs to be modified as follows:
|
grub.conf needs to be modified as follows::
|
||||||
|
|
||||||
title Linux 2.6.29-tip w/ tboot
|
title Linux 2.6.29-tip w/ tboot
|
||||||
root (hd0,0)
|
root (hd0,0)
|
||||||
kernel /tboot.gz logging=serial,vga,memory
|
kernel /tboot.gz logging=serial,vga,memory
|
||||||
|
|||||||
@@ -1,10 +1,17 @@
|
|||||||
|
========================
|
||||||
|
The io_mapping functions
|
||||||
|
========================
|
||||||
|
|
||||||
|
API
|
||||||
|
===
|
||||||
|
|
||||||
The io_mapping functions in linux/io-mapping.h provide an abstraction for
|
The io_mapping functions in linux/io-mapping.h provide an abstraction for
|
||||||
efficiently mapping small regions of an I/O device to the CPU. The initial
|
efficiently mapping small regions of an I/O device to the CPU. The initial
|
||||||
usage is to support the large graphics aperture on 32-bit processors where
|
usage is to support the large graphics aperture on 32-bit processors where
|
||||||
ioremap_wc cannot be used to statically map the entire aperture to the CPU
|
ioremap_wc cannot be used to statically map the entire aperture to the CPU
|
||||||
as it would consume too much of the kernel address space.
|
as it would consume too much of the kernel address space.
|
||||||
|
|
||||||
A mapping object is created during driver initialization using
|
A mapping object is created during driver initialization using::
|
||||||
|
|
||||||
struct io_mapping *io_mapping_create_wc(unsigned long base,
|
struct io_mapping *io_mapping_create_wc(unsigned long base,
|
||||||
unsigned long size)
|
unsigned long size)
|
||||||
@@ -18,7 +25,7 @@ A mapping object is created during driver initialization using
|
|||||||
|
|
||||||
With this mapping object, individual pages can be mapped either atomically
|
With this mapping object, individual pages can be mapped either atomically
|
||||||
or not, depending on the necessary scheduling environment. Of course, atomic
|
or not, depending on the necessary scheduling environment. Of course, atomic
|
||||||
maps are more efficient:
|
maps are more efficient::
|
||||||
|
|
||||||
void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
|
void *io_mapping_map_atomic_wc(struct io_mapping *mapping,
|
||||||
unsigned long offset)
|
unsigned long offset)
|
||||||
@@ -36,6 +43,8 @@ maps are more efficient:
|
|||||||
Note that the task may not sleep while holding this page
|
Note that the task may not sleep while holding this page
|
||||||
mapped.
|
mapped.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void io_mapping_unmap_atomic(void *vaddr)
|
void io_mapping_unmap_atomic(void *vaddr)
|
||||||
|
|
||||||
'vaddr' must be the value returned by the last
|
'vaddr' must be the value returned by the last
|
||||||
@@ -45,22 +54,28 @@ maps are more efficient:
|
|||||||
If you need to sleep while holding the lock, you can use the non-atomic
|
If you need to sleep while holding the lock, you can use the non-atomic
|
||||||
variant, although they may be significantly slower.
|
variant, although they may be significantly slower.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void *io_mapping_map_wc(struct io_mapping *mapping,
|
void *io_mapping_map_wc(struct io_mapping *mapping,
|
||||||
unsigned long offset)
|
unsigned long offset)
|
||||||
|
|
||||||
This works like io_mapping_map_atomic_wc except it allows
|
This works like io_mapping_map_atomic_wc except it allows
|
||||||
the task to sleep while holding the page mapped.
|
the task to sleep while holding the page mapped.
|
||||||
|
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void io_mapping_unmap(void *vaddr)
|
void io_mapping_unmap(void *vaddr)
|
||||||
|
|
||||||
This works like io_mapping_unmap_atomic, except it is used
|
This works like io_mapping_unmap_atomic, except it is used
|
||||||
for pages mapped with io_mapping_map_wc.
|
for pages mapped with io_mapping_map_wc.
|
||||||
|
|
||||||
At driver close time, the io_mapping object must be freed:
|
At driver close time, the io_mapping object must be freed::
|
||||||
|
|
||||||
void io_mapping_free(struct io_mapping *mapping)
|
void io_mapping_free(struct io_mapping *mapping)
|
||||||
|
|
||||||
Current Implementation:
|
Current Implementation
|
||||||
|
======================
|
||||||
|
|
||||||
The initial implementation of these functions uses existing mapping
|
The initial implementation of these functions uses existing mapping
|
||||||
mechanisms and so provides only an abstraction layer and no new
|
mechanisms and so provides only an abstraction layer and no new
|
||||||
|
|||||||
@@ -1,3 +1,7 @@
|
|||||||
|
==============================================
|
||||||
|
Ordering I/O writes to memory-mapped addresses
|
||||||
|
==============================================
|
||||||
|
|
||||||
On some platforms, so-called memory-mapped I/O is weakly ordered. On such
|
On some platforms, so-called memory-mapped I/O is weakly ordered. On such
|
||||||
platforms, driver writers are responsible for ensuring that I/O writes to
|
platforms, driver writers are responsible for ensuring that I/O writes to
|
||||||
memory-mapped addresses on their device arrive in the order intended. This is
|
memory-mapped addresses on their device arrive in the order intended. This is
|
||||||
@@ -8,7 +12,7 @@ critical section of code protected by spinlocks. This would ensure that
|
|||||||
subsequent writes to I/O space arrived only after all prior writes (much like a
|
subsequent writes to I/O space arrived only after all prior writes (much like a
|
||||||
memory barrier op, mb(), only with respect to I/O).
|
memory barrier op, mb(), only with respect to I/O).
|
||||||
|
|
||||||
A more concrete example from a hypothetical device driver:
|
A more concrete example from a hypothetical device driver::
|
||||||
|
|
||||||
...
|
...
|
||||||
CPU A: spin_lock_irqsave(&dev_lock, flags)
|
CPU A: spin_lock_irqsave(&dev_lock, flags)
|
||||||
@@ -25,7 +29,7 @@ CPU B: spin_unlock_irqrestore(&dev_lock, flags)
|
|||||||
...
|
...
|
||||||
|
|
||||||
In the case above, the device may receive newval2 before it receives newval,
|
In the case above, the device may receive newval2 before it receives newval,
|
||||||
which could cause problems. Fixing it is easy enough though:
|
which could cause problems. Fixing it is easy enough though::
|
||||||
|
|
||||||
...
|
...
|
||||||
CPU A: spin_lock_irqsave(&dev_lock, flags)
|
CPU A: spin_lock_irqsave(&dev_lock, flags)
|
||||||
|
|||||||
@@ -1,49 +1,50 @@
|
|||||||
|
=====================
|
||||||
I/O statistics fields
|
I/O statistics fields
|
||||||
---------------
|
=====================
|
||||||
|
|
||||||
Since 2.4.20 (and some versions before, with patches), and 2.5.45,
|
Since 2.4.20 (and some versions before, with patches), and 2.5.45,
|
||||||
more extensive disk statistics have been introduced to help measure disk
|
more extensive disk statistics have been introduced to help measure disk
|
||||||
activity. Tools such as sar and iostat typically interpret these and do
|
activity. Tools such as ``sar`` and ``iostat`` typically interpret these and do
|
||||||
the work for you, but in case you are interested in creating your own
|
the work for you, but in case you are interested in creating your own
|
||||||
tools, the fields are explained here.
|
tools, the fields are explained here.
|
||||||
|
|
||||||
In 2.4 now, the information is found as additional fields in
|
In 2.4 now, the information is found as additional fields in
|
||||||
/proc/partitions. In 2.6, the same information is found in two
|
``/proc/partitions``. In 2.6 and upper, the same information is found in two
|
||||||
places: one is in the file /proc/diskstats, and the other is within
|
places: one is in the file ``/proc/diskstats``, and the other is within
|
||||||
the sysfs file system, which must be mounted in order to obtain
|
the sysfs file system, which must be mounted in order to obtain
|
||||||
the information. Throughout this document we'll assume that sysfs
|
the information. Throughout this document we'll assume that sysfs
|
||||||
is mounted on /sys, although of course it may be mounted anywhere.
|
is mounted on ``/sys``, although of course it may be mounted anywhere.
|
||||||
Both /proc/diskstats and sysfs use the same source for the information
|
Both ``/proc/diskstats`` and sysfs use the same source for the information
|
||||||
and so should not differ.
|
and so should not differ.
|
||||||
|
|
||||||
Here are examples of these different formats:
|
Here are examples of these different formats::
|
||||||
|
|
||||||
2.4:
|
2.4:
|
||||||
3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||||
3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
|
3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030
|
||||||
|
|
||||||
|
2.6+ sysfs:
|
||||||
2.6 sysfs:
|
|
||||||
446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||||
35486 38030 38030 38030
|
35486 38030 38030 38030
|
||||||
|
|
||||||
2.6 diskstats:
|
2.6+ diskstats:
|
||||||
3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160
|
||||||
3 1 hda1 35486 38030 38030 38030
|
3 1 hda1 35486 38030 38030 38030
|
||||||
|
|
||||||
On 2.4 you might execute "grep 'hda ' /proc/partitions". On 2.6, you have
|
On 2.4 you might execute ``grep 'hda ' /proc/partitions``. On 2.6+, you have
|
||||||
a choice of "cat /sys/block/hda/stat" or "grep 'hda ' /proc/diskstats".
|
a choice of ``cat /sys/block/hda/stat`` or ``grep 'hda ' /proc/diskstats``.
|
||||||
|
|
||||||
The advantage of one over the other is that the sysfs choice works well
|
The advantage of one over the other is that the sysfs choice works well
|
||||||
if you are watching a known, small set of disks. /proc/diskstats may
|
if you are watching a known, small set of disks. ``/proc/diskstats`` may
|
||||||
be a better choice if you are watching a large number of disks because
|
be a better choice if you are watching a large number of disks because
|
||||||
you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
|
you'll avoid the overhead of 50, 100, or 500 or more opens/closes with
|
||||||
each snapshot of your disk statistics.
|
each snapshot of your disk statistics.
|
||||||
|
|
||||||
In 2.4, the statistics fields are those after the device name. In
|
In 2.4, the statistics fields are those after the device name. In
|
||||||
the above example, the first field of statistics would be 446216.
|
the above example, the first field of statistics would be 446216.
|
||||||
By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll
|
By contrast, in 2.6+ if you look at ``/sys/block/hda/stat``, you'll
|
||||||
find just the eleven fields, beginning with 446216. If you look at
|
find just the eleven fields, beginning with 446216. If you look at
|
||||||
/proc/diskstats, the eleven fields will be preceded by the major and
|
``/proc/diskstats``, the eleven fields will be preceded by the major and
|
||||||
minor device numbers, and device name. Each of these formats provides
|
minor device numbers, and device name. Each of these formats provides
|
||||||
eleven fields of statistics, each meaning exactly the same things.
|
eleven fields of statistics, each meaning exactly the same things.
|
||||||
All fields except field 9 are cumulative since boot. Field 9 should
|
All fields except field 9 are cumulative since boot. Field 9 should
|
||||||
@@ -59,30 +60,40 @@ system-wide stats you'll have to find all the devices and sum them all up.
|
|||||||
|
|
||||||
Field 1 -- # of reads completed
|
Field 1 -- # of reads completed
|
||||||
This is the total number of reads completed successfully.
|
This is the total number of reads completed successfully.
|
||||||
|
|
||||||
Field 2 -- # of reads merged, field 6 -- # of writes merged
|
Field 2 -- # of reads merged, field 6 -- # of writes merged
|
||||||
Reads and writes which are adjacent to each other may be merged for
|
Reads and writes which are adjacent to each other may be merged for
|
||||||
efficiency. Thus two 4K reads may become one 8K read before it is
|
efficiency. Thus two 4K reads may become one 8K read before it is
|
||||||
ultimately handed to the disk, and so it will be counted (and queued)
|
ultimately handed to the disk, and so it will be counted (and queued)
|
||||||
as only one I/O. This field lets you know how often this was done.
|
as only one I/O. This field lets you know how often this was done.
|
||||||
|
|
||||||
Field 3 -- # of sectors read
|
Field 3 -- # of sectors read
|
||||||
This is the total number of sectors read successfully.
|
This is the total number of sectors read successfully.
|
||||||
|
|
||||||
Field 4 -- # of milliseconds spent reading
|
Field 4 -- # of milliseconds spent reading
|
||||||
This is the total number of milliseconds spent by all reads (as
|
This is the total number of milliseconds spent by all reads (as
|
||||||
measured from __make_request() to end_that_request_last()).
|
measured from __make_request() to end_that_request_last()).
|
||||||
|
|
||||||
Field 5 -- # of writes completed
|
Field 5 -- # of writes completed
|
||||||
This is the total number of writes completed successfully.
|
This is the total number of writes completed successfully.
|
||||||
|
|
||||||
Field 6 -- # of writes merged
|
Field 6 -- # of writes merged
|
||||||
See the description of field 2.
|
See the description of field 2.
|
||||||
|
|
||||||
Field 7 -- # of sectors written
|
Field 7 -- # of sectors written
|
||||||
This is the total number of sectors written successfully.
|
This is the total number of sectors written successfully.
|
||||||
|
|
||||||
Field 8 -- # of milliseconds spent writing
|
Field 8 -- # of milliseconds spent writing
|
||||||
This is the total number of milliseconds spent by all writes (as
|
This is the total number of milliseconds spent by all writes (as
|
||||||
measured from __make_request() to end_that_request_last()).
|
measured from __make_request() to end_that_request_last()).
|
||||||
|
|
||||||
Field 9 -- # of I/Os currently in progress
|
Field 9 -- # of I/Os currently in progress
|
||||||
The only field that should go to zero. Incremented as requests are
|
The only field that should go to zero. Incremented as requests are
|
||||||
given to appropriate struct request_queue and decremented as they finish.
|
given to appropriate struct request_queue and decremented as they finish.
|
||||||
|
|
||||||
Field 10 -- # of milliseconds spent doing I/Os
|
Field 10 -- # of milliseconds spent doing I/Os
|
||||||
This field increases so long as field 9 is nonzero.
|
This field increases so long as field 9 is nonzero.
|
||||||
|
|
||||||
Field 11 -- weighted # of milliseconds spent doing I/Os
|
Field 11 -- weighted # of milliseconds spent doing I/Os
|
||||||
This field is incremented at each I/O start, I/O completion, I/O
|
This field is incremented at each I/O start, I/O completion, I/O
|
||||||
merge, or read of these stats by the number of I/Os in progress
|
merge, or read of these stats by the number of I/Os in progress
|
||||||
@@ -97,7 +108,7 @@ introduced when changes collide, so (for instance) adding up all the
|
|||||||
read I/Os issued per partition should equal those made to the disks ...
|
read I/Os issued per partition should equal those made to the disks ...
|
||||||
but due to the lack of locking it may only be very close.
|
but due to the lack of locking it may only be very close.
|
||||||
|
|
||||||
In 2.6, there are counters for each CPU, which make the lack of locking
|
In 2.6+, there are counters for each CPU, which make the lack of locking
|
||||||
almost a non-issue. When the statistics are read, the per-CPU counters
|
almost a non-issue. When the statistics are read, the per-CPU counters
|
||||||
are summed (possibly overflowing the unsigned long variable they are
|
are summed (possibly overflowing the unsigned long variable they are
|
||||||
summed to) and the result given to the user. There is no convenient
|
summed to) and the result given to the user. There is no convenient
|
||||||
@@ -106,22 +117,25 @@ user interface for accessing the per-CPU counters themselves.
|
|||||||
Disks vs Partitions
|
Disks vs Partitions
|
||||||
-------------------
|
-------------------
|
||||||
|
|
||||||
There were significant changes between 2.4 and 2.6 in the I/O subsystem.
|
There were significant changes between 2.4 and 2.6+ in the I/O subsystem.
|
||||||
As a result, some statistic information disappeared. The translation from
|
As a result, some statistic information disappeared. The translation from
|
||||||
a disk address relative to a partition to the disk address relative to
|
a disk address relative to a partition to the disk address relative to
|
||||||
the host disk happens much earlier. All merges and timings now happen
|
the host disk happens much earlier. All merges and timings now happen
|
||||||
at the disk level rather than at both the disk and partition level as
|
at the disk level rather than at both the disk and partition level as
|
||||||
in 2.4. Consequently, you'll see a different statistics output on 2.6 for
|
in 2.4. Consequently, you'll see a different statistics output on 2.6+ for
|
||||||
partitions from that for disks. There are only *four* fields available
|
partitions from that for disks. There are only *four* fields available
|
||||||
for partitions on 2.6 machines. This is reflected in the examples above.
|
for partitions on 2.6+ machines. This is reflected in the examples above.
|
||||||
|
|
||||||
Field 1 -- # of reads issued
|
Field 1 -- # of reads issued
|
||||||
This is the total number of reads issued to this partition.
|
This is the total number of reads issued to this partition.
|
||||||
|
|
||||||
Field 2 -- # of sectors read
|
Field 2 -- # of sectors read
|
||||||
This is the total number of sectors requested to be read from this
|
This is the total number of sectors requested to be read from this
|
||||||
partition.
|
partition.
|
||||||
|
|
||||||
Field 3 -- # of writes issued
|
Field 3 -- # of writes issued
|
||||||
This is the total number of writes issued to this partition.
|
This is the total number of writes issued to this partition.
|
||||||
|
|
||||||
Field 4 -- # of sectors written
|
Field 4 -- # of sectors written
|
||||||
This is the total number of sectors requested to be written to
|
This is the total number of sectors requested to be written to
|
||||||
this partition.
|
this partition.
|
||||||
@@ -149,16 +163,16 @@ to some (probably insignificant) inaccuracy.
|
|||||||
Additional notes
|
Additional notes
|
||||||
----------------
|
----------------
|
||||||
|
|
||||||
In 2.6, sysfs is not mounted by default. If your distribution of
|
In 2.6+, sysfs is not mounted by default. If your distribution of
|
||||||
Linux hasn't added it already, here's the line you'll want to add to
|
Linux hasn't added it already, here's the line you'll want to add to
|
||||||
your /etc/fstab:
|
your ``/etc/fstab``::
|
||||||
|
|
||||||
none /sys sysfs defaults 0 0
|
none /sys sysfs defaults 0 0
|
||||||
|
|
||||||
|
|
||||||
In 2.6, all disk statistics were removed from /proc/stat. In 2.4, they
|
In 2.6+, all disk statistics were removed from ``/proc/stat``. In 2.4, they
|
||||||
appear in both /proc/partitions and /proc/stat, although the ones in
|
appear in both ``/proc/partitions`` and ``/proc/stat``, although the ones in
|
||||||
/proc/stat take a very different format from those in /proc/partitions
|
``/proc/stat`` take a very different format from those in ``/proc/partitions``
|
||||||
(see proc(5), if your system has it.)
|
(see proc(5), if your system has it.)
|
||||||
|
|
||||||
-- ricklind@us.ibm.com
|
-- ricklind@us.ibm.com
|
||||||
|
|||||||
@@ -1,8 +1,10 @@
|
|||||||
|
=======================
|
||||||
IRQ-flags state tracing
|
IRQ-flags state tracing
|
||||||
|
=======================
|
||||||
|
|
||||||
started by Ingo Molnar <mingo@redhat.com>
|
:Author: started by Ingo Molnar <mingo@redhat.com>
|
||||||
|
|
||||||
the "irq-flags tracing" feature "traces" hardirq and softirq state, in
|
The "irq-flags tracing" feature "traces" hardirq and softirq state, in
|
||||||
that it gives interested subsystems an opportunity to be notified of
|
that it gives interested subsystems an opportunity to be notified of
|
||||||
every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
|
every hardirqs-off/hardirqs-on, softirqs-off/softirqs-on event that
|
||||||
happens in the kernel.
|
happens in the kernel.
|
||||||
@@ -14,7 +16,7 @@ CONFIG_PROVE_RWSEM_LOCKING will be offered on an architecture - these
|
|||||||
are locking APIs that are not used in IRQ context. (the one exception
|
are locking APIs that are not used in IRQ context. (the one exception
|
||||||
for rwsems is worked around)
|
for rwsems is worked around)
|
||||||
|
|
||||||
architecture support for this is certainly not in the "trivial"
|
Architecture support for this is certainly not in the "trivial"
|
||||||
category, because lots of lowlevel assembly code deal with irq-flags
|
category, because lots of lowlevel assembly code deal with irq-flags
|
||||||
state changes. But an architecture can be irq-flags-tracing enabled in a
|
state changes. But an architecture can be irq-flags-tracing enabled in a
|
||||||
rather straightforward and risk-free manner.
|
rather straightforward and risk-free manner.
|
||||||
@@ -41,7 +43,7 @@ irq-flags-tracing support:
|
|||||||
excluded from the irq-tracing [and lock validation] mechanism via
|
excluded from the irq-tracing [and lock validation] mechanism via
|
||||||
lockdep_off()/lockdep_on().
|
lockdep_off()/lockdep_on().
|
||||||
|
|
||||||
in general there is no risk from having an incomplete irq-flags-tracing
|
In general there is no risk from having an incomplete irq-flags-tracing
|
||||||
implementation in an architecture: lockdep will detect that and will
|
implementation in an architecture: lockdep will detect that and will
|
||||||
turn itself off. I.e. the lock validator will still be reliable. There
|
turn itself off. I.e. the lock validator will still be reliable. There
|
||||||
should be no crashes due to irq-tracing bugs. (except if the assembly
|
should be no crashes due to irq-tracing bugs. (except if the assembly
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
|
===========
|
||||||
ISA Drivers
|
ISA Drivers
|
||||||
-----------
|
===========
|
||||||
|
|
||||||
The following text is adapted from the commit message of the initial
|
The following text is adapted from the commit message of the initial
|
||||||
commit of the ISA bus driver authored by Rene Herman.
|
commit of the ISA bus driver authored by Rene Herman.
|
||||||
@@ -23,7 +24,7 @@ that all device creation has been made internal as well.
|
|||||||
|
|
||||||
The usage model this provides is nice, and has been acked from the ALSA
|
The usage model this provides is nice, and has been acked from the ALSA
|
||||||
side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's
|
side by Takashi Iwai and Jaroslav Kysela. The ALSA driver module_init's
|
||||||
now (for oldisa-only drivers) become:
|
now (for oldisa-only drivers) become::
|
||||||
|
|
||||||
static int __init alsa_card_foo_init(void)
|
static int __init alsa_card_foo_init(void)
|
||||||
{
|
{
|
||||||
@@ -47,11 +48,11 @@ parameter, indicating how many devices to create and call our methods
|
|||||||
with.
|
with.
|
||||||
|
|
||||||
The platform_driver callbacks are called with a platform_device param;
|
The platform_driver callbacks are called with a platform_device param;
|
||||||
the isa_driver callbacks are being called with a "struct device *dev,
|
the isa_driver callbacks are being called with a ``struct device *dev,
|
||||||
unsigned int id" pair directly -- with the device creation completely
|
unsigned int id`` pair directly -- with the device creation completely
|
||||||
internal to the bus it's much cleaner to not leak isa_dev's by passing
|
internal to the bus it's much cleaner to not leak isa_dev's by passing
|
||||||
them in at all. The id is the only thing we ever want other then the
|
them in at all. The id is the only thing we ever want other then the
|
||||||
struct device * anyways, and it makes for nicer code in the callbacks as
|
struct device anyways, and it makes for nicer code in the callbacks as
|
||||||
well.
|
well.
|
||||||
|
|
||||||
With this additional .match() callback ISA drivers have all options. If
|
With this additional .match() callback ISA drivers have all options. If
|
||||||
@@ -75,7 +76,7 @@ This exports only two functions; isa_{,un}register_driver().
|
|||||||
|
|
||||||
isa_register_driver() register's the struct device_driver, and then
|
isa_register_driver() register's the struct device_driver, and then
|
||||||
loops over the passed in ndev creating devices and registering them.
|
loops over the passed in ndev creating devices and registering them.
|
||||||
This causes the bus match method to be called for them, which is:
|
This causes the bus match method to be called for them, which is::
|
||||||
|
|
||||||
int isa_bus_match(struct device *dev, struct device_driver *driver)
|
int isa_bus_match(struct device *dev, struct device_driver *driver)
|
||||||
{
|
{
|
||||||
@@ -102,7 +103,7 @@ well.
|
|||||||
Then, if the the driver did not provide a .match, it matches. If it did,
|
Then, if the the driver did not provide a .match, it matches. If it did,
|
||||||
the driver match() method is called to determine a match.
|
the driver match() method is called to determine a match.
|
||||||
|
|
||||||
If it did _not_ match, dev->platform_data is reset to indicate this to
|
If it did **not** match, dev->platform_data is reset to indicate this to
|
||||||
isa_register_driver which can then unregister the device again.
|
isa_register_driver which can then unregister the device again.
|
||||||
|
|
||||||
If during all this, there's any error, or no devices matched at all
|
If during all this, there's any error, or no devices matched at all
|
||||||
|
|||||||
@@ -1,3 +1,4 @@
|
|||||||
|
==========================================================
|
||||||
ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
|
ISA Plug & Play support by Jaroslav Kysela <perex@suse.cz>
|
||||||
==========================================================
|
==========================================================
|
||||||
|
|
||||||
|
|||||||
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
|
|||||||
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
2) Or use the system kernel binary itself as dump-capture kernel and there is
|
||||||
no need to build a separate dump-capture kernel. This is possible
|
no need to build a separate dump-capture kernel. This is possible
|
||||||
only with the architectures which support a relocatable kernel. As
|
only with the architectures which support a relocatable kernel. As
|
||||||
of today, i386, x86_64, ppc64, ia64 and arm architectures support relocatable
|
of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
|
||||||
kernel.
|
relocatable kernel.
|
||||||
|
|
||||||
Building a relocatable kernel is advantageous from the point of view that
|
Building a relocatable kernel is advantageous from the point of view that
|
||||||
one does not have to build a second kernel for capturing the dump. But
|
one does not have to build a second kernel for capturing the dump. But
|
||||||
@@ -339,7 +339,7 @@ For arm:
|
|||||||
For arm64:
|
For arm64:
|
||||||
- Use vmlinux or Image
|
- Use vmlinux or Image
|
||||||
|
|
||||||
If you are using a uncompressed vmlinux image then use following command
|
If you are using an uncompressed vmlinux image then use following command
|
||||||
to load dump-capture kernel.
|
to load dump-capture kernel.
|
||||||
|
|
||||||
kexec -p <dump-capture-kernel-vmlinux-image> \
|
kexec -p <dump-capture-kernel-vmlinux-image> \
|
||||||
@@ -361,6 +361,12 @@ to load dump-capture kernel.
|
|||||||
--dtb=<dtb-for-dump-capture-kernel> \
|
--dtb=<dtb-for-dump-capture-kernel> \
|
||||||
--append="root=<root-dev> <arch-specific-options>"
|
--append="root=<root-dev> <arch-specific-options>"
|
||||||
|
|
||||||
|
If you are using an uncompressed Image, then use following command
|
||||||
|
to load dump-capture kernel.
|
||||||
|
|
||||||
|
kexec -p <dump-capture-kernel-Image> \
|
||||||
|
--initrd=<initrd-for-dump-capture-kernel> \
|
||||||
|
--append="root=<root-dev> <arch-specific-options>"
|
||||||
|
|
||||||
Please note, that --args-linux does not need to be specified for ia64.
|
Please note, that --args-linux does not need to be specified for ia64.
|
||||||
It is planned to make this a no-op on that architecture, but for now
|
It is planned to make this a no-op on that architecture, but for now
|
||||||
|
|||||||
@@ -1,27 +1,29 @@
|
|||||||
REDUCING OS JITTER DUE TO PER-CPU KTHREADS
|
==========================================
|
||||||
|
Reducing OS jitter due to per-cpu kthreads
|
||||||
|
==========================================
|
||||||
|
|
||||||
This document lists per-CPU kthreads in the Linux kernel and presents
|
This document lists per-CPU kthreads in the Linux kernel and presents
|
||||||
options to control their OS jitter. Note that non-per-CPU kthreads are
|
options to control their OS jitter. Note that non-per-CPU kthreads are
|
||||||
not listed here. To reduce OS jitter from non-per-CPU kthreads, bind
|
not listed here. To reduce OS jitter from non-per-CPU kthreads, bind
|
||||||
them to a "housekeeping" CPU dedicated to such work.
|
them to a "housekeeping" CPU dedicated to such work.
|
||||||
|
|
||||||
|
References
|
||||||
|
==========
|
||||||
|
|
||||||
REFERENCES
|
- Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
|
||||||
|
|
||||||
o Documentation/IRQ-affinity.txt: Binding interrupts to sets of CPUs.
|
- Documentation/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
|
||||||
|
|
||||||
o Documentation/cgroup-v1: Using cgroups to bind tasks to sets of CPUs.
|
- man taskset: Using the taskset command to bind tasks to sets
|
||||||
|
|
||||||
o man taskset: Using the taskset command to bind tasks to sets
|
|
||||||
of CPUs.
|
of CPUs.
|
||||||
|
|
||||||
o man sched_setaffinity: Using the sched_setaffinity() system
|
- man sched_setaffinity: Using the sched_setaffinity() system
|
||||||
call to bind tasks to sets of CPUs.
|
call to bind tasks to sets of CPUs.
|
||||||
|
|
||||||
o /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state,
|
- /sys/devices/system/cpu/cpuN/online: Control CPU N's hotplug state,
|
||||||
writing "0" to offline and "1" to online.
|
writing "0" to offline and "1" to online.
|
||||||
|
|
||||||
o In order to locate kernel-generated OS jitter on CPU N:
|
- In order to locate kernel-generated OS jitter on CPU N:
|
||||||
|
|
||||||
cd /sys/kernel/debug/tracing
|
cd /sys/kernel/debug/tracing
|
||||||
echo 1 > max_graph_depth # Increase the "1" for more detail
|
echo 1 > max_graph_depth # Increase the "1" for more detail
|
||||||
@@ -29,12 +31,17 @@ o In order to locate kernel-generated OS jitter on CPU N:
|
|||||||
# run workload
|
# run workload
|
||||||
cat per_cpu/cpuN/trace
|
cat per_cpu/cpuN/trace
|
||||||
|
|
||||||
|
kthreads
|
||||||
|
========
|
||||||
|
|
||||||
KTHREADS
|
Name:
|
||||||
|
ehca_comp/%u
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Periodically process Infiniband-related work.
|
||||||
|
|
||||||
Name: ehca_comp/%u
|
|
||||||
Purpose: Periodically process Infiniband-related work.
|
|
||||||
To reduce its OS jitter, do any of the following:
|
To reduce its OS jitter, do any of the following:
|
||||||
|
|
||||||
1. Don't use eHCA Infiniband hardware, instead choosing hardware
|
1. Don't use eHCA Infiniband hardware, instead choosing hardware
|
||||||
that does not require per-CPU kthreads. This will prevent these
|
that does not require per-CPU kthreads. This will prevent these
|
||||||
kthreads from being created in the first place. (This will
|
kthreads from being created in the first place. (This will
|
||||||
@@ -46,26 +53,45 @@ To reduce its OS jitter, do any of the following:
|
|||||||
provisioned only on selected CPUs.
|
provisioned only on selected CPUs.
|
||||||
|
|
||||||
|
|
||||||
Name: irq/%d-%s
|
Name:
|
||||||
Purpose: Handle threaded interrupts.
|
irq/%d-%s
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Handle threaded interrupts.
|
||||||
|
|
||||||
To reduce its OS jitter, do the following:
|
To reduce its OS jitter, do the following:
|
||||||
|
|
||||||
1. Use irq affinity to force the irq threads to execute on
|
1. Use irq affinity to force the irq threads to execute on
|
||||||
some other CPU.
|
some other CPU.
|
||||||
|
|
||||||
Name: kcmtpd_ctr_%d
|
Name:
|
||||||
Purpose: Handle Bluetooth work.
|
kcmtpd_ctr_%d
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Handle Bluetooth work.
|
||||||
|
|
||||||
To reduce its OS jitter, do one of the following:
|
To reduce its OS jitter, do one of the following:
|
||||||
|
|
||||||
1. Don't use Bluetooth, in which case these kthreads won't be
|
1. Don't use Bluetooth, in which case these kthreads won't be
|
||||||
created in the first place.
|
created in the first place.
|
||||||
2. Use irq affinity to force Bluetooth-related interrupts to
|
2. Use irq affinity to force Bluetooth-related interrupts to
|
||||||
occur on some other CPU and furthermore initiate all
|
occur on some other CPU and furthermore initiate all
|
||||||
Bluetooth activity on some other CPU.
|
Bluetooth activity on some other CPU.
|
||||||
|
|
||||||
Name: ksoftirqd/%u
|
Name:
|
||||||
Purpose: Execute softirq handlers when threaded or when under heavy load.
|
ksoftirqd/%u
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Execute softirq handlers when threaded or when under heavy load.
|
||||||
|
|
||||||
To reduce its OS jitter, each softirq vector must be handled
|
To reduce its OS jitter, each softirq vector must be handled
|
||||||
separately as follows:
|
separately as follows:
|
||||||
TIMER_SOFTIRQ: Do all of the following:
|
|
||||||
|
TIMER_SOFTIRQ
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. To the extent possible, keep the CPU out of the kernel when it
|
1. To the extent possible, keep the CPU out of the kernel when it
|
||||||
is non-idle, for example, by avoiding system calls and by forcing
|
is non-idle, for example, by avoiding system calls and by forcing
|
||||||
both kernel threads and interrupts to execute elsewhere.
|
both kernel threads and interrupts to execute elsewhere.
|
||||||
@@ -76,34 +102,59 @@ TIMER_SOFTIRQ: Do all of the following:
|
|||||||
first one back online. Once you have onlined the CPUs in question,
|
first one back online. Once you have onlined the CPUs in question,
|
||||||
do not offline any other CPUs, because doing so could force the
|
do not offline any other CPUs, because doing so could force the
|
||||||
timer back onto one of the CPUs in question.
|
timer back onto one of the CPUs in question.
|
||||||
NET_TX_SOFTIRQ and NET_RX_SOFTIRQ: Do all of the following:
|
|
||||||
|
NET_TX_SOFTIRQ and NET_RX_SOFTIRQ
|
||||||
|
---------------------------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. Force networking interrupts onto other CPUs.
|
1. Force networking interrupts onto other CPUs.
|
||||||
2. Initiate any network I/O on other CPUs.
|
2. Initiate any network I/O on other CPUs.
|
||||||
3. Once your application has started, prevent CPU-hotplug operations
|
3. Once your application has started, prevent CPU-hotplug operations
|
||||||
from being initiated from tasks that might run on the CPU to
|
from being initiated from tasks that might run on the CPU to
|
||||||
be de-jittered. (It is OK to force this CPU offline and then
|
be de-jittered. (It is OK to force this CPU offline and then
|
||||||
bring it back online before you start your application.)
|
bring it back online before you start your application.)
|
||||||
BLOCK_SOFTIRQ: Do all of the following:
|
|
||||||
|
BLOCK_SOFTIRQ
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. Force block-device interrupts onto some other CPU.
|
1. Force block-device interrupts onto some other CPU.
|
||||||
2. Initiate any block I/O on other CPUs.
|
2. Initiate any block I/O on other CPUs.
|
||||||
3. Once your application has started, prevent CPU-hotplug operations
|
3. Once your application has started, prevent CPU-hotplug operations
|
||||||
from being initiated from tasks that might run on the CPU to
|
from being initiated from tasks that might run on the CPU to
|
||||||
be de-jittered. (It is OK to force this CPU offline and then
|
be de-jittered. (It is OK to force this CPU offline and then
|
||||||
bring it back online before you start your application.)
|
bring it back online before you start your application.)
|
||||||
IRQ_POLL_SOFTIRQ: Do all of the following:
|
|
||||||
|
IRQ_POLL_SOFTIRQ
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. Force block-device interrupts onto some other CPU.
|
1. Force block-device interrupts onto some other CPU.
|
||||||
2. Initiate any block I/O and block-I/O polling on other CPUs.
|
2. Initiate any block I/O and block-I/O polling on other CPUs.
|
||||||
3. Once your application has started, prevent CPU-hotplug operations
|
3. Once your application has started, prevent CPU-hotplug operations
|
||||||
from being initiated from tasks that might run on the CPU to
|
from being initiated from tasks that might run on the CPU to
|
||||||
be de-jittered. (It is OK to force this CPU offline and then
|
be de-jittered. (It is OK to force this CPU offline and then
|
||||||
bring it back online before you start your application.)
|
bring it back online before you start your application.)
|
||||||
TASKLET_SOFTIRQ: Do one or more of the following:
|
|
||||||
|
TASKLET_SOFTIRQ
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Do one or more of the following:
|
||||||
|
|
||||||
1. Avoid use of drivers that use tasklets. (Such drivers will contain
|
1. Avoid use of drivers that use tasklets. (Such drivers will contain
|
||||||
calls to things like tasklet_schedule().)
|
calls to things like tasklet_schedule().)
|
||||||
2. Convert all drivers that you must use from tasklets to workqueues.
|
2. Convert all drivers that you must use from tasklets to workqueues.
|
||||||
3. Force interrupts for drivers using tasklets onto other CPUs,
|
3. Force interrupts for drivers using tasklets onto other CPUs,
|
||||||
and also do I/O involving these drivers on other CPUs.
|
and also do I/O involving these drivers on other CPUs.
|
||||||
SCHED_SOFTIRQ: Do all of the following:
|
|
||||||
|
SCHED_SOFTIRQ
|
||||||
|
-------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
|
1. Avoid sending scheduler IPIs to the CPU to be de-jittered,
|
||||||
for example, ensure that at most one runnable kthread is present
|
for example, ensure that at most one runnable kthread is present
|
||||||
on that CPU. If a thread that expects to run on the de-jittered
|
on that CPU. If a thread that expects to run on the de-jittered
|
||||||
@@ -120,7 +171,12 @@ SCHED_SOFTIRQ: Do all of the following:
|
|||||||
forcing both kernel threads and interrupts to execute elsewhere.
|
forcing both kernel threads and interrupts to execute elsewhere.
|
||||||
This further reduces the number of scheduler-clock interrupts
|
This further reduces the number of scheduler-clock interrupts
|
||||||
received by the de-jittered CPU.
|
received by the de-jittered CPU.
|
||||||
HRTIMER_SOFTIRQ: Do all of the following:
|
|
||||||
|
HRTIMER_SOFTIRQ
|
||||||
|
---------------
|
||||||
|
|
||||||
|
Do all of the following:
|
||||||
|
|
||||||
1. To the extent possible, keep the CPU out of the kernel when it
|
1. To the extent possible, keep the CPU out of the kernel when it
|
||||||
is non-idle. For example, avoid system calls and force both
|
is non-idle. For example, avoid system calls and force both
|
||||||
kernel threads and interrupts to execute elsewhere.
|
kernel threads and interrupts to execute elsewhere.
|
||||||
@@ -131,9 +187,15 @@ HRTIMER_SOFTIRQ: Do all of the following:
|
|||||||
back online. Once you have onlined the CPUs in question, do not
|
back online. Once you have onlined the CPUs in question, do not
|
||||||
offline any other CPUs, because doing so could force the timer
|
offline any other CPUs, because doing so could force the timer
|
||||||
back onto one of the CPUs in question.
|
back onto one of the CPUs in question.
|
||||||
RCU_SOFTIRQ: Do at least one of the following:
|
|
||||||
|
RCU_SOFTIRQ
|
||||||
|
-----------
|
||||||
|
|
||||||
|
Do at least one of the following:
|
||||||
|
|
||||||
1. Offload callbacks and keep the CPU in either dyntick-idle or
|
1. Offload callbacks and keep the CPU in either dyntick-idle or
|
||||||
adaptive-ticks state by doing all of the following:
|
adaptive-ticks state by doing all of the following:
|
||||||
|
|
||||||
a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
|
a. CONFIG_NO_HZ_FULL=y and ensure that the CPU to be
|
||||||
de-jittered is marked as an adaptive-ticks CPU using the
|
de-jittered is marked as an adaptive-ticks CPU using the
|
||||||
"nohz_full=" boot parameter. Bind the rcuo kthreads to
|
"nohz_full=" boot parameter. Bind the rcuo kthreads to
|
||||||
@@ -142,8 +204,10 @@ RCU_SOFTIRQ: Do at least one of the following:
|
|||||||
when it is non-idle, for example, by avoiding system
|
when it is non-idle, for example, by avoiding system
|
||||||
calls and by forcing both kernel threads and interrupts
|
calls and by forcing both kernel threads and interrupts
|
||||||
to execute elsewhere.
|
to execute elsewhere.
|
||||||
|
|
||||||
2. Enable RCU to do its processing remotely via dyntick-idle by
|
2. Enable RCU to do its processing remotely via dyntick-idle by
|
||||||
doing all of the following:
|
doing all of the following:
|
||||||
|
|
||||||
a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
|
a. Build with CONFIG_NO_HZ=y and CONFIG_RCU_FAST_NO_HZ=y.
|
||||||
b. Ensure that the CPU goes idle frequently, allowing other
|
b. Ensure that the CPU goes idle frequently, allowing other
|
||||||
CPUs to detect that it has passed through an RCU quiescent
|
CPUs to detect that it has passed through an RCU quiescent
|
||||||
@@ -155,15 +219,20 @@ RCU_SOFTIRQ: Do at least one of the following:
|
|||||||
calls and by forcing both kernel threads and interrupts
|
calls and by forcing both kernel threads and interrupts
|
||||||
to execute elsewhere.
|
to execute elsewhere.
|
||||||
|
|
||||||
Name: kworker/%u:%d%s (cpu, id, priority)
|
Name:
|
||||||
Purpose: Execute workqueue requests
|
kworker/%u:%d%s (cpu, id, priority)
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Execute workqueue requests
|
||||||
|
|
||||||
To reduce its OS jitter, do any of the following:
|
To reduce its OS jitter, do any of the following:
|
||||||
|
|
||||||
1. Run your workload at a real-time priority, which will allow
|
1. Run your workload at a real-time priority, which will allow
|
||||||
preempting the kworker daemons.
|
preempting the kworker daemons.
|
||||||
2. A given workqueue can be made visible in the sysfs filesystem
|
2. A given workqueue can be made visible in the sysfs filesystem
|
||||||
by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
|
by passing the WQ_SYSFS to that workqueue's alloc_workqueue().
|
||||||
Such a workqueue can be confined to a given subset of the
|
Such a workqueue can be confined to a given subset of the
|
||||||
CPUs using the /sys/devices/virtual/workqueue/*/cpumask sysfs
|
CPUs using the ``/sys/devices/virtual/workqueue/*/cpumask`` sysfs
|
||||||
files. The set of WQ_SYSFS workqueues can be displayed using
|
files. The set of WQ_SYSFS workqueues can be displayed using
|
||||||
"ls sys/devices/virtual/workqueue". That said, the workqueues
|
"ls sys/devices/virtual/workqueue". That said, the workqueues
|
||||||
maintainer would like to caution people against indiscriminately
|
maintainer would like to caution people against indiscriminately
|
||||||
@@ -173,6 +242,7 @@ To reduce its OS jitter, do any of the following:
|
|||||||
to remove it, even if its addition was a mistake.
|
to remove it, even if its addition was a mistake.
|
||||||
3. Do any of the following needed to avoid jitter that your
|
3. Do any of the following needed to avoid jitter that your
|
||||||
application cannot tolerate:
|
application cannot tolerate:
|
||||||
|
|
||||||
a. Build your kernel with CONFIG_SLUB=y rather than
|
a. Build your kernel with CONFIG_SLUB=y rather than
|
||||||
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
|
CONFIG_SLAB=y, thus avoiding the slab allocator's periodic
|
||||||
use of each CPU's workqueues to run its cache_reap()
|
use of each CPU's workqueues to run its cache_reap()
|
||||||
@@ -186,6 +256,7 @@ To reduce its OS jitter, do any of the following:
|
|||||||
be able to build your kernel with CONFIG_CPU_FREQ=n to
|
be able to build your kernel with CONFIG_CPU_FREQ=n to
|
||||||
avoid the CPU-frequency governor periodically running
|
avoid the CPU-frequency governor periodically running
|
||||||
on each CPU, including cs_dbs_timer() and od_dbs_timer().
|
on each CPU, including cs_dbs_timer() and od_dbs_timer().
|
||||||
|
|
||||||
WARNING: Please check your CPU specifications to
|
WARNING: Please check your CPU specifications to
|
||||||
make sure that this is safe on your particular system.
|
make sure that this is safe on your particular system.
|
||||||
d. As of v3.18, Christoph Lameter's on-demand vmstat workers
|
d. As of v3.18, Christoph Lameter's on-demand vmstat workers
|
||||||
@@ -222,9 +293,14 @@ To reduce its OS jitter, do any of the following:
|
|||||||
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
|
CONFIG_PMAC_RACKMETER=n to disable the CPU-meter,
|
||||||
avoiding OS jitter from rackmeter_do_timer().
|
avoiding OS jitter from rackmeter_do_timer().
|
||||||
|
|
||||||
Name: rcuc/%u
|
Name:
|
||||||
Purpose: Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
|
rcuc/%u
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Execute RCU callbacks in CONFIG_RCU_BOOST=y kernels.
|
||||||
|
|
||||||
To reduce its OS jitter, do at least one of the following:
|
To reduce its OS jitter, do at least one of the following:
|
||||||
|
|
||||||
1. Build the kernel with CONFIG_PREEMPT=n. This prevents these
|
1. Build the kernel with CONFIG_PREEMPT=n. This prevents these
|
||||||
kthreads from being created in the first place, and also obviates
|
kthreads from being created in the first place, and also obviates
|
||||||
the need for RCU priority boosting. This approach is feasible
|
the need for RCU priority boosting. This approach is feasible
|
||||||
@@ -244,9 +320,14 @@ To reduce its OS jitter, do at least one of the following:
|
|||||||
CPU, again preventing the rcuc/%u kthreads from having any work
|
CPU, again preventing the rcuc/%u kthreads from having any work
|
||||||
to do.
|
to do.
|
||||||
|
|
||||||
Name: rcuob/%d, rcuop/%d, and rcuos/%d
|
Name:
|
||||||
Purpose: Offload RCU callbacks from the corresponding CPU.
|
rcuob/%d, rcuop/%d, and rcuos/%d
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Offload RCU callbacks from the corresponding CPU.
|
||||||
|
|
||||||
To reduce its OS jitter, do at least one of the following:
|
To reduce its OS jitter, do at least one of the following:
|
||||||
|
|
||||||
1. Use affinity, cgroups, or other mechanism to force these kthreads
|
1. Use affinity, cgroups, or other mechanism to force these kthreads
|
||||||
to execute on some other CPU.
|
to execute on some other CPU.
|
||||||
2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
|
2. Build with CONFIG_RCU_NOCB_CPU=n, which will prevent these
|
||||||
@@ -254,9 +335,14 @@ To reduce its OS jitter, do at least one of the following:
|
|||||||
note that this will not eliminate OS jitter, but will instead
|
note that this will not eliminate OS jitter, but will instead
|
||||||
shift it to RCU_SOFTIRQ.
|
shift it to RCU_SOFTIRQ.
|
||||||
|
|
||||||
Name: watchdog/%u
|
Name:
|
||||||
Purpose: Detect software lockups on each CPU.
|
watchdog/%u
|
||||||
|
|
||||||
|
Purpose:
|
||||||
|
Detect software lockups on each CPU.
|
||||||
|
|
||||||
To reduce its OS jitter, do at least one of the following:
|
To reduce its OS jitter, do at least one of the following:
|
||||||
|
|
||||||
1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
|
1. Build with CONFIG_LOCKUP_DETECTOR=n, which will prevent these
|
||||||
kthreads from being created in the first place.
|
kthreads from being created in the first place.
|
||||||
2. Boot with "nosoftlockup=0", which will also prevent these kthreads
|
2. Boot with "nosoftlockup=0", which will also prevent these kthreads
|
||||||
|
|||||||
@@ -1,13 +1,13 @@
|
|||||||
|
=====================================================================
|
||||||
Everything you never wanted to know about kobjects, ksets, and ktypes
|
Everything you never wanted to know about kobjects, ksets, and ktypes
|
||||||
|
=====================================================================
|
||||||
|
|
||||||
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
:Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||||||
|
:Last updated: December 19, 2007
|
||||||
|
|
||||||
Based on an original article by Jon Corbet for lwn.net written October 1,
|
Based on an original article by Jon Corbet for lwn.net written October 1,
|
||||||
2003 and located at http://lwn.net/Articles/51437/
|
2003 and located at http://lwn.net/Articles/51437/
|
||||||
|
|
||||||
Last updated December 19, 2007
|
|
||||||
|
|
||||||
|
|
||||||
Part of the difficulty in understanding the driver model - and the kobject
|
Part of the difficulty in understanding the driver model - and the kobject
|
||||||
abstraction upon which it is built - is that there is no obvious starting
|
abstraction upon which it is built - is that there is no obvious starting
|
||||||
place. Dealing with kobjects requires understanding a few different types,
|
place. Dealing with kobjects requires understanding a few different types,
|
||||||
@@ -47,6 +47,7 @@ approach will be taken, so we'll go back to kobjects.
|
|||||||
|
|
||||||
|
|
||||||
Embedding kobjects
|
Embedding kobjects
|
||||||
|
==================
|
||||||
|
|
||||||
It is rare for kernel code to create a standalone kobject, with one major
|
It is rare for kernel code to create a standalone kobject, with one major
|
||||||
exception explained below. Instead, kobjects are used to control access to
|
exception explained below. Instead, kobjects are used to control access to
|
||||||
@@ -65,7 +66,7 @@ their own, but are invariably found embedded in the larger objects of
|
|||||||
interest.)
|
interest.)
|
||||||
|
|
||||||
So, for example, the UIO code in drivers/uio/uio.c has a structure that
|
So, for example, the UIO code in drivers/uio/uio.c has a structure that
|
||||||
defines the memory region associated with a uio device:
|
defines the memory region associated with a uio device::
|
||||||
|
|
||||||
struct uio_map {
|
struct uio_map {
|
||||||
struct kobject kobj;
|
struct kobject kobj;
|
||||||
@@ -77,7 +78,7 @@ just a matter of using the kobj member. Code that works with kobjects will
|
|||||||
often have the opposite problem, however: given a struct kobject pointer,
|
often have the opposite problem, however: given a struct kobject pointer,
|
||||||
what is the pointer to the containing structure? You must avoid tricks
|
what is the pointer to the containing structure? You must avoid tricks
|
||||||
(such as assuming that the kobject is at the beginning of the structure)
|
(such as assuming that the kobject is at the beginning of the structure)
|
||||||
and, instead, use the container_of() macro, found in <linux/kernel.h>:
|
and, instead, use the container_of() macro, found in <linux/kernel.h>::
|
||||||
|
|
||||||
container_of(pointer, type, member)
|
container_of(pointer, type, member)
|
||||||
|
|
||||||
@@ -90,13 +91,13 @@ where:
|
|||||||
The return value from container_of() is a pointer to the corresponding
|
The return value from container_of() is a pointer to the corresponding
|
||||||
container type. So, for example, a pointer "kp" to a struct kobject
|
container type. So, for example, a pointer "kp" to a struct kobject
|
||||||
embedded *within* a struct uio_map could be converted to a pointer to the
|
embedded *within* a struct uio_map could be converted to a pointer to the
|
||||||
*containing* uio_map structure with:
|
*containing* uio_map structure with::
|
||||||
|
|
||||||
struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
|
struct uio_map *u_map = container_of(kp, struct uio_map, kobj);
|
||||||
|
|
||||||
For convenience, programmers often define a simple macro for "back-casting"
|
For convenience, programmers often define a simple macro for "back-casting"
|
||||||
kobject pointers to the containing type. Exactly this happens in the
|
kobject pointers to the containing type. Exactly this happens in the
|
||||||
earlier drivers/uio/uio.c, as you can see here:
|
earlier drivers/uio/uio.c, as you can see here::
|
||||||
|
|
||||||
struct uio_map {
|
struct uio_map {
|
||||||
struct kobject kobj;
|
struct kobject kobj;
|
||||||
@@ -106,23 +107,25 @@ earlier drivers/uio/uio.c, as you can see here:
|
|||||||
#define to_map(map) container_of(map, struct uio_map, kobj)
|
#define to_map(map) container_of(map, struct uio_map, kobj)
|
||||||
|
|
||||||
where the macro argument "map" is a pointer to the struct kobject in
|
where the macro argument "map" is a pointer to the struct kobject in
|
||||||
question. That macro is subsequently invoked with:
|
question. That macro is subsequently invoked with::
|
||||||
|
|
||||||
struct uio_map *map = to_map(kobj);
|
struct uio_map *map = to_map(kobj);
|
||||||
|
|
||||||
|
|
||||||
Initialization of kobjects
|
Initialization of kobjects
|
||||||
|
==========================
|
||||||
|
|
||||||
Code which creates a kobject must, of course, initialize that object. Some
|
Code which creates a kobject must, of course, initialize that object. Some
|
||||||
of the internal fields are setup with a (mandatory) call to kobject_init():
|
of the internal fields are setup with a (mandatory) call to kobject_init()::
|
||||||
|
|
||||||
void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
|
void kobject_init(struct kobject *kobj, struct kobj_type *ktype);
|
||||||
|
|
||||||
The ktype is required for a kobject to be created properly, as every kobject
|
The ktype is required for a kobject to be created properly, as every kobject
|
||||||
must have an associated kobj_type. After calling kobject_init(), to
|
must have an associated kobj_type. After calling kobject_init(), to
|
||||||
register the kobject with sysfs, the function kobject_add() must be called:
|
register the kobject with sysfs, the function kobject_add() must be called::
|
||||||
|
|
||||||
int kobject_add(struct kobject *kobj, struct kobject *parent, const char *fmt, ...);
|
int kobject_add(struct kobject *kobj, struct kobject *parent,
|
||||||
|
const char *fmt, ...);
|
||||||
|
|
||||||
This sets up the parent of the kobject and the name for the kobject
|
This sets up the parent of the kobject and the name for the kobject
|
||||||
properly. If the kobject is to be associated with a specific kset,
|
properly. If the kobject is to be associated with a specific kset,
|
||||||
@@ -133,7 +136,7 @@ kset itself.
|
|||||||
|
|
||||||
As the name of the kobject is set when it is added to the kernel, the name
|
As the name of the kobject is set when it is added to the kernel, the name
|
||||||
of the kobject should never be manipulated directly. If you must change
|
of the kobject should never be manipulated directly. If you must change
|
||||||
the name of the kobject, call kobject_rename():
|
the name of the kobject, call kobject_rename()::
|
||||||
|
|
||||||
int kobject_rename(struct kobject *kobj, const char *new_name);
|
int kobject_rename(struct kobject *kobj, const char *new_name);
|
||||||
|
|
||||||
@@ -146,12 +149,12 @@ is being removed. If your code needs to call this function, it is
|
|||||||
incorrect and needs to be fixed.
|
incorrect and needs to be fixed.
|
||||||
|
|
||||||
To properly access the name of the kobject, use the function
|
To properly access the name of the kobject, use the function
|
||||||
kobject_name():
|
kobject_name()::
|
||||||
|
|
||||||
const char *kobject_name(const struct kobject * kobj);
|
const char *kobject_name(const struct kobject * kobj);
|
||||||
|
|
||||||
There is a helper function to both initialize and add the kobject to the
|
There is a helper function to both initialize and add the kobject to the
|
||||||
kernel at the same time, called surprisingly enough kobject_init_and_add():
|
kernel at the same time, called surprisingly enough kobject_init_and_add()::
|
||||||
|
|
||||||
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
|
int kobject_init_and_add(struct kobject *kobj, struct kobj_type *ktype,
|
||||||
struct kobject *parent, const char *fmt, ...);
|
struct kobject *parent, const char *fmt, ...);
|
||||||
@@ -161,10 +164,11 @@ kobject_add() functions described above.
|
|||||||
|
|
||||||
|
|
||||||
Uevents
|
Uevents
|
||||||
|
=======
|
||||||
|
|
||||||
After a kobject has been registered with the kobject core, you need to
|
After a kobject has been registered with the kobject core, you need to
|
||||||
announce to the world that it has been created. This can be done with a
|
announce to the world that it has been created. This can be done with a
|
||||||
call to kobject_uevent():
|
call to kobject_uevent()::
|
||||||
|
|
||||||
int kobject_uevent(struct kobject *kobj, enum kobject_action action);
|
int kobject_uevent(struct kobject *kobj, enum kobject_action action);
|
||||||
|
|
||||||
@@ -180,11 +184,12 @@ hand.
|
|||||||
|
|
||||||
|
|
||||||
Reference counts
|
Reference counts
|
||||||
|
================
|
||||||
|
|
||||||
One of the key functions of a kobject is to serve as a reference counter
|
One of the key functions of a kobject is to serve as a reference counter
|
||||||
for the object in which it is embedded. As long as references to the object
|
for the object in which it is embedded. As long as references to the object
|
||||||
exist, the object (and the code which supports it) must continue to exist.
|
exist, the object (and the code which supports it) must continue to exist.
|
||||||
The low-level functions for manipulating a kobject's reference counts are:
|
The low-level functions for manipulating a kobject's reference counts are::
|
||||||
|
|
||||||
struct kobject *kobject_get(struct kobject *kobj);
|
struct kobject *kobject_get(struct kobject *kobj);
|
||||||
void kobject_put(struct kobject *kobj);
|
void kobject_put(struct kobject *kobj);
|
||||||
@@ -209,21 +214,24 @@ file Documentation/kref.txt in the Linux kernel source tree.
|
|||||||
|
|
||||||
|
|
||||||
Creating "simple" kobjects
|
Creating "simple" kobjects
|
||||||
|
==========================
|
||||||
|
|
||||||
Sometimes all that a developer wants is a way to create a simple directory
|
Sometimes all that a developer wants is a way to create a simple directory
|
||||||
in the sysfs hierarchy, and not have to mess with the whole complication of
|
in the sysfs hierarchy, and not have to mess with the whole complication of
|
||||||
ksets, show and store functions, and other details. This is the one
|
ksets, show and store functions, and other details. This is the one
|
||||||
exception where a single kobject should be created. To create such an
|
exception where a single kobject should be created. To create such an
|
||||||
entry, use the function:
|
entry, use the function::
|
||||||
|
|
||||||
struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
|
struct kobject *kobject_create_and_add(char *name, struct kobject *parent);
|
||||||
|
|
||||||
This function will create a kobject and place it in sysfs in the location
|
This function will create a kobject and place it in sysfs in the location
|
||||||
underneath the specified parent kobject. To create simple attributes
|
underneath the specified parent kobject. To create simple attributes
|
||||||
associated with this kobject, use:
|
associated with this kobject, use::
|
||||||
|
|
||||||
int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
|
int sysfs_create_file(struct kobject *kobj, struct attribute *attr);
|
||||||
or
|
|
||||||
|
or::
|
||||||
|
|
||||||
int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
|
int sysfs_create_group(struct kobject *kobj, struct attribute_group *grp);
|
||||||
|
|
||||||
Both types of attributes used here, with a kobject that has been created
|
Both types of attributes used here, with a kobject that has been created
|
||||||
@@ -236,6 +244,7 @@ implementation of a simple kobject and attributes.
|
|||||||
|
|
||||||
|
|
||||||
ktypes and release methods
|
ktypes and release methods
|
||||||
|
==========================
|
||||||
|
|
||||||
One important thing still missing from the discussion is what happens to a
|
One important thing still missing from the discussion is what happens to a
|
||||||
kobject when its reference count reaches zero. The code which created the
|
kobject when its reference count reaches zero. The code which created the
|
||||||
@@ -257,7 +266,7 @@ is good practice to always use kobject_put() after kobject_init() to avoid
|
|||||||
errors creeping in.
|
errors creeping in.
|
||||||
|
|
||||||
This notification is done through a kobject's release() method. Usually
|
This notification is done through a kobject's release() method. Usually
|
||||||
such a method has a form like:
|
such a method has a form like::
|
||||||
|
|
||||||
void my_object_release(struct kobject *kobj)
|
void my_object_release(struct kobject *kobj)
|
||||||
{
|
{
|
||||||
@@ -281,7 +290,7 @@ leak in the kobject core, which makes people unhappy.
|
|||||||
|
|
||||||
Interestingly, the release() method is not stored in the kobject itself;
|
Interestingly, the release() method is not stored in the kobject itself;
|
||||||
instead, it is associated with the ktype. So let us introduce struct
|
instead, it is associated with the ktype. So let us introduce struct
|
||||||
kobj_type:
|
kobj_type::
|
||||||
|
|
||||||
struct kobj_type {
|
struct kobj_type {
|
||||||
void (*release)(struct kobject *kobj);
|
void (*release)(struct kobject *kobj);
|
||||||
@@ -306,6 +315,7 @@ automatically created for any kobject that is registered with this ktype.
|
|||||||
|
|
||||||
|
|
||||||
ksets
|
ksets
|
||||||
|
=====
|
||||||
|
|
||||||
A kset is merely a collection of kobjects that want to be associated with
|
A kset is merely a collection of kobjects that want to be associated with
|
||||||
each other. There is no restriction that they be of the same ktype, but be
|
each other. There is no restriction that they be of the same ktype, but be
|
||||||
@@ -335,13 +345,16 @@ kobject) in their parent.
|
|||||||
|
|
||||||
As a kset contains a kobject within it, it should always be dynamically
|
As a kset contains a kobject within it, it should always be dynamically
|
||||||
created and never declared statically or on the stack. To create a new
|
created and never declared statically or on the stack. To create a new
|
||||||
kset use:
|
kset use::
|
||||||
|
|
||||||
struct kset *kset_create_and_add(const char *name,
|
struct kset *kset_create_and_add(const char *name,
|
||||||
struct kset_uevent_ops *u,
|
struct kset_uevent_ops *u,
|
||||||
struct kobject *parent);
|
struct kobject *parent);
|
||||||
|
|
||||||
When you are finished with the kset, call:
|
When you are finished with the kset, call::
|
||||||
|
|
||||||
void kset_unregister(struct kset *kset);
|
void kset_unregister(struct kset *kset);
|
||||||
|
|
||||||
to destroy it. This removes the kset from sysfs and decrements its reference
|
to destroy it. This removes the kset from sysfs and decrements its reference
|
||||||
count. When the reference count goes to zero, the kset will be released.
|
count. When the reference count goes to zero, the kset will be released.
|
||||||
Because other references to the kset may still exist, the release may happen
|
Because other references to the kset may still exist, the release may happen
|
||||||
@@ -351,7 +364,7 @@ An example of using a kset can be seen in the
|
|||||||
samples/kobject/kset-example.c file in the kernel tree.
|
samples/kobject/kset-example.c file in the kernel tree.
|
||||||
|
|
||||||
If a kset wishes to control the uevent operations of the kobjects
|
If a kset wishes to control the uevent operations of the kobjects
|
||||||
associated with it, it can use the struct kset_uevent_ops to handle it:
|
associated with it, it can use the struct kset_uevent_ops to handle it::
|
||||||
|
|
||||||
struct kset_uevent_ops {
|
struct kset_uevent_ops {
|
||||||
int (*filter)(struct kset *kset, struct kobject *kobj);
|
int (*filter)(struct kset *kset, struct kobject *kobj);
|
||||||
@@ -386,6 +399,7 @@ added below the parent kobject.
|
|||||||
|
|
||||||
|
|
||||||
Kobject removal
|
Kobject removal
|
||||||
|
===============
|
||||||
|
|
||||||
After a kobject has been registered with the kobject core successfully, it
|
After a kobject has been registered with the kobject core successfully, it
|
||||||
must be cleaned up when the code is finished with it. To do that, call
|
must be cleaned up when the code is finished with it. To do that, call
|
||||||
@@ -409,6 +423,7 @@ called, and the objects in the former circle release each other.
|
|||||||
|
|
||||||
|
|
||||||
Example code to copy from
|
Example code to copy from
|
||||||
|
=========================
|
||||||
|
|
||||||
For a more complete example of using ksets and kobjects properly, see the
|
For a more complete example of using ksets and kobjects properly, see the
|
||||||
example programs samples/kobject/{kobject-example.c,kset-example.c},
|
example programs samples/kobject/{kobject-example.c,kset-example.c},
|
||||||
|
|||||||
@@ -1,9 +1,12 @@
|
|||||||
Title : Kernel Probes (Kprobes)
|
=======================
|
||||||
Authors : Jim Keniston <jkenisto@us.ibm.com>
|
Kernel Probes (Kprobes)
|
||||||
: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
|
=======================
|
||||||
: Masami Hiramatsu <mhiramat@redhat.com>
|
|
||||||
|
|
||||||
CONTENTS
|
:Author: Jim Keniston <jkenisto@us.ibm.com>
|
||||||
|
:Author: Prasanna S Panchamukhi <prasanna.panchamukhi@gmail.com>
|
||||||
|
:Author: Masami Hiramatsu <mhiramat@redhat.com>
|
||||||
|
|
||||||
|
.. CONTENTS
|
||||||
|
|
||||||
1. Concepts: Kprobes, Jprobes, Return Probes
|
1. Concepts: Kprobes, Jprobes, Return Probes
|
||||||
2. Architectures Supported
|
2. Architectures Supported
|
||||||
@@ -18,13 +21,16 @@ CONTENTS
|
|||||||
Appendix A: The kprobes debugfs interface
|
Appendix A: The kprobes debugfs interface
|
||||||
Appendix B: The kprobes sysctl interface
|
Appendix B: The kprobes sysctl interface
|
||||||
|
|
||||||
1. Concepts: Kprobes, Jprobes, Return Probes
|
Concepts: Kprobes, Jprobes, Return Probes
|
||||||
|
=========================================
|
||||||
|
|
||||||
Kprobes enables you to dynamically break into any kernel routine and
|
Kprobes enables you to dynamically break into any kernel routine and
|
||||||
collect debugging and performance information non-disruptively. You
|
collect debugging and performance information non-disruptively. You
|
||||||
can trap at almost any kernel code address(*), specifying a handler
|
can trap at almost any kernel code address [1]_, specifying a handler
|
||||||
routine to be invoked when the breakpoint is hit.
|
routine to be invoked when the breakpoint is hit.
|
||||||
(*: some parts of the kernel code can not be trapped, see 1.5 Blacklist)
|
|
||||||
|
.. [1] some parts of the kernel code can not be trapped, see
|
||||||
|
:ref:`kprobes_blacklist`)
|
||||||
|
|
||||||
There are currently three types of probes: kprobes, jprobes, and
|
There are currently three types of probes: kprobes, jprobes, and
|
||||||
kretprobes (also called return probes). A kprobe can be inserted
|
kretprobes (also called return probes). A kprobe can be inserted
|
||||||
@@ -40,8 +46,8 @@ registration function such as register_kprobe() specifies where
|
|||||||
the probe is to be inserted and what handler is to be called when
|
the probe is to be inserted and what handler is to be called when
|
||||||
the probe is hit.
|
the probe is hit.
|
||||||
|
|
||||||
There are also register_/unregister_*probes() functions for batch
|
There are also ``register_/unregister_*probes()`` functions for batch
|
||||||
registration/unregistration of a group of *probes. These functions
|
registration/unregistration of a group of ``*probes``. These functions
|
||||||
can speed up unregistration process when you have to unregister
|
can speed up unregistration process when you have to unregister
|
||||||
a lot of probes at once.
|
a lot of probes at once.
|
||||||
|
|
||||||
@@ -51,9 +57,10 @@ things that you'll need to know in order to make the best use of
|
|||||||
Kprobes -- e.g., the difference between a pre_handler and
|
Kprobes -- e.g., the difference between a pre_handler and
|
||||||
a post_handler, and how to use the maxactive and nmissed fields of
|
a post_handler, and how to use the maxactive and nmissed fields of
|
||||||
a kretprobe. But if you're in a hurry to start using Kprobes, you
|
a kretprobe. But if you're in a hurry to start using Kprobes, you
|
||||||
can skip ahead to section 2.
|
can skip ahead to :ref:`kprobes_archs_supported`.
|
||||||
|
|
||||||
1.1 How Does a Kprobe Work?
|
How Does a Kprobe Work?
|
||||||
|
-----------------------
|
||||||
|
|
||||||
When a kprobe is registered, Kprobes makes a copy of the probed
|
When a kprobe is registered, Kprobes makes a copy of the probed
|
||||||
instruction and replaces the first byte(s) of the probed instruction
|
instruction and replaces the first byte(s) of the probed instruction
|
||||||
@@ -75,7 +82,8 @@ After the instruction is single-stepped, Kprobes executes the
|
|||||||
"post_handler," if any, that is associated with the kprobe.
|
"post_handler," if any, that is associated with the kprobe.
|
||||||
Execution then continues with the instruction following the probepoint.
|
Execution then continues with the instruction following the probepoint.
|
||||||
|
|
||||||
1.2 How Does a Jprobe Work?
|
How Does a Jprobe Work?
|
||||||
|
-----------------------
|
||||||
|
|
||||||
A jprobe is implemented using a kprobe that is placed on a function's
|
A jprobe is implemented using a kprobe that is placed on a function's
|
||||||
entry point. It employs a simple mirroring principle to allow
|
entry point. It employs a simple mirroring principle to allow
|
||||||
@@ -113,9 +121,11 @@ more than eight function arguments, an argument of more than sixteen
|
|||||||
bytes, or more than 64 bytes of argument data, depending on
|
bytes, or more than 64 bytes of argument data, depending on
|
||||||
architecture).
|
architecture).
|
||||||
|
|
||||||
1.3 Return Probes
|
Return Probes
|
||||||
|
-------------
|
||||||
|
|
||||||
1.3.1 How Does a Return Probe Work?
|
How Does a Return Probe Work?
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
When you call register_kretprobe(), Kprobes establishes a kprobe at
|
When you call register_kretprobe(), Kprobes establishes a kprobe at
|
||||||
the entry to the function. When the probed function is called and this
|
the entry to the function. When the probed function is called and this
|
||||||
@@ -150,7 +160,8 @@ zero when the return probe is registered, and is incremented every
|
|||||||
time the probed function is entered but there is no kretprobe_instance
|
time the probed function is entered but there is no kretprobe_instance
|
||||||
object available for establishing the return probe.
|
object available for establishing the return probe.
|
||||||
|
|
||||||
1.3.2 Kretprobe entry-handler
|
Kretprobe entry-handler
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Kretprobes also provides an optional user-specified handler which runs
|
Kretprobes also provides an optional user-specified handler which runs
|
||||||
on function entry. This handler is specified by setting the entry_handler
|
on function entry. This handler is specified by setting the entry_handler
|
||||||
@@ -174,7 +185,10 @@ In case probed function is entered but there is no kretprobe_instance
|
|||||||
object available, then in addition to incrementing the nmissed count,
|
object available, then in addition to incrementing the nmissed count,
|
||||||
the user entry_handler invocation is also skipped.
|
the user entry_handler invocation is also skipped.
|
||||||
|
|
||||||
1.4 How Does Jump Optimization Work?
|
.. _kprobes_jump_optimization:
|
||||||
|
|
||||||
|
How Does Jump Optimization Work?
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
|
If your kernel is built with CONFIG_OPTPROBES=y (currently this flag
|
||||||
is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
|
is automatically set 'y' on x86/x86-64, non-preemptive kernel) and
|
||||||
@@ -182,14 +196,16 @@ the "debug.kprobes_optimization" kernel parameter is set to 1 (see
|
|||||||
sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
|
sysctl(8)), Kprobes tries to reduce probe-hit overhead by using a jump
|
||||||
instruction instead of a breakpoint instruction at each probepoint.
|
instruction instead of a breakpoint instruction at each probepoint.
|
||||||
|
|
||||||
1.4.1 Init a Kprobe
|
Init a Kprobe
|
||||||
|
^^^^^^^^^^^^^
|
||||||
|
|
||||||
When a probe is registered, before attempting this optimization,
|
When a probe is registered, before attempting this optimization,
|
||||||
Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
|
Kprobes inserts an ordinary, breakpoint-based kprobe at the specified
|
||||||
address. So, even if it's not possible to optimize this particular
|
address. So, even if it's not possible to optimize this particular
|
||||||
probepoint, there'll be a probe there.
|
probepoint, there'll be a probe there.
|
||||||
|
|
||||||
1.4.2 Safety Check
|
Safety Check
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
Before optimizing a probe, Kprobes performs the following safety checks:
|
Before optimizing a probe, Kprobes performs the following safety checks:
|
||||||
|
|
||||||
@@ -200,35 +216,40 @@ instructions.)
|
|||||||
|
|
||||||
- Kprobes analyzes the entire function and verifies that there is no
|
- Kprobes analyzes the entire function and verifies that there is no
|
||||||
jump into the optimized region. Specifically:
|
jump into the optimized region. Specifically:
|
||||||
|
|
||||||
- the function contains no indirect jump;
|
- the function contains no indirect jump;
|
||||||
- the function contains no instruction that causes an exception (since
|
- the function contains no instruction that causes an exception (since
|
||||||
the fixup code triggered by the exception could jump back into the
|
the fixup code triggered by the exception could jump back into the
|
||||||
optimized region -- Kprobes checks the exception tables to verify this);
|
optimized region -- Kprobes checks the exception tables to verify this);
|
||||||
and
|
|
||||||
- there is no near jump to the optimized region (other than to the first
|
- there is no near jump to the optimized region (other than to the first
|
||||||
byte).
|
byte).
|
||||||
|
|
||||||
- For each instruction in the optimized region, Kprobes verifies that
|
- For each instruction in the optimized region, Kprobes verifies that
|
||||||
the instruction can be executed out of line.
|
the instruction can be executed out of line.
|
||||||
|
|
||||||
1.4.3 Preparing Detour Buffer
|
Preparing Detour Buffer
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
Next, Kprobes prepares a "detour" buffer, which contains the following
|
Next, Kprobes prepares a "detour" buffer, which contains the following
|
||||||
instruction sequence:
|
instruction sequence:
|
||||||
|
|
||||||
- code to push the CPU's registers (emulating a breakpoint trap)
|
- code to push the CPU's registers (emulating a breakpoint trap)
|
||||||
- a call to the trampoline code which calls user's probe handlers.
|
- a call to the trampoline code which calls user's probe handlers.
|
||||||
- code to restore registers
|
- code to restore registers
|
||||||
- the instructions from the optimized region
|
- the instructions from the optimized region
|
||||||
- a jump back to the original execution path.
|
- a jump back to the original execution path.
|
||||||
|
|
||||||
1.4.4 Pre-optimization
|
Pre-optimization
|
||||||
|
^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
After preparing the detour buffer, Kprobes verifies that none of the
|
After preparing the detour buffer, Kprobes verifies that none of the
|
||||||
following situations exist:
|
following situations exist:
|
||||||
|
|
||||||
- The probe has either a break_handler (i.e., it's a jprobe) or a
|
- The probe has either a break_handler (i.e., it's a jprobe) or a
|
||||||
post_handler.
|
post_handler.
|
||||||
- Other instructions in the optimized region are probed.
|
- Other instructions in the optimized region are probed.
|
||||||
- The probe is disabled.
|
- The probe is disabled.
|
||||||
|
|
||||||
In any of the above cases, Kprobes won't start optimizing the probe.
|
In any of the above cases, Kprobes won't start optimizing the probe.
|
||||||
Since these are temporary situations, Kprobes tries to start
|
Since these are temporary situations, Kprobes tries to start
|
||||||
optimizing it again if the situation is changed.
|
optimizing it again if the situation is changed.
|
||||||
@@ -240,21 +261,23 @@ Kprobes returns control to the original instruction path by setting
|
|||||||
the CPU's instruction pointer to the copied code in the detour buffer
|
the CPU's instruction pointer to the copied code in the detour buffer
|
||||||
-- thus at least avoiding the single-step.
|
-- thus at least avoiding the single-step.
|
||||||
|
|
||||||
1.4.5 Optimization
|
Optimization
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
The Kprobe-optimizer doesn't insert the jump instruction immediately;
|
The Kprobe-optimizer doesn't insert the jump instruction immediately;
|
||||||
rather, it calls synchronize_sched() for safety first, because it's
|
rather, it calls synchronize_sched() for safety first, because it's
|
||||||
possible for a CPU to be interrupted in the middle of executing the
|
possible for a CPU to be interrupted in the middle of executing the
|
||||||
optimized region(*). As you know, synchronize_sched() can ensure
|
optimized region [3]_. As you know, synchronize_sched() can ensure
|
||||||
that all interruptions that were active when synchronize_sched()
|
that all interruptions that were active when synchronize_sched()
|
||||||
was called are done, but only if CONFIG_PREEMPT=n. So, this version
|
was called are done, but only if CONFIG_PREEMPT=n. So, this version
|
||||||
of kprobe optimization supports only kernels with CONFIG_PREEMPT=n.(**)
|
of kprobe optimization supports only kernels with CONFIG_PREEMPT=n [4]_.
|
||||||
|
|
||||||
After that, the Kprobe-optimizer calls stop_machine() to replace
|
After that, the Kprobe-optimizer calls stop_machine() to replace
|
||||||
the optimized region with a jump instruction to the detour buffer,
|
the optimized region with a jump instruction to the detour buffer,
|
||||||
using text_poke_smp().
|
using text_poke_smp().
|
||||||
|
|
||||||
1.4.6 Unoptimization
|
Unoptimization
|
||||||
|
^^^^^^^^^^^^^^
|
||||||
|
|
||||||
When an optimized kprobe is unregistered, disabled, or blocked by
|
When an optimized kprobe is unregistered, disabled, or blocked by
|
||||||
another kprobe, it will be unoptimized. If this happens before
|
another kprobe, it will be unoptimized. If this happens before
|
||||||
@@ -263,13 +286,13 @@ optimized list. If the optimization has been done, the jump is
|
|||||||
replaced with the original code (except for an int3 breakpoint in
|
replaced with the original code (except for an int3 breakpoint in
|
||||||
the first byte) by using text_poke_smp().
|
the first byte) by using text_poke_smp().
|
||||||
|
|
||||||
(*)Please imagine that the 2nd instruction is interrupted and then
|
.. [3] Please imagine that the 2nd instruction is interrupted and then
|
||||||
the optimizer replaces the 2nd instruction with the jump *address*
|
the optimizer replaces the 2nd instruction with the jump *address*
|
||||||
while the interrupt handler is running. When the interrupt
|
while the interrupt handler is running. When the interrupt
|
||||||
returns to original address, there is no valid instruction,
|
returns to original address, there is no valid instruction,
|
||||||
and it causes an unexpected result.
|
and it causes an unexpected result.
|
||||||
|
|
||||||
(**)This optimization-safety checking may be replaced with the
|
.. [4] This optimization-safety checking may be replaced with the
|
||||||
stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
|
stop-machine method that ksplice uses for supporting a CONFIG_PREEMPT=y
|
||||||
kernel.
|
kernel.
|
||||||
|
|
||||||
@@ -280,11 +303,17 @@ path by changing regs->ip and returning 1. However, when the probe
|
|||||||
is optimized, that modification is ignored. Thus, if you want to
|
is optimized, that modification is ignored. Thus, if you want to
|
||||||
tweak the kernel's execution path, you need to suppress optimization,
|
tweak the kernel's execution path, you need to suppress optimization,
|
||||||
using one of the following techniques:
|
using one of the following techniques:
|
||||||
|
|
||||||
- Specify an empty function for the kprobe's post_handler or break_handler.
|
- Specify an empty function for the kprobe's post_handler or break_handler.
|
||||||
|
|
||||||
or
|
or
|
||||||
|
|
||||||
- Execute 'sysctl -w debug.kprobes_optimization=n'
|
- Execute 'sysctl -w debug.kprobes_optimization=n'
|
||||||
|
|
||||||
1.5 Blacklist
|
.. _kprobes_blacklist:
|
||||||
|
|
||||||
|
Blacklist
|
||||||
|
---------
|
||||||
|
|
||||||
Kprobes can probe most of the kernel except itself. This means
|
Kprobes can probe most of the kernel except itself. This means
|
||||||
that there are some functions where kprobes cannot probe. Probing
|
that there are some functions where kprobes cannot probe. Probing
|
||||||
@@ -297,7 +326,10 @@ to specify a blacklisted function.
|
|||||||
Kprobes checks the given probe address against the blacklist and
|
Kprobes checks the given probe address against the blacklist and
|
||||||
rejects registering it, if the given address is in the blacklist.
|
rejects registering it, if the given address is in the blacklist.
|
||||||
|
|
||||||
2. Architectures Supported
|
.. _kprobes_archs_supported:
|
||||||
|
|
||||||
|
Architectures Supported
|
||||||
|
=======================
|
||||||
|
|
||||||
Kprobes, jprobes, and return probes are implemented on the following
|
Kprobes, jprobes, and return probes are implemented on the following
|
||||||
architectures:
|
architectures:
|
||||||
@@ -312,7 +344,8 @@ architectures:
|
|||||||
- mips
|
- mips
|
||||||
- s390
|
- s390
|
||||||
|
|
||||||
3. Configuring Kprobes
|
Configuring Kprobes
|
||||||
|
===================
|
||||||
|
|
||||||
When configuring the kernel using make menuconfig/xconfig/oldconfig,
|
When configuring the kernel using make menuconfig/xconfig/oldconfig,
|
||||||
ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
|
ensure that CONFIG_KPROBES is set to "y". Under "General setup", look
|
||||||
@@ -331,7 +364,8 @@ it useful to "Compile the kernel with debug info" (CONFIG_DEBUG_INFO),
|
|||||||
so you can use "objdump -d -l vmlinux" to see the source-to-object
|
so you can use "objdump -d -l vmlinux" to see the source-to-object
|
||||||
code mapping.
|
code mapping.
|
||||||
|
|
||||||
4. API Reference
|
API Reference
|
||||||
|
=============
|
||||||
|
|
||||||
The Kprobes API includes a "register" function and an "unregister"
|
The Kprobes API includes a "register" function and an "unregister"
|
||||||
function for each type of probe. The API also includes "register_*probes"
|
function for each type of probe. The API also includes "register_*probes"
|
||||||
@@ -340,7 +374,10 @@ Here are terse, mini-man-page specifications for these functions and
|
|||||||
the associated probe handlers that you'll write. See the files in the
|
the associated probe handlers that you'll write. See the files in the
|
||||||
samples/kprobes/ sub-directory for examples.
|
samples/kprobes/ sub-directory for examples.
|
||||||
|
|
||||||
4.1 register_kprobe
|
register_kprobe
|
||||||
|
---------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int register_kprobe(struct kprobe *kp);
|
int register_kprobe(struct kprobe *kp);
|
||||||
@@ -354,10 +391,11 @@ kp->fault_handler. Any or all handlers can be NULL. If kp->flags
|
|||||||
is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
|
is set KPROBE_FLAG_DISABLED, that kp will be registered but disabled,
|
||||||
so, its handlers aren't hit until calling enable_kprobe(kp).
|
so, its handlers aren't hit until calling enable_kprobe(kp).
|
||||||
|
|
||||||
NOTE:
|
.. note::
|
||||||
|
|
||||||
1. With the introduction of the "symbol_name" field to struct kprobe,
|
1. With the introduction of the "symbol_name" field to struct kprobe,
|
||||||
the probepoint address resolution will now be taken care of by the kernel.
|
the probepoint address resolution will now be taken care of by the kernel.
|
||||||
The following will now work:
|
The following will now work::
|
||||||
|
|
||||||
kp.symbol_name = "symbol_name";
|
kp.symbol_name = "symbol_name";
|
||||||
|
|
||||||
@@ -377,7 +415,8 @@ Use "offset" with caution.
|
|||||||
|
|
||||||
register_kprobe() returns 0 on success, or a negative errno otherwise.
|
register_kprobe() returns 0 on success, or a negative errno otherwise.
|
||||||
|
|
||||||
User's pre-handler (kp->pre_handler):
|
User's pre-handler (kp->pre_handler)::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
#include <linux/ptrace.h>
|
#include <linux/ptrace.h>
|
||||||
int pre_handler(struct kprobe *p, struct pt_regs *regs);
|
int pre_handler(struct kprobe *p, struct pt_regs *regs);
|
||||||
@@ -386,7 +425,8 @@ Called with p pointing to the kprobe associated with the breakpoint,
|
|||||||
and regs pointing to the struct containing the registers saved when
|
and regs pointing to the struct containing the registers saved when
|
||||||
the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
|
the breakpoint was hit. Return 0 here unless you're a Kprobes geek.
|
||||||
|
|
||||||
User's post-handler (kp->post_handler):
|
User's post-handler (kp->post_handler)::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
#include <linux/ptrace.h>
|
#include <linux/ptrace.h>
|
||||||
void post_handler(struct kprobe *p, struct pt_regs *regs,
|
void post_handler(struct kprobe *p, struct pt_regs *regs,
|
||||||
@@ -395,7 +435,8 @@ void post_handler(struct kprobe *p, struct pt_regs *regs,
|
|||||||
p and regs are as described for the pre_handler. flags always seems
|
p and regs are as described for the pre_handler. flags always seems
|
||||||
to be zero.
|
to be zero.
|
||||||
|
|
||||||
User's fault-handler (kp->fault_handler):
|
User's fault-handler (kp->fault_handler)::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
#include <linux/ptrace.h>
|
#include <linux/ptrace.h>
|
||||||
int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
|
int fault_handler(struct kprobe *p, struct pt_regs *regs, int trapnr);
|
||||||
@@ -405,7 +446,10 @@ architecture-specific trap number associated with the fault (e.g.,
|
|||||||
on i386, 13 for a general protection fault or 14 for a page fault).
|
on i386, 13 for a general protection fault or 14 for a page fault).
|
||||||
Returns 1 if it successfully handled the exception.
|
Returns 1 if it successfully handled the exception.
|
||||||
|
|
||||||
4.2 register_jprobe
|
register_jprobe
|
||||||
|
---------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int register_jprobe(struct jprobe *jp)
|
int register_jprobe(struct jprobe *jp)
|
||||||
@@ -423,7 +467,10 @@ declaration must match.
|
|||||||
|
|
||||||
register_jprobe() returns 0 on success, or a negative errno otherwise.
|
register_jprobe() returns 0 on success, or a negative errno otherwise.
|
||||||
|
|
||||||
4.3 register_kretprobe
|
register_kretprobe
|
||||||
|
------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int register_kretprobe(struct kretprobe *rp);
|
int register_kretprobe(struct kretprobe *rp);
|
||||||
@@ -436,14 +483,17 @@ register_kretprobe(); see "How Does a Return Probe Work?" for details.
|
|||||||
register_kretprobe() returns 0 on success, or a negative errno
|
register_kretprobe() returns 0 on success, or a negative errno
|
||||||
otherwise.
|
otherwise.
|
||||||
|
|
||||||
User's return-probe handler (rp->handler):
|
User's return-probe handler (rp->handler)::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
#include <linux/ptrace.h>
|
#include <linux/ptrace.h>
|
||||||
int kretprobe_handler(struct kretprobe_instance *ri, struct pt_regs *regs);
|
int kretprobe_handler(struct kretprobe_instance *ri,
|
||||||
|
struct pt_regs *regs);
|
||||||
|
|
||||||
regs is as described for kprobe.pre_handler. ri points to the
|
regs is as described for kprobe.pre_handler. ri points to the
|
||||||
kretprobe_instance object, of which the following fields may be
|
kretprobe_instance object, of which the following fields may be
|
||||||
of interest:
|
of interest:
|
||||||
|
|
||||||
- ret_addr: the return address
|
- ret_addr: the return address
|
||||||
- rp: points to the corresponding kretprobe object
|
- rp: points to the corresponding kretprobe object
|
||||||
- task: points to the corresponding task struct
|
- task: points to the corresponding task struct
|
||||||
@@ -456,7 +506,10 @@ the architecture's ABI.
|
|||||||
|
|
||||||
The handler's return value is currently ignored.
|
The handler's return value is currently ignored.
|
||||||
|
|
||||||
4.4 unregister_*probe
|
unregister_*probe
|
||||||
|
------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
void unregister_kprobe(struct kprobe *kp);
|
void unregister_kprobe(struct kprobe *kp);
|
||||||
@@ -466,11 +519,15 @@ void unregister_kretprobe(struct kretprobe *rp);
|
|||||||
Removes the specified probe. The unregister function can be called
|
Removes the specified probe. The unregister function can be called
|
||||||
at any time after the probe has been registered.
|
at any time after the probe has been registered.
|
||||||
|
|
||||||
NOTE:
|
.. note::
|
||||||
|
|
||||||
If the functions find an incorrect probe (ex. an unregistered probe),
|
If the functions find an incorrect probe (ex. an unregistered probe),
|
||||||
they clear the addr field of the probe.
|
they clear the addr field of the probe.
|
||||||
|
|
||||||
4.5 register_*probes
|
register_*probes
|
||||||
|
----------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int register_kprobes(struct kprobe **kps, int num);
|
int register_kprobes(struct kprobe **kps, int num);
|
||||||
@@ -481,14 +538,19 @@ Registers each of the num probes in the specified array. If any
|
|||||||
error occurs during registration, all probes in the array, up to
|
error occurs during registration, all probes in the array, up to
|
||||||
the bad probe, are safely unregistered before the register_*probes
|
the bad probe, are safely unregistered before the register_*probes
|
||||||
function returns.
|
function returns.
|
||||||
- kps/rps/jps: an array of pointers to *probe data structures
|
|
||||||
|
- kps/rps/jps: an array of pointers to ``*probe`` data structures
|
||||||
- num: the number of the array entries.
|
- num: the number of the array entries.
|
||||||
|
|
||||||
NOTE:
|
.. note::
|
||||||
|
|
||||||
You have to allocate(or define) an array of pointers and set all
|
You have to allocate(or define) an array of pointers and set all
|
||||||
of the array entries before using these functions.
|
of the array entries before using these functions.
|
||||||
|
|
||||||
4.6 unregister_*probes
|
unregister_*probes
|
||||||
|
------------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
void unregister_kprobes(struct kprobe **kps, int num);
|
void unregister_kprobes(struct kprobe **kps, int num);
|
||||||
@@ -497,33 +559,41 @@ void unregister_jprobes(struct jprobe **jps, int num);
|
|||||||
|
|
||||||
Removes each of the num probes in the specified array at once.
|
Removes each of the num probes in the specified array at once.
|
||||||
|
|
||||||
NOTE:
|
.. note::
|
||||||
|
|
||||||
If the functions find some incorrect probes (ex. unregistered
|
If the functions find some incorrect probes (ex. unregistered
|
||||||
probes) in the specified array, they clear the addr field of those
|
probes) in the specified array, they clear the addr field of those
|
||||||
incorrect probes. However, other probes in the array are
|
incorrect probes. However, other probes in the array are
|
||||||
unregistered correctly.
|
unregistered correctly.
|
||||||
|
|
||||||
4.7 disable_*probe
|
disable_*probe
|
||||||
|
--------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int disable_kprobe(struct kprobe *kp);
|
int disable_kprobe(struct kprobe *kp);
|
||||||
int disable_kretprobe(struct kretprobe *rp);
|
int disable_kretprobe(struct kretprobe *rp);
|
||||||
int disable_jprobe(struct jprobe *jp);
|
int disable_jprobe(struct jprobe *jp);
|
||||||
|
|
||||||
Temporarily disables the specified *probe. You can enable it again by using
|
Temporarily disables the specified ``*probe``. You can enable it again by using
|
||||||
enable_*probe(). You must specify the probe which has been registered.
|
enable_*probe(). You must specify the probe which has been registered.
|
||||||
|
|
||||||
4.8 enable_*probe
|
enable_*probe
|
||||||
|
-------------
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/kprobes.h>
|
#include <linux/kprobes.h>
|
||||||
int enable_kprobe(struct kprobe *kp);
|
int enable_kprobe(struct kprobe *kp);
|
||||||
int enable_kretprobe(struct kretprobe *rp);
|
int enable_kretprobe(struct kretprobe *rp);
|
||||||
int enable_jprobe(struct jprobe *jp);
|
int enable_jprobe(struct jprobe *jp);
|
||||||
|
|
||||||
Enables *probe which has been disabled by disable_*probe(). You must specify
|
Enables ``*probe`` which has been disabled by disable_*probe(). You must specify
|
||||||
the probe which has been registered.
|
the probe which has been registered.
|
||||||
|
|
||||||
5. Kprobes Features and Limitations
|
Kprobes Features and Limitations
|
||||||
|
================================
|
||||||
|
|
||||||
Kprobes allows multiple probes at the same address. Currently,
|
Kprobes allows multiple probes at the same address. Currently,
|
||||||
however, there cannot be multiple jprobes on the same function at
|
however, there cannot be multiple jprobes on the same function at
|
||||||
@@ -538,7 +608,7 @@ are discussed in this section.
|
|||||||
|
|
||||||
The register_*probe functions will return -EINVAL if you attempt
|
The register_*probe functions will return -EINVAL if you attempt
|
||||||
to install a probe in the code that implements Kprobes (mostly
|
to install a probe in the code that implements Kprobes (mostly
|
||||||
kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such
|
kernel/kprobes.c and ``arch/*/kernel/kprobes.c``, but also functions such
|
||||||
as do_page_fault and notifier_call_chain).
|
as do_page_fault and notifier_call_chain).
|
||||||
|
|
||||||
If you install a probe in an inline-able function, Kprobes makes
|
If you install a probe in an inline-able function, Kprobes makes
|
||||||
@@ -602,6 +672,8 @@ explain it, we introduce some terminology. Imagine a 3-instruction
|
|||||||
sequence consisting of a two 2-byte instructions and one 3-byte
|
sequence consisting of a two 2-byte instructions and one 3-byte
|
||||||
instruction.
|
instruction.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
IA
|
IA
|
||||||
|
|
|
|
||||||
[-2][-1][0][1][2][3][4][5][6][7]
|
[-2][-1][0][1][2][3][4][5][6][7]
|
||||||
@@ -628,7 +700,8 @@ d) DCR must not straddle the border between functions.
|
|||||||
Anyway, these limitations are checked by the in-kernel instruction
|
Anyway, these limitations are checked by the in-kernel instruction
|
||||||
decoder, so you don't need to worry about that.
|
decoder, so you don't need to worry about that.
|
||||||
|
|
||||||
6. Probe Overhead
|
Probe Overhead
|
||||||
|
==============
|
||||||
|
|
||||||
On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
|
On a typical CPU in use in 2005, a kprobe hit takes 0.5 to 1.0
|
||||||
microseconds to process. Specifically, a benchmark that hits the same
|
microseconds to process. Specifically, a benchmark that hits the same
|
||||||
@@ -638,9 +711,10 @@ return-probe hit typically takes 50-75% longer than a kprobe hit.
|
|||||||
When you have a return probe set on a function, adding a kprobe at
|
When you have a return probe set on a function, adding a kprobe at
|
||||||
the entry to that function adds essentially no overhead.
|
the entry to that function adds essentially no overhead.
|
||||||
|
|
||||||
Here are sample overhead figures (in usec) for different architectures.
|
Here are sample overhead figures (in usec) for different architectures::
|
||||||
|
|
||||||
k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe
|
k = kprobe; j = jprobe; r = return probe; kr = kprobe + return probe
|
||||||
on same function; jr = jprobe + return probe on same function
|
on same function; jr = jprobe + return probe on same function::
|
||||||
|
|
||||||
i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
|
i386: Intel Pentium M, 1495 MHz, 2957.31 bogomips
|
||||||
k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40
|
k = 0.57 usec; j = 1.00; r = 0.92; kr = 0.99; jr = 1.40
|
||||||
@@ -651,10 +725,12 @@ k = 0.49 usec; j = 0.76; r = 0.80; kr = 0.82; jr = 1.07
|
|||||||
ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
|
ppc64: POWER5 (gr), 1656 MHz (SMT disabled, 1 virtual CPU per physical CPU)
|
||||||
k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
|
k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
|
||||||
|
|
||||||
6.1 Optimized Probe Overhead
|
Optimized Probe Overhead
|
||||||
|
------------------------
|
||||||
|
|
||||||
Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
|
Typically, an optimized kprobe hit takes 0.07 to 0.1 microseconds to
|
||||||
process. Here are sample overhead figures (in usec) for x86 architectures.
|
process. Here are sample overhead figures (in usec) for x86 architectures::
|
||||||
|
|
||||||
k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
|
k = unoptimized kprobe, b = boosted (single-step skipped), o = optimized kprobe,
|
||||||
r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
|
r = unoptimized kretprobe, rb = boosted kretprobe, ro = optimized kretprobe.
|
||||||
|
|
||||||
@@ -664,7 +740,8 @@ k = 0.80 usec; b = 0.33; o = 0.05; r = 1.10; rb = 0.61; ro = 0.33
|
|||||||
x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
|
x86-64: Intel(R) Xeon(R) E5410, 2.33GHz, 4656.90 bogomips
|
||||||
k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
|
k = 0.99 usec; b = 0.43; o = 0.06; r = 1.24; rb = 0.68; ro = 0.30
|
||||||
|
|
||||||
7. TODO
|
TODO
|
||||||
|
====
|
||||||
|
|
||||||
a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
|
a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
|
||||||
programming interface for probe-based instrumentation. Try it out.
|
programming interface for probe-based instrumentation. Try it out.
|
||||||
@@ -673,31 +750,37 @@ c. Support for other architectures.
|
|||||||
d. User-space probes.
|
d. User-space probes.
|
||||||
e. Watchpoint probes (which fire on data references).
|
e. Watchpoint probes (which fire on data references).
|
||||||
|
|
||||||
8. Kprobes Example
|
Kprobes Example
|
||||||
|
===============
|
||||||
|
|
||||||
See samples/kprobes/kprobe_example.c
|
See samples/kprobes/kprobe_example.c
|
||||||
|
|
||||||
9. Jprobes Example
|
Jprobes Example
|
||||||
|
===============
|
||||||
|
|
||||||
See samples/kprobes/jprobe_example.c
|
See samples/kprobes/jprobe_example.c
|
||||||
|
|
||||||
10. Kretprobes Example
|
Kretprobes Example
|
||||||
|
==================
|
||||||
|
|
||||||
See samples/kprobes/kretprobe_example.c
|
See samples/kprobes/kretprobe_example.c
|
||||||
|
|
||||||
For additional information on Kprobes, refer to the following URLs:
|
For additional information on Kprobes, refer to the following URLs:
|
||||||
http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
|
|
||||||
http://www.redhat.com/magazine/005mar05/features/kprobes/
|
- http://www-106.ibm.com/developerworks/library/l-kprobes.html?ca=dgr-lnxw42Kprobe
|
||||||
http://www-users.cs.umn.edu/~boutcher/kprobes/
|
- http://www.redhat.com/magazine/005mar05/features/kprobes/
|
||||||
http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
|
- http://www-users.cs.umn.edu/~boutcher/kprobes/
|
||||||
|
- http://www.linuxsymposium.org/2006/linuxsymposium_procv2.pdf (pages 101-115)
|
||||||
|
|
||||||
|
|
||||||
Appendix A: The kprobes debugfs interface
|
The kprobes debugfs interface
|
||||||
|
=============================
|
||||||
|
|
||||||
|
|
||||||
With recent kernels (> 2.6.20) the list of registered kprobes is visible
|
With recent kernels (> 2.6.20) the list of registered kprobes is visible
|
||||||
under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
|
under the /sys/kernel/debug/kprobes/ directory (assuming debugfs is mounted at //sys/kernel/debug).
|
||||||
|
|
||||||
/sys/kernel/debug/kprobes/list: Lists all registered probes on the system
|
/sys/kernel/debug/kprobes/list: Lists all registered probes on the system::
|
||||||
|
|
||||||
c015d71a k vfs_read+0x0
|
c015d71a k vfs_read+0x0
|
||||||
c011a316 j do_fork+0x0
|
c011a316 j do_fork+0x0
|
||||||
@@ -725,17 +808,19 @@ change each probe's disabling state. This means that disabled kprobes (marked
|
|||||||
[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
|
[DISABLED]) will be not enabled if you turn ON all kprobes by this knob.
|
||||||
|
|
||||||
|
|
||||||
Appendix B: The kprobes sysctl interface
|
The kprobes sysctl interface
|
||||||
|
============================
|
||||||
|
|
||||||
/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
|
/proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF.
|
||||||
|
|
||||||
When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
|
When CONFIG_OPTPROBES=y, this sysctl interface appears and it provides
|
||||||
a knob to globally and forcibly turn jump optimization (see section
|
a knob to globally and forcibly turn jump optimization (see section
|
||||||
1.4) ON or OFF. By default, jump optimization is allowed (ON).
|
:ref:`kprobes_jump_optimization`) ON or OFF. By default, jump optimization
|
||||||
If you echo "0" to this file or set "debug.kprobes_optimization" to
|
is allowed (ON). If you echo "0" to this file or set
|
||||||
0 via sysctl, all optimized probes will be unoptimized, and any new
|
"debug.kprobes_optimization" to 0 via sysctl, all optimized probes will be
|
||||||
probes registered after that will not be optimized. Note that this
|
unoptimized, and any new probes registered after that will not be optimized.
|
||||||
knob *changes* the optimized state. This means that optimized probes
|
|
||||||
(marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
|
Note that this knob *changes* the optimized state. This means that optimized
|
||||||
|
probes (marked [OPTIMIZED]) will be unoptimized ([OPTIMIZED] tag will be
|
||||||
removed). If the knob is turned on, they will be optimized again.
|
removed). If the knob is turned on, they will be optimized again.
|
||||||
|
|
||||||
|
|||||||
@@ -1,10 +1,25 @@
|
|||||||
|
===================================================
|
||||||
|
Adding reference counters (krefs) to kernel objects
|
||||||
|
===================================================
|
||||||
|
|
||||||
|
:Author: Corey Minyard <minyard@acm.org>
|
||||||
|
:Author: Thomas Hellstrom <thellstrom@vmware.com>
|
||||||
|
|
||||||
|
A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
|
||||||
|
presentation on krefs, which can be found at:
|
||||||
|
|
||||||
|
- http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
|
||||||
|
- http://www.kroah.com/linux/talks/ols_2004_kref_talk/
|
||||||
|
|
||||||
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
krefs allow you to add reference counters to your objects. If you
|
krefs allow you to add reference counters to your objects. If you
|
||||||
have objects that are used in multiple places and passed around, and
|
have objects that are used in multiple places and passed around, and
|
||||||
you don't have refcounts, your code is almost certainly broken. If
|
you don't have refcounts, your code is almost certainly broken. If
|
||||||
you want refcounts, krefs are the way to go.
|
you want refcounts, krefs are the way to go.
|
||||||
|
|
||||||
To use a kref, add one to your data structures like:
|
To use a kref, add one to your data structures like::
|
||||||
|
|
||||||
struct my_data
|
struct my_data
|
||||||
{
|
{
|
||||||
@@ -17,8 +32,11 @@ struct my_data
|
|||||||
|
|
||||||
The kref can occur anywhere within the data structure.
|
The kref can occur anywhere within the data structure.
|
||||||
|
|
||||||
|
Initialization
|
||||||
|
==============
|
||||||
|
|
||||||
You must initialize the kref after you allocate it. To do this, call
|
You must initialize the kref after you allocate it. To do this, call
|
||||||
kref_init as so:
|
kref_init as so::
|
||||||
|
|
||||||
struct my_data *data;
|
struct my_data *data;
|
||||||
|
|
||||||
@@ -29,18 +47,25 @@ kref_init as so:
|
|||||||
|
|
||||||
This sets the refcount in the kref to 1.
|
This sets the refcount in the kref to 1.
|
||||||
|
|
||||||
|
Kref rules
|
||||||
|
==========
|
||||||
|
|
||||||
Once you have an initialized kref, you must follow the following
|
Once you have an initialized kref, you must follow the following
|
||||||
rules:
|
rules:
|
||||||
|
|
||||||
1) If you make a non-temporary copy of a pointer, especially if
|
1) If you make a non-temporary copy of a pointer, especially if
|
||||||
it can be passed to another thread of execution, you must
|
it can be passed to another thread of execution, you must
|
||||||
increment the refcount with kref_get() before passing it off:
|
increment the refcount with kref_get() before passing it off::
|
||||||
|
|
||||||
kref_get(&data->refcount);
|
kref_get(&data->refcount);
|
||||||
|
|
||||||
If you already have a valid pointer to a kref-ed structure (the
|
If you already have a valid pointer to a kref-ed structure (the
|
||||||
refcount cannot go to zero) you may do this without a lock.
|
refcount cannot go to zero) you may do this without a lock.
|
||||||
|
|
||||||
2) When you are done with a pointer, you must call kref_put():
|
2) When you are done with a pointer, you must call kref_put()::
|
||||||
|
|
||||||
kref_put(&data->refcount, data_release);
|
kref_put(&data->refcount, data_release);
|
||||||
|
|
||||||
If this is the last reference to the pointer, the release
|
If this is the last reference to the pointer, the release
|
||||||
routine will be called. If the code never tries to get
|
routine will be called. If the code never tries to get
|
||||||
a valid pointer to a kref-ed structure without already
|
a valid pointer to a kref-ed structure without already
|
||||||
@@ -53,7 +78,7 @@ rules:
|
|||||||
structure must remain valid during the kref_get().
|
structure must remain valid during the kref_get().
|
||||||
|
|
||||||
For example, if you allocate some data and then pass it to another
|
For example, if you allocate some data and then pass it to another
|
||||||
thread to process:
|
thread to process::
|
||||||
|
|
||||||
void data_release(struct kref *ref)
|
void data_release(struct kref *ref)
|
||||||
{
|
{
|
||||||
@@ -104,7 +129,7 @@ put needs no lock because nothing tries to get the data without
|
|||||||
already holding a pointer.
|
already holding a pointer.
|
||||||
|
|
||||||
Note that the "before" in rule 1 is very important. You should never
|
Note that the "before" in rule 1 is very important. You should never
|
||||||
do something like:
|
do something like::
|
||||||
|
|
||||||
task = kthread_run(more_data_handling, data, "more_data_handling");
|
task = kthread_run(more_data_handling, data, "more_data_handling");
|
||||||
if (task == ERR_PTR(-ENOMEM)) {
|
if (task == ERR_PTR(-ENOMEM)) {
|
||||||
@@ -124,14 +149,14 @@ bad style. Don't do it.
|
|||||||
There are some situations where you can optimize the gets and puts.
|
There are some situations where you can optimize the gets and puts.
|
||||||
For instance, if you are done with an object and enqueuing it for
|
For instance, if you are done with an object and enqueuing it for
|
||||||
something else or passing it off to something else, there is no reason
|
something else or passing it off to something else, there is no reason
|
||||||
to do a get then a put:
|
to do a get then a put::
|
||||||
|
|
||||||
/* Silly extra get and put */
|
/* Silly extra get and put */
|
||||||
kref_get(&obj->ref);
|
kref_get(&obj->ref);
|
||||||
enqueue(obj);
|
enqueue(obj);
|
||||||
kref_put(&obj->ref, obj_cleanup);
|
kref_put(&obj->ref, obj_cleanup);
|
||||||
|
|
||||||
Just do the enqueue. A comment about this is always welcome:
|
Just do the enqueue. A comment about this is always welcome::
|
||||||
|
|
||||||
enqueue(obj);
|
enqueue(obj);
|
||||||
/* We are done with obj, so we pass our refcount off
|
/* We are done with obj, so we pass our refcount off
|
||||||
@@ -142,7 +167,7 @@ instance, you have a list of items that are each kref-ed, and you wish
|
|||||||
to get the first one. You can't just pull the first item off the list
|
to get the first one. You can't just pull the first item off the list
|
||||||
and kref_get() it. That violates rule 3 because you are not already
|
and kref_get() it. That violates rule 3 because you are not already
|
||||||
holding a valid pointer. You must add a mutex (or some other lock).
|
holding a valid pointer. You must add a mutex (or some other lock).
|
||||||
For instance:
|
For instance::
|
||||||
|
|
||||||
static DEFINE_MUTEX(mutex);
|
static DEFINE_MUTEX(mutex);
|
||||||
static LIST_HEAD(q);
|
static LIST_HEAD(q);
|
||||||
@@ -182,7 +207,7 @@ static void put_entry(struct my_data *entry)
|
|||||||
The kref_put() return value is useful if you do not want to hold the
|
The kref_put() return value is useful if you do not want to hold the
|
||||||
lock during the whole release operation. Say you didn't want to call
|
lock during the whole release operation. Say you didn't want to call
|
||||||
kfree() with the lock held in the example above (since it is kind of
|
kfree() with the lock held in the example above (since it is kind of
|
||||||
pointless to do so). You could use kref_put() as follows:
|
pointless to do so). You could use kref_put() as follows::
|
||||||
|
|
||||||
static void release_entry(struct kref *ref)
|
static void release_entry(struct kref *ref)
|
||||||
{
|
{
|
||||||
@@ -205,18 +230,8 @@ of the free operations that could take a long time or might claim the
|
|||||||
same lock. Note that doing everything in the release routine is still
|
same lock. Note that doing everything in the release routine is still
|
||||||
preferred as it is a little neater.
|
preferred as it is a little neater.
|
||||||
|
|
||||||
|
|
||||||
Corey Minyard <minyard@acm.org>
|
|
||||||
|
|
||||||
A lot of this was lifted from Greg Kroah-Hartman's 2004 OLS paper and
|
|
||||||
presentation on krefs, which can be found at:
|
|
||||||
http://www.kroah.com/linux/talks/ols_2004_kref_paper/Reprint-Kroah-Hartman-OLS2004.pdf
|
|
||||||
and:
|
|
||||||
http://www.kroah.com/linux/talks/ols_2004_kref_talk/
|
|
||||||
|
|
||||||
|
|
||||||
The above example could also be optimized using kref_get_unless_zero() in
|
The above example could also be optimized using kref_get_unless_zero() in
|
||||||
the following way:
|
the following way::
|
||||||
|
|
||||||
static struct my_data *get_entry()
|
static struct my_data *get_entry()
|
||||||
{
|
{
|
||||||
@@ -254,8 +269,11 @@ Note that it is illegal to use kref_get_unless_zero without checking its
|
|||||||
return value. If you are sure (by already having a valid pointer) that
|
return value. If you are sure (by already having a valid pointer) that
|
||||||
kref_get_unless_zero() will return true, then use kref_get() instead.
|
kref_get_unless_zero() will return true, then use kref_get() instead.
|
||||||
|
|
||||||
|
Krefs and RCU
|
||||||
|
=============
|
||||||
|
|
||||||
The function kref_get_unless_zero also makes it possible to use rcu
|
The function kref_get_unless_zero also makes it possible to use rcu
|
||||||
locking for lookups in the above example:
|
locking for lookups in the above example::
|
||||||
|
|
||||||
struct my_data
|
struct my_data
|
||||||
{
|
{
|
||||||
@@ -299,6 +317,3 @@ rcu grace period after release_entry_rcu was called. That can be accomplished
|
|||||||
by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
|
by using kfree_rcu(entry, rhead) as done above, or by calling synchronize_rcu()
|
||||||
before using kfree, but note that synchronize_rcu() may sleep for a
|
before using kfree, but note that synchronize_rcu() may sleep for a
|
||||||
substantial amount of time.
|
substantial amount of time.
|
||||||
|
|
||||||
|
|
||||||
Thomas Hellstrom <thellstrom@vmware.com>
|
|
||||||
|
|||||||
@@ -1,9 +1,9 @@
|
|||||||
|
==========================================
|
||||||
LDM - Logical Disk Manager (Dynamic Disks)
|
LDM - Logical Disk Manager (Dynamic Disks)
|
||||||
------------------------------------------
|
==========================================
|
||||||
|
|
||||||
Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
|
:Author: Originally Written by FlatCap - Richard Russon <ldm@flatcap.org>.
|
||||||
Last Updated by Anton Altaparmakov on 30 March 2007 for Windows Vista.
|
:Last Updated: Anton Altaparmakov on 30 March 2007 for Windows Vista.
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
--------
|
--------
|
||||||
@@ -37,24 +37,36 @@ Example
|
|||||||
-------
|
-------
|
||||||
|
|
||||||
Below we have a 50MiB disk, divided into seven partitions.
|
Below we have a 50MiB disk, divided into seven partitions.
|
||||||
N.B. The missing 1MiB at the end of the disk is where the LDM database is
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
The missing 1MiB at the end of the disk is where the LDM database is
|
||||||
stored.
|
stored.
|
||||||
|
|
||||||
Device | Offset Bytes Sectors MiB | Size Bytes Sectors MiB
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
-------+----------------------------+---------------------------
|
|Device || Offset Bytes | Sectors | MiB || Size Bytes | Sectors | MiB|
|
||||||
hda | 0 0 0 | 52428800 102400 50
|
+=======++==============+=========+=====++==============+=========+====+
|
||||||
hda1 | 51380224 100352 49 | 1048576 2048 1
|
|hda || 0 | 0 | 0 || 52428800 | 102400 | 50|
|
||||||
hda2 | 16384 32 0 | 6979584 13632 6
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
hda3 | 6995968 13664 6 | 10485760 20480 10
|
|hda1 || 51380224 | 100352 | 49 || 1048576 | 2048 | 1|
|
||||||
hda4 | 17481728 34144 16 | 4194304 8192 4
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
hda5 | 21676032 42336 20 | 5242880 10240 5
|
|hda2 || 16384 | 32 | 0 || 6979584 | 13632 | 6|
|
||||||
hda6 | 26918912 52576 25 | 10485760 20480 10
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
hda7 | 37404672 73056 35 | 13959168 27264 13
|
|hda3 || 6995968 | 13664 | 6 || 10485760 | 20480 | 10|
|
||||||
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
|
|hda4 || 17481728 | 34144 | 16 || 4194304 | 8192 | 4|
|
||||||
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
|
|hda5 || 21676032 | 42336 | 20 || 5242880 | 10240 | 5|
|
||||||
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
|
|hda6 || 26918912 | 52576 | 25 || 10485760 | 20480 | 10|
|
||||||
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
|
|hda7 || 37404672 | 73056 | 35 || 13959168 | 27264 | 13|
|
||||||
|
+-------++--------------+---------+-----++--------------+---------+----+
|
||||||
|
|
||||||
The LDM Database may not store the partitions in the order that they appear on
|
The LDM Database may not store the partitions in the order that they appear on
|
||||||
disk, but the driver will sort them.
|
disk, but the driver will sort them.
|
||||||
|
|
||||||
When Linux boots, you will see something like:
|
When Linux boots, you will see something like::
|
||||||
|
|
||||||
hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
|
hda: 102400 sectors w/32KiB Cache, CHS=50/64/32
|
||||||
hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
|
hda: [LDM] hda1 hda2 hda3 hda4 hda5 hda6 hda7
|
||||||
@@ -65,13 +77,13 @@ Compiling LDM Support
|
|||||||
|
|
||||||
To enable LDM, choose the following two options:
|
To enable LDM, choose the following two options:
|
||||||
|
|
||||||
"Advanced partition selection" CONFIG_PARTITION_ADVANCED
|
- "Advanced partition selection" CONFIG_PARTITION_ADVANCED
|
||||||
"Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
|
- "Windows Logical Disk Manager (Dynamic Disk) support" CONFIG_LDM_PARTITION
|
||||||
|
|
||||||
If you believe the driver isn't working as it should, you can enable the extra
|
If you believe the driver isn't working as it should, you can enable the extra
|
||||||
debugging code. This will produce a LOT of output. The option is:
|
debugging code. This will produce a LOT of output. The option is:
|
||||||
|
|
||||||
"Windows LDM extra logging" CONFIG_LDM_DEBUG
|
- "Windows LDM extra logging" CONFIG_LDM_DEBUG
|
||||||
|
|
||||||
N.B. The partition code cannot be compiled as a module.
|
N.B. The partition code cannot be compiled as a module.
|
||||||
|
|
||||||
|
|||||||
@@ -30,7 +30,8 @@ timeout is set through the confusingly named "kernel.panic" sysctl),
|
|||||||
to cause the system to reboot automatically after a specified amount
|
to cause the system to reboot automatically after a specified amount
|
||||||
of time.
|
of time.
|
||||||
|
|
||||||
=== Implementation ===
|
Implementation
|
||||||
|
==============
|
||||||
|
|
||||||
The soft and hard lockup detectors are built on top of the hrtimer and
|
The soft and hard lockup detectors are built on top of the hrtimer and
|
||||||
perf subsystems, respectively. A direct consequence of this is that,
|
perf subsystems, respectively. A direct consequence of this is that,
|
||||||
|
|||||||
@@ -1,8 +1,9 @@
|
|||||||
|
===========================================================
|
||||||
LZO stream format as understood by Linux's LZO decompressor
|
LZO stream format as understood by Linux's LZO decompressor
|
||||||
===========================================================
|
===========================================================
|
||||||
|
|
||||||
Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
This is not a specification. No specification seems to be publicly available
|
This is not a specification. No specification seems to be publicly available
|
||||||
for the LZO stream format. This document describes what input format the LZO
|
for the LZO stream format. This document describes what input format the LZO
|
||||||
@@ -14,6 +15,7 @@ Introduction
|
|||||||
for future bug reports.
|
for future bug reports.
|
||||||
|
|
||||||
Description
|
Description
|
||||||
|
===========
|
||||||
|
|
||||||
The stream is composed of a series of instructions, operands, and data. The
|
The stream is composed of a series of instructions, operands, and data. The
|
||||||
instructions consist in a few bits representing an opcode, and bits forming
|
instructions consist in a few bits representing an opcode, and bits forming
|
||||||
@@ -38,7 +40,7 @@ Description
|
|||||||
of bits in the operand. If the number of bits isn't enough to represent the
|
of bits in the operand. If the number of bits isn't enough to represent the
|
||||||
length, up to 255 may be added in increments by consuming more bytes with a
|
length, up to 255 may be added in increments by consuming more bytes with a
|
||||||
rate of at most 255 per extra byte (thus the compression ratio cannot exceed
|
rate of at most 255 per extra byte (thus the compression ratio cannot exceed
|
||||||
around 255:1). The variable length encoding using #bits is always the same :
|
around 255:1). The variable length encoding using #bits is always the same::
|
||||||
|
|
||||||
length = byte & ((1 << #bits) - 1)
|
length = byte & ((1 << #bits) - 1)
|
||||||
if (!length) {
|
if (!length) {
|
||||||
@@ -67,15 +69,19 @@ Description
|
|||||||
instruction may encode this distance (0001HLLL), it takes one LE16 operand
|
instruction may encode this distance (0001HLLL), it takes one LE16 operand
|
||||||
for the distance, thus requiring 3 bytes.
|
for the distance, thus requiring 3 bytes.
|
||||||
|
|
||||||
IMPORTANT NOTE : in the code some length checks are missing because certain
|
.. important::
|
||||||
instructions are called under the assumption that a certain number of bytes
|
|
||||||
follow because it has already been guaranteed before parsing the instructions.
|
In the code some length checks are missing because certain instructions
|
||||||
They just have to "refill" this credit if they consume extra bytes. This is
|
are called under the assumption that a certain number of bytes follow
|
||||||
an implementation design choice independent on the algorithm or encoding.
|
because it has already been guaranteed before parsing the instructions.
|
||||||
|
They just have to "refill" this credit if they consume extra bytes. This
|
||||||
|
is an implementation design choice independent on the algorithm or
|
||||||
|
encoding.
|
||||||
|
|
||||||
Byte sequences
|
Byte sequences
|
||||||
|
==============
|
||||||
|
|
||||||
First byte encoding :
|
First byte encoding::
|
||||||
|
|
||||||
0..17 : follow regular instruction encoding, see below. It is worth
|
0..17 : follow regular instruction encoding, see below. It is worth
|
||||||
noting that codes 16 and 17 will represent a block copy from
|
noting that codes 16 and 17 will represent a block copy from
|
||||||
@@ -91,7 +97,7 @@ Byte sequences
|
|||||||
state = 4 [ don't copy extra literals ]
|
state = 4 [ don't copy extra literals ]
|
||||||
skip byte
|
skip byte
|
||||||
|
|
||||||
Instruction encoding :
|
Instruction encoding::
|
||||||
|
|
||||||
0 0 0 0 X X X X (0..15)
|
0 0 0 0 X X X X (0..15)
|
||||||
Depends on the number of literals copied by the last instruction.
|
Depends on the number of literals copied by the last instruction.
|
||||||
@@ -156,6 +162,7 @@ Byte sequences
|
|||||||
distance = (H << 3) + D + 1
|
distance = (H << 3) + D + 1
|
||||||
|
|
||||||
Authors
|
Authors
|
||||||
|
=======
|
||||||
|
|
||||||
This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an
|
This document was written by Willy Tarreau <w@1wt.eu> on 2014/07/19 during an
|
||||||
analysis of the decompression code available in Linux 3.16-rc5. The code is
|
analysis of the decompression code available in Linux 3.16-rc5. The code is
|
||||||
|
|||||||
@@ -1,5 +1,8 @@
|
|||||||
|
============================
|
||||||
The Common Mailbox Framework
|
The Common Mailbox Framework
|
||||||
Jassi Brar <jaswinder.singh@linaro.org>
|
============================
|
||||||
|
|
||||||
|
:Author: Jassi Brar <jaswinder.singh@linaro.org>
|
||||||
|
|
||||||
This document aims to help developers write client and controller
|
This document aims to help developers write client and controller
|
||||||
drivers for the API. But before we start, let us note that the
|
drivers for the API. But before we start, let us note that the
|
||||||
@@ -13,12 +16,15 @@ similar copies of code written for each platform. Having said that,
|
|||||||
nothing prevents the remote f/w to also be Linux based and use the
|
nothing prevents the remote f/w to also be Linux based and use the
|
||||||
same api there. However none of that helps us locally because we only
|
same api there. However none of that helps us locally because we only
|
||||||
ever deal at client's protocol level.
|
ever deal at client's protocol level.
|
||||||
|
|
||||||
Some of the choices made during implementation are the result of this
|
Some of the choices made during implementation are the result of this
|
||||||
peculiarity of this "common" framework.
|
peculiarity of this "common" framework.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Part 1 - Controller Driver (See include/linux/mailbox_controller.h)
|
Controller Driver (See include/linux/mailbox_controller.h)
|
||||||
|
==========================================================
|
||||||
|
|
||||||
|
|
||||||
Allocate mbox_controller and the array of mbox_chan.
|
Allocate mbox_controller and the array of mbox_chan.
|
||||||
Populate mbox_chan_ops, except peek_data() all are mandatory.
|
Populate mbox_chan_ops, except peek_data() all are mandatory.
|
||||||
@@ -30,12 +36,15 @@ the controller driver should set via 'txdone_irq' or 'txdone_poll'
|
|||||||
or neither.
|
or neither.
|
||||||
|
|
||||||
|
|
||||||
Part 2 - Client Driver (See include/linux/mailbox_client.h)
|
Client Driver (See include/linux/mailbox_client.h)
|
||||||
|
==================================================
|
||||||
|
|
||||||
|
|
||||||
The client might want to operate in blocking mode (synchronously
|
The client might want to operate in blocking mode (synchronously
|
||||||
send a message through before returning) or non-blocking/async mode (submit
|
send a message through before returning) or non-blocking/async mode (submit
|
||||||
a message and a callback function to the API and return immediately).
|
a message and a callback function to the API and return immediately).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct demo_client {
|
struct demo_client {
|
||||||
struct mbox_client cl;
|
struct mbox_client cl;
|
||||||
|
|||||||
@@ -1876,8 +1876,8 @@ There are some more advanced barrier functions:
|
|||||||
This makes sure that the death mark on the object is perceived to be set
|
This makes sure that the death mark on the object is perceived to be set
|
||||||
*before* the reference counter is decremented.
|
*before* the reference counter is decremented.
|
||||||
|
|
||||||
See Documentation/atomic_ops.txt for more information. See the "Atomic
|
See Documentation/core-api/atomic_ops.rst for more information. See the
|
||||||
operations" subsection for information on where to use these.
|
"Atomic operations" subsection for information on where to use these.
|
||||||
|
|
||||||
|
|
||||||
(*) lockless_dereference();
|
(*) lockless_dereference();
|
||||||
@@ -2584,7 +2584,7 @@ situations because on some CPUs the atomic instructions used imply full memory
|
|||||||
barriers, and so barrier instructions are superfluous in conjunction with them,
|
barriers, and so barrier instructions are superfluous in conjunction with them,
|
||||||
and in such cases the special barrier primitives will be no-ops.
|
and in such cases the special barrier primitives will be no-ops.
|
||||||
|
|
||||||
See Documentation/atomic_ops.txt for more information.
|
See Documentation/core-api/atomic_ops.rst for more information.
|
||||||
|
|
||||||
|
|
||||||
ACCESSING DEVICES
|
ACCESSING DEVICES
|
||||||
|
|||||||
@@ -2,13 +2,15 @@
|
|||||||
Memory Hotplug
|
Memory Hotplug
|
||||||
==============
|
==============
|
||||||
|
|
||||||
Created: Jul 28 2007
|
:Created: Jul 28 2007
|
||||||
Add description of notifier of memory hotplug Oct 11 2007
|
:Updated: Add description of notifier of memory hotplug: Oct 11 2007
|
||||||
|
|
||||||
This document is about memory hotplug including how-to-use and current status.
|
This document is about memory hotplug including how-to-use and current status.
|
||||||
Because Memory Hotplug is still under development, contents of this text will
|
Because Memory Hotplug is still under development, contents of this text will
|
||||||
be changed often.
|
be changed often.
|
||||||
|
|
||||||
|
.. CONTENTS
|
||||||
|
|
||||||
1. Introduction
|
1. Introduction
|
||||||
1.1 purpose of memory hotplug
|
1.1 purpose of memory hotplug
|
||||||
1.2. Phases of memory hotplug
|
1.2. Phases of memory hotplug
|
||||||
@@ -28,17 +30,20 @@ be changed often.
|
|||||||
8. Memory hotplug event notifier
|
8. Memory hotplug event notifier
|
||||||
9. Future Work List
|
9. Future Work List
|
||||||
|
|
||||||
Note(1): x86_64's has special implementation for memory hotplug.
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
(1) x86_64's has special implementation for memory hotplug.
|
||||||
This text does not describe it.
|
This text does not describe it.
|
||||||
Note(2): This text assumes that sysfs is mounted at /sys.
|
(2) This text assumes that sysfs is mounted at /sys.
|
||||||
|
|
||||||
|
|
||||||
---------------
|
Introduction
|
||||||
1. Introduction
|
============
|
||||||
---------------
|
|
||||||
|
purpose of memory hotplug
|
||||||
|
-------------------------
|
||||||
|
|
||||||
1.1 purpose of memory hotplug
|
|
||||||
------------
|
|
||||||
Memory Hotplug allows users to increase/decrease the amount of memory.
|
Memory Hotplug allows users to increase/decrease the amount of memory.
|
||||||
Generally, there are two purposes.
|
Generally, there are two purposes.
|
||||||
|
|
||||||
@@ -53,9 +58,11 @@ hardware which supports memory power management.
|
|||||||
Linux memory hotplug is designed for both purpose.
|
Linux memory hotplug is designed for both purpose.
|
||||||
|
|
||||||
|
|
||||||
1.2. Phases of memory hotplug
|
Phases of memory hotplug
|
||||||
---------------
|
------------------------
|
||||||
There are 2 phases in Memory Hotplug.
|
|
||||||
|
There are 2 phases in Memory Hotplug:
|
||||||
|
|
||||||
1) Physical Memory Hotplug phase
|
1) Physical Memory Hotplug phase
|
||||||
2) Logical Memory Hotplug phase.
|
2) Logical Memory Hotplug phase.
|
||||||
|
|
||||||
@@ -70,7 +77,7 @@ management tables, and makes sysfs files for new memory's operation.
|
|||||||
If firmware supports notification of connection of new memory to OS,
|
If firmware supports notification of connection of new memory to OS,
|
||||||
this phase is triggered automatically. ACPI can notify this event. If not,
|
this phase is triggered automatically. ACPI can notify this event. If not,
|
||||||
"probe" operation by system administration is used instead.
|
"probe" operation by system administration is used instead.
|
||||||
(see Section 4.).
|
(see :ref:`memory_hotplug_physical_mem`).
|
||||||
|
|
||||||
Logical Memory Hotplug phase is to change memory state into
|
Logical Memory Hotplug phase is to change memory state into
|
||||||
available/unavailable for users. Amount of memory from user's view is
|
available/unavailable for users. Amount of memory from user's view is
|
||||||
@@ -86,8 +93,9 @@ phase by hand.
|
|||||||
phases can be execute in seamless way.)
|
phases can be execute in seamless way.)
|
||||||
|
|
||||||
|
|
||||||
1.3. Unit of Memory online/offline operation
|
Unit of Memory online/offline operation
|
||||||
------------
|
---------------------------------------
|
||||||
|
|
||||||
Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
|
Memory hotplug uses SPARSEMEM memory model which allows memory to be divided
|
||||||
into chunks of the same size. These chunks are called "sections". The size of
|
into chunks of the same size. These chunks are called "sections". The size of
|
||||||
a memory section is architecture dependent. For example, power uses 16MiB, ia64
|
a memory section is architecture dependent. For example, power uses 16MiB, ia64
|
||||||
@@ -97,43 +105,47 @@ Memory sections are combined into chunks referred to as "memory blocks". The
|
|||||||
size of a memory block is architecture dependent and represents the logical
|
size of a memory block is architecture dependent and represents the logical
|
||||||
unit upon which memory online/offline operations are to be performed. The
|
unit upon which memory online/offline operations are to be performed. The
|
||||||
default size of a memory block is the same as memory section size unless an
|
default size of a memory block is the same as memory section size unless an
|
||||||
architecture specifies otherwise. (see Section 3.)
|
architecture specifies otherwise. (see :ref:`memory_hotplug_sysfs_files`.)
|
||||||
|
|
||||||
To determine the size (in bytes) of a memory block please read this file:
|
To determine the size (in bytes) of a memory block please read this file:
|
||||||
|
|
||||||
/sys/devices/system/memory/block_size_bytes
|
/sys/devices/system/memory/block_size_bytes
|
||||||
|
|
||||||
|
|
||||||
-----------------------
|
Kernel Configuration
|
||||||
2. Kernel Configuration
|
====================
|
||||||
-----------------------
|
|
||||||
To use memory hotplug feature, kernel must be compiled with following
|
To use memory hotplug feature, kernel must be compiled with following
|
||||||
config options.
|
config options.
|
||||||
|
|
||||||
- For all memory hotplug
|
- For all memory hotplug:
|
||||||
Memory model -> Sparse Memory (CONFIG_SPARSEMEM)
|
- Memory model -> Sparse Memory (CONFIG_SPARSEMEM)
|
||||||
Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG)
|
- Allow for memory hot-add (CONFIG_MEMORY_HOTPLUG)
|
||||||
|
|
||||||
- To enable memory removal, the following are also necessary
|
- To enable memory removal, the following are also necessary:
|
||||||
Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE)
|
- Allow for memory hot remove (CONFIG_MEMORY_HOTREMOVE)
|
||||||
Page Migration (CONFIG_MIGRATION)
|
- Page Migration (CONFIG_MIGRATION)
|
||||||
|
|
||||||
- For ACPI memory hotplug, the following are also necessary
|
- For ACPI memory hotplug, the following are also necessary:
|
||||||
Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
|
- Memory hotplug (under ACPI Support menu) (CONFIG_ACPI_HOTPLUG_MEMORY)
|
||||||
This option can be kernel module.
|
- This option can be kernel module.
|
||||||
|
|
||||||
- As a related configuration, if your box has a feature of NUMA-node hotplug
|
- As a related configuration, if your box has a feature of NUMA-node hotplug
|
||||||
via ACPI, then this option is necessary too.
|
via ACPI, then this option is necessary too.
|
||||||
ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
|
|
||||||
|
- ACPI0004,PNP0A05 and PNP0A06 Container Driver (under ACPI Support menu)
|
||||||
(CONFIG_ACPI_CONTAINER).
|
(CONFIG_ACPI_CONTAINER).
|
||||||
|
|
||||||
This option can be kernel module too.
|
This option can be kernel module too.
|
||||||
|
|
||||||
|
|
||||||
--------------------------------
|
.. _memory_hotplug_sysfs_files:
|
||||||
3 sysfs files for memory hotplug
|
|
||||||
--------------------------------
|
sysfs files for memory hotplug
|
||||||
|
==============================
|
||||||
|
|
||||||
All memory blocks have their device information in sysfs. Each memory block
|
All memory blocks have their device information in sysfs. Each memory block
|
||||||
is described under /sys/devices/system/memory as
|
is described under /sys/devices/system/memory as:
|
||||||
|
|
||||||
/sys/devices/system/memory/memoryXXX
|
/sys/devices/system/memory/memoryXXX
|
||||||
(XXX is the memory block id.)
|
(XXX is the memory block id.)
|
||||||
@@ -145,43 +157,53 @@ the existence of one should not affect the hotplug capabilities of the memory
|
|||||||
block.
|
block.
|
||||||
|
|
||||||
For example, assume 1GiB memory block size. A device for a memory starting at
|
For example, assume 1GiB memory block size. A device for a memory starting at
|
||||||
0x100000000 is /sys/device/system/memory/memory4
|
0x100000000 is /sys/device/system/memory/memory4::
|
||||||
|
|
||||||
(0x100000000 / 1Gib = 4)
|
(0x100000000 / 1Gib = 4)
|
||||||
|
|
||||||
This device covers address range [0x100000000 ... 0x140000000)
|
This device covers address range [0x100000000 ... 0x140000000)
|
||||||
|
|
||||||
Under each memory block, you can see 5 files:
|
Under each memory block, you can see 5 files:
|
||||||
|
|
||||||
/sys/devices/system/memory/memoryXXX/phys_index
|
- /sys/devices/system/memory/memoryXXX/phys_index
|
||||||
/sys/devices/system/memory/memoryXXX/phys_device
|
- /sys/devices/system/memory/memoryXXX/phys_device
|
||||||
/sys/devices/system/memory/memoryXXX/state
|
- /sys/devices/system/memory/memoryXXX/state
|
||||||
/sys/devices/system/memory/memoryXXX/removable
|
- /sys/devices/system/memory/memoryXXX/removable
|
||||||
/sys/devices/system/memory/memoryXXX/valid_zones
|
- /sys/devices/system/memory/memoryXXX/valid_zones
|
||||||
|
|
||||||
|
=================== ============================================================
|
||||||
|
``phys_index`` read-only and contains memory block id, same as XXX.
|
||||||
|
``state`` read-write
|
||||||
|
|
||||||
|
- at read: contains online/offline state of memory.
|
||||||
|
- at write: user can specify "online_kernel",
|
||||||
|
|
||||||
'phys_index' : read-only and contains memory block id, same as XXX.
|
|
||||||
'state' : read-write
|
|
||||||
at read: contains online/offline state of memory.
|
|
||||||
at write: user can specify "online_kernel",
|
|
||||||
"online_movable", "online", "offline" command
|
"online_movable", "online", "offline" command
|
||||||
which will be performed on all sections in the block.
|
which will be performed on all sections in the block.
|
||||||
'phys_device' : read-only: designed to show the name of physical memory
|
``phys_device`` read-only: designed to show the name of physical memory
|
||||||
device. This is not well implemented now.
|
device. This is not well implemented now.
|
||||||
'removable' : read-only: contains an integer value indicating
|
``removable`` read-only: contains an integer value indicating
|
||||||
whether the memory block is removable or not
|
whether the memory block is removable or not
|
||||||
removable. A value of 1 indicates that the memory
|
removable. A value of 1 indicates that the memory
|
||||||
block is removable and a value of 0 indicates that
|
block is removable and a value of 0 indicates that
|
||||||
it is not removable. A memory block is removable only if
|
it is not removable. A memory block is removable only if
|
||||||
every section in the block is removable.
|
every section in the block is removable.
|
||||||
'valid_zones' : read-only: designed to show which zones this memory block
|
``valid_zones`` read-only: designed to show which zones this memory block
|
||||||
can be onlined to.
|
can be onlined to.
|
||||||
The first column shows it's default zone.
|
|
||||||
|
The first column shows it`s default zone.
|
||||||
|
|
||||||
"memory6/valid_zones: Normal Movable" shows this memoryblock
|
"memory6/valid_zones: Normal Movable" shows this memoryblock
|
||||||
can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE
|
can be onlined to ZONE_NORMAL by default and to ZONE_MOVABLE
|
||||||
by online_movable.
|
by online_movable.
|
||||||
|
|
||||||
"memory7/valid_zones: Movable Normal" shows this memoryblock
|
"memory7/valid_zones: Movable Normal" shows this memoryblock
|
||||||
can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL
|
can be onlined to ZONE_MOVABLE by default and to ZONE_NORMAL
|
||||||
by online_kernel.
|
by online_kernel.
|
||||||
|
=================== ============================================================
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
NOTE:
|
|
||||||
These directories/files appear after physical memory hotplug phase.
|
These directories/files appear after physical memory hotplug phase.
|
||||||
|
|
||||||
If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
|
If CONFIG_NUMA is enabled the memoryXXX/ directories can also be accessed
|
||||||
@@ -193,13 +215,14 @@ For example:
|
|||||||
A backlink will also be created:
|
A backlink will also be created:
|
||||||
/sys/devices/system/memory/memory9/node0 -> ../../node/node0
|
/sys/devices/system/memory/memory9/node0 -> ../../node/node0
|
||||||
|
|
||||||
|
.. _memory_hotplug_physical_mem:
|
||||||
|
|
||||||
--------------------------------
|
Physical memory hot-add phase
|
||||||
4. Physical memory hot-add phase
|
=============================
|
||||||
--------------------------------
|
|
||||||
|
Hardware(Firmware) Support
|
||||||
|
--------------------------
|
||||||
|
|
||||||
4.1 Hardware(Firmware) Support
|
|
||||||
------------
|
|
||||||
On x86_64/ia64 platform, memory hotplug by ACPI is supported.
|
On x86_64/ia64 platform, memory hotplug by ACPI is supported.
|
||||||
|
|
||||||
In general, the firmware (ACPI) which supports memory hotplug defines
|
In general, the firmware (ACPI) which supports memory hotplug defines
|
||||||
@@ -209,7 +232,8 @@ script. This will be done automatically.
|
|||||||
|
|
||||||
But scripts for memory hotplug are not contained in generic udev package(now).
|
But scripts for memory hotplug are not contained in generic udev package(now).
|
||||||
You may have to write it by yourself or online/offline memory by hand.
|
You may have to write it by yourself or online/offline memory by hand.
|
||||||
Please see "How to online memory", "How to offline memory" in this text.
|
Please see :ref:`memory_hotplug_how_to_online_memory` and
|
||||||
|
:ref:`memory_hotplug_how_to_offline_memory`.
|
||||||
|
|
||||||
If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
|
If firmware supports NUMA-node hotplug, and defines an object _HID "ACPI0004",
|
||||||
"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler
|
"PNP0A05", or "PNP0A06", notification is asserted to it, and ACPI handler
|
||||||
@@ -217,8 +241,9 @@ calls hotplug code for all of objects which are defined in it.
|
|||||||
If memory device is found, memory hotplug code will be called.
|
If memory device is found, memory hotplug code will be called.
|
||||||
|
|
||||||
|
|
||||||
4.2 Notify memory hot-add event by hand
|
Notify memory hot-add event by hand
|
||||||
------------
|
-----------------------------------
|
||||||
|
|
||||||
On some architectures, the firmware may not notify the kernel of a memory
|
On some architectures, the firmware may not notify the kernel of a memory
|
||||||
hotplug event. Therefore, the memory "probe" interface is supported to
|
hotplug event. Therefore, the memory "probe" interface is supported to
|
||||||
explicitly notify the kernel. This interface depends on
|
explicitly notify the kernel. This interface depends on
|
||||||
@@ -229,35 +254,38 @@ notification.
|
|||||||
Probe interface is located at
|
Probe interface is located at
|
||||||
/sys/devices/system/memory/probe
|
/sys/devices/system/memory/probe
|
||||||
|
|
||||||
You can tell the physical address of new memory to the kernel by
|
You can tell the physical address of new memory to the kernel by::
|
||||||
|
|
||||||
% echo start_address_of_new_memory > /sys/devices/system/memory/probe
|
% echo start_address_of_new_memory > /sys/devices/system/memory/probe
|
||||||
|
|
||||||
Then, [start_address_of_new_memory, start_address_of_new_memory +
|
Then, [start_address_of_new_memory, start_address_of_new_memory +
|
||||||
memory_block_size] memory range is hot-added. In this case, hotplug script is
|
memory_block_size] memory range is hot-added. In this case, hotplug script is
|
||||||
not called (in current implementation). You'll have to online memory by
|
not called (in current implementation). You'll have to online memory by
|
||||||
yourself. Please see "How to online memory" in this text.
|
yourself. Please see :ref:`memory_hotplug_how_to_online_memory`.
|
||||||
|
|
||||||
|
|
||||||
------------------------------
|
Logical Memory hot-add phase
|
||||||
5. Logical Memory hot-add phase
|
============================
|
||||||
------------------------------
|
|
||||||
|
|
||||||
5.1. State of memory
|
State of memory
|
||||||
------------
|
---------------
|
||||||
To see (online/offline) state of a memory block, read 'state' file.
|
|
||||||
|
To see (online/offline) state of a memory block, read 'state' file::
|
||||||
|
|
||||||
% cat /sys/device/system/memory/memoryXXX/state
|
% cat /sys/device/system/memory/memoryXXX/state
|
||||||
|
|
||||||
|
|
||||||
If the memory block is online, you'll read "online".
|
- If the memory block is online, you'll read "online".
|
||||||
If the memory block is offline, you'll read "offline".
|
- If the memory block is offline, you'll read "offline".
|
||||||
|
|
||||||
|
|
||||||
5.2. How to online memory
|
.. _memory_hotplug_how_to_online_memory:
|
||||||
------------
|
|
||||||
|
How to online memory
|
||||||
|
--------------------
|
||||||
|
|
||||||
When the memory is hot-added, the kernel decides whether or not to "online"
|
When the memory is hot-added, the kernel decides whether or not to "online"
|
||||||
it according to the policy which can be read from "auto_online_blocks" file:
|
it according to the policy which can be read from "auto_online_blocks" file::
|
||||||
|
|
||||||
% cat /sys/devices/system/memory/auto_online_blocks
|
% cat /sys/devices/system/memory/auto_online_blocks
|
||||||
|
|
||||||
@@ -265,7 +293,7 @@ The default depends on the CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE kernel config
|
|||||||
option. If it is disabled the default is "offline" which means the newly added
|
option. If it is disabled the default is "offline" which means the newly added
|
||||||
memory is not in a ready-to-use state and you have to "online" the newly added
|
memory is not in a ready-to-use state and you have to "online" the newly added
|
||||||
memory blocks manually. Automatic onlining can be requested by writing "online"
|
memory blocks manually. Automatic onlining can be requested by writing "online"
|
||||||
to "auto_online_blocks" file:
|
to "auto_online_blocks" file::
|
||||||
|
|
||||||
% echo online > /sys/devices/system/memory/auto_online_blocks
|
% echo online > /sys/devices/system/memory/auto_online_blocks
|
||||||
|
|
||||||
@@ -277,7 +305,7 @@ online. User space tools can check their "state" files
|
|||||||
|
|
||||||
If the automatic onlining wasn't requested, failed, or some memory block was
|
If the automatic onlining wasn't requested, failed, or some memory block was
|
||||||
offlined it is possible to change the individual block's state by writing to the
|
offlined it is possible to change the individual block's state by writing to the
|
||||||
"state" file:
|
"state" file::
|
||||||
|
|
||||||
% echo online > /sys/devices/system/memory/memoryXXX/state
|
% echo online > /sys/devices/system/memory/memoryXXX/state
|
||||||
|
|
||||||
@@ -286,15 +314,17 @@ If the memory block doesn't belong to any zone an appropriate kernel zone
|
|||||||
(usually ZONE_NORMAL) will be used unless movable_node kernel command line
|
(usually ZONE_NORMAL) will be used unless movable_node kernel command line
|
||||||
option is specified when ZONE_MOVABLE will be used.
|
option is specified when ZONE_MOVABLE will be used.
|
||||||
|
|
||||||
You can explicitly request to associate it with ZONE_MOVABLE by
|
You can explicitly request to associate it with ZONE_MOVABLE by::
|
||||||
|
|
||||||
% echo online_movable > /sys/devices/system/memory/memoryXXX/state
|
% echo online_movable > /sys/devices/system/memory/memoryXXX/state
|
||||||
(NOTE: current limit: this memory block must be adjacent to ZONE_MOVABLE)
|
|
||||||
|
|
||||||
Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by:
|
.. note:: current limit: this memory block must be adjacent to ZONE_MOVABLE
|
||||||
|
|
||||||
|
Or you can explicitly request a kernel zone (usually ZONE_NORMAL) by::
|
||||||
|
|
||||||
% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
|
% echo online_kernel > /sys/devices/system/memory/memoryXXX/state
|
||||||
(NOTE: current limit: this memory block must be adjacent to ZONE_NORMAL)
|
|
||||||
|
.. note:: current limit: this memory block must be adjacent to ZONE_NORMAL
|
||||||
|
|
||||||
An explicit zone onlining can fail (e.g. when the range is already within
|
An explicit zone onlining can fail (e.g. when the range is already within
|
||||||
and existing and incompatible zone already).
|
and existing and incompatible zone already).
|
||||||
@@ -306,12 +336,12 @@ This may be changed in future.
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
------------------------
|
Logical memory remove
|
||||||
6. Logical memory remove
|
=====================
|
||||||
------------------------
|
|
||||||
|
Memory offline and ZONE_MOVABLE
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
6.1 Memory offline and ZONE_MOVABLE
|
|
||||||
------------
|
|
||||||
Memory offlining is more complicated than memory online. Because memory offline
|
Memory offlining is more complicated than memory online. Because memory offline
|
||||||
has to make the whole memory block be unused, memory offline can fail if
|
has to make the whole memory block be unused, memory offline can fail if
|
||||||
the memory block includes memory which cannot be freed.
|
the memory block includes memory which cannot be freed.
|
||||||
@@ -343,15 +373,18 @@ creates ZONE_MOVABLE as following.
|
|||||||
Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ.
|
Size of memory not for movable pages (not for offline) is TOTAL - ZZZZ.
|
||||||
Size of memory for movable pages (for offline) is ZZZZ.
|
Size of memory for movable pages (for offline) is ZZZZ.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
|
||||||
Note: Unfortunately, there is no information to show which memory block belongs
|
Unfortunately, there is no information to show which memory block belongs
|
||||||
to ZONE_MOVABLE. This is TBD.
|
to ZONE_MOVABLE. This is TBD.
|
||||||
|
|
||||||
|
.. _memory_hotplug_how_to_offline_memory:
|
||||||
|
|
||||||
|
How to offline memory
|
||||||
|
---------------------
|
||||||
|
|
||||||
6.2. How to offline memory
|
|
||||||
------------
|
|
||||||
You can offline a memory block by using the same sysfs interface that was used
|
You can offline a memory block by using the same sysfs interface that was used
|
||||||
in memory onlining.
|
in memory onlining::
|
||||||
|
|
||||||
% echo offline > /sys/devices/system/memory/memoryXXX/state
|
% echo offline > /sys/devices/system/memory/memoryXXX/state
|
||||||
|
|
||||||
@@ -367,22 +400,22 @@ able to offline it (or not). (For example, a page is referred to by some kernel
|
|||||||
internal call and released soon.)
|
internal call and released soon.)
|
||||||
|
|
||||||
Consideration:
|
Consideration:
|
||||||
Memory hotplug's design direction is to make the possibility of memory offlining
|
Memory hotplug's design direction is to make the possibility of memory
|
||||||
higher and to guarantee unplugging memory under any situation. But it needs
|
offlining higher and to guarantee unplugging memory under any situation. But
|
||||||
more work. Returning -EBUSY under some situation may be good because the user
|
it needs more work. Returning -EBUSY under some situation may be good because
|
||||||
can decide to retry more or not by himself. Currently, memory offlining code
|
the user can decide to retry more or not by himself. Currently, memory
|
||||||
does some amount of retry with 120 seconds timeout.
|
offlining code does some amount of retry with 120 seconds timeout.
|
||||||
|
|
||||||
|
Physical memory remove
|
||||||
|
======================
|
||||||
|
|
||||||
-------------------------
|
|
||||||
7. Physical memory remove
|
|
||||||
-------------------------
|
|
||||||
Need more implementation yet....
|
Need more implementation yet....
|
||||||
- Notification completion of remove works by OS to firmware.
|
- Notification completion of remove works by OS to firmware.
|
||||||
- Guard from remove if not yet.
|
- Guard from remove if not yet.
|
||||||
|
|
||||||
--------------------------------
|
Memory hotplug event notifier
|
||||||
8. Memory hotplug event notifier
|
=============================
|
||||||
--------------------------------
|
|
||||||
Hotplugging events are sent to a notification queue.
|
Hotplugging events are sent to a notification queue.
|
||||||
|
|
||||||
There are six types of notification defined in include/linux/memory.h:
|
There are six types of notification defined in include/linux/memory.h:
|
||||||
@@ -412,14 +445,14 @@ MEM_CANCEL_OFFLINE
|
|||||||
MEM_OFFLINE
|
MEM_OFFLINE
|
||||||
Generated after offlining memory is complete.
|
Generated after offlining memory is complete.
|
||||||
|
|
||||||
A callback routine can be registered by calling
|
A callback routine can be registered by calling::
|
||||||
|
|
||||||
hotplug_memory_notifier(callback_func, priority)
|
hotplug_memory_notifier(callback_func, priority)
|
||||||
|
|
||||||
Callback functions with higher values of priority are called before callback
|
Callback functions with higher values of priority are called before callback
|
||||||
functions with lower values.
|
functions with lower values.
|
||||||
|
|
||||||
A callback function must have the following prototype:
|
A callback function must have the following prototype::
|
||||||
|
|
||||||
int callback_func(
|
int callback_func(
|
||||||
struct notifier_block *self, unsigned long action, void *arg);
|
struct notifier_block *self, unsigned long action, void *arg);
|
||||||
@@ -427,7 +460,7 @@ A callback function must have the following prototype:
|
|||||||
The first argument of the callback function (self) is a pointer to the block
|
The first argument of the callback function (self) is a pointer to the block
|
||||||
of the notifier chain that points to the callback function itself.
|
of the notifier chain that points to the callback function itself.
|
||||||
The second argument (action) is one of the event types described above.
|
The second argument (action) is one of the event types described above.
|
||||||
The third argument (arg) passes a pointer of struct memory_notify.
|
The third argument (arg) passes a pointer of struct memory_notify::
|
||||||
|
|
||||||
struct memory_notify {
|
struct memory_notify {
|
||||||
unsigned long start_pfn;
|
unsigned long start_pfn;
|
||||||
@@ -437,15 +470,16 @@ struct memory_notify {
|
|||||||
int status_change_nid;
|
int status_change_nid;
|
||||||
}
|
}
|
||||||
|
|
||||||
start_pfn is start_pfn of online/offline memory.
|
- start_pfn is start_pfn of online/offline memory.
|
||||||
nr_pages is # of pages of online/offline memory.
|
- nr_pages is # of pages of online/offline memory.
|
||||||
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
|
- status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
|
||||||
is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
||||||
status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
|
- status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
|
||||||
is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
is (will be) set/clear, if this is -1, then nodemask status is not changed.
|
||||||
status_change_nid is set node id when N_MEMORY of nodemask is (will be)
|
- status_change_nid is set node id when N_MEMORY of nodemask is (will be)
|
||||||
set/clear. It means a new(memoryless) node gets new memory by online and a
|
set/clear. It means a new(memoryless) node gets new memory by online and a
|
||||||
node loses all memory. If this is -1, then nodemask status is not changed.
|
node loses all memory. If this is -1, then nodemask status is not changed.
|
||||||
|
|
||||||
If status_changed_nid* >= 0, callback should create/discard structures for the
|
If status_changed_nid* >= 0, callback should create/discard structures for the
|
||||||
node if necessary.
|
node if necessary.
|
||||||
|
|
||||||
@@ -461,9 +495,9 @@ further processing of the notification queue.
|
|||||||
|
|
||||||
NOTIFY_STOP stops further processing of the notification queue.
|
NOTIFY_STOP stops further processing of the notification queue.
|
||||||
|
|
||||||
--------------
|
Future Work
|
||||||
9. Future Work
|
===========
|
||||||
--------------
|
|
||||||
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
|
- allowing memory hot-add to ZONE_MOVABLE. maybe we need some switch like
|
||||||
sysctl or new control file.
|
sysctl or new control file.
|
||||||
- showing memory block and physical device relationship.
|
- showing memory block and physical device relationship.
|
||||||
@@ -471,4 +505,3 @@ NOTIFY_STOP stops further processing of the notification queue.
|
|||||||
- support HugeTLB page migration and offlining.
|
- support HugeTLB page migration and offlining.
|
||||||
- memmap removing at memory offline.
|
- memmap removing at memory offline.
|
||||||
- physical remove memory.
|
- physical remove memory.
|
||||||
|
|
||||||
|
|||||||
@@ -1,7 +1,8 @@
|
|||||||
|
=================
|
||||||
MEN Chameleon Bus
|
MEN Chameleon Bus
|
||||||
=================
|
=================
|
||||||
|
|
||||||
Table of Contents
|
.. Table of Contents
|
||||||
=================
|
=================
|
||||||
1 Introduction
|
1 Introduction
|
||||||
1.1 Scope of this Document
|
1.1 Scope of this Document
|
||||||
@@ -19,37 +20,44 @@ Table of Contents
|
|||||||
4.3 Initializing the driver
|
4.3 Initializing the driver
|
||||||
|
|
||||||
|
|
||||||
1 Introduction
|
Introduction
|
||||||
===============
|
============
|
||||||
|
|
||||||
This document describes the architecture and implementation of the MEN
|
This document describes the architecture and implementation of the MEN
|
||||||
Chameleon Bus (called MCB throughout this document).
|
Chameleon Bus (called MCB throughout this document).
|
||||||
|
|
||||||
1.1 Scope of this Document
|
Scope of this Document
|
||||||
---------------------------
|
----------------------
|
||||||
|
|
||||||
This document is intended to be a short overview of the current
|
This document is intended to be a short overview of the current
|
||||||
implementation and does by no means describe the complete possibilities of MCB
|
implementation and does by no means describe the complete possibilities of MCB
|
||||||
based devices.
|
based devices.
|
||||||
|
|
||||||
1.2 Limitations of the current implementation
|
Limitations of the current implementation
|
||||||
----------------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
The current implementation is limited to PCI and PCIe based carrier devices
|
The current implementation is limited to PCI and PCIe based carrier devices
|
||||||
that only use a single memory resource and share the PCI legacy IRQ. Not
|
that only use a single memory resource and share the PCI legacy IRQ. Not
|
||||||
implemented are:
|
implemented are:
|
||||||
|
|
||||||
- Multi-resource MCB devices like the VME Controller or M-Module carrier.
|
- Multi-resource MCB devices like the VME Controller or M-Module carrier.
|
||||||
- MCB devices that need another MCB device, like SRAM for a DMA Controller's
|
- MCB devices that need another MCB device, like SRAM for a DMA Controller's
|
||||||
buffer descriptors or a video controller's video memory.
|
buffer descriptors or a video controller's video memory.
|
||||||
- A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
|
- A per-carrier IRQ domain for carrier devices that have one (or more) IRQs
|
||||||
per MCB device like PCIe based carriers with MSI or MSI-X support.
|
per MCB device like PCIe based carriers with MSI or MSI-X support.
|
||||||
|
|
||||||
2 Architecture
|
Architecture
|
||||||
===============
|
============
|
||||||
|
|
||||||
MCB is divided into 3 functional blocks:
|
MCB is divided into 3 functional blocks:
|
||||||
|
|
||||||
- The MEN Chameleon Bus itself,
|
- The MEN Chameleon Bus itself,
|
||||||
- drivers for MCB Carrier Devices and
|
- drivers for MCB Carrier Devices and
|
||||||
- the parser for the Chameleon table.
|
- the parser for the Chameleon table.
|
||||||
|
|
||||||
2.1 MEN Chameleon Bus
|
MEN Chameleon Bus
|
||||||
----------------------
|
-----------------
|
||||||
|
|
||||||
The MEN Chameleon Bus is an artificial bus system that attaches to a so
|
The MEN Chameleon Bus is an artificial bus system that attaches to a so
|
||||||
called Chameleon FPGA device found on some hardware produced my MEN Mikro
|
called Chameleon FPGA device found on some hardware produced my MEN Mikro
|
||||||
Elektronik GmbH. These devices are multi-function devices implemented in a
|
Elektronik GmbH. These devices are multi-function devices implemented in a
|
||||||
@@ -59,8 +67,9 @@ Table of Contents
|
|||||||
BAR, size in the FPGA, interrupt number and some other properties currently
|
BAR, size in the FPGA, interrupt number and some other properties currently
|
||||||
not handled by the MCB implementation.
|
not handled by the MCB implementation.
|
||||||
|
|
||||||
2.2 Carrier Devices
|
Carrier Devices
|
||||||
--------------------
|
---------------
|
||||||
|
|
||||||
A carrier device is just an abstraction for the real world physical bus the
|
A carrier device is just an abstraction for the real world physical bus the
|
||||||
Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
|
Chameleon FPGA is attached to. Some IP Core drivers may need to interact with
|
||||||
properties of the carrier device (like querying the IRQ number of a PCI
|
properties of the carrier device (like querying the IRQ number of a PCI
|
||||||
@@ -70,8 +79,9 @@ Table of Contents
|
|||||||
implement the get_irq() method which can be translated into a hardware bus
|
implement the get_irq() method which can be translated into a hardware bus
|
||||||
query for the IRQ number the device should use.
|
query for the IRQ number the device should use.
|
||||||
|
|
||||||
2.3 Parser
|
Parser
|
||||||
-----------
|
------
|
||||||
|
|
||||||
The parser reads the first 512 bytes of a Chameleon device and parses the
|
The parser reads the first 512 bytes of a Chameleon device and parses the
|
||||||
Chameleon table. Currently the parser only supports the Chameleon v2 variant
|
Chameleon table. Currently the parser only supports the Chameleon v2 variant
|
||||||
of the Chameleon table but can easily be adopted to support an older or
|
of the Chameleon table but can easily be adopted to support an older or
|
||||||
@@ -81,36 +91,39 @@ Table of Contents
|
|||||||
MCB devices are registered at the MCB and thus at the driver core of the
|
MCB devices are registered at the MCB and thus at the driver core of the
|
||||||
Linux kernel.
|
Linux kernel.
|
||||||
|
|
||||||
3 Resource handling
|
Resource handling
|
||||||
====================
|
=================
|
||||||
|
|
||||||
The current implementation assigns exactly one memory and one IRQ resource
|
The current implementation assigns exactly one memory and one IRQ resource
|
||||||
per MCB device. But this is likely going to change in the future.
|
per MCB device. But this is likely going to change in the future.
|
||||||
|
|
||||||
3.1 Memory Resources
|
Memory Resources
|
||||||
---------------------
|
----------------
|
||||||
|
|
||||||
Each MCB device has exactly one memory resource, which can be requested from
|
Each MCB device has exactly one memory resource, which can be requested from
|
||||||
the MCB bus. This memory resource is the physical address of the MCB device
|
the MCB bus. This memory resource is the physical address of the MCB device
|
||||||
inside the carrier and is intended to be passed to ioremap() and friends. It
|
inside the carrier and is intended to be passed to ioremap() and friends. It
|
||||||
is already requested from the kernel by calling request_mem_region().
|
is already requested from the kernel by calling request_mem_region().
|
||||||
|
|
||||||
3.2 IRQs
|
IRQs
|
||||||
---------
|
----
|
||||||
|
|
||||||
Each MCB device has exactly one IRQ resource, which can be requested from the
|
Each MCB device has exactly one IRQ resource, which can be requested from the
|
||||||
MCB bus. If a carrier device driver implements the ->get_irq() callback
|
MCB bus. If a carrier device driver implements the ->get_irq() callback
|
||||||
method, the IRQ number assigned by the carrier device will be returned,
|
method, the IRQ number assigned by the carrier device will be returned,
|
||||||
otherwise the IRQ number inside the Chameleon table will be returned. This
|
otherwise the IRQ number inside the Chameleon table will be returned. This
|
||||||
number is suitable to be passed to request_irq().
|
number is suitable to be passed to request_irq().
|
||||||
|
|
||||||
4 Writing an MCB driver
|
Writing an MCB driver
|
||||||
=======================
|
=====================
|
||||||
|
|
||||||
|
The driver structure
|
||||||
|
--------------------
|
||||||
|
|
||||||
4.1 The driver structure
|
|
||||||
-------------------------
|
|
||||||
Each MCB driver has a structure to identify the device driver as well as
|
Each MCB driver has a structure to identify the device driver as well as
|
||||||
device ids which identify the IP Core inside the FPGA. The driver structure
|
device ids which identify the IP Core inside the FPGA. The driver structure
|
||||||
also contains callback methods which get executed on driver probe and
|
also contains callback methods which get executed on driver probe and
|
||||||
removal from the system.
|
removal from the system::
|
||||||
|
|
||||||
|
|
||||||
static const struct mcb_device_id foo_ids[] = {
|
static const struct mcb_device_id foo_ids[] = {
|
||||||
{ .device = 0x123 },
|
{ .device = 0x123 },
|
||||||
@@ -128,22 +141,22 @@ Table of Contents
|
|||||||
.id_table = foo_ids,
|
.id_table = foo_ids,
|
||||||
};
|
};
|
||||||
|
|
||||||
4.2 Probing and attaching
|
Probing and attaching
|
||||||
--------------------------
|
---------------------
|
||||||
|
|
||||||
When a driver is loaded and the MCB devices it services are found, the MCB
|
When a driver is loaded and the MCB devices it services are found, the MCB
|
||||||
core will call the driver's probe callback method. When the driver is removed
|
core will call the driver's probe callback method. When the driver is removed
|
||||||
from the system, the MCB core will call the driver's remove callback method.
|
from the system, the MCB core will call the driver's remove callback method::
|
||||||
|
|
||||||
|
|
||||||
static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
|
static init foo_probe(struct mcb_device *mdev, const struct mcb_device_id *id);
|
||||||
static void foo_remove(struct mcb_device *mdev);
|
static void foo_remove(struct mcb_device *mdev);
|
||||||
|
|
||||||
4.3 Initializing the driver
|
Initializing the driver
|
||||||
----------------------------
|
-----------------------
|
||||||
|
|
||||||
When the kernel is booted or your foo driver module is inserted, you have to
|
When the kernel is booted or your foo driver module is inserted, you have to
|
||||||
perform driver initialization. Usually it is enough to register your driver
|
perform driver initialization. Usually it is enough to register your driver
|
||||||
module at the MCB core.
|
module at the MCB core::
|
||||||
|
|
||||||
|
|
||||||
static int __init foo_init(void)
|
static int __init foo_init(void)
|
||||||
{
|
{
|
||||||
@@ -157,7 +170,6 @@ Table of Contents
|
|||||||
}
|
}
|
||||||
module_exit(foo_exit);
|
module_exit(foo_exit);
|
||||||
|
|
||||||
The module_mcb_driver() macro can be used to reduce the above code.
|
The module_mcb_driver() macro can be used to reduce the above code::
|
||||||
|
|
||||||
|
|
||||||
module_mcb_driver(foo_driver);
|
module_mcb_driver(foo_driver);
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
=============================
|
=============================
|
||||||
NO-MMU MEMORY MAPPING SUPPORT
|
No-MMU memory mapping support
|
||||||
=============================
|
=============================
|
||||||
|
|
||||||
The kernel has limited support for memory mapping under no-MMU conditions, such
|
The kernel has limited support for memory mapping under no-MMU conditions, such
|
||||||
@@ -16,7 +16,7 @@ the CLONE_VM flag.
|
|||||||
The behaviour is similar between the MMU and no-MMU cases, but not identical;
|
The behaviour is similar between the MMU and no-MMU cases, but not identical;
|
||||||
and it's also much more restricted in the latter case:
|
and it's also much more restricted in the latter case:
|
||||||
|
|
||||||
(*) Anonymous mapping, MAP_PRIVATE
|
(#) Anonymous mapping, MAP_PRIVATE
|
||||||
|
|
||||||
In the MMU case: VM regions backed by arbitrary pages; copy-on-write
|
In the MMU case: VM regions backed by arbitrary pages; copy-on-write
|
||||||
across fork.
|
across fork.
|
||||||
@@ -24,14 +24,14 @@ and it's also much more restricted in the latter case:
|
|||||||
In the no-MMU case: VM regions backed by arbitrary contiguous runs of
|
In the no-MMU case: VM regions backed by arbitrary contiguous runs of
|
||||||
pages.
|
pages.
|
||||||
|
|
||||||
(*) Anonymous mapping, MAP_SHARED
|
(#) Anonymous mapping, MAP_SHARED
|
||||||
|
|
||||||
These behave very much like private mappings, except that they're
|
These behave very much like private mappings, except that they're
|
||||||
shared across fork() or clone() without CLONE_VM in the MMU case. Since
|
shared across fork() or clone() without CLONE_VM in the MMU case. Since
|
||||||
the no-MMU case doesn't support these, behaviour is identical to
|
the no-MMU case doesn't support these, behaviour is identical to
|
||||||
MAP_PRIVATE there.
|
MAP_PRIVATE there.
|
||||||
|
|
||||||
(*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE
|
(#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, !PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: VM regions backed by pages read from file; changes to
|
In the MMU case: VM regions backed by pages read from file; changes to
|
||||||
the underlying file are reflected in the mapping; copied across fork.
|
the underlying file are reflected in the mapping; copied across fork.
|
||||||
@@ -56,7 +56,7 @@ and it's also much more restricted in the latter case:
|
|||||||
are visible in other processes (no MMU protection), but should not
|
are visible in other processes (no MMU protection), but should not
|
||||||
happen.
|
happen.
|
||||||
|
|
||||||
(*) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE
|
(#) File, MAP_PRIVATE, PROT_READ / PROT_EXEC, PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: like the non-PROT_WRITE case, except that the pages in
|
In the MMU case: like the non-PROT_WRITE case, except that the pages in
|
||||||
question get copied before the write actually happens. From that point
|
question get copied before the write actually happens. From that point
|
||||||
@@ -66,7 +66,7 @@ and it's also much more restricted in the latter case:
|
|||||||
In the no-MMU case: works much like the non-PROT_WRITE case, except
|
In the no-MMU case: works much like the non-PROT_WRITE case, except
|
||||||
that a copy is always taken and never shared.
|
that a copy is always taken and never shared.
|
||||||
|
|
||||||
(*) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
(#) Regular file / blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: VM regions backed by pages read from file; changes to
|
In the MMU case: VM regions backed by pages read from file; changes to
|
||||||
pages written back to file; writes to file reflected into pages backing
|
pages written back to file; writes to file reflected into pages backing
|
||||||
@@ -74,7 +74,7 @@ and it's also much more restricted in the latter case:
|
|||||||
|
|
||||||
In the no-MMU case: not supported.
|
In the no-MMU case: not supported.
|
||||||
|
|
||||||
(*) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
(#) Memory backed regular file, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: As for ordinary regular files.
|
In the MMU case: As for ordinary regular files.
|
||||||
|
|
||||||
@@ -85,7 +85,7 @@ and it's also much more restricted in the latter case:
|
|||||||
as for the MMU case. If the filesystem does not provide any such
|
as for the MMU case. If the filesystem does not provide any such
|
||||||
support, then the mapping request will be denied.
|
support, then the mapping request will be denied.
|
||||||
|
|
||||||
(*) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
(#) Memory backed blockdev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: As for ordinary regular files.
|
In the MMU case: As for ordinary regular files.
|
||||||
|
|
||||||
@@ -94,7 +94,7 @@ and it's also much more restricted in the latter case:
|
|||||||
truncate being called. The ramdisk driver could do this if it allocated
|
truncate being called. The ramdisk driver could do this if it allocated
|
||||||
all its memory as a contiguous array upfront.
|
all its memory as a contiguous array upfront.
|
||||||
|
|
||||||
(*) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
(#) Memory backed chardev, MAP_SHARED, PROT_READ / PROT_EXEC / PROT_WRITE
|
||||||
|
|
||||||
In the MMU case: As for ordinary regular files.
|
In the MMU case: As for ordinary regular files.
|
||||||
|
|
||||||
@@ -105,21 +105,20 @@ and it's also much more restricted in the latter case:
|
|||||||
provide any such support, then the mapping request will be denied.
|
provide any such support, then the mapping request will be denied.
|
||||||
|
|
||||||
|
|
||||||
============================
|
Further notes on no-MMU MMAP
|
||||||
FURTHER NOTES ON NO-MMU MMAP
|
|
||||||
============================
|
============================
|
||||||
|
|
||||||
(*) A request for a private mapping of a file may return a buffer that is not
|
(#) A request for a private mapping of a file may return a buffer that is not
|
||||||
page-aligned. This is because XIP may take place, and the data may not be
|
page-aligned. This is because XIP may take place, and the data may not be
|
||||||
paged aligned in the backing store.
|
paged aligned in the backing store.
|
||||||
|
|
||||||
(*) A request for an anonymous mapping will always be page aligned. If
|
(#) A request for an anonymous mapping will always be page aligned. If
|
||||||
possible the size of the request should be a power of two otherwise some
|
possible the size of the request should be a power of two otherwise some
|
||||||
of the space may be wasted as the kernel must allocate a power-of-2
|
of the space may be wasted as the kernel must allocate a power-of-2
|
||||||
granule but will only discard the excess if appropriately configured as
|
granule but will only discard the excess if appropriately configured as
|
||||||
this has an effect on fragmentation.
|
this has an effect on fragmentation.
|
||||||
|
|
||||||
(*) The memory allocated by a request for an anonymous mapping will normally
|
(#) The memory allocated by a request for an anonymous mapping will normally
|
||||||
be cleared by the kernel before being returned in accordance with the
|
be cleared by the kernel before being returned in accordance with the
|
||||||
Linux man pages (ver 2.22 or later).
|
Linux man pages (ver 2.22 or later).
|
||||||
|
|
||||||
@@ -145,24 +144,23 @@ FURTHER NOTES ON NO-MMU MMAP
|
|||||||
uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this
|
uClibc uses this to speed up malloc(), and the ELF-FDPIC binfmt uses this
|
||||||
to allocate the brk and stack region.
|
to allocate the brk and stack region.
|
||||||
|
|
||||||
(*) A list of all the private copy and anonymous mappings on the system is
|
(#) A list of all the private copy and anonymous mappings on the system is
|
||||||
visible through /proc/maps in no-MMU mode.
|
visible through /proc/maps in no-MMU mode.
|
||||||
|
|
||||||
(*) A list of all the mappings in use by a process is visible through
|
(#) A list of all the mappings in use by a process is visible through
|
||||||
/proc/<pid>/maps in no-MMU mode.
|
/proc/<pid>/maps in no-MMU mode.
|
||||||
|
|
||||||
(*) Supplying MAP_FIXED or a requesting a particular mapping address will
|
(#) Supplying MAP_FIXED or a requesting a particular mapping address will
|
||||||
result in an error.
|
result in an error.
|
||||||
|
|
||||||
(*) Files mapped privately usually have to have a read method provided by the
|
(#) Files mapped privately usually have to have a read method provided by the
|
||||||
driver or filesystem so that the contents can be read into the memory
|
driver or filesystem so that the contents can be read into the memory
|
||||||
allocated if mmap() chooses not to map the backing device directly. An
|
allocated if mmap() chooses not to map the backing device directly. An
|
||||||
error will result if they don't. This is most likely to be encountered
|
error will result if they don't. This is most likely to be encountered
|
||||||
with character device files, pipes, fifos and sockets.
|
with character device files, pipes, fifos and sockets.
|
||||||
|
|
||||||
|
|
||||||
==========================
|
Interprocess shared memory
|
||||||
INTERPROCESS SHARED MEMORY
|
|
||||||
==========================
|
==========================
|
||||||
|
|
||||||
Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
|
Both SYSV IPC SHM shared memory and POSIX shared memory is supported in NOMMU
|
||||||
@@ -170,8 +168,7 @@ mode. The former through the usual mechanism, the latter through files created
|
|||||||
on ramfs or tmpfs mounts.
|
on ramfs or tmpfs mounts.
|
||||||
|
|
||||||
|
|
||||||
=======
|
Futexes
|
||||||
FUTEXES
|
|
||||||
=======
|
=======
|
||||||
|
|
||||||
Futexes are supported in NOMMU mode if the arch supports them. An error will
|
Futexes are supported in NOMMU mode if the arch supports them. An error will
|
||||||
@@ -180,12 +177,11 @@ mappings made by a process or if the mapping in which the address lies does not
|
|||||||
support futexes (such as an I/O chardev mapping).
|
support futexes (such as an I/O chardev mapping).
|
||||||
|
|
||||||
|
|
||||||
=============
|
No-MMU mremap
|
||||||
NO-MMU MREMAP
|
|
||||||
=============
|
=============
|
||||||
|
|
||||||
The mremap() function is partially supported. It may change the size of a
|
The mremap() function is partially supported. It may change the size of a
|
||||||
mapping, and may move it[*] if MREMAP_MAYMOVE is specified and if the new size
|
mapping, and may move it [#]_ if MREMAP_MAYMOVE is specified and if the new size
|
||||||
of the mapping exceeds the size of the slab object currently occupied by the
|
of the mapping exceeds the size of the slab object currently occupied by the
|
||||||
memory to which the mapping refers, or if a smaller slab object could be used.
|
memory to which the mapping refers, or if a smaller slab object could be used.
|
||||||
|
|
||||||
@@ -200,11 +196,10 @@ a previously mapped object. It may not be used to create holes in existing
|
|||||||
mappings, move parts of existing mappings or resize parts of mappings. It must
|
mappings, move parts of existing mappings or resize parts of mappings. It must
|
||||||
act on a complete mapping.
|
act on a complete mapping.
|
||||||
|
|
||||||
[*] Not currently supported.
|
.. [#] Not currently supported.
|
||||||
|
|
||||||
|
|
||||||
============================================
|
Providing shareable character device support
|
||||||
PROVIDING SHAREABLE CHARACTER DEVICE SUPPORT
|
|
||||||
============================================
|
============================================
|
||||||
|
|
||||||
To provide shareable character device support, a driver must provide a
|
To provide shareable character device support, a driver must provide a
|
||||||
@@ -235,7 +230,7 @@ direct the call to the device-specific driver. Under such circumstances, the
|
|||||||
mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a
|
mapping request will be rejected if NOMMU_MAP_COPY is not specified, and a
|
||||||
copy mapped otherwise.
|
copy mapped otherwise.
|
||||||
|
|
||||||
IMPORTANT NOTE:
|
.. important::
|
||||||
|
|
||||||
Some types of device may present a different appearance to anyone
|
Some types of device may present a different appearance to anyone
|
||||||
looking at them in certain modes. Flash chips can be like this; for
|
looking at them in certain modes. Flash chips can be like this; for
|
||||||
@@ -249,8 +244,7 @@ IMPORTANT NOTE:
|
|||||||
circumstances!
|
circumstances!
|
||||||
|
|
||||||
|
|
||||||
==============================================
|
Providing shareable memory-backed file support
|
||||||
PROVIDING SHAREABLE MEMORY-BACKED FILE SUPPORT
|
|
||||||
==============================================
|
==============================================
|
||||||
|
|
||||||
Provision of shared mappings on memory backed files is similar to the provision
|
Provision of shared mappings on memory backed files is similar to the provision
|
||||||
@@ -267,8 +261,7 @@ Memory backed devices are indicated by the mapping's backing device info having
|
|||||||
the memory_backed flag set.
|
the memory_backed flag set.
|
||||||
|
|
||||||
|
|
||||||
========================================
|
Providing shareable block device support
|
||||||
PROVIDING SHAREABLE BLOCK DEVICE SUPPORT
|
|
||||||
========================================
|
========================================
|
||||||
|
|
||||||
Provision of shared mappings on block device files is exactly the same as for
|
Provision of shared mappings on block device files is exactly the same as for
|
||||||
@@ -276,8 +269,7 @@ character devices. If there isn't a real device underneath, then the driver
|
|||||||
should allocate sufficient contiguous memory to honour any supported mapping.
|
should allocate sufficient contiguous memory to honour any supported mapping.
|
||||||
|
|
||||||
|
|
||||||
=================================
|
Adjusting page trimming behaviour
|
||||||
ADJUSTING PAGE TRIMMING BEHAVIOUR
|
|
||||||
=================================
|
=================================
|
||||||
|
|
||||||
NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages
|
NOMMU mmap automatically rounds up to the nearest power-of-2 number of pages
|
||||||
@@ -288,4 +280,4 @@ allocator. In order to retain finer-grained control over fragmentation, this
|
|||||||
behaviour can either be disabled completely, or bumped up to a higher page
|
behaviour can either be disabled completely, or bumped up to a higher page
|
||||||
watermark where trimming begins.
|
watermark where trimming begins.
|
||||||
|
|
||||||
Page trimming behaviour is configurable via the sysctl `vm.nr_trim_pages'.
|
Page trimming behaviour is configurable via the sysctl ``vm.nr_trim_pages``.
|
||||||
|
|||||||
@@ -1,16 +1,21 @@
|
|||||||
# NTB Drivers
|
===========
|
||||||
|
NTB Drivers
|
||||||
|
===========
|
||||||
|
|
||||||
NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
|
NTB (Non-Transparent Bridge) is a type of PCI-Express bridge chip that connects
|
||||||
the separate memory systems of two computers to the same PCI-Express fabric.
|
the separate memory systems of two or more computers to the same PCI-Express
|
||||||
Existing NTB hardware supports a common feature set, including scratchpad
|
fabric. Existing NTB hardware supports a common feature set: doorbell
|
||||||
registers, doorbell registers, and memory translation windows. Scratchpad
|
registers and memory translation windows, as well as non common features like
|
||||||
registers are read-and-writable registers that are accessible from either side
|
scratchpad and message registers. Scratchpad registers are read-and-writable
|
||||||
of the device, so that peers can exchange a small amount of information at a
|
registers that are accessible from either side of the device, so that peers can
|
||||||
fixed address. Doorbell registers provide a way for peers to send interrupt
|
exchange a small amount of information at a fixed address. Message registers can
|
||||||
events. Memory windows allow translated read and write access to the peer
|
be utilized for the same purpose. Additionally they are provided with with
|
||||||
memory.
|
special status bits to make sure the information isn't rewritten by another
|
||||||
|
peer. Doorbell registers provide a way for peers to send interrupt events.
|
||||||
|
Memory windows allow translated read and write access to the peer memory.
|
||||||
|
|
||||||
## NTB Core Driver (ntb)
|
NTB Core Driver (ntb)
|
||||||
|
=====================
|
||||||
|
|
||||||
The NTB core driver defines an api wrapping the common feature set, and allows
|
The NTB core driver defines an api wrapping the common feature set, and allows
|
||||||
clients interested in NTB features to discover NTB the devices supported by
|
clients interested in NTB features to discover NTB the devices supported by
|
||||||
@@ -18,7 +23,8 @@ hardware drivers. The term "client" is used here to mean an upper layer
|
|||||||
component making use of the NTB api. The term "driver," or "hardware driver,"
|
component making use of the NTB api. The term "driver," or "hardware driver,"
|
||||||
is used here to mean a driver for a specific vendor and model of NTB hardware.
|
is used here to mean a driver for a specific vendor and model of NTB hardware.
|
||||||
|
|
||||||
## NTB Client Drivers
|
NTB Client Drivers
|
||||||
|
==================
|
||||||
|
|
||||||
NTB client drivers should register with the NTB core driver. After
|
NTB client drivers should register with the NTB core driver. After
|
||||||
registering, the client probe and remove functions will be called appropriately
|
registering, the client probe and remove functions will be called appropriately
|
||||||
@@ -26,7 +32,90 @@ as ntb hardware, or hardware drivers, are inserted and removed. The
|
|||||||
registration uses the Linux Device framework, so it should feel familiar to
|
registration uses the Linux Device framework, so it should feel familiar to
|
||||||
anyone who has written a pci driver.
|
anyone who has written a pci driver.
|
||||||
|
|
||||||
### NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
|
NTB Typical client driver implementation
|
||||||
|
----------------------------------------
|
||||||
|
|
||||||
|
Primary purpose of NTB is to share some peace of memory between at least two
|
||||||
|
systems. So the NTB device features like Scratchpad/Message registers are
|
||||||
|
mainly used to perform the proper memory window initialization. Typically
|
||||||
|
there are two types of memory window interfaces supported by the NTB API:
|
||||||
|
inbound translation configured on the local ntb port and outbound translation
|
||||||
|
configured by the peer, on the peer ntb port. The first type is
|
||||||
|
depicted on the next figure
|
||||||
|
|
||||||
|
Inbound translation:
|
||||||
|
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
|
||||||
|
____________
|
||||||
|
| dma-mapped |-ntb_mw_set_trans(addr) |
|
||||||
|
| memory | _v____________ | ______________
|
||||||
|
| (addr) |<======| MW xlat addr |<====| MW base addr |<== memory-mapped IO
|
||||||
|
|------------| |--------------| | |--------------|
|
||||||
|
|
||||||
|
So typical scenario of the first type memory window initialization looks:
|
||||||
|
1) allocate a memory region, 2) put translated address to NTB config,
|
||||||
|
3) somehow notify a peer device of performed initialization, 4) peer device
|
||||||
|
maps corresponding outbound memory window so to have access to the shared
|
||||||
|
memory region.
|
||||||
|
|
||||||
|
The second type of interface, that implies the shared windows being
|
||||||
|
initialized by a peer device, is depicted on the figure:
|
||||||
|
|
||||||
|
Outbound translation:
|
||||||
|
Memory: Local NTB Port: Peer NTB Port: Peer MMIO:
|
||||||
|
____________ ______________
|
||||||
|
| dma-mapped | | | MW base addr |<== memory-mapped IO
|
||||||
|
| memory | | |--------------|
|
||||||
|
| (addr) |<===================| MW xlat addr |<-ntb_peer_mw_set_trans(addr)
|
||||||
|
|------------| | |--------------|
|
||||||
|
|
||||||
|
Typical scenario of the second type interface initialization would be:
|
||||||
|
1) allocate a memory region, 2) somehow deliver a translated address to a peer
|
||||||
|
device, 3) peer puts the translated address to NTB config, 4) peer device maps
|
||||||
|
outbound memory window so to have access to the shared memory region.
|
||||||
|
|
||||||
|
As one can see the described scenarios can be combined in one portable
|
||||||
|
algorithm.
|
||||||
|
Local device:
|
||||||
|
1) Allocate memory for a shared window
|
||||||
|
2) Initialize memory window by translated address of the allocated region
|
||||||
|
(it may fail if local memory window initialization is unsupported)
|
||||||
|
3) Send the translated address and memory window index to a peer device
|
||||||
|
Peer device:
|
||||||
|
1) Initialize memory window with retrieved address of the allocated
|
||||||
|
by another device memory region (it may fail if peer memory window
|
||||||
|
initialization is unsupported)
|
||||||
|
2) Map outbound memory window
|
||||||
|
|
||||||
|
In accordance with this scenario, the NTB Memory Window API can be used as
|
||||||
|
follows:
|
||||||
|
Local device:
|
||||||
|
1) ntb_mw_count(pidx) - retrieve number of memory ranges, which can
|
||||||
|
be allocated for memory windows between local device and peer device
|
||||||
|
of port with specified index.
|
||||||
|
2) ntb_get_align(pidx, midx) - retrieve parameters restricting the
|
||||||
|
shared memory region alignment and size. Then memory can be properly
|
||||||
|
allocated.
|
||||||
|
3) Allocate physically contiguous memory region in compliance with
|
||||||
|
restrictions retrieved in 2).
|
||||||
|
4) ntb_mw_set_trans(pidx, midx) - try to set translation address of
|
||||||
|
the memory window with specified index for the defined peer device
|
||||||
|
(it may fail if local translated address setting is not supported)
|
||||||
|
5) Send translated base address (usually together with memory window
|
||||||
|
number) to the peer device using, for instance, scratchpad or message
|
||||||
|
registers.
|
||||||
|
Peer device:
|
||||||
|
1) ntb_peer_mw_set_trans(pidx, midx) - try to set received from other
|
||||||
|
device (related to pidx) translated address for specified memory
|
||||||
|
window. It may fail if retrieved address, for instance, exceeds
|
||||||
|
maximum possible address or isn't properly aligned.
|
||||||
|
2) ntb_peer_mw_get_addr(widx) - retrieve MMIO address to map the memory
|
||||||
|
window so to have an access to the shared memory.
|
||||||
|
|
||||||
|
Also it is worth to note, that method ntb_mw_count(pidx) should return the
|
||||||
|
same value as ntb_peer_mw_count() on the peer with port index - pidx.
|
||||||
|
|
||||||
|
NTB Transport Client (ntb\_transport) and NTB Netdev (ntb\_netdev)
|
||||||
|
------------------------------------------------------------------
|
||||||
|
|
||||||
The primary client for NTB is the Transport client, used in tandem with NTB
|
The primary client for NTB is the Transport client, used in tandem with NTB
|
||||||
Netdev. These drivers function together to create a logical link to the peer,
|
Netdev. These drivers function together to create a logical link to the peer,
|
||||||
@@ -37,7 +126,8 @@ Transport queue pair. Network data is copied between socket buffers and the
|
|||||||
Transport queue pair buffer. The Transport client may be used for other things
|
Transport queue pair buffer. The Transport client may be used for other things
|
||||||
besides Netdev, however no other applications have yet been written.
|
besides Netdev, however no other applications have yet been written.
|
||||||
|
|
||||||
### NTB Ping Pong Test Client (ntb\_pingpong)
|
NTB Ping Pong Test Client (ntb\_pingpong)
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
The Ping Pong test client serves as a demonstration to exercise the doorbell
|
The Ping Pong test client serves as a demonstration to exercise the doorbell
|
||||||
and scratchpad registers of NTB hardware, and as an example simple NTB client.
|
and scratchpad registers of NTB hardware, and as an example simple NTB client.
|
||||||
@@ -64,7 +154,8 @@ Module Parameters:
|
|||||||
* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
|
* dyndbg - It is suggested to specify dyndbg=+p when loading this module, and
|
||||||
then to observe debugging output on the console.
|
then to observe debugging output on the console.
|
||||||
|
|
||||||
### NTB Tool Test Client (ntb\_tool)
|
NTB Tool Test Client (ntb\_tool)
|
||||||
|
--------------------------------
|
||||||
|
|
||||||
The Tool test client serves for debugging, primarily, ntb hardware and drivers.
|
The Tool test client serves for debugging, primarily, ntb hardware and drivers.
|
||||||
The Tool provides access through debugfs for reading, setting, and clearing the
|
The Tool provides access through debugfs for reading, setting, and clearing the
|
||||||
@@ -74,48 +165,60 @@ The Tool does not currently have any module parameters.
|
|||||||
|
|
||||||
Debugfs Files:
|
Debugfs Files:
|
||||||
|
|
||||||
* *debugfs*/ntb\_tool/*hw*/ - A directory in debugfs will be created for each
|
* *debugfs*/ntb\_tool/*hw*/
|
||||||
|
A directory in debugfs will be created for each
|
||||||
NTB device probed by the tool. This directory is shortened to *hw*
|
NTB device probed by the tool. This directory is shortened to *hw*
|
||||||
below.
|
below.
|
||||||
* *hw*/db - This file is used to read, set, and clear the local doorbell. Not
|
* *hw*/db
|
||||||
|
This file is used to read, set, and clear the local doorbell. Not
|
||||||
all operations may be supported by all hardware. To read the doorbell,
|
all operations may be supported by all hardware. To read the doorbell,
|
||||||
read the file. To set the doorbell, write `s` followed by the bits to
|
read the file. To set the doorbell, write `s` followed by the bits to
|
||||||
set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c`
|
set (eg: `echo 's 0x0101' > db`). To clear the doorbell, write `c`
|
||||||
followed by the bits to clear.
|
followed by the bits to clear.
|
||||||
* *hw*/mask - This file is used to read, set, and clear the local doorbell mask.
|
* *hw*/mask
|
||||||
|
This file is used to read, set, and clear the local doorbell mask.
|
||||||
See *db* for details.
|
See *db* for details.
|
||||||
* *hw*/peer\_db - This file is used to read, set, and clear the peer doorbell.
|
* *hw*/peer\_db
|
||||||
|
This file is used to read, set, and clear the peer doorbell.
|
||||||
See *db* for details.
|
See *db* for details.
|
||||||
* *hw*/peer\_mask - This file is used to read, set, and clear the peer doorbell
|
* *hw*/peer\_mask
|
||||||
|
This file is used to read, set, and clear the peer doorbell
|
||||||
mask. See *db* for details.
|
mask. See *db* for details.
|
||||||
* *hw*/spad - This file is used to read and write local scratchpads. To read
|
* *hw*/spad
|
||||||
|
This file is used to read and write local scratchpads. To read
|
||||||
the values of all scratchpads, read the file. To write values, write a
|
the values of all scratchpads, read the file. To write values, write a
|
||||||
series of pairs of scratchpad number and value
|
series of pairs of scratchpad number and value
|
||||||
(eg: `echo '4 0x123 7 0xabc' > spad`
|
(eg: `echo '4 0x123 7 0xabc' > spad`
|
||||||
# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
|
# to set scratchpads `4` and `7` to `0x123` and `0xabc`, respectively).
|
||||||
* *hw*/peer\_spad - This file is used to read and write peer scratchpads. See
|
* *hw*/peer\_spad
|
||||||
|
This file is used to read and write peer scratchpads. See
|
||||||
*spad* for details.
|
*spad* for details.
|
||||||
|
|
||||||
## NTB Hardware Drivers
|
NTB Hardware Drivers
|
||||||
|
====================
|
||||||
|
|
||||||
NTB hardware drivers should register devices with the NTB core driver. After
|
NTB hardware drivers should register devices with the NTB core driver. After
|
||||||
registering, clients probe and remove functions will be called.
|
registering, clients probe and remove functions will be called.
|
||||||
|
|
||||||
### NTB Intel Hardware Driver (ntb\_hw\_intel)
|
NTB Intel Hardware Driver (ntb\_hw\_intel)
|
||||||
|
------------------------------------------
|
||||||
|
|
||||||
The Intel hardware driver supports NTB on Xeon and Atom CPUs.
|
The Intel hardware driver supports NTB on Xeon and Atom CPUs.
|
||||||
|
|
||||||
Module Parameters:
|
Module Parameters:
|
||||||
|
|
||||||
* b2b\_mw\_idx - If the peer ntb is to be accessed via a memory window, then use
|
* b2b\_mw\_idx
|
||||||
|
If the peer ntb is to be accessed via a memory window, then use
|
||||||
this memory window to access the peer ntb. A value of zero or positive
|
this memory window to access the peer ntb. A value of zero or positive
|
||||||
starts from the first mw idx, and a negative value starts from the last
|
starts from the first mw idx, and a negative value starts from the last
|
||||||
mw idx. Both sides MUST set the same value here! The default value is
|
mw idx. Both sides MUST set the same value here! The default value is
|
||||||
`-1`.
|
`-1`.
|
||||||
* b2b\_mw\_share - If the peer ntb is to be accessed via a memory window, and if
|
* b2b\_mw\_share
|
||||||
|
If the peer ntb is to be accessed via a memory window, and if
|
||||||
the memory window is large enough, still allow the client to use the
|
the memory window is large enough, still allow the client to use the
|
||||||
second half of the memory window for address translation to the peer.
|
second half of the memory window for address translation to the peer.
|
||||||
* xeon\_b2b\_usd\_bar2\_addr64 - If using B2B topology on Xeon hardware, use
|
* xeon\_b2b\_usd\_bar2\_addr64
|
||||||
|
If using B2B topology on Xeon hardware, use
|
||||||
this 64 bit address on the bus between the NTB devices for the window
|
this 64 bit address on the bus between the NTB devices for the window
|
||||||
at BAR2, on the upstream side of the link.
|
at BAR2, on the upstream side of the link.
|
||||||
* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
|
* xeon\_b2b\_usd\_bar4\_addr64 - See *xeon\_b2b\_bar2\_addr64*.
|
||||||
|
|||||||
@@ -1,10 +1,12 @@
|
|||||||
|
===============================
|
||||||
Numa policy hit/miss statistics
|
Numa policy hit/miss statistics
|
||||||
|
===============================
|
||||||
|
|
||||||
/sys/devices/system/node/node*/numastat
|
/sys/devices/system/node/node*/numastat
|
||||||
|
|
||||||
All units are pages. Hugepages have separate counters.
|
All units are pages. Hugepages have separate counters.
|
||||||
|
|
||||||
|
=============== ============================================================
|
||||||
numa_hit A process wanted to allocate memory from this node,
|
numa_hit A process wanted to allocate memory from this node,
|
||||||
and succeeded.
|
and succeeded.
|
||||||
|
|
||||||
@@ -20,6 +22,7 @@ other_node A process ran on this node and got memory from another node.
|
|||||||
|
|
||||||
interleave_hit Interleaving wanted to allocate from this node
|
interleave_hit Interleaving wanted to allocate from this node
|
||||||
and succeeded.
|
and succeeded.
|
||||||
|
=============== ============================================================
|
||||||
|
|
||||||
For easier reading you can use the numastat utility from the numactl package
|
For easier reading you can use the numastat utility from the numactl package
|
||||||
(http://oss.sgi.com/projects/libnuma/). Note that it only works
|
(http://oss.sgi.com/projects/libnuma/). Note that it only works
|
||||||
|
|||||||
@@ -1,5 +1,8 @@
|
|||||||
|
=======================================
|
||||||
The padata parallel execution mechanism
|
The padata parallel execution mechanism
|
||||||
Last updated for 2.6.36
|
=======================================
|
||||||
|
|
||||||
|
:Last updated: for 2.6.36
|
||||||
|
|
||||||
Padata is a mechanism by which the kernel can farm work out to be done in
|
Padata is a mechanism by which the kernel can farm work out to be done in
|
||||||
parallel on multiple CPUs while retaining the ordering of tasks. It was
|
parallel on multiple CPUs while retaining the ordering of tasks. It was
|
||||||
@@ -9,7 +12,7 @@ those packets. The crypto developers made a point of writing padata in a
|
|||||||
sufficiently general fashion that it could be put to other uses as well.
|
sufficiently general fashion that it could be put to other uses as well.
|
||||||
|
|
||||||
The first step in using padata is to set up a padata_instance structure for
|
The first step in using padata is to set up a padata_instance structure for
|
||||||
overall control of how tasks are to be run:
|
overall control of how tasks are to be run::
|
||||||
|
|
||||||
#include <linux/padata.h>
|
#include <linux/padata.h>
|
||||||
|
|
||||||
@@ -24,7 +27,7 @@ The workqueue wq is where the work will actually be done; it should be
|
|||||||
a multithreaded queue, naturally.
|
a multithreaded queue, naturally.
|
||||||
|
|
||||||
To allocate a padata instance with the cpu_possible_mask for both
|
To allocate a padata instance with the cpu_possible_mask for both
|
||||||
cpumasks this helper function can be used:
|
cpumasks this helper function can be used::
|
||||||
|
|
||||||
struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
|
struct padata_instance *padata_alloc_possible(struct workqueue_struct *wq);
|
||||||
|
|
||||||
@@ -36,7 +39,7 @@ it is legal to supply a cpumask to padata that contains offline CPUs.
|
|||||||
Once an offline CPU in the user supplied cpumask comes online, padata
|
Once an offline CPU in the user supplied cpumask comes online, padata
|
||||||
is going to use it.
|
is going to use it.
|
||||||
|
|
||||||
There are functions for enabling and disabling the instance:
|
There are functions for enabling and disabling the instance::
|
||||||
|
|
||||||
int padata_start(struct padata_instance *pinst);
|
int padata_start(struct padata_instance *pinst);
|
||||||
void padata_stop(struct padata_instance *pinst);
|
void padata_stop(struct padata_instance *pinst);
|
||||||
@@ -48,7 +51,7 @@ padata cpumask contains no active CPU (flag not set).
|
|||||||
padata_stop clears the flag and blocks until the padata instance
|
padata_stop clears the flag and blocks until the padata instance
|
||||||
is unused.
|
is unused.
|
||||||
|
|
||||||
The list of CPUs to be used can be adjusted with these functions:
|
The list of CPUs to be used can be adjusted with these functions::
|
||||||
|
|
||||||
int padata_set_cpumasks(struct padata_instance *pinst,
|
int padata_set_cpumasks(struct padata_instance *pinst,
|
||||||
cpumask_var_t pcpumask,
|
cpumask_var_t pcpumask,
|
||||||
@@ -71,12 +74,12 @@ padata_add_cpu/padata_remove_cpu are used. cpu specifies the CPU to add or
|
|||||||
remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
|
remove and mask is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL.
|
||||||
|
|
||||||
If a user is interested in padata cpumask changes, he can register to
|
If a user is interested in padata cpumask changes, he can register to
|
||||||
the padata cpumask change notifier:
|
the padata cpumask change notifier::
|
||||||
|
|
||||||
int padata_register_cpumask_notifier(struct padata_instance *pinst,
|
int padata_register_cpumask_notifier(struct padata_instance *pinst,
|
||||||
struct notifier_block *nblock);
|
struct notifier_block *nblock);
|
||||||
|
|
||||||
To unregister from that notifier:
|
To unregister from that notifier::
|
||||||
|
|
||||||
int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
|
int padata_unregister_cpumask_notifier(struct padata_instance *pinst,
|
||||||
struct notifier_block *nblock);
|
struct notifier_block *nblock);
|
||||||
@@ -84,7 +87,7 @@ To unregister from that notifier:
|
|||||||
The padata cpumask change notifier notifies about changes of the usable
|
The padata cpumask change notifier notifies about changes of the usable
|
||||||
cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
|
cpumasks, i.e. the subset of active CPUs in the user supplied cpumask.
|
||||||
|
|
||||||
Padata calls the notifier chain with:
|
Padata calls the notifier chain with::
|
||||||
|
|
||||||
blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
|
blocking_notifier_call_chain(&pinst->cpumask_change_notifier,
|
||||||
notification_mask,
|
notification_mask,
|
||||||
@@ -95,7 +98,7 @@ is one of PADATA_CPU_SERIAL, PADATA_CPU_PARALLEL and cpumask is a pointer
|
|||||||
to a struct padata_cpumask that contains the new cpumask information.
|
to a struct padata_cpumask that contains the new cpumask information.
|
||||||
|
|
||||||
Actually submitting work to the padata instance requires the creation of a
|
Actually submitting work to the padata instance requires the creation of a
|
||||||
padata_priv structure:
|
padata_priv structure::
|
||||||
|
|
||||||
struct padata_priv {
|
struct padata_priv {
|
||||||
/* Other stuff here... */
|
/* Other stuff here... */
|
||||||
@@ -110,7 +113,7 @@ parallel() and serial() functions should be provided. Those functions will
|
|||||||
be called in the process of getting the work done as we will see
|
be called in the process of getting the work done as we will see
|
||||||
momentarily.
|
momentarily.
|
||||||
|
|
||||||
The submission of work is done with:
|
The submission of work is done with::
|
||||||
|
|
||||||
int padata_do_parallel(struct padata_instance *pinst,
|
int padata_do_parallel(struct padata_instance *pinst,
|
||||||
struct padata_priv *padata, int cb_cpu);
|
struct padata_priv *padata, int cb_cpu);
|
||||||
@@ -138,7 +141,7 @@ need not be completed during this call, but, if parallel() leaves work
|
|||||||
outstanding, it should be prepared to be called again with a new job before
|
outstanding, it should be prepared to be called again with a new job before
|
||||||
the previous one completes. When a task does complete, parallel() (or
|
the previous one completes. When a task does complete, parallel() (or
|
||||||
whatever function actually finishes the job) should inform padata of the
|
whatever function actually finishes the job) should inform padata of the
|
||||||
fact with a call to:
|
fact with a call to::
|
||||||
|
|
||||||
void padata_do_serial(struct padata_priv *padata);
|
void padata_do_serial(struct padata_priv *padata);
|
||||||
|
|
||||||
@@ -151,7 +154,7 @@ pains to ensure that tasks are completed in the order in which they were
|
|||||||
submitted.
|
submitted.
|
||||||
|
|
||||||
The one remaining function in the padata API should be called to clean up
|
The one remaining function in the padata API should be called to clean up
|
||||||
when a padata instance is no longer needed:
|
when a padata instance is no longer needed::
|
||||||
|
|
||||||
void padata_free(struct padata_instance *pinst);
|
void padata_free(struct padata_instance *pinst);
|
||||||
|
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -1,5 +1,6 @@
|
|||||||
|
====================
|
||||||
Percpu rw semaphores
|
Percpu rw semaphores
|
||||||
--------------------
|
====================
|
||||||
|
|
||||||
Percpu rw semaphores is a new read-write semaphore design that is
|
Percpu rw semaphores is a new read-write semaphore design that is
|
||||||
optimized for locking for reading.
|
optimized for locking for reading.
|
||||||
|
|||||||
@@ -1,10 +1,14 @@
|
|||||||
PHY SUBSYSTEM
|
=============
|
||||||
Kishon Vijay Abraham I <kishon@ti.com>
|
PHY subsystem
|
||||||
|
=============
|
||||||
|
|
||||||
|
:Author: Kishon Vijay Abraham I <kishon@ti.com>
|
||||||
|
|
||||||
This document explains the Generic PHY Framework along with the APIs provided,
|
This document explains the Generic PHY Framework along with the APIs provided,
|
||||||
and how-to-use.
|
and how-to-use.
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
*PHY* is the abbreviation for physical layer. It is used to connect a device
|
*PHY* is the abbreviation for physical layer. It is used to connect a device
|
||||||
to the physical medium e.g., the USB controller has a PHY to provide functions
|
to the physical medium e.g., the USB controller has a PHY to provide functions
|
||||||
@@ -21,7 +25,8 @@ better code maintainability.
|
|||||||
This framework will be of use only to devices that use external PHY (PHY
|
This framework will be of use only to devices that use external PHY (PHY
|
||||||
functionality is not embedded within the controller).
|
functionality is not embedded within the controller).
|
||||||
|
|
||||||
2. Registering/Unregistering the PHY provider
|
Registering/Unregistering the PHY provider
|
||||||
|
==========================================
|
||||||
|
|
||||||
PHY provider refers to an entity that implements one or more PHY instances.
|
PHY provider refers to an entity that implements one or more PHY instances.
|
||||||
For the simple case where the PHY provider implements only a single instance of
|
For the simple case where the PHY provider implements only a single instance of
|
||||||
@@ -30,11 +35,14 @@ of_phy_simple_xlate. If the PHY provider implements multiple instances, it
|
|||||||
should provide its own implementation of of_xlate. of_xlate is used only for
|
should provide its own implementation of of_xlate. of_xlate is used only for
|
||||||
dt boot case.
|
dt boot case.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#define of_phy_provider_register(dev, xlate) \
|
#define of_phy_provider_register(dev, xlate) \
|
||||||
__of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
|
__of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
|
||||||
|
|
||||||
#define devm_of_phy_provider_register(dev, xlate) \
|
#define devm_of_phy_provider_register(dev, xlate) \
|
||||||
__devm_of_phy_provider_register((dev), NULL, THIS_MODULE, (xlate))
|
__devm_of_phy_provider_register((dev), NULL, THIS_MODULE,
|
||||||
|
(xlate))
|
||||||
|
|
||||||
of_phy_provider_register and devm_of_phy_provider_register macros can be used to
|
of_phy_provider_register and devm_of_phy_provider_register macros can be used to
|
||||||
register the phy_provider and it takes device and of_xlate as
|
register the phy_provider and it takes device and of_xlate as
|
||||||
@@ -47,11 +55,14 @@ nodes within extra levels for context and extensibility, in which case the low
|
|||||||
level of_phy_provider_register_full() and devm_of_phy_provider_register_full()
|
level of_phy_provider_register_full() and devm_of_phy_provider_register_full()
|
||||||
macros can be used to override the node containing the children.
|
macros can be used to override the node containing the children.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#define of_phy_provider_register_full(dev, children, xlate) \
|
#define of_phy_provider_register_full(dev, children, xlate) \
|
||||||
__of_phy_provider_register(dev, children, THIS_MODULE, xlate)
|
__of_phy_provider_register(dev, children, THIS_MODULE, xlate)
|
||||||
|
|
||||||
#define devm_of_phy_provider_register_full(dev, children, xlate) \
|
#define devm_of_phy_provider_register_full(dev, children, xlate) \
|
||||||
__devm_of_phy_provider_register_full(dev, children, THIS_MODULE, xlate)
|
__devm_of_phy_provider_register_full(dev, children,
|
||||||
|
THIS_MODULE, xlate)
|
||||||
|
|
||||||
void devm_of_phy_provider_unregister(struct device *dev,
|
void devm_of_phy_provider_unregister(struct device *dev,
|
||||||
struct phy_provider *phy_provider);
|
struct phy_provider *phy_provider);
|
||||||
@@ -60,14 +71,18 @@ void of_phy_provider_unregister(struct phy_provider *phy_provider);
|
|||||||
devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to
|
devm_of_phy_provider_unregister and of_phy_provider_unregister can be used to
|
||||||
unregister the PHY.
|
unregister the PHY.
|
||||||
|
|
||||||
3. Creating the PHY
|
Creating the PHY
|
||||||
|
================
|
||||||
|
|
||||||
The PHY driver should create the PHY in order for other peripheral controllers
|
The PHY driver should create the PHY in order for other peripheral controllers
|
||||||
to make use of it. The PHY framework provides 2 APIs to create the PHY.
|
to make use of it. The PHY framework provides 2 APIs to create the PHY.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct phy *phy_create(struct device *dev, struct device_node *node,
|
struct phy *phy_create(struct device *dev, struct device_node *node,
|
||||||
const struct phy_ops *ops);
|
const struct phy_ops *ops);
|
||||||
struct phy *devm_phy_create(struct device *dev, struct device_node *node,
|
struct phy *devm_phy_create(struct device *dev,
|
||||||
|
struct device_node *node,
|
||||||
const struct phy_ops *ops);
|
const struct phy_ops *ops);
|
||||||
|
|
||||||
The PHY drivers can use one of the above 2 APIs to create the PHY by passing
|
The PHY drivers can use one of the above 2 APIs to create the PHY by passing
|
||||||
@@ -84,11 +99,15 @@ phy_ops to get back the private data.
|
|||||||
Before the controller can make use of the PHY, it has to get a reference to
|
Before the controller can make use of the PHY, it has to get a reference to
|
||||||
it. This framework provides the following APIs to get a reference to the PHY.
|
it. This framework provides the following APIs to get a reference to the PHY.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct phy *phy_get(struct device *dev, const char *string);
|
struct phy *phy_get(struct device *dev, const char *string);
|
||||||
struct phy *phy_optional_get(struct device *dev, const char *string);
|
struct phy *phy_optional_get(struct device *dev, const char *string);
|
||||||
struct phy *devm_phy_get(struct device *dev, const char *string);
|
struct phy *devm_phy_get(struct device *dev, const char *string);
|
||||||
struct phy *devm_phy_optional_get(struct device *dev, const char *string);
|
struct phy *devm_phy_optional_get(struct device *dev,
|
||||||
struct phy *devm_of_phy_get_by_index(struct device *dev, struct device_node *np,
|
const char *string);
|
||||||
|
struct phy *devm_of_phy_get_by_index(struct device *dev,
|
||||||
|
struct device_node *np,
|
||||||
int index);
|
int index);
|
||||||
|
|
||||||
phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can
|
phy_get, phy_optional_get, devm_phy_get and devm_phy_optional_get can
|
||||||
@@ -111,22 +130,26 @@ the phy_init() and phy_exit() calls, and phy_power_on() and
|
|||||||
phy_power_off() calls are all NOP when applied to a NULL phy. The NULL
|
phy_power_off() calls are all NOP when applied to a NULL phy. The NULL
|
||||||
phy is useful in devices for handling optional phy devices.
|
phy is useful in devices for handling optional phy devices.
|
||||||
|
|
||||||
5. Releasing a reference to the PHY
|
Releasing a reference to the PHY
|
||||||
|
================================
|
||||||
|
|
||||||
When the controller no longer needs the PHY, it has to release the reference
|
When the controller no longer needs the PHY, it has to release the reference
|
||||||
to the PHY it has obtained using the APIs mentioned in the above section. The
|
to the PHY it has obtained using the APIs mentioned in the above section. The
|
||||||
PHY framework provides 2 APIs to release a reference to the PHY.
|
PHY framework provides 2 APIs to release a reference to the PHY.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void phy_put(struct phy *phy);
|
void phy_put(struct phy *phy);
|
||||||
void devm_phy_put(struct device *dev, struct phy *phy);
|
void devm_phy_put(struct device *dev, struct phy *phy);
|
||||||
|
|
||||||
Both these APIs are used to release a reference to the PHY and devm_phy_put
|
Both these APIs are used to release a reference to the PHY and devm_phy_put
|
||||||
destroys the devres associated with this PHY.
|
destroys the devres associated with this PHY.
|
||||||
|
|
||||||
6. Destroying the PHY
|
Destroying the PHY
|
||||||
|
==================
|
||||||
|
|
||||||
When the driver that created the PHY is unloaded, it should destroy the PHY it
|
When the driver that created the PHY is unloaded, it should destroy the PHY it
|
||||||
created using one of the following 2 APIs.
|
created using one of the following 2 APIs::
|
||||||
|
|
||||||
void phy_destroy(struct phy *phy);
|
void phy_destroy(struct phy *phy);
|
||||||
void devm_phy_destroy(struct device *dev, struct phy *phy);
|
void devm_phy_destroy(struct device *dev, struct phy *phy);
|
||||||
@@ -134,7 +157,8 @@ void devm_phy_destroy(struct device *dev, struct phy *phy);
|
|||||||
Both these APIs destroy the PHY and devm_phy_destroy destroys the devres
|
Both these APIs destroy the PHY and devm_phy_destroy destroys the devres
|
||||||
associated with this PHY.
|
associated with this PHY.
|
||||||
|
|
||||||
7. PM Runtime
|
PM Runtime
|
||||||
|
==========
|
||||||
|
|
||||||
This subsystem is pm runtime enabled. So while creating the PHY,
|
This subsystem is pm runtime enabled. So while creating the PHY,
|
||||||
pm_runtime_enable of the phy device created by this subsystem is called and
|
pm_runtime_enable of the phy device created by this subsystem is called and
|
||||||
@@ -150,7 +174,8 @@ There are exported APIs like phy_pm_runtime_get, phy_pm_runtime_get_sync,
|
|||||||
phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and
|
phy_pm_runtime_put, phy_pm_runtime_put_sync, phy_pm_runtime_allow and
|
||||||
phy_pm_runtime_forbid for performing PM operations.
|
phy_pm_runtime_forbid for performing PM operations.
|
||||||
|
|
||||||
8. PHY Mappings
|
PHY Mappings
|
||||||
|
============
|
||||||
|
|
||||||
In order to get reference to a PHY without help from DeviceTree, the framework
|
In order to get reference to a PHY without help from DeviceTree, the framework
|
||||||
offers lookups which can be compared to clkdev that allow clk structures to be
|
offers lookups which can be compared to clkdev that allow clk structures to be
|
||||||
@@ -158,12 +183,15 @@ bound to devices. A lookup can be made be made during runtime when a handle to
|
|||||||
the struct phy already exists.
|
the struct phy already exists.
|
||||||
|
|
||||||
The framework offers the following API for registering and unregistering the
|
The framework offers the following API for registering and unregistering the
|
||||||
lookups.
|
lookups::
|
||||||
|
|
||||||
int phy_create_lookup(struct phy *phy, const char *con_id, const char *dev_id);
|
int phy_create_lookup(struct phy *phy, const char *con_id,
|
||||||
void phy_remove_lookup(struct phy *phy, const char *con_id, const char *dev_id);
|
const char *dev_id);
|
||||||
|
void phy_remove_lookup(struct phy *phy, const char *con_id,
|
||||||
|
const char *dev_id);
|
||||||
|
|
||||||
9. DeviceTree Binding
|
DeviceTree Binding
|
||||||
|
==================
|
||||||
|
|
||||||
The documentation for PHY dt binding can be found @
|
The documentation for PHY dt binding can be found @
|
||||||
Documentation/devicetree/bindings/phy/phy-bindings.txt
|
Documentation/devicetree/bindings/phy/phy-bindings.txt
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
|
======================
|
||||||
Lightweight PI-futexes
|
Lightweight PI-futexes
|
||||||
----------------------
|
======================
|
||||||
|
|
||||||
We are calling them lightweight for 3 reasons:
|
We are calling them lightweight for 3 reasons:
|
||||||
|
|
||||||
@@ -25,8 +26,8 @@ determinism and well-bound latencies. Even in the worst-case, PI will
|
|||||||
improve the statistical distribution of locking related application
|
improve the statistical distribution of locking related application
|
||||||
delays.
|
delays.
|
||||||
|
|
||||||
The longer reply:
|
The longer reply
|
||||||
-----------------
|
----------------
|
||||||
|
|
||||||
Firstly, sharing locks between multiple tasks is a common programming
|
Firstly, sharing locks between multiple tasks is a common programming
|
||||||
technique that often cannot be replaced with lockless algorithms. As we
|
technique that often cannot be replaced with lockless algorithms. As we
|
||||||
@@ -71,8 +72,8 @@ deterministic execution of the high-prio task: any medium-priority task
|
|||||||
could preempt the low-prio task while it holds the shared lock and
|
could preempt the low-prio task while it holds the shared lock and
|
||||||
executes the critical section, and could delay it indefinitely.
|
executes the critical section, and could delay it indefinitely.
|
||||||
|
|
||||||
Implementation:
|
Implementation
|
||||||
---------------
|
--------------
|
||||||
|
|
||||||
As mentioned before, the userspace fastpath of PI-enabled pthread
|
As mentioned before, the userspace fastpath of PI-enabled pthread
|
||||||
mutexes involves no kernel work at all - they behave quite similarly to
|
mutexes involves no kernel work at all - they behave quite similarly to
|
||||||
@@ -83,8 +84,8 @@ entering the kernel.
|
|||||||
|
|
||||||
To handle the slowpath, we have added two new futex ops:
|
To handle the slowpath, we have added two new futex ops:
|
||||||
|
|
||||||
FUTEX_LOCK_PI
|
- FUTEX_LOCK_PI
|
||||||
FUTEX_UNLOCK_PI
|
- FUTEX_UNLOCK_PI
|
||||||
|
|
||||||
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
|
If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
|
||||||
TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
|
TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
|
||||||
|
|||||||
@@ -1,45 +1,57 @@
|
|||||||
|
=================================
|
||||||
Linux Plug and Play Documentation
|
Linux Plug and Play Documentation
|
||||||
by Adam Belay <ambx1@neo.rr.com>
|
=================================
|
||||||
last updated: Oct. 16, 2002
|
|
||||||
---------------------------------------------------------------------------------------
|
|
||||||
|
|
||||||
|
:Author: Adam Belay <ambx1@neo.rr.com>
|
||||||
|
:Last updated: Oct. 16, 2002
|
||||||
|
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
--------
|
--------
|
||||||
|
|
||||||
Plug and Play provides a means of detecting and setting resources for legacy or
|
Plug and Play provides a means of detecting and setting resources for legacy or
|
||||||
otherwise unconfigurable devices. The Linux Plug and Play Layer provides these
|
otherwise unconfigurable devices. The Linux Plug and Play Layer provides these
|
||||||
services to compatible drivers.
|
services to compatible drivers.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
The User Interface
|
The User Interface
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
The Linux Plug and Play user interface provides a means to activate PnP devices
|
The Linux Plug and Play user interface provides a means to activate PnP devices
|
||||||
for legacy and user level drivers that do not support Linux Plug and Play. The
|
for legacy and user level drivers that do not support Linux Plug and Play. The
|
||||||
user interface is integrated into sysfs.
|
user interface is integrated into sysfs.
|
||||||
|
|
||||||
In addition to the standard sysfs file the following are created in each
|
In addition to the standard sysfs file the following are created in each
|
||||||
device's directory:
|
device's directory:
|
||||||
id - displays a list of support EISA IDs
|
- id - displays a list of support EISA IDs
|
||||||
options - displays possible resource configurations
|
- options - displays possible resource configurations
|
||||||
resources - displays currently allocated resources and allows resource changes
|
- resources - displays currently allocated resources and allows resource changes
|
||||||
|
|
||||||
-activating a device
|
activating a device
|
||||||
|
^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
# echo "auto" > resources
|
# echo "auto" > resources
|
||||||
|
|
||||||
this will invoke the automatic resource config system to activate the device
|
this will invoke the automatic resource config system to activate the device
|
||||||
|
|
||||||
-manually activating a device
|
manually activating a device
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
# echo "manual <depnum> <mode>" > resources
|
# echo "manual <depnum> <mode>" > resources
|
||||||
|
|
||||||
<depnum> - the configuration number
|
<depnum> - the configuration number
|
||||||
<mode> - static or dynamic
|
<mode> - static or dynamic
|
||||||
static = for next boot
|
static = for next boot
|
||||||
dynamic = now
|
dynamic = now
|
||||||
|
|
||||||
-disabling a device
|
disabling a device
|
||||||
|
^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
# echo "disable" > resources
|
# echo "disable" > resources
|
||||||
|
|
||||||
@@ -47,19 +59,23 @@ this will invoke the automatic resource config system to activate the device
|
|||||||
EXAMPLE:
|
EXAMPLE:
|
||||||
|
|
||||||
Suppose you need to activate the floppy disk controller.
|
Suppose you need to activate the floppy disk controller.
|
||||||
1.) change to the proper directory, in my case it is
|
|
||||||
/driver/bus/pnp/devices/00:0f
|
1. change to the proper directory, in my case it is
|
||||||
|
/driver/bus/pnp/devices/00:0f::
|
||||||
|
|
||||||
# cd /driver/bus/pnp/devices/00:0f
|
# cd /driver/bus/pnp/devices/00:0f
|
||||||
# cat name
|
# cat name
|
||||||
PC standard floppy disk controller
|
PC standard floppy disk controller
|
||||||
|
|
||||||
2.) check if the device is already active
|
2. check if the device is already active::
|
||||||
|
|
||||||
# cat resources
|
# cat resources
|
||||||
DISABLED
|
DISABLED
|
||||||
|
|
||||||
- Notice the string "DISABLED". This means the device is not active.
|
- Notice the string "DISABLED". This means the device is not active.
|
||||||
|
|
||||||
3.) check the device's possible configurations (optional)
|
3. check the device's possible configurations (optional)::
|
||||||
|
|
||||||
# cat options
|
# cat options
|
||||||
Dependent: 01 - Priority acceptable
|
Dependent: 01 - Priority acceptable
|
||||||
port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
|
port 0x3f0-0x3f0, align 0x7, size 0x6, 16-bit address decoding
|
||||||
@@ -72,17 +88,20 @@ Dependent: 02 - Priority acceptable
|
|||||||
irq 6
|
irq 6
|
||||||
dma 2 8-bit compatible
|
dma 2 8-bit compatible
|
||||||
|
|
||||||
4.) now activate the device
|
4. now activate the device::
|
||||||
|
|
||||||
# echo "auto" > resources
|
# echo "auto" > resources
|
||||||
|
|
||||||
5.) finally check if the device is active
|
5. finally check if the device is active::
|
||||||
|
|
||||||
# cat resources
|
# cat resources
|
||||||
io 0x3f0-0x3f5
|
io 0x3f0-0x3f5
|
||||||
io 0x3f7-0x3f7
|
io 0x3f7-0x3f7
|
||||||
irq 6
|
irq 6
|
||||||
dma 2
|
dma 2
|
||||||
|
|
||||||
also there are a series of kernel parameters:
|
also there are a series of kernel parameters::
|
||||||
|
|
||||||
pnp_reserve_irq=irq1[,irq2] ....
|
pnp_reserve_irq=irq1[,irq2] ....
|
||||||
pnp_reserve_dma=dma1[,dma2] ....
|
pnp_reserve_dma=dma1[,dma2] ....
|
||||||
pnp_reserve_io=io1,size1[,io2,size2] ....
|
pnp_reserve_io=io1,size1[,io2,size2] ....
|
||||||
@@ -92,6 +111,7 @@ pnp_reserve_mem=mem1,size1[,mem2,size2] ....
|
|||||||
|
|
||||||
The Unified Plug and Play Layer
|
The Unified Plug and Play Layer
|
||||||
-------------------------------
|
-------------------------------
|
||||||
|
|
||||||
All Plug and Play drivers, protocols, and services meet at a central location
|
All Plug and Play drivers, protocols, and services meet at a central location
|
||||||
called the Plug and Play Layer. This layer is responsible for the exchange of
|
called the Plug and Play Layer. This layer is responsible for the exchange of
|
||||||
information between PnP drivers and PnP protocols. Thus it automatically
|
information between PnP drivers and PnP protocols. Thus it automatically
|
||||||
@@ -101,64 +121,73 @@ significantly easier.
|
|||||||
The following functions are available from the Plug and Play Layer:
|
The following functions are available from the Plug and Play Layer:
|
||||||
|
|
||||||
pnp_get_protocol
|
pnp_get_protocol
|
||||||
- increments the number of uses by one
|
increments the number of uses by one
|
||||||
|
|
||||||
pnp_put_protocol
|
pnp_put_protocol
|
||||||
- deincrements the number of uses by one
|
deincrements the number of uses by one
|
||||||
|
|
||||||
pnp_register_protocol
|
pnp_register_protocol
|
||||||
- use this to register a new PnP protocol
|
use this to register a new PnP protocol
|
||||||
|
|
||||||
pnp_unregister_protocol
|
pnp_unregister_protocol
|
||||||
- use this function to remove a PnP protocol from the Plug and Play Layer
|
use this function to remove a PnP protocol from the Plug and Play Layer
|
||||||
|
|
||||||
pnp_register_driver
|
pnp_register_driver
|
||||||
- adds a PnP driver to the Plug and Play Layer
|
adds a PnP driver to the Plug and Play Layer
|
||||||
- this includes driver model integration
|
|
||||||
- returns zero for success or a negative error number for failure; count
|
this includes driver model integration
|
||||||
|
returns zero for success or a negative error number for failure; count
|
||||||
calls to the .add() method if you need to know how many devices bind to
|
calls to the .add() method if you need to know how many devices bind to
|
||||||
the driver
|
the driver
|
||||||
|
|
||||||
pnp_unregister_driver
|
pnp_unregister_driver
|
||||||
- removes a PnP driver from the Plug and Play Layer
|
removes a PnP driver from the Plug and Play Layer
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Plug and Play Protocols
|
Plug and Play Protocols
|
||||||
-----------------------
|
-----------------------
|
||||||
|
|
||||||
This section contains information for PnP protocol developers.
|
This section contains information for PnP protocol developers.
|
||||||
|
|
||||||
The following Protocols are currently available in the computing world:
|
The following Protocols are currently available in the computing world:
|
||||||
- PNPBIOS: used for system devices such as serial and parallel ports.
|
|
||||||
- ISAPNP: provides PnP support for the ISA bus
|
- PNPBIOS:
|
||||||
- ACPI: among its many uses, ACPI provides information about system level
|
used for system devices such as serial and parallel ports.
|
||||||
|
- ISAPNP:
|
||||||
|
provides PnP support for the ISA bus
|
||||||
|
- ACPI:
|
||||||
|
among its many uses, ACPI provides information about system level
|
||||||
devices.
|
devices.
|
||||||
|
|
||||||
It is meant to replace the PNPBIOS. It is not currently supported by Linux
|
It is meant to replace the PNPBIOS. It is not currently supported by Linux
|
||||||
Plug and Play but it is planned to be in the near future.
|
Plug and Play but it is planned to be in the near future.
|
||||||
|
|
||||||
|
|
||||||
Requirements for a Linux PnP protocol:
|
Requirements for a Linux PnP protocol:
|
||||||
1.) the protocol must use EISA IDs
|
1. the protocol must use EISA IDs
|
||||||
2.) the protocol must inform the PnP Layer of a device's current configuration
|
2. the protocol must inform the PnP Layer of a device's current configuration
|
||||||
|
|
||||||
- the ability to set resources is optional but preferred.
|
- the ability to set resources is optional but preferred.
|
||||||
|
|
||||||
The following are PnP protocol related functions:
|
The following are PnP protocol related functions:
|
||||||
|
|
||||||
pnp_add_device
|
pnp_add_device
|
||||||
- use this function to add a PnP device to the PnP layer
|
use this function to add a PnP device to the PnP layer
|
||||||
- only call this function when all wanted values are set in the pnp_dev
|
|
||||||
|
only call this function when all wanted values are set in the pnp_dev
|
||||||
structure
|
structure
|
||||||
|
|
||||||
pnp_init_device
|
pnp_init_device
|
||||||
- call this to initialize the PnP structure
|
call this to initialize the PnP structure
|
||||||
|
|
||||||
pnp_remove_device
|
pnp_remove_device
|
||||||
- call this to remove a device from the Plug and Play Layer.
|
call this to remove a device from the Plug and Play Layer.
|
||||||
- it will fail if the device is still in use.
|
it will fail if the device is still in use.
|
||||||
- automatically will free mem used by the device and related structures
|
automatically will free mem used by the device and related structures
|
||||||
|
|
||||||
pnp_add_id
|
pnp_add_id
|
||||||
- adds an EISA ID to the list of supported IDs for the specified device
|
adds an EISA ID to the list of supported IDs for the specified device
|
||||||
|
|
||||||
For more information consult the source of a protocol such as
|
For more information consult the source of a protocol such as
|
||||||
/drivers/pnp/pnpbios/core.c.
|
/drivers/pnp/pnpbios/core.c.
|
||||||
@@ -167,12 +196,16 @@ For more information consult the source of a protocol such as
|
|||||||
|
|
||||||
Linux Plug and Play Drivers
|
Linux Plug and Play Drivers
|
||||||
---------------------------
|
---------------------------
|
||||||
|
|
||||||
This section contains information for Linux PnP driver developers.
|
This section contains information for Linux PnP driver developers.
|
||||||
|
|
||||||
The New Way
|
The New Way
|
||||||
...........
|
^^^^^^^^^^^
|
||||||
1.) first make a list of supported EISA IDS
|
|
||||||
ex:
|
1. first make a list of supported EISA IDS
|
||||||
|
|
||||||
|
ex::
|
||||||
|
|
||||||
static const struct pnp_id pnp_dev_table[] = {
|
static const struct pnp_id pnp_dev_table[] = {
|
||||||
/* Standard LPT Printer Port */
|
/* Standard LPT Printer Port */
|
||||||
{.id = "PNP0400", .driver_data = 0},
|
{.id = "PNP0400", .driver_data = 0},
|
||||||
@@ -183,36 +216,43 @@ static const struct pnp_id pnp_dev_table[] = {
|
|||||||
|
|
||||||
Please note that the character 'X' can be used as a wild card in the function
|
Please note that the character 'X' can be used as a wild card in the function
|
||||||
portion (last four characters).
|
portion (last four characters).
|
||||||
ex:
|
|
||||||
|
ex::
|
||||||
|
|
||||||
/* Unknown PnP modems */
|
/* Unknown PnP modems */
|
||||||
{ "PNPCXXX", UNKNOWN_DEV },
|
{ "PNPCXXX", UNKNOWN_DEV },
|
||||||
|
|
||||||
Supported PnP card IDs can optionally be defined.
|
Supported PnP card IDs can optionally be defined.
|
||||||
ex:
|
ex::
|
||||||
|
|
||||||
static const struct pnp_id pnp_card_table[] = {
|
static const struct pnp_id pnp_card_table[] = {
|
||||||
{ "ANYDEVS", 0 },
|
{ "ANYDEVS", 0 },
|
||||||
{ "", 0 }
|
{ "", 0 }
|
||||||
};
|
};
|
||||||
|
|
||||||
2.) Optionally define probe and remove functions. It may make sense not to
|
2. Optionally define probe and remove functions. It may make sense not to
|
||||||
define these functions if the driver already has a reliable method of detecting
|
define these functions if the driver already has a reliable method of detecting
|
||||||
the resources, such as the parport_pc driver.
|
the resources, such as the parport_pc driver.
|
||||||
ex:
|
|
||||||
|
ex::
|
||||||
|
|
||||||
static int
|
static int
|
||||||
serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
|
serial_pnp_probe(struct pnp_dev * dev, const struct pnp_id *card_id, const
|
||||||
struct pnp_id *dev_id)
|
struct pnp_id *dev_id)
|
||||||
{
|
{
|
||||||
. . .
|
. . .
|
||||||
|
|
||||||
ex:
|
ex::
|
||||||
|
|
||||||
static void serial_pnp_remove(struct pnp_dev * dev)
|
static void serial_pnp_remove(struct pnp_dev * dev)
|
||||||
{
|
{
|
||||||
. . .
|
. . .
|
||||||
|
|
||||||
consult /drivers/serial/8250_pnp.c for more information.
|
consult /drivers/serial/8250_pnp.c for more information.
|
||||||
|
|
||||||
3.) create a driver structure
|
3. create a driver structure
|
||||||
ex:
|
|
||||||
|
ex::
|
||||||
|
|
||||||
static struct pnp_driver serial_pnp_driver = {
|
static struct pnp_driver serial_pnp_driver = {
|
||||||
.name = "serial",
|
.name = "serial",
|
||||||
@@ -224,8 +264,9 @@ static struct pnp_driver serial_pnp_driver = {
|
|||||||
|
|
||||||
* name and id_table cannot be NULL.
|
* name and id_table cannot be NULL.
|
||||||
|
|
||||||
4.) register the driver
|
4. register the driver
|
||||||
ex:
|
|
||||||
|
ex::
|
||||||
|
|
||||||
static int __init serial8250_pnp_init(void)
|
static int __init serial8250_pnp_init(void)
|
||||||
{
|
{
|
||||||
@@ -233,12 +274,12 @@ static int __init serial8250_pnp_init(void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
The Old Way
|
The Old Way
|
||||||
...........
|
^^^^^^^^^^^
|
||||||
|
|
||||||
A series of compatibility functions have been created to make it easy to convert
|
A series of compatibility functions have been created to make it easy to convert
|
||||||
ISAPNP drivers. They should serve as a temporary solution only.
|
ISAPNP drivers. They should serve as a temporary solution only.
|
||||||
|
|
||||||
They are as follows:
|
They are as follows::
|
||||||
|
|
||||||
struct pnp_card *pnp_find_card(unsigned short vendor,
|
struct pnp_card *pnp_find_card(unsigned short vendor,
|
||||||
unsigned short device,
|
unsigned short device,
|
||||||
|
|||||||
@@ -1,10 +1,13 @@
|
|||||||
Proper Locking Under a Preemptible Kernel:
|
===========================================================================
|
||||||
Keeping Kernel Code Preempt-Safe
|
Proper Locking Under a Preemptible Kernel: Keeping Kernel Code Preempt-Safe
|
||||||
Robert Love <rml@tech9.net>
|
===========================================================================
|
||||||
Last Updated: 28 Aug 2002
|
|
||||||
|
:Author: Robert Love <rml@tech9.net>
|
||||||
|
:Last Updated: 28 Aug 2002
|
||||||
|
|
||||||
|
|
||||||
INTRODUCTION
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
|
|
||||||
A preemptible kernel creates new locking issues. The issues are the same as
|
A preemptible kernel creates new locking issues. The issues are the same as
|
||||||
@@ -17,9 +20,10 @@ requires protecting these situations.
|
|||||||
|
|
||||||
|
|
||||||
RULE #1: Per-CPU data structures need explicit protection
|
RULE #1: Per-CPU data structures need explicit protection
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
Two similar problems arise. An example code snippet:
|
Two similar problems arise. An example code snippet::
|
||||||
|
|
||||||
struct this_needs_locking tux[NR_CPUS];
|
struct this_needs_locking tux[NR_CPUS];
|
||||||
tux[smp_processor_id()] = some_value;
|
tux[smp_processor_id()] = some_value;
|
||||||
@@ -35,6 +39,7 @@ You can also use put_cpu() and get_cpu(), which will disable preemption.
|
|||||||
|
|
||||||
|
|
||||||
RULE #2: CPU state must be protected.
|
RULE #2: CPU state must be protected.
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
Under preemption, the state of the CPU must be protected. This is arch-
|
Under preemption, the state of the CPU must be protected. This is arch-
|
||||||
@@ -52,6 +57,7 @@ However, fpu__restore() must be called with preemption disabled.
|
|||||||
|
|
||||||
|
|
||||||
RULE #3: Lock acquire and release must be performed by same task
|
RULE #3: Lock acquire and release must be performed by same task
|
||||||
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
|
|
||||||
A lock acquired in one task must be released by the same task. This
|
A lock acquired in one task must be released by the same task. This
|
||||||
@@ -61,12 +67,15 @@ like this, acquire and release the task in the same code path and
|
|||||||
have the caller wait on an event by the other task.
|
have the caller wait on an event by the other task.
|
||||||
|
|
||||||
|
|
||||||
SOLUTION
|
Solution
|
||||||
|
========
|
||||||
|
|
||||||
|
|
||||||
Data protection under preemption is achieved by disabling preemption for the
|
Data protection under preemption is achieved by disabling preemption for the
|
||||||
duration of the critical region.
|
duration of the critical region.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
preempt_enable() decrement the preempt counter
|
preempt_enable() decrement the preempt counter
|
||||||
preempt_disable() increment the preempt counter
|
preempt_disable() increment the preempt counter
|
||||||
preempt_enable_no_resched() decrement, but do not immediately preempt
|
preempt_enable_no_resched() decrement, but do not immediately preempt
|
||||||
@@ -89,7 +98,7 @@ So use this implicit preemption-disabling property only if you know that the
|
|||||||
affected codepath does not do any of this. Best policy is to use this only for
|
affected codepath does not do any of this. Best policy is to use this only for
|
||||||
small, atomic code that you wrote and which calls no complex functions.
|
small, atomic code that you wrote and which calls no complex functions.
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
cpucache_t *cc; /* this is per-CPU */
|
cpucache_t *cc; /* this is per-CPU */
|
||||||
preempt_disable();
|
preempt_disable();
|
||||||
@@ -102,7 +111,7 @@ Example:
|
|||||||
return 0;
|
return 0;
|
||||||
|
|
||||||
Notice how the preemption statements must encompass every reference of the
|
Notice how the preemption statements must encompass every reference of the
|
||||||
critical variables. Another example:
|
critical variables. Another example::
|
||||||
|
|
||||||
int buf[NR_CPUS];
|
int buf[NR_CPUS];
|
||||||
set_cpu_val(buf);
|
set_cpu_val(buf);
|
||||||
@@ -114,7 +123,8 @@ This code is not preempt-safe, but see how easily we can fix it by simply
|
|||||||
moving the spin_lock up two lines.
|
moving the spin_lock up two lines.
|
||||||
|
|
||||||
|
|
||||||
PREVENTING PREEMPTION USING INTERRUPT DISABLING
|
Preventing preemption using interrupt disabling
|
||||||
|
===============================================
|
||||||
|
|
||||||
|
|
||||||
It is possible to prevent a preemption event using local_irq_disable and
|
It is possible to prevent a preemption event using local_irq_disable and
|
||||||
|
|||||||
@@ -1,5 +1,18 @@
|
|||||||
|
=========================================
|
||||||
|
How to get printk format specifiers right
|
||||||
|
=========================================
|
||||||
|
|
||||||
|
:Author: Randy Dunlap <rdunlap@infradead.org>
|
||||||
|
:Author: Andrew Murray <amurray@mpc-data.co.uk>
|
||||||
|
|
||||||
|
|
||||||
|
Integer types
|
||||||
|
=============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
If variable is of Type, use printk format specifier:
|
If variable is of Type, use printk format specifier:
|
||||||
---------------------------------------------------------
|
------------------------------------------------------------
|
||||||
int %d or %x
|
int %d or %x
|
||||||
unsigned int %u or %x
|
unsigned int %u or %x
|
||||||
long %ld or %lx
|
long %ld or %lx
|
||||||
@@ -13,25 +26,29 @@ If variable is of Type, use printk format specifier:
|
|||||||
s64 %lld or %llx
|
s64 %lld or %llx
|
||||||
u64 %llu or %llx
|
u64 %llu or %llx
|
||||||
|
|
||||||
If <type> is dependent on a config option for its size (e.g., sector_t,
|
If <type> is dependent on a config option for its size (e.g., ``sector_t``,
|
||||||
blkcnt_t) or is architecture-dependent for its size (e.g., tcflag_t), use a
|
``blkcnt_t``) or is architecture-dependent for its size (e.g., ``tcflag_t``),
|
||||||
format specifier of its largest possible type and explicitly cast to it.
|
use a format specifier of its largest possible type and explicitly cast to it.
|
||||||
Example:
|
|
||||||
|
Example::
|
||||||
|
|
||||||
printk("test: sector number/total blocks: %llu/%llu\n",
|
printk("test: sector number/total blocks: %llu/%llu\n",
|
||||||
(unsigned long long)sector, (unsigned long long)blockcount);
|
(unsigned long long)sector, (unsigned long long)blockcount);
|
||||||
|
|
||||||
Reminder: sizeof() result is of type size_t.
|
Reminder: ``sizeof()`` result is of type ``size_t``.
|
||||||
|
|
||||||
The kernel's printf does not support %n. For obvious reasons, floating
|
The kernel's printf does not support ``%n``. For obvious reasons, floating
|
||||||
point formats (%e, %f, %g, %a) are also not recognized. Use of any
|
point formats (``%e, %f, %g, %a``) are also not recognized. Use of any
|
||||||
unsupported specifier or length qualifier results in a WARN and early
|
unsupported specifier or length qualifier results in a WARN and early
|
||||||
return from vsnprintf.
|
return from vsnprintf.
|
||||||
|
|
||||||
Raw pointer value SHOULD be printed with %p. The kernel supports
|
Raw pointer value SHOULD be printed with %p. The kernel supports
|
||||||
the following extended format specifiers for pointer types:
|
the following extended format specifiers for pointer types:
|
||||||
|
|
||||||
Symbols/Function Pointers:
|
Symbols/Function Pointers
|
||||||
|
=========================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pF versatile_init+0x0/0x110
|
%pF versatile_init+0x0/0x110
|
||||||
%pf versatile_init
|
%pf versatile_init
|
||||||
@@ -41,80 +58,97 @@ Symbols/Function Pointers:
|
|||||||
%ps versatile_init
|
%ps versatile_init
|
||||||
%pB prev_fn_of_versatile_init+0x88/0x88
|
%pB prev_fn_of_versatile_init+0x88/0x88
|
||||||
|
|
||||||
For printing symbols and function pointers. The 'S' and 's' specifiers
|
For printing symbols and function pointers. The ``S`` and ``s`` specifiers
|
||||||
result in the symbol name with ('S') or without ('s') offsets. Where
|
result in the symbol name with (``S``) or without (``s``) offsets. Where
|
||||||
this is used on a kernel without KALLSYMS - the symbol address is
|
this is used on a kernel without KALLSYMS - the symbol address is
|
||||||
printed instead.
|
printed instead.
|
||||||
|
|
||||||
The 'B' specifier results in the symbol name with offsets and should be
|
The ``B`` specifier results in the symbol name with offsets and should be
|
||||||
used when printing stack backtraces. The specifier takes into
|
used when printing stack backtraces. The specifier takes into
|
||||||
consideration the effect of compiler optimisations which may occur
|
consideration the effect of compiler optimisations which may occur
|
||||||
when tail-call's are used and marked with the noreturn GCC attribute.
|
when tail-call``s are used and marked with the noreturn GCC attribute.
|
||||||
|
|
||||||
On ia64, ppc64 and parisc64 architectures function pointers are
|
On ia64, ppc64 and parisc64 architectures function pointers are
|
||||||
actually function descriptors which must first be resolved. The 'F' and
|
actually function descriptors which must first be resolved. The ``F`` and
|
||||||
'f' specifiers perform this resolution and then provide the same
|
``f`` specifiers perform this resolution and then provide the same
|
||||||
functionality as the 'S' and 's' specifiers.
|
functionality as the ``S`` and ``s`` specifiers.
|
||||||
|
|
||||||
Kernel Pointers:
|
Kernel Pointers
|
||||||
|
===============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pK 0x01234567 or 0x0123456789abcdef
|
%pK 0x01234567 or 0x0123456789abcdef
|
||||||
|
|
||||||
For printing kernel pointers which should be hidden from unprivileged
|
For printing kernel pointers which should be hidden from unprivileged
|
||||||
users. The behaviour of %pK depends on the kptr_restrict sysctl - see
|
users. The behaviour of ``%pK`` depends on the ``kptr_restrict sysctl`` - see
|
||||||
Documentation/sysctl/kernel.txt for more details.
|
Documentation/sysctl/kernel.txt for more details.
|
||||||
|
|
||||||
Struct Resources:
|
Struct Resources
|
||||||
|
================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pr [mem 0x60000000-0x6fffffff flags 0x2200] or
|
%pr [mem 0x60000000-0x6fffffff flags 0x2200] or
|
||||||
[mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
|
[mem 0x0000000060000000-0x000000006fffffff flags 0x2200]
|
||||||
%pR [mem 0x60000000-0x6fffffff pref] or
|
%pR [mem 0x60000000-0x6fffffff pref] or
|
||||||
[mem 0x0000000060000000-0x000000006fffffff pref]
|
[mem 0x0000000060000000-0x000000006fffffff pref]
|
||||||
|
|
||||||
For printing struct resources. The 'R' and 'r' specifiers result in a
|
For printing struct resources. The ``R`` and ``r`` specifiers result in a
|
||||||
printed resource with ('R') or without ('r') a decoded flags member.
|
printed resource with (``R``) or without (``r``) a decoded flags member.
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
Physical addresses types phys_addr_t:
|
Physical addresses types ``phys_addr_t``
|
||||||
|
========================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pa[p] 0x01234567 or 0x0123456789abcdef
|
%pa[p] 0x01234567 or 0x0123456789abcdef
|
||||||
|
|
||||||
For printing a phys_addr_t type (and its derivatives, such as
|
For printing a ``phys_addr_t`` type (and its derivatives, such as
|
||||||
resource_size_t) which can vary based on build options, regardless of
|
``resource_size_t``) which can vary based on build options, regardless of
|
||||||
the width of the CPU data path. Passed by reference.
|
the width of the CPU data path. Passed by reference.
|
||||||
|
|
||||||
DMA addresses types dma_addr_t:
|
DMA addresses types ``dma_addr_t``
|
||||||
|
==================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pad 0x01234567 or 0x0123456789abcdef
|
%pad 0x01234567 or 0x0123456789abcdef
|
||||||
|
|
||||||
For printing a dma_addr_t type which can vary based on build options,
|
For printing a ``dma_addr_t`` type which can vary based on build options,
|
||||||
regardless of the width of the CPU data path. Passed by reference.
|
regardless of the width of the CPU data path. Passed by reference.
|
||||||
|
|
||||||
Raw buffer as an escaped string:
|
Raw buffer as an escaped string
|
||||||
|
===============================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%*pE[achnops]
|
%*pE[achnops]
|
||||||
|
|
||||||
For printing raw buffer as an escaped string. For the following buffer
|
For printing raw buffer as an escaped string. For the following buffer::
|
||||||
|
|
||||||
1b 62 20 5c 43 07 22 90 0d 5d
|
1b 62 20 5c 43 07 22 90 0d 5d
|
||||||
|
|
||||||
few examples show how the conversion would be done (the result string
|
few examples show how the conversion would be done (the result string
|
||||||
without surrounding quotes):
|
without surrounding quotes)::
|
||||||
|
|
||||||
%*pE "\eb \C\a"\220\r]"
|
%*pE "\eb \C\a"\220\r]"
|
||||||
%*pEhp "\x1bb \C\x07"\x90\x0d]"
|
%*pEhp "\x1bb \C\x07"\x90\x0d]"
|
||||||
%*pEa "\e\142\040\\\103\a\042\220\r\135"
|
%*pEa "\e\142\040\\\103\a\042\220\r\135"
|
||||||
|
|
||||||
The conversion rules are applied according to an optional combination
|
The conversion rules are applied according to an optional combination
|
||||||
of flags (see string_escape_mem() kernel documentation for the
|
of flags (see :c:func:`string_escape_mem` kernel documentation for the
|
||||||
details):
|
details):
|
||||||
a - ESCAPE_ANY
|
|
||||||
c - ESCAPE_SPECIAL
|
- ``a`` - ESCAPE_ANY
|
||||||
h - ESCAPE_HEX
|
- ``c`` - ESCAPE_SPECIAL
|
||||||
n - ESCAPE_NULL
|
- ``h`` - ESCAPE_HEX
|
||||||
o - ESCAPE_OCTAL
|
- ``n`` - ESCAPE_NULL
|
||||||
p - ESCAPE_NP
|
- ``o`` - ESCAPE_OCTAL
|
||||||
s - ESCAPE_SPACE
|
- ``p`` - ESCAPE_NP
|
||||||
|
- ``s`` - ESCAPE_SPACE
|
||||||
|
|
||||||
By default ESCAPE_ANY_NP is used.
|
By default ESCAPE_ANY_NP is used.
|
||||||
|
|
||||||
ESCAPE_ANY_NP is the sane choice for many cases, in particularly for
|
ESCAPE_ANY_NP is the sane choice for many cases, in particularly for
|
||||||
@@ -122,7 +156,10 @@ Raw buffer as an escaped string:
|
|||||||
|
|
||||||
If field width is omitted the 1 byte only will be escaped.
|
If field width is omitted the 1 byte only will be escaped.
|
||||||
|
|
||||||
Raw buffer as a hex string:
|
Raw buffer as a hex string
|
||||||
|
==========================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%*ph 00 01 02 ... 3f
|
%*ph 00 01 02 ... 3f
|
||||||
%*phC 00:01:02: ... :3f
|
%*phC 00:01:02: ... :3f
|
||||||
@@ -131,9 +168,12 @@ Raw buffer as a hex string:
|
|||||||
|
|
||||||
For printing a small buffers (up to 64 bytes long) as a hex string with
|
For printing a small buffers (up to 64 bytes long) as a hex string with
|
||||||
certain separator. For the larger buffers consider to use
|
certain separator. For the larger buffers consider to use
|
||||||
print_hex_dump().
|
:c:func:`print_hex_dump`.
|
||||||
|
|
||||||
MAC/FDDI addresses:
|
MAC/FDDI addresses
|
||||||
|
==================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pM 00:01:02:03:04:05
|
%pM 00:01:02:03:04:05
|
||||||
%pMR 05:04:03:02:01:00
|
%pMR 05:04:03:02:01:00
|
||||||
@@ -141,53 +181,62 @@ MAC/FDDI addresses:
|
|||||||
%pm 000102030405
|
%pm 000102030405
|
||||||
%pmR 050403020100
|
%pmR 050403020100
|
||||||
|
|
||||||
For printing 6-byte MAC/FDDI addresses in hex notation. The 'M' and 'm'
|
For printing 6-byte MAC/FDDI addresses in hex notation. The ``M`` and ``m``
|
||||||
specifiers result in a printed address with ('M') or without ('m') byte
|
specifiers result in a printed address with (``M``) or without (``m``) byte
|
||||||
separators. The default byte separator is the colon (':').
|
separators. The default byte separator is the colon (``:``).
|
||||||
|
|
||||||
Where FDDI addresses are concerned the 'F' specifier can be used after
|
Where FDDI addresses are concerned the ``F`` specifier can be used after
|
||||||
the 'M' specifier to use dash ('-') separators instead of the default
|
the ``M`` specifier to use dash (``-``) separators instead of the default
|
||||||
separator.
|
separator.
|
||||||
|
|
||||||
For Bluetooth addresses the 'R' specifier shall be used after the 'M'
|
For Bluetooth addresses the ``R`` specifier shall be used after the ``M``
|
||||||
specifier to use reversed byte order suitable for visual interpretation
|
specifier to use reversed byte order suitable for visual interpretation
|
||||||
of Bluetooth addresses which are in the little endian order.
|
of Bluetooth addresses which are in the little endian order.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
IPv4 addresses:
|
IPv4 addresses
|
||||||
|
==============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pI4 1.2.3.4
|
%pI4 1.2.3.4
|
||||||
%pi4 001.002.003.004
|
%pi4 001.002.003.004
|
||||||
%p[Ii]4[hnbl]
|
%p[Ii]4[hnbl]
|
||||||
|
|
||||||
For printing IPv4 dot-separated decimal addresses. The 'I4' and 'i4'
|
For printing IPv4 dot-separated decimal addresses. The ``I4`` and ``i4``
|
||||||
specifiers result in a printed address with ('i4') or without ('I4')
|
specifiers result in a printed address with (``i4``) or without (``I4``)
|
||||||
leading zeros.
|
leading zeros.
|
||||||
|
|
||||||
The additional 'h', 'n', 'b', and 'l' specifiers are used to specify
|
The additional ``h``, ``n``, ``b``, and ``l`` specifiers are used to specify
|
||||||
host, network, big or little endian order addresses respectively. Where
|
host, network, big or little endian order addresses respectively. Where
|
||||||
no specifier is provided the default network/big endian order is used.
|
no specifier is provided the default network/big endian order is used.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
IPv6 addresses:
|
IPv6 addresses
|
||||||
|
==============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pI6 0001:0002:0003:0004:0005:0006:0007:0008
|
%pI6 0001:0002:0003:0004:0005:0006:0007:0008
|
||||||
%pi6 00010002000300040005000600070008
|
%pi6 00010002000300040005000600070008
|
||||||
%pI6c 1:2:3:4:5:6:7:8
|
%pI6c 1:2:3:4:5:6:7:8
|
||||||
|
|
||||||
For printing IPv6 network-order 16-bit hex addresses. The 'I6' and 'i6'
|
For printing IPv6 network-order 16-bit hex addresses. The ``I6`` and ``i6``
|
||||||
specifiers result in a printed address with ('I6') or without ('i6')
|
specifiers result in a printed address with (``I6``) or without (``i6``)
|
||||||
colon-separators. Leading zeros are always used.
|
colon-separators. Leading zeros are always used.
|
||||||
|
|
||||||
The additional 'c' specifier can be used with the 'I' specifier to
|
The additional ``c`` specifier can be used with the ``I`` specifier to
|
||||||
print a compressed IPv6 address as described by
|
print a compressed IPv6 address as described by
|
||||||
http://tools.ietf.org/html/rfc5952
|
http://tools.ietf.org/html/rfc5952
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
IPv4/IPv6 addresses (generic, with port, flowinfo, scope):
|
IPv4/IPv6 addresses (generic, with port, flowinfo, scope)
|
||||||
|
=========================================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008
|
%pIS 1.2.3.4 or 0001:0002:0003:0004:0005:0006:0007:0008
|
||||||
%piS 001.002.003.004 or 00010002000300040005000600070008
|
%piS 001.002.003.004 or 00010002000300040005000600070008
|
||||||
@@ -195,33 +244,36 @@ IPv4/IPv6 addresses (generic, with port, flowinfo, scope):
|
|||||||
%pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345
|
%pISpc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345
|
||||||
%p[Ii]S[pfschnbl]
|
%p[Ii]S[pfschnbl]
|
||||||
|
|
||||||
For printing an IP address without the need to distinguish whether it's
|
For printing an IP address without the need to distinguish whether it``s
|
||||||
of type AF_INET or AF_INET6, a pointer to a valid 'struct sockaddr',
|
of type AF_INET or AF_INET6, a pointer to a valid ``struct sockaddr``,
|
||||||
specified through 'IS' or 'iS', can be passed to this format specifier.
|
specified through ``IS`` or ``iS``, can be passed to this format specifier.
|
||||||
|
|
||||||
The additional 'p', 'f', and 's' specifiers are used to specify port
|
The additional ``p``, ``f``, and ``s`` specifiers are used to specify port
|
||||||
(IPv4, IPv6), flowinfo (IPv6) and scope (IPv6). Ports have a ':' prefix,
|
(IPv4, IPv6), flowinfo (IPv6) and scope (IPv6). Ports have a ``:`` prefix,
|
||||||
flowinfo a '/' and scope a '%', each followed by the actual value.
|
flowinfo a ``/`` and scope a ``%``, each followed by the actual value.
|
||||||
|
|
||||||
In case of an IPv6 address the compressed IPv6 address as described by
|
In case of an IPv6 address the compressed IPv6 address as described by
|
||||||
http://tools.ietf.org/html/rfc5952 is being used if the additional
|
http://tools.ietf.org/html/rfc5952 is being used if the additional
|
||||||
specifier 'c' is given. The IPv6 address is surrounded by '[', ']' in
|
specifier ``c`` is given. The IPv6 address is surrounded by ``[``, ``]`` in
|
||||||
case of additional specifiers 'p', 'f' or 's' as suggested by
|
case of additional specifiers ``p``, ``f`` or ``s`` as suggested by
|
||||||
https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07
|
https://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-07
|
||||||
|
|
||||||
In case of IPv4 addresses, the additional 'h', 'n', 'b', and 'l'
|
In case of IPv4 addresses, the additional ``h``, ``n``, ``b``, and ``l``
|
||||||
specifiers can be used as well and are ignored in case of an IPv6
|
specifiers can be used as well and are ignored in case of an IPv6
|
||||||
address.
|
address.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
Further examples:
|
Further examples::
|
||||||
|
|
||||||
%pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789
|
%pISfc 1.2.3.4 or [1:2:3:4:5:6:7:8]/123456789
|
||||||
%pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890
|
%pISsc 1.2.3.4 or [1:2:3:4:5:6:7:8]%1234567890
|
||||||
%pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789
|
%pISpfc 1.2.3.4:12345 or [1:2:3:4:5:6:7:8]:12345/123456789
|
||||||
|
|
||||||
UUID/GUID addresses:
|
UUID/GUID addresses
|
||||||
|
===================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
|
%pUb 00010203-0405-0607-0809-0a0b0c0d0e0f
|
||||||
%pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
|
%pUB 00010203-0405-0607-0809-0A0B0C0D0E0F
|
||||||
@@ -238,30 +290,39 @@ UUID/GUID addresses:
|
|||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
dentry names:
|
dentry names
|
||||||
|
============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pd{,2,3,4}
|
%pd{,2,3,4}
|
||||||
%pD{,2,3,4}
|
%pD{,2,3,4}
|
||||||
|
|
||||||
For printing dentry name; if we race with d_move(), the name might be
|
For printing dentry name; if we race with :c:func:`d_move`, the name might be
|
||||||
a mix of old and new ones, but it won't oops. %pd dentry is a safer
|
a mix of old and new ones, but it won't oops. ``%pd`` dentry is a safer
|
||||||
equivalent of %s dentry->d_name.name we used to use, %pd<n> prints
|
equivalent of ``%s`` ``dentry->d_name.name`` we used to use, ``%pd<n>`` prints
|
||||||
n last components. %pD does the same thing for struct file.
|
``n`` last components. ``%pD`` does the same thing for struct file.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
block_device names:
|
block_device names
|
||||||
|
==================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pg sda, sda1 or loop0p1
|
%pg sda, sda1 or loop0p1
|
||||||
|
|
||||||
For printing name of block_device pointers.
|
For printing name of block_device pointers.
|
||||||
|
|
||||||
struct va_format:
|
struct va_format
|
||||||
|
================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pV
|
%pV
|
||||||
|
|
||||||
For printing struct va_format structures. These contain a format string
|
For printing struct va_format structures. These contain a format string
|
||||||
and va_list as follows:
|
and va_list as follows::
|
||||||
|
|
||||||
struct va_format {
|
struct va_format {
|
||||||
const char *fmt;
|
const char *fmt;
|
||||||
@@ -275,7 +336,11 @@ struct va_format:
|
|||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
kobjects:
|
kobjects
|
||||||
|
========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pO
|
%pO
|
||||||
|
|
||||||
Base specifier for kobject based structs. Must be followed with
|
Base specifier for kobject based structs. Must be followed with
|
||||||
@@ -311,30 +376,40 @@ kobjects:
|
|||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
struct clk:
|
|
||||||
|
struct clk
|
||||||
|
==========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pC pll1
|
%pC pll1
|
||||||
%pCn pll1
|
%pCn pll1
|
||||||
%pCr 1560000000
|
%pCr 1560000000
|
||||||
|
|
||||||
For printing struct clk structures. '%pC' and '%pCn' print the name
|
For printing struct clk structures. ``%pC`` and ``%pCn`` print the name
|
||||||
(Common Clock Framework) or address (legacy clock framework) of the
|
(Common Clock Framework) or address (legacy clock framework) of the
|
||||||
structure; '%pCr' prints the current clock rate.
|
structure; ``%pCr`` prints the current clock rate.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
bitmap and its derivatives such as cpumask and nodemask:
|
bitmap and its derivatives such as cpumask and nodemask
|
||||||
|
=======================================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%*pb 0779
|
%*pb 0779
|
||||||
%*pbl 0,3-6,8-10
|
%*pbl 0,3-6,8-10
|
||||||
|
|
||||||
For printing bitmap and its derivatives such as cpumask and nodemask,
|
For printing bitmap and its derivatives such as cpumask and nodemask,
|
||||||
%*pb output the bitmap with field width as the number of bits and %*pbl
|
``%*pb`` output the bitmap with field width as the number of bits and ``%*pbl``
|
||||||
output the bitmap as range list with field width as the number of bits.
|
output the bitmap as range list with field width as the number of bits.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
Flags bitfields such as page flags, gfp_flags:
|
Flags bitfields such as page flags, gfp_flags
|
||||||
|
=============================================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pGp referenced|uptodate|lru|active|private
|
%pGp referenced|uptodate|lru|active|private
|
||||||
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
|
%pGg GFP_USER|GFP_DMA32|GFP_NOWARN
|
||||||
@@ -343,16 +418,19 @@ Flags bitfields such as page flags, gfp_flags:
|
|||||||
For printing flags bitfields as a collection of symbolic constants that
|
For printing flags bitfields as a collection of symbolic constants that
|
||||||
would construct the value. The type of flags is given by the third
|
would construct the value. The type of flags is given by the third
|
||||||
character. Currently supported are [p]age flags, [v]ma_flags (both
|
character. Currently supported are [p]age flags, [v]ma_flags (both
|
||||||
expect unsigned long *) and [g]fp_flags (expects gfp_t *). The flag
|
expect ``unsigned long *``) and [g]fp_flags (expects ``gfp_t *``). The flag
|
||||||
names and print order depends on the particular type.
|
names and print order depends on the particular type.
|
||||||
|
|
||||||
Note that this format should not be used directly in TP_printk() part
|
Note that this format should not be used directly in :c:func:`TP_printk()` part
|
||||||
of a tracepoint. Instead, use the show_*_flags() functions from
|
of a tracepoint. Instead, use the ``show_*_flags()`` functions from
|
||||||
<trace/events/mmflags.h>.
|
<trace/events/mmflags.h>.
|
||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
Network device features:
|
Network device features
|
||||||
|
=======================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
%pNF 0x000000000000c000
|
%pNF 0x000000000000c000
|
||||||
|
|
||||||
@@ -360,12 +438,8 @@ Network device features:
|
|||||||
|
|
||||||
Passed by reference.
|
Passed by reference.
|
||||||
|
|
||||||
If you add other %p extensions, please extend lib/test_printf.c with
|
If you add other ``%p`` extensions, please extend lib/test_printf.c with
|
||||||
one or more test cases, if at all feasible.
|
one or more test cases, if at all feasible.
|
||||||
|
|
||||||
|
|
||||||
Thank you for your cooperation and attention.
|
Thank you for your cooperation and attention.
|
||||||
|
|
||||||
|
|
||||||
By Randy Dunlap <rdunlap@infradead.org> and
|
|
||||||
Andrew Murray <amurray@mpc-data.co.uk>
|
|
||||||
|
|||||||
@@ -1,4 +1,6 @@
|
|||||||
|
======================================
|
||||||
Pulse Width Modulation (PWM) interface
|
Pulse Width Modulation (PWM) interface
|
||||||
|
======================================
|
||||||
|
|
||||||
This provides an overview about the Linux PWM interface
|
This provides an overview about the Linux PWM interface
|
||||||
|
|
||||||
@@ -16,7 +18,7 @@ Users of the legacy PWM API use unique IDs to refer to PWM devices.
|
|||||||
|
|
||||||
Instead of referring to a PWM device via its unique ID, board setup code
|
Instead of referring to a PWM device via its unique ID, board setup code
|
||||||
should instead register a static mapping that can be used to match PWM
|
should instead register a static mapping that can be used to match PWM
|
||||||
consumers to providers, as given in the following example:
|
consumers to providers, as given in the following example::
|
||||||
|
|
||||||
static struct pwm_lookup board_pwm_lookup[] = {
|
static struct pwm_lookup board_pwm_lookup[] = {
|
||||||
PWM_LOOKUP("tegra-pwm", 0, "pwm-backlight", NULL,
|
PWM_LOOKUP("tegra-pwm", 0, "pwm-backlight", NULL,
|
||||||
@@ -40,7 +42,7 @@ New users should use the pwm_get() function and pass to it the consumer
|
|||||||
device or a consumer name. pwm_put() is used to free the PWM device. Managed
|
device or a consumer name. pwm_put() is used to free the PWM device. Managed
|
||||||
variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
|
variants of these functions, devm_pwm_get() and devm_pwm_put(), also exist.
|
||||||
|
|
||||||
After being requested, a PWM has to be configured using:
|
After being requested, a PWM has to be configured using::
|
||||||
|
|
||||||
int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
|
int pwm_apply_state(struct pwm_device *pwm, struct pwm_state *state);
|
||||||
|
|
||||||
@@ -72,11 +74,14 @@ interface is provided to use the PWMs from userspace. It is exposed at
|
|||||||
pwmchipN, where N is the base of the PWM chip. Inside the directory you
|
pwmchipN, where N is the base of the PWM chip. Inside the directory you
|
||||||
will find:
|
will find:
|
||||||
|
|
||||||
npwm - The number of PWM channels this chip supports (read-only).
|
npwm
|
||||||
|
The number of PWM channels this chip supports (read-only).
|
||||||
|
|
||||||
export - Exports a PWM channel for use with sysfs (write-only).
|
export
|
||||||
|
Exports a PWM channel for use with sysfs (write-only).
|
||||||
|
|
||||||
unexport - Unexports a PWM channel from sysfs (write-only).
|
unexport
|
||||||
|
Unexports a PWM channel from sysfs (write-only).
|
||||||
|
|
||||||
The PWM channels are numbered using a per-chip index from 0 to npwm-1.
|
The PWM channels are numbered using a per-chip index from 0 to npwm-1.
|
||||||
|
|
||||||
@@ -84,21 +89,26 @@ When a PWM channel is exported a pwmX directory will be created in the
|
|||||||
pwmchipN directory it is associated with, where X is the number of the
|
pwmchipN directory it is associated with, where X is the number of the
|
||||||
channel that was exported. The following properties will then be available:
|
channel that was exported. The following properties will then be available:
|
||||||
|
|
||||||
period - The total period of the PWM signal (read/write).
|
period
|
||||||
|
The total period of the PWM signal (read/write).
|
||||||
Value is in nanoseconds and is the sum of the active and inactive
|
Value is in nanoseconds and is the sum of the active and inactive
|
||||||
time of the PWM.
|
time of the PWM.
|
||||||
|
|
||||||
duty_cycle - The active time of the PWM signal (read/write).
|
duty_cycle
|
||||||
|
The active time of the PWM signal (read/write).
|
||||||
Value is in nanoseconds and must be less than the period.
|
Value is in nanoseconds and must be less than the period.
|
||||||
|
|
||||||
polarity - Changes the polarity of the PWM signal (read/write).
|
polarity
|
||||||
|
Changes the polarity of the PWM signal (read/write).
|
||||||
Writes to this property only work if the PWM chip supports changing
|
Writes to this property only work if the PWM chip supports changing
|
||||||
the polarity. The polarity can only be changed if the PWM is not
|
the polarity. The polarity can only be changed if the PWM is not
|
||||||
enabled. Value is the string "normal" or "inversed".
|
enabled. Value is the string "normal" or "inversed".
|
||||||
|
|
||||||
enable - Enable/disable the PWM signal (read/write).
|
enable
|
||||||
0 - disabled
|
Enable/disable the PWM signal (read/write).
|
||||||
1 - enabled
|
|
||||||
|
- 0 - disabled
|
||||||
|
- 1 - enabled
|
||||||
|
|
||||||
Implementing a PWM driver
|
Implementing a PWM driver
|
||||||
-------------------------
|
-------------------------
|
||||||
|
|||||||
@@ -1,7 +1,10 @@
|
|||||||
|
=================================
|
||||||
Red-black Trees (rbtree) in Linux
|
Red-black Trees (rbtree) in Linux
|
||||||
January 18, 2007
|
=================================
|
||||||
Rob Landley <rob@landley.net>
|
|
||||||
=============================
|
|
||||||
|
:Date: January 18, 2007
|
||||||
|
:Author: Rob Landley <rob@landley.net>
|
||||||
|
|
||||||
What are red-black trees, and what are they for?
|
What are red-black trees, and what are they for?
|
||||||
------------------------------------------------
|
------------------------------------------------
|
||||||
@@ -56,7 +59,7 @@ user of the rbtree code.
|
|||||||
Creating a new rbtree
|
Creating a new rbtree
|
||||||
---------------------
|
---------------------
|
||||||
|
|
||||||
Data nodes in an rbtree tree are structures containing a struct rb_node member:
|
Data nodes in an rbtree tree are structures containing a struct rb_node member::
|
||||||
|
|
||||||
struct mytype {
|
struct mytype {
|
||||||
struct rb_node node;
|
struct rb_node node;
|
||||||
@@ -78,7 +81,7 @@ Searching for a value in an rbtree
|
|||||||
Writing a search function for your tree is fairly straightforward: start at the
|
Writing a search function for your tree is fairly straightforward: start at the
|
||||||
root, compare each value, and follow the left or right branch as necessary.
|
root, compare each value, and follow the left or right branch as necessary.
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
struct mytype *my_search(struct rb_root *root, char *string)
|
struct mytype *my_search(struct rb_root *root, char *string)
|
||||||
{
|
{
|
||||||
@@ -110,7 +113,7 @@ The search for insertion differs from the previous search by finding the
|
|||||||
location of the pointer on which to graft the new node. The new node also
|
location of the pointer on which to graft the new node. The new node also
|
||||||
needs a link to its parent node for rebalancing purposes.
|
needs a link to its parent node for rebalancing purposes.
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
int my_insert(struct rb_root *root, struct mytype *data)
|
int my_insert(struct rb_root *root, struct mytype *data)
|
||||||
{
|
{
|
||||||
@@ -140,11 +143,11 @@ Example:
|
|||||||
Removing or replacing existing data in an rbtree
|
Removing or replacing existing data in an rbtree
|
||||||
------------------------------------------------
|
------------------------------------------------
|
||||||
|
|
||||||
To remove an existing node from a tree, call:
|
To remove an existing node from a tree, call::
|
||||||
|
|
||||||
void rb_erase(struct rb_node *victim, struct rb_root *tree);
|
void rb_erase(struct rb_node *victim, struct rb_root *tree);
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
struct mytype *data = mysearch(&mytree, "walrus");
|
struct mytype *data = mysearch(&mytree, "walrus");
|
||||||
|
|
||||||
@@ -153,7 +156,7 @@ Example:
|
|||||||
myfree(data);
|
myfree(data);
|
||||||
}
|
}
|
||||||
|
|
||||||
To replace an existing node in a tree with a new one with the same key, call:
|
To replace an existing node in a tree with a new one with the same key, call::
|
||||||
|
|
||||||
void rb_replace_node(struct rb_node *old, struct rb_node *new,
|
void rb_replace_node(struct rb_node *old, struct rb_node *new,
|
||||||
struct rb_root *tree);
|
struct rb_root *tree);
|
||||||
@@ -166,7 +169,7 @@ Iterating through the elements stored in an rbtree (in sort order)
|
|||||||
|
|
||||||
Four functions are provided for iterating through an rbtree's contents in
|
Four functions are provided for iterating through an rbtree's contents in
|
||||||
sorted order. These work on arbitrary trees, and should not need to be
|
sorted order. These work on arbitrary trees, and should not need to be
|
||||||
modified or wrapped (except for locking purposes):
|
modified or wrapped (except for locking purposes)::
|
||||||
|
|
||||||
struct rb_node *rb_first(struct rb_root *tree);
|
struct rb_node *rb_first(struct rb_root *tree);
|
||||||
struct rb_node *rb_last(struct rb_root *tree);
|
struct rb_node *rb_last(struct rb_root *tree);
|
||||||
@@ -184,7 +187,7 @@ which the containing data structure may be accessed with the container_of()
|
|||||||
macro, and individual members may be accessed directly via
|
macro, and individual members may be accessed directly via
|
||||||
rb_entry(node, type, member).
|
rb_entry(node, type, member).
|
||||||
|
|
||||||
Example:
|
Example::
|
||||||
|
|
||||||
struct rb_node *node;
|
struct rb_node *node;
|
||||||
for (node = rb_first(&mytree); node; node = rb_next(node))
|
for (node = rb_first(&mytree); node; node = rb_next(node))
|
||||||
@@ -241,7 +244,8 @@ user should have a single rb_erase_augmented() call site in order to limit
|
|||||||
compiled code size.
|
compiled code size.
|
||||||
|
|
||||||
|
|
||||||
Sample usage:
|
Sample usage
|
||||||
|
^^^^^^^^^^^^
|
||||||
|
|
||||||
Interval tree is an example of augmented rb tree. Reference -
|
Interval tree is an example of augmented rb tree. Reference -
|
||||||
"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
|
"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
|
||||||
@@ -259,7 +263,7 @@ This "extra information" stored in each node is the maximum hi
|
|||||||
information can be maintained at each node just be looking at the node
|
information can be maintained at each node just be looking at the node
|
||||||
and its immediate children. And this will be used in O(log n) lookup
|
and its immediate children. And this will be used in O(log n) lookup
|
||||||
for lowest match (lowest start address among all possible matches)
|
for lowest match (lowest start address among all possible matches)
|
||||||
with something like:
|
with something like::
|
||||||
|
|
||||||
struct interval_tree_node *
|
struct interval_tree_node *
|
||||||
interval_tree_first_match(struct rb_root *root,
|
interval_tree_first_match(struct rb_root *root,
|
||||||
@@ -303,7 +307,7 @@ interval_tree_first_match(struct rb_root *root,
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
Insertion/removal are defined using the following augmented callbacks:
|
Insertion/removal are defined using the following augmented callbacks::
|
||||||
|
|
||||||
static inline unsigned long
|
static inline unsigned long
|
||||||
compute_subtree_last(struct interval_tree_node *node)
|
compute_subtree_last(struct interval_tree_node *node)
|
||||||
|
|||||||
@@ -1,6 +1,9 @@
|
|||||||
|
==========================
|
||||||
Remote Processor Framework
|
Remote Processor Framework
|
||||||
|
==========================
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
Modern SoCs typically have heterogeneous remote processor devices in asymmetric
|
Modern SoCs typically have heterogeneous remote processor devices in asymmetric
|
||||||
multiprocessing (AMP) configurations, which may be running different instances
|
multiprocessing (AMP) configurations, which may be running different instances
|
||||||
@@ -26,38 +29,56 @@ remoteproc will add those devices. This makes it possible to reuse the
|
|||||||
existing virtio drivers with remote processor backends at a minimal development
|
existing virtio drivers with remote processor backends at a minimal development
|
||||||
cost.
|
cost.
|
||||||
|
|
||||||
2. User API
|
User API
|
||||||
|
========
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int rproc_boot(struct rproc *rproc)
|
int rproc_boot(struct rproc *rproc)
|
||||||
- Boot a remote processor (i.e. load its firmware, power it on, ...).
|
|
||||||
|
Boot a remote processor (i.e. load its firmware, power it on, ...).
|
||||||
|
|
||||||
If the remote processor is already powered on, this function immediately
|
If the remote processor is already powered on, this function immediately
|
||||||
returns (successfully).
|
returns (successfully).
|
||||||
|
|
||||||
Returns 0 on success, and an appropriate error value otherwise.
|
Returns 0 on success, and an appropriate error value otherwise.
|
||||||
Note: to use this function you should already have a valid rproc
|
Note: to use this function you should already have a valid rproc
|
||||||
handle. There are several ways to achieve that cleanly (devres, pdata,
|
handle. There are several ways to achieve that cleanly (devres, pdata,
|
||||||
the way remoteproc_rpmsg.c does this, or, if this becomes prevalent, we
|
the way remoteproc_rpmsg.c does this, or, if this becomes prevalent, we
|
||||||
might also consider using dev_archdata for this).
|
might also consider using dev_archdata for this).
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void rproc_shutdown(struct rproc *rproc)
|
void rproc_shutdown(struct rproc *rproc)
|
||||||
- Power off a remote processor (previously booted with rproc_boot()).
|
|
||||||
|
Power off a remote processor (previously booted with rproc_boot()).
|
||||||
In case @rproc is still being used by an additional user(s), then
|
In case @rproc is still being used by an additional user(s), then
|
||||||
this function will just decrement the power refcount and exit,
|
this function will just decrement the power refcount and exit,
|
||||||
without really powering off the device.
|
without really powering off the device.
|
||||||
|
|
||||||
Every call to rproc_boot() must (eventually) be accompanied by a call
|
Every call to rproc_boot() must (eventually) be accompanied by a call
|
||||||
to rproc_shutdown(). Calling rproc_shutdown() redundantly is a bug.
|
to rproc_shutdown(). Calling rproc_shutdown() redundantly is a bug.
|
||||||
Notes:
|
|
||||||
- we're not decrementing the rproc's refcount, only the power refcount.
|
.. note::
|
||||||
|
|
||||||
|
we're not decrementing the rproc's refcount, only the power refcount.
|
||||||
which means that the @rproc handle stays valid even after
|
which means that the @rproc handle stays valid even after
|
||||||
rproc_shutdown() returns, and users can still use it with a subsequent
|
rproc_shutdown() returns, and users can still use it with a subsequent
|
||||||
rproc_boot(), if needed.
|
rproc_boot(), if needed.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct rproc *rproc_get_by_phandle(phandle phandle)
|
struct rproc *rproc_get_by_phandle(phandle phandle)
|
||||||
- Find an rproc handle using a device tree phandle. Returns the rproc
|
|
||||||
|
Find an rproc handle using a device tree phandle. Returns the rproc
|
||||||
handle on success, and NULL on failure. This function increments
|
handle on success, and NULL on failure. This function increments
|
||||||
the remote processor's refcount, so always use rproc_put() to
|
the remote processor's refcount, so always use rproc_put() to
|
||||||
decrement it back once rproc isn't needed anymore.
|
decrement it back once rproc isn't needed anymore.
|
||||||
|
|
||||||
3. Typical usage
|
Typical usage
|
||||||
|
=============
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
#include <linux/remoteproc.h>
|
#include <linux/remoteproc.h>
|
||||||
|
|
||||||
@@ -82,12 +103,16 @@ int dummy_rproc_example(struct rproc *my_rproc)
|
|||||||
rproc_shutdown(my_rproc);
|
rproc_shutdown(my_rproc);
|
||||||
}
|
}
|
||||||
|
|
||||||
4. API for implementors
|
API for implementors
|
||||||
|
====================
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
struct rproc *rproc_alloc(struct device *dev, const char *name,
|
struct rproc *rproc_alloc(struct device *dev, const char *name,
|
||||||
const struct rproc_ops *ops,
|
const struct rproc_ops *ops,
|
||||||
const char *firmware, int len)
|
const char *firmware, int len)
|
||||||
- Allocate a new remote processor handle, but don't register
|
|
||||||
|
Allocate a new remote processor handle, but don't register
|
||||||
it yet. Required parameters are the underlying device, the
|
it yet. Required parameters are the underlying device, the
|
||||||
name of this remote processor, platform-specific ops handlers,
|
name of this remote processor, platform-specific ops handlers,
|
||||||
the name of the firmware to boot this rproc with, and the
|
the name of the firmware to boot this rproc with, and the
|
||||||
@@ -95,36 +120,54 @@ int dummy_rproc_example(struct rproc *my_rproc)
|
|||||||
|
|
||||||
This function should be used by rproc implementations during
|
This function should be used by rproc implementations during
|
||||||
initialization of the remote processor.
|
initialization of the remote processor.
|
||||||
|
|
||||||
After creating an rproc handle using this function, and when ready,
|
After creating an rproc handle using this function, and when ready,
|
||||||
implementations should then call rproc_add() to complete
|
implementations should then call rproc_add() to complete
|
||||||
the registration of the remote processor.
|
the registration of the remote processor.
|
||||||
|
|
||||||
On success, the new rproc is returned, and on failure, NULL.
|
On success, the new rproc is returned, and on failure, NULL.
|
||||||
|
|
||||||
Note: _never_ directly deallocate @rproc, even if it was not registered
|
.. note::
|
||||||
|
|
||||||
|
**never** directly deallocate @rproc, even if it was not registered
|
||||||
yet. Instead, when you need to unroll rproc_alloc(), use rproc_free().
|
yet. Instead, when you need to unroll rproc_alloc(), use rproc_free().
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void rproc_free(struct rproc *rproc)
|
void rproc_free(struct rproc *rproc)
|
||||||
- Free an rproc handle that was allocated by rproc_alloc.
|
|
||||||
|
Free an rproc handle that was allocated by rproc_alloc.
|
||||||
|
|
||||||
This function essentially unrolls rproc_alloc(), by decrementing the
|
This function essentially unrolls rproc_alloc(), by decrementing the
|
||||||
rproc's refcount. It doesn't directly free rproc; that would happen
|
rproc's refcount. It doesn't directly free rproc; that would happen
|
||||||
only if there are no other references to rproc and its refcount now
|
only if there are no other references to rproc and its refcount now
|
||||||
dropped to zero.
|
dropped to zero.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int rproc_add(struct rproc *rproc)
|
int rproc_add(struct rproc *rproc)
|
||||||
- Register @rproc with the remoteproc framework, after it has been
|
|
||||||
|
Register @rproc with the remoteproc framework, after it has been
|
||||||
allocated with rproc_alloc().
|
allocated with rproc_alloc().
|
||||||
|
|
||||||
This is called by the platform-specific rproc implementation, whenever
|
This is called by the platform-specific rproc implementation, whenever
|
||||||
a new remote processor device is probed.
|
a new remote processor device is probed.
|
||||||
|
|
||||||
Returns 0 on success and an appropriate error code otherwise.
|
Returns 0 on success and an appropriate error code otherwise.
|
||||||
Note: this function initiates an asynchronous firmware loading
|
Note: this function initiates an asynchronous firmware loading
|
||||||
context, which will look for virtio devices supported by the rproc's
|
context, which will look for virtio devices supported by the rproc's
|
||||||
firmware.
|
firmware.
|
||||||
|
|
||||||
If found, those virtio devices will be created and added, so as a result
|
If found, those virtio devices will be created and added, so as a result
|
||||||
of registering this remote processor, additional virtio drivers might get
|
of registering this remote processor, additional virtio drivers might get
|
||||||
probed.
|
probed.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
int rproc_del(struct rproc *rproc)
|
int rproc_del(struct rproc *rproc)
|
||||||
- Unroll rproc_add().
|
|
||||||
|
Unroll rproc_add().
|
||||||
|
|
||||||
This function should be called when the platform specific rproc
|
This function should be called when the platform specific rproc
|
||||||
implementation decides to remove the rproc device. it should
|
implementation decides to remove the rproc device. it should
|
||||||
_only_ be called if a previous invocation of rproc_add()
|
_only_ be called if a previous invocation of rproc_add()
|
||||||
@@ -135,17 +178,22 @@ int dummy_rproc_example(struct rproc *my_rproc)
|
|||||||
|
|
||||||
Returns 0 on success and -EINVAL if @rproc isn't valid.
|
Returns 0 on success and -EINVAL if @rproc isn't valid.
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
|
void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type)
|
||||||
- Report a crash in a remoteproc
|
|
||||||
|
Report a crash in a remoteproc
|
||||||
|
|
||||||
This function must be called every time a crash is detected by the
|
This function must be called every time a crash is detected by the
|
||||||
platform specific rproc implementation. This should not be called from a
|
platform specific rproc implementation. This should not be called from a
|
||||||
non-remoteproc driver. This function can be called from atomic/interrupt
|
non-remoteproc driver. This function can be called from atomic/interrupt
|
||||||
context.
|
context.
|
||||||
|
|
||||||
5. Implementation callbacks
|
Implementation callbacks
|
||||||
|
========================
|
||||||
|
|
||||||
These callbacks should be provided by platform-specific remoteproc
|
These callbacks should be provided by platform-specific remoteproc
|
||||||
drivers:
|
drivers::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct rproc_ops - platform-specific device handlers
|
* struct rproc_ops - platform-specific device handlers
|
||||||
@@ -179,7 +227,8 @@ the exact virtqueue index to look in is optional: it is easy (and not
|
|||||||
too expensive) to go through the existing virtqueues and look for new buffers
|
too expensive) to go through the existing virtqueues and look for new buffers
|
||||||
in the used rings.
|
in the used rings.
|
||||||
|
|
||||||
6. Binary Firmware Structure
|
Binary Firmware Structure
|
||||||
|
=========================
|
||||||
|
|
||||||
At this point remoteproc only supports ELF32 firmware binaries. However,
|
At this point remoteproc only supports ELF32 firmware binaries. However,
|
||||||
it is quite expected that other platforms/devices which we'd want to
|
it is quite expected that other platforms/devices which we'd want to
|
||||||
@@ -207,7 +256,7 @@ resource entries that publish the existence of supported features
|
|||||||
or configurations by the remote processor, such as trace buffers and
|
or configurations by the remote processor, such as trace buffers and
|
||||||
supported virtio devices (and their configurations).
|
supported virtio devices (and their configurations).
|
||||||
|
|
||||||
The resource table begins with this header:
|
The resource table begins with this header::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct resource_table - firmware resource table header
|
* struct resource_table - firmware resource table header
|
||||||
@@ -229,7 +278,7 @@ struct resource_table {
|
|||||||
} __packed;
|
} __packed;
|
||||||
|
|
||||||
Immediately following this header are the resource entries themselves,
|
Immediately following this header are the resource entries themselves,
|
||||||
each of which begins with the following resource entry header:
|
each of which begins with the following resource entry header::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* struct fw_rsc_hdr - firmware resource entry header
|
* struct fw_rsc_hdr - firmware resource entry header
|
||||||
@@ -252,7 +301,7 @@ is expected, where the firmware requests a resource, and once allocated,
|
|||||||
the host should provide back its details (e.g. address of an allocated
|
the host should provide back its details (e.g. address of an allocated
|
||||||
memory region).
|
memory region).
|
||||||
|
|
||||||
Here are the various resource types that are currently supported:
|
Here are the various resource types that are currently supported::
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* enum fw_resource_type - types of resource entries
|
* enum fw_resource_type - types of resource entries
|
||||||
@@ -286,7 +335,8 @@ We also expect that platform-specific resource entries will show up
|
|||||||
at some point. When that happens, we could easily add a new RSC_PLATFORM
|
at some point. When that happens, we could easily add a new RSC_PLATFORM
|
||||||
type, and hand those resources to the platform-specific rproc driver to handle.
|
type, and hand those resources to the platform-specific rproc driver to handle.
|
||||||
|
|
||||||
7. Virtio and remoteproc
|
Virtio and remoteproc
|
||||||
|
=====================
|
||||||
|
|
||||||
The firmware should provide remoteproc information about virtio devices
|
The firmware should provide remoteproc information about virtio devices
|
||||||
that it supports, and their configurations: a RSC_VDEV resource entry
|
that it supports, and their configurations: a RSC_VDEV resource entry
|
||||||
|
|||||||
@@ -1,13 +1,13 @@
|
|||||||
|
===============================
|
||||||
rfkill - RF kill switch support
|
rfkill - RF kill switch support
|
||||||
===============================
|
===============================
|
||||||
|
|
||||||
1. Introduction
|
|
||||||
2. Implementation details
|
|
||||||
3. Kernel API
|
|
||||||
4. Userspace support
|
|
||||||
|
|
||||||
|
.. contents::
|
||||||
|
:depth: 2
|
||||||
|
|
||||||
1. Introduction
|
Introduction
|
||||||
|
============
|
||||||
|
|
||||||
The rfkill subsystem provides a generic interface to disabling any radio
|
The rfkill subsystem provides a generic interface to disabling any radio
|
||||||
transmitter in the system. When a transmitter is blocked, it shall not
|
transmitter in the system. When a transmitter is blocked, it shall not
|
||||||
@@ -21,17 +21,24 @@ aircraft.
|
|||||||
The rfkill subsystem has a concept of "hard" and "soft" block, which
|
The rfkill subsystem has a concept of "hard" and "soft" block, which
|
||||||
differ little in their meaning (block == transmitters off) but rather in
|
differ little in their meaning (block == transmitters off) but rather in
|
||||||
whether they can be changed or not:
|
whether they can be changed or not:
|
||||||
- hard block: read-only radio block that cannot be overridden by software
|
|
||||||
- soft block: writable radio block (need not be readable) that is set by
|
- hard block
|
||||||
|
read-only radio block that cannot be overridden by software
|
||||||
|
|
||||||
|
- soft block
|
||||||
|
writable radio block (need not be readable) that is set by
|
||||||
the system software.
|
the system software.
|
||||||
|
|
||||||
The rfkill subsystem has two parameters, rfkill.default_state and
|
The rfkill subsystem has two parameters, rfkill.default_state and
|
||||||
rfkill.master_switch_mode, which are documented in admin-guide/kernel-parameters.rst.
|
rfkill.master_switch_mode, which are documented in
|
||||||
|
admin-guide/kernel-parameters.rst.
|
||||||
|
|
||||||
|
|
||||||
2. Implementation details
|
Implementation details
|
||||||
|
======================
|
||||||
|
|
||||||
The rfkill subsystem is composed of three main components:
|
The rfkill subsystem is composed of three main components:
|
||||||
|
|
||||||
* the rfkill core,
|
* the rfkill core,
|
||||||
* the deprecated rfkill-input module (an input layer handler, being
|
* the deprecated rfkill-input module (an input layer handler, being
|
||||||
replaced by userspace policy code) and
|
replaced by userspace policy code) and
|
||||||
@@ -55,7 +62,8 @@ use the return value of rfkill_set_hw_state() unless the hardware actually
|
|||||||
keeps track of soft and hard block separately.
|
keeps track of soft and hard block separately.
|
||||||
|
|
||||||
|
|
||||||
3. Kernel API
|
Kernel API
|
||||||
|
==========
|
||||||
|
|
||||||
|
|
||||||
Drivers for radio transmitters normally implement an rfkill driver.
|
Drivers for radio transmitters normally implement an rfkill driver.
|
||||||
@@ -69,7 +77,7 @@ For some platforms, it is possible that the hardware state changes during
|
|||||||
suspend/hibernation, in which case it will be necessary to update the rfkill
|
suspend/hibernation, in which case it will be necessary to update the rfkill
|
||||||
core with the current state is at resume time.
|
core with the current state is at resume time.
|
||||||
|
|
||||||
To create an rfkill driver, driver's Kconfig needs to have
|
To create an rfkill driver, driver's Kconfig needs to have::
|
||||||
|
|
||||||
depends on RFKILL || !RFKILL
|
depends on RFKILL || !RFKILL
|
||||||
|
|
||||||
@@ -87,7 +95,8 @@ RFKill provides per-switch LED triggers, which can be used to drive LEDs
|
|||||||
according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
|
according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
|
||||||
|
|
||||||
|
|
||||||
5. Userspace support
|
Userspace support
|
||||||
|
=================
|
||||||
|
|
||||||
The recommended userspace interface to use is /dev/rfkill, which is a misc
|
The recommended userspace interface to use is /dev/rfkill, which is a misc
|
||||||
character device that allows userspace to obtain and set the state of rfkill
|
character device that allows userspace to obtain and set the state of rfkill
|
||||||
@@ -112,7 +121,7 @@ rfkill core framework.
|
|||||||
Additionally, each rfkill device is registered in sysfs and emits uevents.
|
Additionally, each rfkill device is registered in sysfs and emits uevents.
|
||||||
|
|
||||||
rfkill devices issue uevents (with an action of "change"), with the following
|
rfkill devices issue uevents (with an action of "change"), with the following
|
||||||
environment variables set:
|
environment variables set::
|
||||||
|
|
||||||
RFKILL_NAME
|
RFKILL_NAME
|
||||||
RFKILL_STATE
|
RFKILL_STATE
|
||||||
|
|||||||
@@ -1,7 +1,9 @@
|
|||||||
Started by Paul Jackson <pj@sgi.com>
|
====================
|
||||||
|
|
||||||
The robust futex ABI
|
The robust futex ABI
|
||||||
--------------------
|
====================
|
||||||
|
|
||||||
|
:Author: Started by Paul Jackson <pj@sgi.com>
|
||||||
|
|
||||||
|
|
||||||
Robust_futexes provide a mechanism that is used in addition to normal
|
Robust_futexes provide a mechanism that is used in addition to normal
|
||||||
futexes, for kernel assist of cleanup of held locks on task exit.
|
futexes, for kernel assist of cleanup of held locks on task exit.
|
||||||
@@ -32,7 +34,7 @@ probably causing deadlock or other such failure of the other threads
|
|||||||
waiting on the same locks.
|
waiting on the same locks.
|
||||||
|
|
||||||
A thread that anticipates possibly using robust_futexes should first
|
A thread that anticipates possibly using robust_futexes should first
|
||||||
issue the system call:
|
issue the system call::
|
||||||
|
|
||||||
asmlinkage long
|
asmlinkage long
|
||||||
sys_set_robust_list(struct robust_list_head __user *head, size_t len);
|
sys_set_robust_list(struct robust_list_head __user *head, size_t len);
|
||||||
@@ -91,7 +93,7 @@ that lock using the futex mechanism.
|
|||||||
When a thread has invoked the above system call to indicate it
|
When a thread has invoked the above system call to indicate it
|
||||||
anticipates using robust_futexes, the kernel stores the passed in 'head'
|
anticipates using robust_futexes, the kernel stores the passed in 'head'
|
||||||
pointer for that task. The task may retrieve that value later on by
|
pointer for that task. The task may retrieve that value later on by
|
||||||
using the system call:
|
using the system call::
|
||||||
|
|
||||||
asmlinkage long
|
asmlinkage long
|
||||||
sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
|
sys_get_robust_list(int pid, struct robust_list_head __user **head_ptr,
|
||||||
@@ -135,6 +137,7 @@ manipulating this list), the user code must observe the following
|
|||||||
protocol on 'lock entry' insertion and removal:
|
protocol on 'lock entry' insertion and removal:
|
||||||
|
|
||||||
On insertion:
|
On insertion:
|
||||||
|
|
||||||
1) set the 'list_op_pending' word to the address of the 'lock entry'
|
1) set the 'list_op_pending' word to the address of the 'lock entry'
|
||||||
to be inserted,
|
to be inserted,
|
||||||
2) acquire the futex lock,
|
2) acquire the futex lock,
|
||||||
@@ -143,6 +146,7 @@ On insertion:
|
|||||||
4) clear the 'list_op_pending' word.
|
4) clear the 'list_op_pending' word.
|
||||||
|
|
||||||
On removal:
|
On removal:
|
||||||
|
|
||||||
1) set the 'list_op_pending' word to the address of the 'lock entry'
|
1) set the 'list_op_pending' word to the address of the 'lock entry'
|
||||||
to be removed,
|
to be removed,
|
||||||
2) remove the lock entry for this lock from the 'head' list,
|
2) remove the lock entry for this lock from the 'head' list,
|
||||||
|
|||||||
@@ -1,4 +1,8 @@
|
|||||||
Started by: Ingo Molnar <mingo@redhat.com>
|
========================================
|
||||||
|
A description of what robust futexes are
|
||||||
|
========================================
|
||||||
|
|
||||||
|
:Started by: Ingo Molnar <mingo@redhat.com>
|
||||||
|
|
||||||
Background
|
Background
|
||||||
----------
|
----------
|
||||||
@@ -163,7 +167,7 @@ Implementation details
|
|||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
The patch adds two new syscalls: one to register the userspace list, and
|
The patch adds two new syscalls: one to register the userspace list, and
|
||||||
one to query the registered list pointer:
|
one to query the registered list pointer::
|
||||||
|
|
||||||
asmlinkage long
|
asmlinkage long
|
||||||
sys_set_robust_list(struct robust_list_head __user *head,
|
sys_set_robust_list(struct robust_list_head __user *head,
|
||||||
@@ -185,7 +189,7 @@ straightforward. The kernel doesn't have any internal distinction between
|
|||||||
robust and normal futexes.
|
robust and normal futexes.
|
||||||
|
|
||||||
If a futex is found to be held at exit time, the kernel sets the
|
If a futex is found to be held at exit time, the kernel sets the
|
||||||
following bit of the futex word:
|
following bit of the futex word::
|
||||||
|
|
||||||
#define FUTEX_OWNER_DIED 0x40000000
|
#define FUTEX_OWNER_DIED 0x40000000
|
||||||
|
|
||||||
@@ -193,7 +197,7 @@ and wakes up the next futex waiter (if any). User-space does the rest of
|
|||||||
the cleanup.
|
the cleanup.
|
||||||
|
|
||||||
Otherwise, robust futexes are acquired by glibc by putting the TID into
|
Otherwise, robust futexes are acquired by glibc by putting the TID into
|
||||||
the futex field atomically. Waiters set the FUTEX_WAITERS bit:
|
the futex field atomically. Waiters set the FUTEX_WAITERS bit::
|
||||||
|
|
||||||
#define FUTEX_WAITERS 0x80000000
|
#define FUTEX_WAITERS 0x80000000
|
||||||
|
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user