![]() |
|
![]() |
Linux Device Drivers, 2nd EditionBy Alessandro Rubini & Jonathan Corbet2nd Edition June 2001 0-59600-008-1, Order Number: 0081 586 pages, $39.95 |
Chapter 13
mmap and DMAContents:
Memory Management in Linux
The mmap Device Operation
The kiobuf Interface
Direct Memory Access and Bus Mastering
Backward Compatibility
Quick ReferenceThis chapter delves into the area of Linux memory management, with an emphasis on techniques that are useful to the device driver writer. The material in this chapter is somewhat advanced, and not everybody will need a grasp of it. Nonetheless, many tasks can only be done through digging more deeply into the memory management subsystem; it also provides an interesting look into how an important part of the kernel works.
The material in this chapter is divided into three sections. The first covers the implementation of the mmapsystem call, which allows the mapping of device memory directly into a user process's address space. We then cover the kernel kiobuf mechanism, which provides direct access to user memory from kernel space. The kiobuf system may be used to implement "raw I/O'' for certain kinds of devices. The final section covers direct memory access (DMA) I/O operations, which essentially provide peripherals with direct access to system memory.
Of course, all of these techniques require an understanding of how Linux memory management works, so we start with an overview of that subsystem.
Memory Management in Linux
Address Types
![]()
Figure 13-1. Address types used in Linux
- User virtual addresses
- Physical addresses
- Bus addresses
The addresses used between peripheral buses and memory. Often they are the same as the physical addresses used by the processor, but that is not necessarily the case. Bus addresses are highly architecture dependent, of course.
- Kernel logical addresses
These make up the normal address space of the kernel. These addresses map most or all of main memory, and are often treated as if they were physical addresses. On most architectures, logical addresses and their associated physical addresses differ only by a constant offset. Logical addresses use the hardware's native pointer size, and thus may be unable to address all of physical memory on heavily equipped 32-bit systems. Logical addresses are usually stored in variables of type unsigned long or void *. Memory returned from kmalloc has a logical address.
- Kernel virtual addresses
These differ from logical addresses in that they do not necessarily have a direct mapping to physical addresses. All logical addresses are kernel virtual addresses; memory allocated by vmalloc also has a virtual address (but no direct physical mapping). The function kmap, described later in this chapter, also returns virtual addresses. Virtual addresses are usually stored in pointer variables.
If you have a logical address, the macro __pa() (defined in <asm/page.h>) will return its associated physical address. Physical addresses can be mapped back to logical addresses with __va(), but only for low-memory pages.
High and Low Memory
The difference between logical and kernel virtual addresses is highlighted on 32-bit systems that are equipped with large amounts of memory. With 32 bits, it is possible to address 4 GB of memory. Linux on 32-bit systems has, until recently, been limited to substantially less memory than that, however, because of the way it sets up the virtual address space. The system was unable to handle more memory than it could set up logical addresses for, since it needed directly mapped kernel addresses for all memory.
- Low memory
- High memory
We will point out high-memory limitations as we come to them in this chapter.
The Memory Map and struct page
Historically, the kernel has used logical addresses to refer to explicit pages of memory. The addition of high-memory support, however, has exposed an obvious problem with that approach -- logical addresses are not available for high memory. Thus kernel functions that deal with memory are increasingly using pointers to struct page instead. This data structure is used to keep track of just about everything the kernel needs to know about physical memory; there is one struct page for each physical page on the system. Some of the fields of this structure include the following:
- atomic_t count;
- wait_queue_head_t wait;
- void *virtual;
- unsigned long flags;
A set of bit flags describing the status of the page. These include PG_locked, which indicates that the page has been locked in memory, and PG_reserved, which prevents the memory management system from working with the page at all.
- struct page *virt_to_page(void *kaddr);
- void *page_address(struct page *page);
- #include <linux/highmem.h>
- void *kmap(struct page *page);
- void kunmap(struct page *page);
kmap returns a kernel virtual address for any page in the system. For low-memory pages, it just returns the logical address of the page; for high-memory pages, kmapcreates a special mapping. Mappings created with kmap should always be freed with kunmap; a limited number of such mappings is available, so it is better not to hold on to them for too long. kmap calls are additive, so if two or more functions both call kmap on the same page the right thing happens. Note also that kmap can sleep if no mappings are available.
We will see some uses of these functions when we get into the example code later in this chapter.
Page Tables
The Linux kernel manages three levels of page tables in order to map virtual addresses to physical addresses. The multiple levels allow the memory range to be sparsely populated; modern systems will spread a process out across a large range of virtual memory. It makes sense to do things that way; it allows for runtime flexibility in how things are laid out.
![]()
Figure 13-2. The three levels of Linux page tables
- Page Directory (PGD)
The top-level page table. The PGD is an array of pgd_t items, each of which points to a second-level page table. Each process has its own page directory, and there is one for kernel space as well. You can think of the page directory as a page-aligned array of pgd_ts.
- Page mid-level Directory (PMD)
The second-level table. The PMD is a page-aligned array of pmd_t items. A pmd_t is a pointer to the third-level page table. Two-level processors have no physical PMD; they declare their PMD as an array with a single element, whose value is the PMD itself -- we'll see in a while how this is handled in C and how the compiler optimizes this level away.
- Page Table
The types introduced in this list are defined in <asm/page.h>, which must be included by every source file that plays with paging.
Irrespective of the mechanisms used by the CPU, the Linux software implementation is based on three-level page tables, and the following symbols are used to access them. Both <asm/page.h> and <asm/pgtable.h> must be included for all of them to be accessible.
- PTRS_PER_PGD
- PTRS_PER_PMD
- PTRS_PER_PTE
The size of each table. Two-level processors set PTRS_PER_PMD to 1, to avoid dealing with the middle level.
- unsigned pgd_val(pgd_t pgd)
- unsigned pmd_val(pmd_t pmd)
- unsigned pte_val(pte_t pte)
These three macros are used to retrieve the unsigned value from the typed data item. The actual type used varies depending on the underlying architecture and kernel configuration options; it is usually either unsigned long or, on 32-bit processors supporting high memory, unsigned long long. SPARC64 processors use unsigned int. The macros help in using strict data typing in source code without introducing computational overhead.
- pgd_t * pgd_offset(struct mm_struct * mm, unsigned long address)
- pmd_t * pmd_offset(pgd_t * dir, unsigned long address)
- pte_t * pte_offset(pmd_t * dir, unsigned long address)
These inline functions[50] are used to retrieve the pgd, pmd, and pte entries associated with address. Page-table lookup begins with a pointer to struct mm_struct. The pointer associated with the memory map of the current process is current->mm, while the pointer to kernel space is described by &init_mm. Two-level processors define pmd_offset(dir,add) as (pmd_t *)dir, thus folding the pmd over the pgd. Functions that scan page tables are always declared as inline, and the compiler optimizes out any pmd lookup.
- struct page *pte_page(pte_t pte)
This function returns a pointer to the struct page entry for the page in this page-table entry. Code that deals with page-tables will generally want to use pte_pagerather than pte_val, since pte_page deals with the processor-dependent format of the page-table entry and returns the struct page pointer, which is usually what's needed.
- pte_present(pte_t pte)
Each process in the system has a struct mm_struct structure, which contains its page tables and a great many other things. It also contains a spinlock called page_table_lock, which should be held while traversing or modifying the page tables.
Virtual Memory Areas
Although paging sits at the lowest level of memory management, something more is necessary before you can use the computer's resources efficiently. The kernel needs a higher-level mechanism to handle the way a process sees its memory. This mechanism is implemented in Linux by means of virtual memory areas, which are typically referred to as areas or VMAs.
morgana.root# cat /proc/1/maps # look at init 08048000-0804e000 r-xp 00000000 08:01 51297 /sbin/init # text 0804e000-08050000 rw-p 00005000 08:01 51297 /sbin/init # data 08050000-08054000 rwxp 00000000 00:00 0 # zero-mapped bss 40000000-40013000 r-xp 00000000 08:01 39003 /lib/ld-2.1.3.so # text 40013000-40014000 rw-p 00012000 08:01 39003 /lib/ld-2.1.3.so # data 40014000-40015000 rw-p 00000000 00:00 0 # bss for ld.so 4001b000-40108000 r-xp 00000000 08:01 39006 /lib/libc-2.1.3.so # text 40108000-4010c000 rw-p 000ec000 08:01 39006 /lib/libc-2.1.3.so # data 4010c000-40110000 rw-p 00000000 00:00 0 # bss for libc.so bfffe000-c0000000 rwxp fffff000 00:00 0 # zero-mapped stack morgana.root# rsh wolf head /proc/self/maps #### alpha-axp: static ecoff 000000011fffe000-0000000120000000 rwxp 0000000000000000 00:00 0 # stack 0000000120000000-0000000120014000 r-xp 0000000000000000 08:03 2844 # text 0000000140000000-0000000140002000 rwxp 0000000000014000 08:03 2844 # data 0000000140002000-0000000140008000 rwxp 0000000000000000 00:00 0 # bssThe fields in each line are as follows:
start-end perm offset major:minor inode image.
- start
- end
The beginning and ending virtual addresses for this memory area.
- perm
- offset
- major
- minor
- inode
- image
The name of the file (usually an executable image) that has been mapped.
A driver that implements the mmap method needs to fill a VMA structure in the address space of the process mapping the device. The driver writer should therefore have at least a minimal understanding of VMAs in order to use them.
Let's look at the most important fields in struct vm_area_struct (defined in <linux/mm.h>). These fields may be used by device drivers in their mmap implementation. Note that the kernel maintains lists and trees of VMAs to optimize area lookup, and several fields of vm_area_struct are used to maintain this organization. VMAs thus can't be created at will by a driver, or the structures will break. The main fields of VMAs are as follows (note the similarity between these fields and the /proc output we just saw):
- unsigned long vm_start;
- unsigned long vm_end;
- struct file *vm_file;
A pointer to the struct file structure associated with this area (if any).
- unsigned long vm_pgoff;
- unsigned long vm_flags;
A set of flags describing this area. The flags of the most interest to device driver writers are VM_IO and VM_RESERVED. VM_IO marks a VMA as being a memory-mapped I/O region. Among other things, the VM_IO flag will prevent the region from being included in process core dumps. VM_RESERVED tells the memory management system not to attempt to swap out this VMA; it should be set in most device mappings.
- struct vm_operations_struct *vm_ops;
- void *vm_private_data;
A field that may be used by the driver to store its own information.
- void (*open)(struct vm_area_struct *vma);
- void (*close)(struct vm_area_struct *vma);
- void (*unmap)(struct vm_area_struct *vma, unsigned long addr, size_t len);
- void (*protect)(struct vm_area_struct *vma, unsigned long, size_t, unsigned int newprot);
- int (*sync)(struct vm_area_struct *vma, unsigned long, size_t, unsigned int flags);
- struct page *(*nopage)(struct vm_area_struct *vma, unsigned long address, int write_access);
- struct page *(*wppage)(struct vm_area_struct *vma, unsigned long address, struct page *page);
- int (*swapout)(struct page *page, struct file *file);
That concludes our overview of Linux memory management data structures. With that out of the way, we can now proceed to the implementation of the mmap system call.
The mmap Device Operation
Memory mapping is one of the most interesting features of modern Unix systems. As far as drivers are concerned, memory mapping can be used to provide user programs with direct access to device memory.
cat /proc/731/maps 08048000-08327000 r-xp 00000000 08:01 55505 /usr/X11R6/bin/XF86_SVGA 08327000-08369000 rw-p 002de000 08:01 55505 /usr/X11R6/bin/XF86_SVGA 40015000-40019000 rw-s fe2fc000 08:01 10778 /dev/mem 40131000-40141000 rw-s 000a0000 08:01 10778 /dev/mem 40141000-40941000 rw-s f4000000 08:01 10778 /dev/mem ...000a0000-000bffff : Video RAM area f4000000-f4ffffff : Matrox Graphics, Inc. MGA G200 AGP fe2fc000-fe2fffff : Matrox Graphics, Inc. MGA G200 AGPThe system call is declared as follows (as described in the mmap(2) manual page):
mmap (caddr_t addr, size_t len, int prot, int flags, int fd, off_t offset)On the other hand, the file operation is declared as
int (*mmap) (struct file *filp, struct vm_area_struct *vma);Using remap_page_range
The job of building new page tables to map a range of physical addresses is handled by remap_page_range, which has the following prototype:
int remap_page_range(unsigned long virt_add, unsigned long phys_add, unsigned long size, pgprot_t prot);
- virt_add
- phys_add
- size
- prot
The arguments to remap_page_range are fairly straightforward, and most of them are already provided to you in the VMA when your mmap method is called. The one complication has to do with caching: usually, references to device memory should not be cached by the processor. Often the system BIOS will set things up properly, but it is also possible to disable caching of specific VMAs via the protection field. Unfortunately, disabling caching at this level is highly processor dependent. The curious reader may wish to look at the function pgprot_noncached from drivers/char/mem.c to see what's involved. We won't discuss the topic further here.
A Simple Implementation
#include <linux/mm.h> int simple_mmap(struct file *filp, struct vm_area_struct *vma) { unsigned long offset = vma->vm_pgoff << PAGE_SHIFT; if (offset >= _ _pa(high_memory) || (filp->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; vma->vm_flags |= VM_RESERVED; if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, vma->vm_page_prot)) return -EAGAIN; return 0; }The /dev/mem code checks to see if the requested offset (stored in vma->vm_pgoff) is beyond physical memory; if so, the VM_IO VMA flag is set to mark the area as being I/O memory. The VM_RESERVED flag is always set to keep the system from trying to swap this area out. Then it is just a matter of calling remap_page_range to create the necessary page tables.
Adding VMA Operations
As we have seen, the vm_area_struct structure contains a set of operations that may be applied to the VMA. Now we'll look at providing those operations in a simple way; a more detailed example will follow later on.
Here, we will provide open and close operations for our VMA. These operations will be called anytime a process opens or closes the VMA; in particular, the open method will be invoked anytime a process forks and creates a new reference to the VMA. The open and close VMA methods are called in addition to the processing performed by the kernel, so they need not reimplement any of the work done there. They exist as a way for drivers to do any additional processing that they may require.
void simple_vma_open(struct vm_area_struct *vma) { MOD_INC_USE_COUNT; } void simple_vma_close(struct vm_area_struct *vma) { MOD_DEC_USE_COUNT; } static struct vm_operations_struct simple_remap_vm_ops = { open: simple_vma_open, close: simple_vma_close, }; int simple_remap_mmap(struct file *filp, struct vm_area_struct *vma) { unsigned long offset = VMA_OFFSET(vma); if (offset >= __pa(high_memory) || (filp->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; vma->vm_flags |= VM_RESERVED; if (remap_page_range(vma->vm_start, offset, vma->vm_end-vma->vm_start, vma->vm_page_prot)) return -EAGAIN; vma->vm_ops = &simple_remap_vm_ops; simple_vma_open(vma); return 0; }Mapping Memory with nopage
Although remap_page_range works well for many, if not most, driver mmap implementations, sometimes it is necessary to be a little more flexible. In such situations, an implementation using the nopage VMA method may be called for.
The nopage method, remember, has the following prototype:
struct page (*nopage)(struct vm_area_struct *vma, unsigned long address, int write_access);get_page(struct page *pageptr);One situation in which the nopage approach is useful can be brought about by the mremap system call, which is used by applications to change the bounding addresses of a mapped region. If the driver wants to be able to deal with mremap, the previous implementation won't work correctly, because there's no way for the driver to know that the mapped region has changed.
An implementation of /dev/mem using nopage looks like the following:
struct page *simple_vma_nopage(struct vm_area_struct *vma, unsigned long address, int write_access) { struct page *pageptr; unsigned long physaddr = address - vma->vm_start + VMA_OFFSET(vma); pageptr = virt_to_page(__va(physaddr)); get_page(pageptr); return pageptr; } int simple_nopage_mmap(struct file *filp, struct vm_area_struct *vma) { unsigned long offset = VMA_OFFSET(vma); if (offset >= __pa(high_memory) || (filp->f_flags & O_SYNC)) vma->vm_flags |= VM_IO; vma->vm_flags |= VM_RESERVED; vma->vm_ops = &simple_nopage_vm_ops; simple_vma_open(vma); return 0; }Note that this implementation will work for ISA memory regions but not for those on the PCI bus. PCI memory is mapped above the highest system memory, and there are no entries in the system memory map for those addresses. Because there is thus no struct page to return a pointer to, nopagecannot be used in these situations; you must, instead, use remap_page_range.
Remapping Specific I/O Regions
All the examples we've seen so far are reimplementations of /dev/mem; they remap physical addresses into user space. The typical driver, however, wants to map only the small address range that applies to its peripheral device, not all of memory. In order to map to user space only a subset of the whole memory range, the driver needs only to play with the offsets. The following lines will do the trick for a driver mapping a region of simple_region_size bytes, beginning at physical address simple_region_start (which should be page aligned).
unsigned long off = vma->vm_pgoff << PAGE_SHIFT; unsigned long physical = simple_region_start + off; unsigned long vsize = vma->vm_end - vma->vm_start; unsigned long psize = simple_region_size - off; if (vsize > psize) return -EINVAL; /* spans too high */ remap_page_range(vma_>vm_start, physical, vsize, vma->vm_page_prot);Note that the user process can always use mremapto extend its mapping, possibly past the end of the physical device area. If your driver has no nopage method, it will never be notified of this extension, and the additional area will map to the zero page. As a driver writer, you may well want to prevent this sort of behavior; mapping the zero page onto the end of your region is not an explicitly bad thing to do, but it is highly unlikely that the programmer wanted that to happen.
struct page *simple_nopage(struct vm_area_struct *vma, unsigned long address, int write_access); { return NOPAGE_SIGBUS; /* send a SIGBUS */}Remapping RAM
Of course, a more thorough implementation could check to see if the faulting address is within the device area, and perform the remapping if that is the case. Once again, however, nopagewill not work with PCI memory areas, so extension of PCI mappings is not possible. In Linux, a page of physical addresses is marked as "reserved'' in the memory map to indicate that it is not available for memory management. On the PC, for example, the range between 640 KB and 1 MB is marked as reserved, as are the pages that host the kernel code itself.
morgana.root# ./mapper /dev/mem 0x10000 0x1000 | od -Ax -t x1 mapped "/dev/mem" from 65536 to 69632 000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * 001000Remapping RAM with the nopage method
The way to map real RAM to user space is to use vm_ops->nopage to deal with page faults one at a time. A sample implementation is part of the scullp module, introduced in Chapter 7, "Getting Hold of Memory".
int scullp_mmap(struct file *filp, struct vm_area_struct *vma) { struct inode *inode = INODE_FROM_F(filp); /* refuse to map if order is not 0 */ if (scullp_devices[MINOR(inode->i_rdev)].order) return -ENODEV; /* don't do anything here: "nopage" will fill the holes */ vma->vm_ops = &scullp_vm_ops; vma->vm_flags |= VM_RESERVED; vma->vm_private_data = scullp_devices + MINOR(inode->i_rdev); scullp_vma_open(vma); return 0; }open and close simply keep track of these counts and are defined as follows:
void scullp_vma_open(struct vm_area_struct *vma) { ScullP_Dev *dev = scullp_vma_to_dev(vma); dev->vmas++; MOD_INC_USE_COUNT; } void scullp_vma_close(struct vm_area_struct *vma) { ScullP_Dev *dev = scullp_vma_to_dev(vma); dev->vmas--; MOD_DEC_USE_COUNT; }struct page *scullp_vma_nopage(struct vm_area_struct *vma, unsigned long address, int write) { unsigned long offset; ScullP_Dev *ptr, *dev = scullp_vma_to_dev(vma); struct page *page = NOPAGE_SIGBUS; void *pageptr = NULL; /* default to "missing" */ down(&dev->sem); offset = (address - vma->vm_start) + VMA_OFFSET(vma); if (offset >= dev->size) goto out; /* out of range */ /* * Now retrieve the scullp device from the list, then the page. * If the device has holes, the process receives a SIGBUS when * accessing the hole. */ offset >>= PAGE_SHIFT; /* offset is a number of pages */ for (ptr = dev; ptr && offset >= dev->qset;) { ptr = ptr->next; offset -= dev->qset; } if (ptr && ptr->data) pageptr = ptr->data[offset]; if (!pageptr) goto out; /* hole or end-of-file */ page = virt_to_page(pageptr); /* got it, now increment the count */ get_page(page); out: up(&dev->sem); return page; }The scullp device now works as expected, as you can see in this sample output from the mapper utility. Here we send a directory listing of /dev (which is long) to the scullp device, and then use the mapper utility to look at pieces of that listing with mmap.
morgana% ls -l /dev > /dev/scullp morgana% ./mapper /dev/scullp 0 140 mapped "/dev/scullp" from 0 to 140 total 77 -rwxr-xr-x 1 root root 26689 Mar 2 2000 MAKEDEV crw-rw-rw- 1 root root 14, 14 Aug 10 20:55 admmidi0 morgana% ./mapper /dev/scullp 8192 200 mapped "/dev/scullp" from 8192 to 8392 0 crw -- -- -- - 1 root root 113, 1 Mar 26 1999 cum1 crw -- -- -- - 1 root root 113, 2 Mar 26 1999 cum2 crw -- -- -- - 1 root root 113, 3 Mar 26 1999 cum3Remapping Virtual Addresses
Although it's rarely necessary, it's interesting to see how a driver can map a virtual address to user space using mmap. A true virtual address, remember, is an address returned by a function like vmalloc or kmap -- that is, a virtual address mapped in the kernel page tables. The code in this section is taken from scullv, which is the module that works like scullp but allocates its storage through vmalloc.
Most of the work of vmalloc is building page tables to access allocated pages as a continuous address range. The nopage method, instead, must pull the page tables back apart in order to return a struct page pointer to the caller. Therefore, the nopageimplementation for scullv must scan the page tables to retrieve the page map entry associated with the page.
pgd_t *pgd; pmd_t *pmd; pte_t *pte; unsigned long lpage; /* * After scullv lookup, "page" is now the address of the page * needed by the current process. Since it's a vmalloc address, * first retrieve the unsigned long value to be looked up * in page tables. */ lpage = VMALLOC_VMADDR(pageptr); spin_lock(&init_mm.page_table_lock); pgd = pgd_offset(&init_mm, lpage); pmd = pmd_offset(pgd, lpage); pte = pte_offset(pmd, lpage); page = pte_page(*pte); spin_unlock(&init_mm.page_table_lock); /* got it, now increment the count */ get_page(page); out: up(&dev->sem); return page;Based on this discussion, you might also want to map addresses returned by ioremap to user space. This mapping is easily accomplished because you can use remap_page_range directly, without implementing methods for virtual memory areas. In other words, remap_page_range is already usable for building new page tables that map I/O memory to user space; there's no need to look in the kernel page tables built by vremap as we did in scullv.
The kiobuf Interface
As of version 2.3.12, the Linux kernel supports an I/O abstraction called the kernel I/O buffer, or kiobuf. The kiobuf interface is intended to hide much of the complexity of the virtual memory system from device drivers (and other parts of the system that do I/O). Many features are planned for kiobufs, but their primary use in the 2.4 kernel is to facilitate the mapping of user-space buffers into the kernel.
The kiobuf Structure
Any code that works with kiobufs must include <linux/iobuf.h>. This file defines struct kiobuf, which is the heart of the kiobuf interface. This structure describes an array of pages that make up an I/O operation; its fields include the following:
- int nr_pages;
- int length;
- int offset;
- struct page **maplist;
An array of page structures, one for each page of data in the kiobuf
void kiobuf_init(struct kiobuf *iobuf);Usually kiobufs are allocated in groups as part of a kernel I/O vector, or kiovec. A kiovec can be allocated and initialized in one step with a call to alloc_kiovec:
int alloc_kiovec(int nr, struct kiobuf **iovec);void free_kiovec(int nr, struct kiobuf **);The kernel provides a pair of functions for locking and unlocking the pages mapped in a kiovec:
int lock_kiovec(int nr, struct kiobuf *iovec[], int wait); int unlock_kiovec(int nr, struct kiobuf *iovec[]);Mapping User-Space Buffers and Raw I/O
Unix systems have long provided a "raw'' interface to some devices -- block devices in particular -- which performs I/O directly from a user-space buffer and avoids copying data through the kernel. In some cases much improved performance can be had in this manner, especially if the data being transferred will not be used again in the near future. For example, disk backups typically read a great deal of data from the disk exactly once, then forget about it. Running the backup via a raw interface will avoid filling the system buffer cache with useless data.
In this section, we add a raw I/O capability to the sbull sample block driver. When kiobufs are available, sbull actually registers two devices. The block sbull device was examined in detail in Chapter 12, "Loading Block Drivers". What we didn't see in that chapter was a second, char device (called sbullr), which provides raw access to the RAM-disk device. Thus, /dev/sbull0 and /dev/sbullr0 access the same memory; the former using the traditional, buffered mode and the second providing raw access via the kiobuf mechanism.
# define SBULLR_SECTOR 512 /* insist on this */ # define SBULLR_SECTOR_MASK (SBULLR_SECTOR - 1) # define SBULLR_SECTOR_SHIFT 9ssize_t sbullr_read(struct file *filp, char *buf, size_t size, loff_t *off) { Sbull_Dev *dev = sbull_devices + MINOR(filp->f_dentry->d_inode->i_rdev); return sbullr_transfer(dev, buf, size, off, READ); } ssize_t sbullr_write(struct file *filp, const char *buf, size_t size, loff_t *off) { Sbull_Dev *dev = sbull_devices + MINOR(filp->f_dentry->d_inode->i_rdev); return sbullr_transfer(dev, (char *) buf, size, off, WRITE); }static int sbullr_transfer (Sbull_Dev *dev, char *buf, size_t count, loff_t *offset, int rw) { struct kiobuf *iobuf; int result; /* Only block alignment and size allowed */ if ((*offset & SBULLR_SECTOR_MASK) || (count & SBULLR_SECTOR_MASK)) return -EINVAL; if ((unsigned long) buf & SBULLR_SECTOR_MASK) return -EINVAL; /* Allocate an I/O vector */ result = alloc_kiovec(1, &iobuf); if (result) return result; /* Map the user I/O buffer and do the I/O. */ result = map_user_kiobuf(rw, iobuf, (unsigned long) buf, count); if (result) { free_kiovec(1, &iobuf); return result; } spin_lock(&dev->lock); result = sbullr_rw_iovec(dev, iobuf, rw, *offset >> SBULLR_SECTOR_SHIFT, count >> SBULLR_SECTOR_SHIFT); spin_unlock(&dev->lock); /* Clean up and return. */ unmap_kiobuf(iobuf); free_kiovec(1, &iobuf); if (result > 0) *offset += result << SBULLR_SECTOR_SHIFT; return result << SBULLR_SECTOR_SHIFT; }After doing a couple of sanity checks, the code creates a kiovec (containing a single kiobuf) with alloc_kiovec. It then uses that kiovec to map in the user buffer by calling map_user_kiobuf:
int map_user_kiobuf(int rw, struct kiobuf *iobuf, unsigned long address, size_t len);static int sbullr_rw_iovec(Sbull_Dev *dev, struct kiobuf *iobuf, int rw, int sector, int nsectors) { struct request fakereq; struct page *page; int offset = iobuf->offset, ndone = 0, pageno, result; /* Perform I/O on each sector */ fakereq.sector = sector; fakereq.current_nr_sectors = 1; fakereq.cmd = rw; for (pageno = 0; pageno < iobuf->nr_pages; pageno++) { page = iobuf->maplist[pageno]; while (ndone < nsectors) { /* Fake up a request structure for the operation */ fakereq.buffer = (void *) (kmap(page) + offset); result = sbull_transfer(dev, &fakereq); kunmap(page); if (result == 0) return ndone; /* Move on to the next one */ ndone++; fakereq.sector++; offset += SBULLR_SECTOR; if (offset >= PAGE_SIZE) { offset = 0; break; } } } return ndone; }Although kiobufs remain controversial in the kernel development community, there is interest in using them in a wider range of contexts. There is, for example, a patch that implements Unix pipes with kiobufs -- data is copied directly from one process's address space to the other with no buffering in the kernel at all. A patch also exists that makes it easy to use a kiobuf to map kernel virtual memory into a process's address space, thus eliminating the need for a nopage implementation as shown earlier.
Direct Memory Access and Bus Mastering
Direct memory access, or DMA, is the advanced topic that completes our overview of memory issues. DMA is the hardware mechanism that allows peripheral components to transfer their I/O data directly to and from main memory without the need for the system processor to be involved in the transfer. Use of this mechanism can greatly increase throughput to and from a device, because a great deal of computational overhead is eliminated.
Overview of a DMA Data Transfer
In the first case, the steps involved can be summarized as follows:
The hardware raises an interrupt to announce that new data has arrived.
The interrupt handler allocates a buffer and tells the hardware where to transfer its data.
The peripheral device writes the data to the buffer and raises another interrupt when it's done.
The handler dispatches the new data, wakes any relevant process, and takes care of housekeeping.
A variant of the asynchronous approach is often seen with network cards. These cards often expect to see a circular buffer (often called a DMA ring buffer) established in memory shared with the processor; each incoming packet is placed in the next available buffer in the ring, and an interrupt is signaled. The driver then passes the network packets to the rest of the kernel, and places a new DMA buffer in the ring.
Allocating the DMA Buffer
This section covers the allocation of DMA buffers at a low level; we will introduce a higher-level interface shortly, but it is still a good idea to understand the material presented here.
For devices with this kind of limitation, memory should be allocated from the DMA zone by adding the GFP_DMA flag to the kmalloc or get_free_pagescall. When this flag is present, only memory that can be addressed with 16 bits will be allocated.
Do-it-yourself allocation
We have seen how get_free_pages (and therefore kmalloc) can't return more than 128 KB (or, more generally, 32 pages) of consecutive memory space. But the request is prone to fail even when the allocated buffer is less than 128 KB, because system memory becomes fragmented over time.[52]
dmabuf = ioremap( 0x1F00000 /* 31M */, 0x100000 /* 1M */);We're not going to show the code here, but you'll find it in misc-modules/allocator.c; the code is thoroughly commented and designed to be called by other modules. Unlike every other source accompanying this book, the allocator is covered by the GPL. The reason we decided to put the source under the GPL is that it is neither particularly beautiful nor particularly clever, and if someone is going to use it, we want to be sure that the source is released with the module.
Bus Addresses
At the lowest level (again, we'll look at a higher-level solution shortly), the Linux kernel provides a portable solution by exporting the following functions, defined in <asm/io.h>:
unsigned long virt_to_bus(volatile void * address); void * bus_to_virt(unsigned long address);DMA on the PCI Bus
The 2.4 kernel includes a flexible mechanism that supports PCI DMA (also known as bus mastering). It handles the details of buffer allocation and can deal with setting up the bus hardware for multipage transfers on hardware that supports them. This code also takes care of situations in which a buffer lives in a non-DMA-capable zone of memory, though only on some platforms and at a computational cost (as we will see later).
Drivers that use the following functions should include <linux/pci.h>.
Dealing with difficult hardware
The first question that must be answered before performing DMA is whether the given device is capable of such operation on the current host. Many PCI devices fail to implement the full 32-bit bus address space, often because they are modified versions of old ISA hardware. The Linux kernel will attempt to work with such devices, but it is not always possible.
The function pci_dma_supported should be called for any device that has addressing limitations:
int pci_dma_supported(struct pci_dev *pdev, dma_addr_t mask);if (pci_dma_supported (pdev, 0xffff)) pdev->dma_mask = 0xffff; else { card->use_dma = 0; /* We'll have to live without DMA */ printk (KERN_WARN, "mydev: DMA not supported\n"); }int pci_set_dma_mask(struct pci_dev *pdev, dma_addr_t mask);For devices that can handle 32-bit addresses, there is no need to call pci_dma_supported.
DMA mappings
A DMA mapping is a combination of allocating a DMA buffer and generating an address for that buffer that is accessible by the device. In many cases, getting that address involves a simple call to virt_to_bus; some hardware, however, requires that mapping registers be set up in the bus hardware as well. Mapping registers are an equivalent of virtual memory for peripherals. On systems where these registers are used, peripherals have a relatively small, dedicated range of addresses to which they may perform DMA. Those addresses are remapped, via the mapping registers, into system RAM. Mapping registers have some nice features, including the ability to make several distributed pages appear contiguous in the device's address space. Not all architectures have mapping registers, however; in particular, the popular PC platform has no mapping registers.
The DMA mapping sets up a new type, dma_addr_t, to represent bus addresses. Variables of type dma_addr_t should be treated as opaque by the driver; the only allowable operations are to pass them to the DMA support routines and to the device itself.
- Consistent DMA mappings
These exist for the life of the driver. A consistently mapped buffer must be simultaneously available to both the CPU and the peripheral (other types of mappings, as we will see later, can be available only to one or the other at any given time). The buffer should also, if possible, not have caching issues that could cause one not to see updates made by the other.
- Streaming DMA mappings
These are set up for a single operation. Some architectures allow for significant optimizations when streaming mappings are used, as we will see, but these mappings also are subject to a stricter set of rules in how they may be accessed. The kernel developers recommend the use of streaming mappings over consistent mappings whenever possible. There are two reasons for this recommendation. The first is that, on systems that support them, each DMA mapping uses one or more mapping registers on the bus. Consistent mappings, which have a long lifetime, can monopolize these registers for a long time, even when they are not being used. The other reason is that, on some hardware, streaming mappings can be optimized in ways that are not available to consistent mappings.
The two mapping types must be manipulated in different ways; it's time to look at the details.
Setting up consistent DMA mappings
A driver can set up a consistent mapping with a call to pci_alloc_consistent:
void *pci_alloc_consistent(struct pci_dev *pdev, size_t size, dma_addr_t *bus_addr);void pci_free_consistent(struct pci_dev *pdev, size_t size, void *cpu_addr, dma_handle_t bus_addr);Note that this function requires that both the CPU address and the bus address be provided.
Setting up streaming DMA mappings
Streaming mappings have a more complicated interface than the consistent variety, for a number of reasons. These mappings expect to work with a buffer that has already been allocated by the driver, and thus have to deal with addresses that they did not choose. On some architectures, streaming mappings can also have multiple, discontiguous pages and multipart "scatter-gather" buffers.
- PCI_DMA_TODEVICE
- PCI_DMA_FROMDEVICE
These two symbols should be reasonably self-explanatory. If data is being sent to the device (in response, perhaps, to a write system call), PCI_DMA_TODEVICE should be used; data going to the CPU, instead, will be marked with PCI_DMA_FROMDEVICE.
- PCI_DMA_BIDIRECTIONAL
If data can move in either direction, use PCI_DMA_BIDIRECTIONAL.
- PCI_DMA_NONE
When you have a single buffer to transfer, map it with pci_map_single:
dma_addr_t pci_map_single(struct pci_dev *pdev, void *buffer, size_t size, int direction);Once the transfer is complete, the mapping should be deleted with pci_unmap_single:
void pci_unmap_single(struct pci_dev *pdev, dma_addr_t bus_addr, size_t size, int direction);Here, the size and direction arguments must match those used to map the buffer.
There are some important rules that apply to streaming DMA mappings:
Second, consider what happens if the buffer to be mapped is in a region of memory that is not accessible to the device. Some architectures will simply fail in this case, but others will create a bounce buffer. The bounce buffer is just a separate region of memory that is accessible to the device. If a buffer is mapped with a direction of PCI_DMA_TODEVICE, and a bounce buffer is required, the contents of the original buffer will be copied as part of the mapping operation. Clearly, changes to the original buffer after the copy will not be seen by the device. Similarly, PCI_DMA_FROMDEVICE bounce buffers are copied back to the original buffer by pci_unmap_single; the data from the device is not present until that copy has been done.
void pci_sync_single(struct pci_dev *pdev, dma_handle_t bus_addr, size_t size, int direction);This function should be called before the processor accesses a PCI_DMA_FROMDEVICE buffer, and after an access to a PCI_DMA_TODEVICE buffer.
Scatter-gather mappings
Scatter-gather mappings are a special case of streaming DMA mappings. Suppose you have several buffers, all of which need to be transferred to or from the device. This situation can come about in several ways, including from a readv or writev system call, a clustered disk I/O request, or a list of pages in a mapped kernel I/O buffer. You could simply map each buffer in turn and perform the required operation, but there are advantages to mapping the whole list at once.
One reason is that some smart devices can accept a scatterlist of array pointers and lengths and transfer them all in one DMA operation; for example, "zero-copy'' networking is easier if packets can be built in multiple pieces. Linux is likely to take much better advantage of such devices in the future. Another reason to map scatterlists as a whole is to take advantage of systems that have mapping registers in the bus hardware. On such systems, physically discontiguous pages can be assembled into a single, contiguous array from the device's point of view. This technique works only when the entries in the scatterlist are equal to the page size in length (except the first and last), but when it does work it can turn multiple operations into a single DMA and speed things up accordingly.
So now you're convinced that mapping of scatterlists is worthwhile in some situations. The first step in mapping a scatterlist is to create and fill in an array of struct scatterlist describing the buffers to be transferred. This structure is architecture dependent, and is described in <linux/scatterlist.h>. It will always contain two fields, however:
- char *address;
The address of a buffer used in the scatter/gather operation
- unsigned int length;
int pci_map_sg(struct pci_dev *pdev, struct scatterlist *list, int nents, int direction);
- dma_addr_t sg_dma_address(struct scatterlist *sg);
- unsigned int sg_dma_len(struct scatterlist *sg);
Once the transfer is complete, a scatter-gather mapping is unmapped with a call to pci_unmap_sg:
void pci_unmap_sg(struct pci_dev *pdev, struct scatterlist *list, int nents, int direction);Scatter-gather mappings are streaming DMA mappings, and the same access rules apply to them as to the single variety. If you must access a mapped scatter-gather list, you must synchronize it first:
void pci_dma_sync_sg(struct pci_dev *pdev, struct scatterlist *sg, int nents, int direction);How different architectures support PCI DMA
As we stated at the beginning of this section, DMA is a very hardware-specific operation. The PCI DMA interface we have just described attempts to abstract out as many hardware dependencies as possible. There are still some things that show through, however.
- M68K
- S/390
- Super-H
- IA-32 (x86)
- MIPS
- PowerPC
- ARM
These platforms support the PCI DMA interface, but it is mostly a false front. There are no mapping registers in the bus interface, so scatterlists cannot be combined and virtual addresses cannot be used. There is no bounce buffer support, so mapping of high-memory addresses cannot be done. The mapping functions on the ARM architecture can sleep, which is not the case for the other platforms.
- IA-64
The Itanium architecture also lacks mapping registers. This 64-bit architecture can easily generate addresses that PCI peripherals cannot use, though. The PCI interface on this platform thus implements bounce buffers, allowing any address to be (seemingly) used for DMA operations.
- Alpha
- MIPS64
- SPARC
These architectures support an I/O memory management unit. As of 2.4.0, the MIPS64 port does not actually make use of this capability, so its PCI DMA implementation looks like that of the IA-32. The Alpha and SPARC ports, though, can do full-buffer mapping with proper scatter-gather support.
A simple PCI DMA example
The actual form of DMA operations on the PCI bus is very dependent on the device being driven. Thus, this example does not apply to any real device; instead, it is part of a hypothetical driver called dad (DMA Acquisition Device). A driver for this device might define a transfer function like this:
int dad_transfer(struct dad_dev *dev, int write, void *buffer, size_t count) { dma_addr_t bus_addr; unsigned long flags; /* Map the buffer for DMA */ dev->dma_dir = (write ? PCI_DMA_TODEVICE : PCI_DMA_FROMDEVICE); dev->dma_size = count; bus_addr = pci_map_single(dev->pci_dev, buffer, count, dev->dma_dir); dev->dma_addr = bus_addr; /* Set up the device */ writeb(dev->registers.command, DAD_CMD_DISABLEDMA); writeb(dev->registers.command, write ? DAD_CMD_WR : DAD_CMD_RD); writel(dev->registers.addr, cpu_to_le32(bus_addr)); writel(dev->registers.len, cpu_to_le32(count)); /* Start the operation */ writeb(dev->registers.command, DAD_CMD_ENABLEDMA); return 0; }void dad_interrupt(int irq, void *dev_id, struct pt_regs *regs) { struct dad_dev *dev = (struct dad_dev *) dev_id; /* Make sure it's really our device interrupting */ /* Unmap the DMA buffer */ pci_unmap_single(dev->pci_dev, dev->dma_addr, dev->dma_size, dev->dma_dir); /* Only now is it safe to access the buffer, copy to user, etc. */ ... }A quick look at SBus
SPARC-based systems have traditionally included a Sun-designed bus called the SBus. This bus is beyond the scope of this chapter, but a quick mention is worthwhile. There is a set of functions (declared in <asm/sbus.h>) for performing DMA mappings on the SBus; they have names like sbus_alloc_consistent and sbus_map_sg. In other words, the SBus DMA API looks almost exactly like the PCI interface. A detailed look at the function definitions will be required before working with DMA on the SBus, but the concepts will match those discussed earlier for the PCI bus.
DMA for ISA Devices
The ISA bus allows for two kinds of DMA transfers: native DMA and ISA bus master DMA. Native DMA uses standard DMA-controller circuitry on the motherboard to drive the signal lines on the ISA bus. ISA bus master DMA, on the other hand, is handled entirely by the peripheral device. The latter type of DMA is rarely used and doesn't require discussion here because it is similar to DMA for PCI devices, at least from the driver's point of view. An example of an ISA bus master is the 1542 SCSI controller, whose driver is drivers/scsi/aha1542.c in the kernel sources.
- The 8237 DMA controller (DMAC)
- The peripheral device
- The device driver
Registering DMA usage
You should be used to kernel registries -- we've already seen them for I/O ports and interrupt lines. The DMA channel registry is similar to the others. After <asm/dma.h> has been included, the following functions can be used to obtain and release ownership of a DMA channel:
int request_dma(unsigned int channel, const char *name); void free_dma(unsigned int channel);int dad_open (struct inode *inode, struct file *filp) { struct dad_device *my_device; /* ... */ if ( (error = request_irq(my_device.irq, dad_interrupt, SA_INTERRUPT, "dad", NULL)) ) return error; /* or implement blocking open */ if ( (error = request_dma(my_device.dma, "dad")) ) { free_irq(my_device.irq, NULL); return error; /* or implement blocking open */ } /* ... */ return 0; }The close implementation that matches the open just shown looks like this:
void dad_close (struct inode *inode, struct file *filp) { struct dad_device *my_device; /* ... */ free_dma(my_device.dma); free_irq(my_device.irq, NULL); /* ... */ }merlino% cat /proc/dma 1: Sound Blaster8 4: cascadeTalking to the DMA controller
After registration, the main part of the driver's job consists of configuring the DMA controller for proper operation. This task is not trivial, but fortunately the kernel exports all the functions needed by the typical driver.
The driver needs to configure the DMA controller either when read or write is called, or when preparing for asynchronous transfers. This latter task is performed either at open time or in response to an ioctl command, depending on the driver and the policy it implements. The code shown here is the code that is typically called by the read or write device methods.
This subsection provides a quick overview of the internals of the DMA controller so you will understand the code introduced here. If you want to learn more, we'd urge you to read <asm/dma.h> and some hardware manuals describing the PC architecture. In particular, we don't deal with the issue of 8-bit versus 16-bit data transfers. If you are writing device drivers for ISA device boards, you should find the relevant information in the hardware manuals for the devices.
The DMA controller is a shared resource, and confusion could arise if more than one processor attempts to program it simultaneously. For that reason, the controller is protected by a spinlock, called dma_spin_lock. Drivers should not manipulate the lock directly, however; two functions have been provided to do that for you:
- unsigned long claim_dma_lock();
- void release_dma_lock(unsigned long flags);
Returns the DMA spinlock and restores the previous interrupt status.
- void set_dma_mode(unsigned int channel, char mode);
- void set_dma_addr(unsigned int channel, unsigned int addr);
- void set_dma_count(unsigned int channel, unsigned int count);
- void disable_dma(unsigned int channel);
- void enable_dma(unsigned int channel);
This function tells the controller that the DMA channel contains valid data.
- int get_dma_residue(unsigned int channel);
- void clear_dma_ff(unsigned int channel)
int dad_dma_prepare(int channel, int mode, unsigned int buf, unsigned int count) { unsigned long flags; flags = claim_dma_lock(); disable_dma(channel); clear_dma_ff(channel); set_dma_mode(channel, mode); set_dma_addr(channel, virt_to_bus(buf)); set_dma_count(channel, count); enable_dma(channel); release_dma_lock(flags); return 0; }A function like the next one, then, is used to check for successful completion of DMA:
int dad_dma_isdone(int channel) { int residue; unsigned long flags = claim_dma_lock (); residue = get_dma_residue(channel); release_dma_lock(flags); return (residue == 0); }The only thing that remains to be done is to configure the device board. This device-specific task usually consists of reading or writing a few I/O ports. Devices differ in significant ways. For example, some devices expect the programmer to tell the hardware how big the DMA buffer is, and sometimes the driver has to read a value that is hardwired into the device. For configuring the board, the hardware manual is your only friend.
Backward Compatibility
As with other parts of the kernel, both memory mapping and DMA have seen a number of changes over the years. This section describes the things a driver writer must take into account in order to write portable code.
Changes to Memory Management
The 2.3 development series saw major changes in the way memory management worked. The 2.2 kernel was quite limited in the amount of memory it could use, especially on 32-bit processors. With 2.4, those limits have been lifted; Linux is now able to manage all the memory that the processor is able to address. Some things have had to change to make all this possible; overall, however, the scale of the changes at the API level is surprisingly small.
#ifdef MAP_NR #define virt_to_page(page) (mem_map + MAP_NR(page)) #endif#ifndef get_page # define get_page(p) atomic_inc(&(p)->count) #endifThere have also been changes to the the various vm_ops methods stored in the VMA:
The nopage and wppagemethods returned unsigned long (i.e., a logical address) in 2.2, rather than struct page *.
There was, of course, no high-memory support in older kernels. All memory had logical addresses, and the kmap and kunmap functions did not exist.
static struct mm_struct *init_mm_ptr; #define init_mm (*init_mm_ptr) /* to avoid ifdefs later */ static void retrieve_init_mm_ptr(void) { struct task_struct *p; for (p = current ; (p = p->next_task) != current ; ) if (p->pid == 0) break; init_mm_ptr = p->mm; }The 2.0 kernel also lacked the distinction between logical and physical addresses, so the __va and __pa macros did not exist. There was no need for them at that time.
Finally, the 2.0 version of the driver mmapmethod, like most others, had a struct inode argument; the method's prototype was
int (*mmap)(struct inode *inode, struct file *filp, struct vm_area_struct *vma);Changes to DMA
The PCI DMA interface as described earlier did not exist prior to kernel 2.3.41. Before then, DMA was handled in a more direct -- and system-dependent -- way. Buffers were "mapped'' by calling virt_to_bus, and there was no general interface for handling bus-mapping registers.
Quick Reference
This chapter introduced the following symbols related to memory handling. The list doesn't include the symbols introduced in the first section, as that section is a huge list in itself and those symbols are rarely useful to device drivers.
- #include <linux/mm.h>
All the functions and structures related to memory management are prototyped and defined in this header.
- int remap_page_range(unsigned long virt_add, unsigned long phys_add, unsigned long size, pgprot_t prot);
- struct page *virt_to_page(void *kaddr);
- void *page_address(struct page *page);
These macros convert between kernel logical addresses and their associated memory map entries. page_address only works for low-memory pages, or high-memory pages that have been explicitly mapped.
- void *__va(unsigned long physaddr);
- unsigned long __pa(void *kaddr);
These macros convert between kernel logical addresses and physical addresses.
- unsigned long kmap(struct page *page);
- void kunmap(struct page *page);
kmap returns a kernel virtual address that is mapped to the given page, creating the mapping if need be. kunmap deletes the mapping for the given page.
- #include <linux/iobuf.h>
- void kiobuf_init(struct kiobuf *iobuf);
- int alloc_kiovec(int number, struct kiobuf **iobuf);
- void free_kiovec(int number, struct kiobuf **iobuf);
These functions handle the allocation, initialization, and freeing of kernel I/O buffers. kiobuf_init initializes a single kiobuf, but is rarely used; alloc_kiovec, which allocates and initializes a vector of kiobufs, is usually used instead. A vector of kiobufs is freed with free_kiovec.
- int lock_kiovec(int nr, struct kiobuf *iovec[], int wait);
- int unlock_kiovec(int nr, struct kiobuf *iovec[]);
These functions lock a kiovec in memory, and release it. They are unnecessary when using kiobufs for I/O to user-space memory.
- int map_user_kiobuf(int rw, struct kiobuf *iobuf, unsigned long address, size_t len);
- void unmap_kiobuf(struct kiobuf *iobuf);
map_user_kiobuf maps a buffer in user space into the given kernel I/O buffer; unmap_kiobuf undoes that mapping.
- #include <asm/io.h>
- unsigned long virt_to_bus(volatile void * address);
- void * bus_to_virt(unsigned long address);
These functions convert between kernel virtual and bus addresses. Bus addresses must be used to talk to peripheral devices.
- #include <linux/pci.h>
- int pci_dma_supported(struct pci_dev *pdev, dma_addr_t mask);
- void *pci_alloc_consistent(struct pci_dev *pdev, size_t size, dma_addr_t *bus_addr)
- void pci_free_consistent(struct pci_dev *pdev, size_t size, void *cpuaddr, dma_handle_t bus_addr);
These functions allocate and free consistent DMA mappings, for a buffer that will last the lifetime of the driver.
- PCI_DMA_TODEVICE
- PCI_DMA_FROMDEVICE
- PCI_DMA_BIDIRECTIONAL
- PCI_DMA_NONE
These symbols are used to tell the streaming mapping functions the direction in which data will be moving to or from the buffer.
- dma_addr_t pci_map_single(struct pci_dev *pdev, void *buffer, size_t size, int direction);
- void pci_unmap_single(struct pci_dev *pdev, dma_addr_t bus_addr, size_t size, int direction);
- void pci_sync_single(struct pci_dev *pdev, dma_handle_t bus_addr, size_t size, int direction)
- struct scatterlist { /* ... */ };
- dma_addr_t sg_dma_address(struct scatterlist *sg);
- unsigned int sg_dma_len(struct scatterlist *sg);
The scatterlist structure describes an I/O operation that involves more than one buffer. The macros sg_dma_address and sg_dma_len may be used to extract bus addresses and buffer lengths to pass to the device when implementing scatter-gather operations.
- pci_map_sg(struct pci_dev *pdev, struct scatterlist *list, int nents, int direction);
- pci_unmap_sg(struct pci_dev *pdev, struct scatterlist *list, int nents, int direction);
- pci_dma_sync_sg(struct pci_dev *pdev, struct scatterlist *sg, int nents, int direction)
pci_map_sg maps a scatter-gather operation, and pci_unmap_sg undoes that mapping. If the buffers must be accessed while the mapping is active, pci_dma_sync_sg may be used to synchronize things.
- /proc/dma
- #include <asm/dma.h>
This header defines or prototypes all the functions and macros related to DMA. It must be included to use any of the following symbols.
- int request_dma(unsigned int channel, const char *name);
- void free_dma(unsigned int channel);
These functions access the DMA registry. Registration must be performed before using ISA DMA channels.
- unsigned long claim_dma_lock();
- void release_dma_lock(unsigned long flags);
These functions acquire and release the DMA spinlock, which must be held prior to calling the other ISA DMA functions described later in this list. They also disable and reenable interrupts on the local processor.
- void set_dma_mode(unsigned int channel, char mode);
- void set_dma_addr(unsigned int channel, unsigned int addr);
- void set_dma_count(unsigned int channel, unsigned int count);
These functions are used to program DMA information in the DMA controller. addr is a bus address.
- void disable_dma(unsigned int channel);
- void enable_dma(unsigned int channel);
A DMA channel must be disabled during configuration. These functions change the status of the DMA channel.
- int get_dma_residue(unsigned int channel);
- void clear_dma_ff(unsigned int channel)
![]() |
![]() |
![]() |
Back to: Linux Device Drivers, 2nd Edition
© 2001, O'Reilly & Associates, Inc.