[home] [purchase] [table of contents] [errata] [change log] [report error] [authors] [search]

change log

This page summarizes the major changes that occurred in the Linux source tree relative to what is described in the book. Changes that do not affect the book are not described.

stable kernel (v2.4.xx)

v2.4.18
  • The stack area for a newly created process has been moved from the end of region 4 to the end of region 3. For example, with a page size of 16KB, the starting address of the stack is now 0x60000fffffffc000.
  • The stack and data pages are no longer executable by default. This is intended to improve protection against (simple) stack overflow attacks and also has the nice effect that the lazy execute bit scheme described on the bottom of page 204 is no longer needed.

development kernel (v2.5.xx)

v2.5.68-v2.5.69
  • The ia64 kernel has been updated to use the generic device DMA API instead of the PCI DMA interface. For backwards-compatibility, the PCI DMA interface remains present and is mapped onto the generic DMA API.
  • Interrupt handles are now expected to return a value of type irqreturn_t. This is intended to catch unhandled interrupts.
v2.5.65-v2.5.67
  • To support non-linear file-mappings, the PTE-interface has been extended by a new constant (PTE_FILE_MAX_BITS) and three routines: pte_file(), pgoff_to_pte(), and pte_to_pgoff(). The idea here is that when a page for a non-linear file-mapping needs to be relinquished, its file-offset is encoded (as a page-number) in the page-table entry (PTE) that was mapping the page. These "file PTEs" are guaranteed to have the present bit cleared. Furthermore, platforms need to reserve a bit in the PTE so that file PTEs can be distinguished from swap PTEs. The platform-independent part of the kernel uses routine pte_file() to tell file PTEs from swap PTEs. Note that pte_file() must not be used on a PTE which has the present bit set. Routine pgoff_to_pte() converts a file-offset (scaled by the page-size) into a file PTE. Conversely, pte_to_pgoff() returns the file-offset (scaled by page-size) encoded in a file PTE. The PTE_FILE_MAX_BITS constant tells the platform-independent part of the kernel how many bits in the PTE are available for encoding the file-offset.
v2.5.61-v2.5.64
v2.5.60
  • The fadvise64() system call has been added. In Andrew Morton's words: "The main reason for wanting this syscall is to provide userspace with the ability to explicitly shoot down pagecache when streaming large files. This is what O_STREAMING does, only posix_fadvise() is standards-based, and harder to use; posix_fadvise() also subsumes readahead().
v2.5.52-v2.5.59
  • A light-weight system call infrastructure has been added to the ia64 linux kernel. The infrastructure is described in Documentation/ia64/fsys.txt. Something trivial such as getpid() can execute in as little as 35 cycles on Itanium 2, while fully preserving syscall semantics (i.e., you can strace it, single-step it, etc.).
  • Kernel register ar.k6 now contains the kernel-virtual address, rather than the physical address of the current task.
  • The format of the exception-table entries was changed: instead of gp-relative values, the kernel now uses place-relative values (sometimes misleadingly called "ip-relative" values). The entries are generated with the TAG-. construct and has the advantage of making it possible to recover the original value without any additional info (such as a global pointer). It also makes it easier to support multiple, replicated kernel images (which is useful on NUMA machines).
  • A new routine deactivate_mm() has been added to the address-space number management interface.
v2.5.46-v2.5.52
  • The restart_syscall() system call has been added. This is a helper-system call which is intended for kernel-use only (user-level can invoke it, too, but without much of a useful effect). The idea of this syscall is to allow restarting of certain system calls that are not truly idempotent. For example, nanosleep() may have to be restarted after a signal was received that does not get delivered to user-level. For the restart, it is obviously not possible to use the old delay argument, as otherwise the system call would sleep for too long. A system call can trigger restart via this new system call by returning the kernel-internal error-code -ERESTART_RESTARTBLOCK. Caveat: this kind of restart cannot be nested!
  • The remap_file_pages() syscall has been added. This is useful primarily for virtual-address-limited architectures (such as x86), because it allows to map an arbitrary portion of a huge (e.g., >4GB) file. For example, a data base could create a huge shared memory segment and then map in just the portion it's currently needing (yeah, segmentation all over again; just say no and get yourself a 64-bit machine...).
  • The set_tid_address() system call has been added. It is used by the Native POSIX Thread Library (NPTL) to establish the thread-id of a newly started process (started via execve()).
  • For clone2(), CLONE_SETTID got split into CLONE_CHILD_SETTID and CLONE_PARENT_SETTID with separated thread-id pointer arguments. This makes it possible to store the new thread id in the parent in a different place than in the child.
  • The following module-related system calls have been removed: create_module(), get_kernel_syms(), and query_module(). The reason these could be removed is that the entire module-loader has been moved from user-space into the kernel itself (primary motivation is to avoid some nasty race conditions).
  • The security() system call has been removed again. It was perceived to be too much of a hook (which also made it difficult to emulate the system call).
  • schedule_tail() once again needs to be called on single-processor kernels, too.
v2.5.45
  • The epoll interface has been added to the kernel. This interface consists of the three system calls epoll_create(), epoll_ctl(), and epoll_wait(). It is intended to be a more scalable replacement for poll() (and select()).
v2.5.36-v2.5.44
  • The lookup_dcookie() system call has been added. This is a helper system call for profiling tools such as oprofile.
v2.5.31-v2.5.35
  • Support for huge pages has been added via the alloc_hugepages() and free_hugepages() system calls. These were introduced by Seth Rohit of Intel and allow to allocate non-paged huge pages. The size of a huge page is platform-specific. For example, on x86 it is either 2MBytes or 4MBytes and on IA-64 it can be configured to a size in the range from 256KB up to 4GB (on Itanium 2).
  • A new routine local_irqs_disabled() has been added to the interrupt masking interface. The routine tests whether delivery of interrupts is presently disabled (masked).
  • Improved POSIX-threading support. Ingo Molnar and Ulrich Drepper have worked out a new thread library (libpthread) which fully support POSIX semantics on top of a minimal set of kernel extensions. So far, the extensions consist of (i) a new exit_group() system call, (ii) new task iterators for_each_process and do_each_thread()/while_each_thread() and (iii) the following new clone2() flags:
    CLONE_SETTLS:
    If this flag is set, an additional argument is passed to clone2() which specifies the new thread-local pointer for the child task. This flag requires kernel support to ensure atomicity of setting the thread pointer (tp aka r13 on IA-64).
    CLONE_SETTID:
    If set, an additional argument is passed to clone2(), which is used to return the task id of the child task.
    CLONE_CLEARTID:
    If set, the word through which the task id of the child task was returned (see CLONE_SETTID) is cleared to 0 when the child terminates. This is used by user-level thread-libraries to detect when it is safe to re-use the stack of a terminated thread.
    CLONE_DETACHED:
    If set, the child does not send the child-exit signal (SIGCHLD) when it exits.
v2.5.19-v2.5.30
  • The interrupt masking interface has been cleaned up and the old global interrupt masking routines cli(), sti(), save_flags(), save_flags_cli(), and restore_flags() have been removed from the MP-version of the kernel. The UP-version is scheduled to remove the routines also at some point before v2.6 is released. Usually, cli() and sti() need to be replaced with an explicit spinlock and local interrupt masking (e.g., spin_lock_irqsave(flags)spin_lock_irqrestore(flags)).
v2.5.18
  • The TLB-shootdown kernel interface is being revised. The new interface is defined in include/asm/tlb.h and consists of routines tlb_gather_mmu(), tlb_start_vma(), tlb_remove_tlb_entry(), tlb_remove_page, tlb_end_vma(), and tlb_finish_mmu. The old TLB-flush routines are being deprecated, though most of them are still needed as of v2.5.18.
v2.5.15-v2.5.17
  • Many of the PTE-related routines now use page-frame numbers again (instead of page-descriptor pointers). Specifically, VALID_PAGE() has been replaced by virt_addr_valid, mk_pte and mk_pte_phys have been replaced by pfn_pte. Other new routines are pfn_valid, page_to_pfn(), pfn_to_page(), and pte_pfn().
  • Improved hotplug CPU support by changing CLONE_PID to CLONE_IDLETASK. The latter forces a process id (pid) of 0.
v2.5.14
  • The IA-64 version of local_irq_restore() was changed so that it restores only the psr.i bit (other psr bits remain unchanged).
v2.5.10
  • Flag CLONE_SYSVSEM has been added to the clone2() system call so that the System V semaphore undo lists can be shared across threads in a process.
v2.5.2
  • A new facility has been introduced which supports per-task filesystem namespaces. To support this, the clone2() flag CLONE_NEWNS has been added.
  • A new, more scalable task scheduler has been developed by Ingo Molnar. This has the following effects:
    • The platform-specific context-switch routine (switch_to()) is now called with interrupts turned off (the old scheduler called it with interrupts enabled).
    • The platform-specific PROC_CHANGE_PENALTY constant has been removed because it is no longer needed.
    • Routine smp_send_reschedule_all() has been added.
    • The init_tasks array and the cpu_now_booting variables have been removed and replaced with a single task_for_booting_cpu variable.
    • The processor member in the task structure has been renamed to cpu.
v2.5.3
  • A bunch of new system calls were introduced to handle extended filesystem attributes. Namely: setxattr(), lsetxattr(), fsetxattr(), getxattr(), lgetxattr(), fgetxattr(), listxattr(), llistxattr(), flistxattr(), removexattr(), lremovexattr(), fremovexattr().
  • The task flag need_resched should no longer be tested directly. Instead, routine need_resched() should be used.
  • Routines flush_tlb_range() and flush_cache_range() now take a vm-area pointer instead of an mm-pointer as the first argument.
  • Routines remap_page_range() and io_remap_page_range() now take a vm-area pointer as a new first argument.
  • The routine smp_migrate_task() has been added to the multiprocessor support interface. This routine migrates an existing task to a specific CPU.
  • The variable cache_decay_ticks has been added to the multiprocessor support interface. It expresses the duration (in clock ticks) for which the caches of an idle task are to be considered "hot". This parameter affects affinity-decisions of the task scheduler.
  • It is now possible to (partially) order initialization calls. This is achieved by classifying each initializer with one of 7 macros: early_arch_initcall, mem_initcall, subsys_initcall, arch_initcall, fs_initcall, device_initcall, late_initcall. The order listed here corresponds to the order with which the classes are executed. Within a class, execution-order remains undefined. Initializers declared via the __initcall macro are treated like device_initcall.
v2.5.4
  • Linux now supports preemption inside the kernel. To build such a kernel, CONFIG_PREEMPT needs to be turned on. Not all platforms support this option. In particular, IA-64 does not yet support kernel preemption. A consequence of this change is that the spinlock interface now uses a prefix of ``_raw_'' (e.g., _raw_spin_lock() instead of just spin_lock()). This renaming was done such that the platform-independent part of Linux can implement spin_lock() etc., differently depending on whether or not kernel-premption is enabled.
  • Yet another thread data structure has been introduced. The new structure is called thread_info and is intended to encapsulate all state need during kernel entry and exit. Also, on x86 and some other platforms, the task structure has been moved out of the memory area containing the thread_info and the kernel stack. However, on IA-64, the task structure remains in the old place and the thread_info follows directly above.
v2.5.5
  • New page-table management hooks have been added which make it possible to place page tables in high memory. The hooks created for this purpose are called pmd_populate_kernel(), pmd_alloc_one_kernel(), pte_free_kernel(), pte_offset_kernel(), pte_offset_map(), pte_offset_map_nested(), and pte_unmap_nested(). Another effect of this change is that PTE pages are now referred to via a page descriptor pointer (struct page *) instead of a direct kernel identity-mapped address.
  • A new routine called flush_icache_user_range() has been added to the memory coherency kernel interface (page 201). This routine is used to ensure that i- and d-caches are coherent for a portion of user-level page.
  • The third argument to the switch_to() routine has been dropped. This has become possible because the new scheduler never needs to refer back to the previously executing task.
v2.5.8
  • The cache flushing routines are now declared in and the TLB flushing routines in . This change was made to resolve circular include dependencies.
  • A new routine called flush_tlb_kernel_range() has been added to the memory coherency kernel interface (page 201).

[home] [purchase] [table of contents] [errata] [change log] [report error] [authors] [search]
Last modified: June 30, 2005. copyright © 2001-2005 david mosberger. all rights reserved.