| v2.5.68-v2.5.69 |
- The ia64 kernel has been updated to use the generic device
DMA API instead of the PCI DMA interface. For
backwards-compatibility, the PCI DMA interface remains
present and is mapped onto the generic DMA API.
- Interrupt handles are now expected to return a value
of type irqreturn_t. This is intended to catch
unhandled interrupts.
|
| v2.5.65-v2.5.67 |
- To support non-linear file-mappings, the PTE-interface has
been extended by a new constant (PTE_FILE_MAX_BITS)
and three routines: pte_file(), pgoff_to_pte(),
and pte_to_pgoff(). The idea here is that when a
page for a non-linear file-mapping needs to be relinquished,
its file-offset is encoded (as a page-number) in the
page-table entry (PTE) that was mapping the page. These
"file PTEs" are guaranteed to have the present bit
cleared. Furthermore, platforms need to reserve a bit in the
PTE so that file PTEs can be distinguished from swap PTEs.
The platform-independent part of the kernel uses routine
pte_file() to tell file PTEs from swap PTEs. Note
that pte_file() must not be used on a PTE which has
the present bit set. Routine
pgoff_to_pte() converts a file-offset (scaled by the
page-size) into a file PTE. Conversely, pte_to_pgoff()
returns the file-offset (scaled by page-size) encoded
in a file PTE. The PTE_FILE_MAX_BITS constant tells
the platform-independent part of the kernel how many bits in
the PTE are available for encoding the file-offset.
|
| v2.5.61-v2.5.64 |
|
| v2.5.60 |
- The fadvise64() system call has been added. In
Andrew Morton's words: "The main reason for wanting this
syscall is to provide userspace with the ability to explicitly
shoot down pagecache when streaming large files. This is what
O_STREAMING does, only posix_fadvise() is
standards-based, and harder to use; posix_fadvise()
also subsumes readahead().
|
| v2.5.52-v2.5.59 |
- A light-weight system call infrastructure has been added
to the ia64 linux kernel. The infrastructure is
described in Documentation/ia64/fsys.txt. Something
trivial such as getpid() can execute in as little as
35 cycles on Itanium 2, while fully preserving syscall
semantics (i.e., you can strace it, single-step it, etc.).
- Kernel register ar.k6 now contains the kernel-virtual
address, rather than the physical address of the current
task.
- The format of the exception-table entries was changed: instead
of gp-relative values, the kernel now uses place-relative
values (sometimes misleadingly called "ip-relative" values).
The entries are generated with the TAG-. construct
and has the advantage of making it possible to recover the original
value without any additional info (such as a global
pointer). It also makes it easier to support multiple,
replicated kernel images (which is useful on NUMA machines).
- A new routine deactivate_mm() has been added to
the address-space number management interface.
|
| v2.5.46-v2.5.52 |
- The restart_syscall() system call has been added. This
is a helper-system call which is intended for kernel-use
only (user-level can invoke it, too, but without much of
a useful effect). The idea of this syscall is to allow restarting
of certain system calls that are not truly idempotent. For example,
nanosleep() may have to be restarted after a signal was received
that does not get delivered to user-level. For the restart, it is
obviously not possible to use the old delay argument, as otherwise
the system call would sleep for too long. A system call can trigger
restart via this new system call by returning the kernel-internal
error-code -ERESTART_RESTARTBLOCK. Caveat: this kind of
restart cannot be nested!
- The remap_file_pages() syscall has been added. This is useful
primarily for virtual-address-limited architectures (such as x86),
because it allows to map an arbitrary portion of a huge (e.g., >4GB)
file. For example, a data base could create a huge shared memory
segment and then map in just the portion it's currently needing
(yeah, segmentation all over again; just say no and get yourself
a 64-bit machine...).
- The set_tid_address() system call has been added. It is used
by the
Native
POSIX Thread Library (NPTL) to establish the thread-id of a
newly started process (started via execve()).
- For clone2(), CLONE_SETTID got split into
CLONE_CHILD_SETTID and CLONE_PARENT_SETTID
with separated thread-id pointer arguments. This makes it possible
to store the new thread id in the parent in a different place than
in the child.
- The following module-related system calls have been removed:
create_module(), get_kernel_syms(), and
query_module(). The reason these could be removed is that
the entire module-loader has been moved from user-space into the
kernel itself (primary motivation is to avoid some nasty race
conditions).
- The security() system call has been removed again. It was
perceived to be too much of a hook (which also made it difficult
to emulate the system call).
- schedule_tail() once again needs to be called on single-processor
kernels, too.
|
| v2.5.45 |
- The epoll interface has been added to the kernel.
This interface consists of the three system calls
epoll_create(), epoll_ctl(), and
epoll_wait(). It is intended to be a more scalable
replacement for poll() (and select()).
|
| v2.5.36-v2.5.44 |
- The lookup_dcookie() system call has been added.
This is a helper system call for profiling tools such
as oprofile.
|
| v2.5.31-v2.5.35 |
- Support for huge pages has been added via the
alloc_hugepages() and free_hugepages()
system calls. These were introduced by Seth Rohit of
Intel and allow to allocate non-paged huge pages. The
size of a huge page is platform-specific. For example,
on x86 it is either 2MBytes or 4MBytes and on
IA-64 it can be configured to a size in the range from
256KB up to 4GB (on Itanium 2).
- A new routine local_irqs_disabled() has been added to
the interrupt masking interface. The routine tests whether
delivery of interrupts is presently disabled (masked).
- Improved POSIX-threading support. Ingo Molnar and Ulrich Drepper
have worked out a new thread library (libpthread) which
fully support POSIX semantics on top of a minimal set of kernel
extensions. So far, the extensions consist of (i) a new
exit_group() system call, (ii) new task iterators
for_each_process and
do_each_thread()/while_each_thread()
and (iii) the following new
clone2() flags:
- CLONE_SETTLS:
- If this flag is set, an additional argument is passed to
clone2() which specifies the new thread-local pointer for
the child task. This flag requires kernel support to ensure atomicity
of setting the thread pointer (tp aka r13 on IA-64).
- CLONE_SETTID:
- If set, an additional argument is passed to clone2(), which
is used to return the task id of the child task.
- CLONE_CLEARTID:
- If set, the word through which the task id of the child task was
returned (see CLONE_SETTID) is cleared to 0 when the child
terminates. This is used by user-level thread-libraries to detect
when it is safe to re-use the stack of a terminated thread.
- CLONE_DETACHED:
- If set, the child does not send the child-exit signal
(SIGCHLD) when it exits.
|
| v2.5.19-v2.5.30 |
- The interrupt masking interface has been cleaned up and the
old global interrupt masking routines
cli(), sti(), save_flags(),
save_flags_cli(), and restore_flags()
have been removed from the MP-version of the kernel.
The UP-version is scheduled to remove the routines also
at some point before v2.6 is released. Usually,
cli() and sti() need to be replaced with an
explicit spinlock and local interrupt masking (e.g.,
spin_lock_irqsave(flags)spin_lock_irqrestore(flags)).
|
v2.5.18 |
- The TLB-shootdown kernel interface is being revised. The new interface
is defined in include/asm/tlb.h and consists of routines
tlb_gather_mmu(), tlb_start_vma(),
tlb_remove_tlb_entry(), tlb_remove_page,
tlb_end_vma(), and tlb_finish_mmu. The old TLB-flush
routines are being deprecated, though most of them are still needed
as of v2.5.18.
|
| v2.5.15-v2.5.17 |
- Many of the PTE-related routines now use page-frame numbers again
(instead of page-descriptor pointers). Specifically,
VALID_PAGE() has been replaced by virt_addr_valid,
mk_pte and mk_pte_phys have been replaced by
pfn_pte. Other new routines are pfn_valid,
page_to_pfn(), pfn_to_page(), and pte_pfn().
- Improved hotplug CPU support by changing CLONE_PID to
CLONE_IDLETASK. The latter forces a process id (pid) of 0.
|
| v2.5.14 |
- The IA-64 version of local_irq_restore() was changed
so that it restores only the psr.i
bit (other psr bits remain unchanged).
|
| v2.5.10 |
- Flag CLONE_SYSVSEM has been added to the
clone2() system call so that the System V
semaphore undo lists can be shared across threads in a process.
|
| v2.5.2 |
- A new facility has been introduced which supports per-task
filesystem namespaces. To support this, the clone2() flag
CLONE_NEWNS has been added.
- A new, more scalable task scheduler has been developed by
Ingo Molnar. This has the following effects:
- The platform-specific context-switch routine (switch_to())
is now called with interrupts turned off (the old scheduler
called it with interrupts enabled).
- The platform-specific PROC_CHANGE_PENALTY constant
has been removed because it is no longer needed.
- Routine smp_send_reschedule_all() has been added.
- The init_tasks array and the cpu_now_booting variables
have been removed and replaced with a single
task_for_booting_cpu variable.
- The processor member in the task structure has been renamed
to cpu.
|
| v2.5.3 |
- A bunch of new system calls were introduced to handle extended
filesystem attributes. Namely: setxattr(), lsetxattr(),
fsetxattr(), getxattr(), lgetxattr(),
fgetxattr(), listxattr(), llistxattr(),
flistxattr(), removexattr(), lremovexattr(),
fremovexattr().
- The task flag need_resched should no longer be tested
directly. Instead, routine need_resched() should be used.
- Routines flush_tlb_range() and flush_cache_range()
now take a vm-area pointer instead of an mm-pointer as the
first argument.
- Routines remap_page_range() and io_remap_page_range()
now take a vm-area pointer as a new first argument.
- The routine smp_migrate_task() has been added to the
multiprocessor support interface. This routine migrates an existing
task to a specific CPU.
- The variable cache_decay_ticks has been added to
the multiprocessor support interface. It expresses the duration
(in clock ticks) for which the caches of an idle task are
to be considered "hot". This parameter affects affinity-decisions
of the task scheduler.
- It is now possible to (partially) order initialization calls.
This is achieved by classifying each initializer with one
of 7 macros: early_arch_initcall, mem_initcall,
subsys_initcall, arch_initcall,
fs_initcall, device_initcall,
late_initcall. The order listed here corresponds to
the order with which the classes are executed.
Within a class, execution-order remains undefined. Initializers
declared via the __initcall macro are treated like
device_initcall.
|
| v2.5.4 |
- Linux now supports preemption inside the kernel. To build
such a kernel, CONFIG_PREEMPT needs to be turned on.
Not all platforms support this option. In particular, IA-64
does not yet support kernel preemption. A consequence of
this change is that the spinlock interface now uses a
prefix of ``_raw_'' (e.g., _raw_spin_lock()
instead
of just spin_lock()). This renaming was done such that
the platform-independent part of Linux can implement
spin_lock() etc., differently depending on whether or
not kernel-premption is enabled.
- Yet another thread data structure has been introduced. The new
structure is called thread_info and is intended to
encapsulate all state need during kernel entry and exit. Also,
on x86 and some other platforms, the task structure has been
moved out of the memory area containing the thread_info
and the kernel stack. However, on IA-64, the task structure remains
in the old place and the thread_info follows directly
above.
|
| v2.5.5 |
- New page-table management hooks have been added which make it
possible to place page tables in high memory. The hooks created
for this purpose are called pmd_populate_kernel(),
pmd_alloc_one_kernel(), pte_free_kernel(),
pte_offset_kernel(), pte_offset_map(),
pte_offset_map_nested(), and pte_unmap_nested().
Another effect of this change is that PTE pages are now referred
to via a page descriptor pointer (struct page *) instead
of a direct kernel identity-mapped address.
- A new routine called flush_icache_user_range() has been
added to the memory coherency kernel interface (page 201). This
routine is used to ensure that i- and d-caches are coherent for
a portion of user-level page.
- The third argument to the switch_to() routine has been dropped.
This has become possible because the new scheduler never needs
to refer back to the previously executing task.
|
| v2.5.8 |
- The cache flushing routines are now declared in
and the TLB flushing routines in . This change was
made to resolve circular include dependencies.
- A new routine called flush_tlb_kernel_range() has been
added to the memory coherency kernel interface (page 201).
|