Xv6-RISC-V阅读笔记

发表于 2024-04-24 更新于 2026-04-03 分类于 Xv6 本文字数： 6.4k 阅读时长 ≈ 23 分钟

参考原文

Chapter 1 系统接口

Unix utilities实验

实验说明

启动系统

sleep

重点：使用user/user.h的sleep接口实现，单位为jiffies(1/10)。

pingpong

重点：使用pip接口通信。

primes

目的：主进程准备好2-35的数字写入管道。

从管道中读取数字n（此数字为素数），创建一个子进程，并将剩余的非n的倍数的数写入子管道中。然后进程等待子进程的退出。子进程会重复父进程的动作，直到读取的数字到达35，则不再创建子进程。

find

重点：熟悉文件属性读取，和路径拼接。

xargs

重点：使用exec接口实现，并需要构建新的参数数组。

Chapter 2 系统结构

RISC-V has three modes in which the CPU can execute instructions: machine mode, supervisor mode, and user mode.

An application can execute only user-mode instructions and is said to be running in user space, while the software in supervisor mode can also execute privileged instructions and is said to be running in kernel space.

CPUs provide a special instruction ( RISC-V provides the ecall instruction ) that switches the CPU from user mode to supervisor mode and enters the kernel at an entry point specified by the kernel.

the entire operating system resides in the kernel, so that the implementations of all system calls run in supervisor mode. This organization is called a monolithic kernel.

OS designers can minimize the amount of operating system code that runs in supervisor mode, and execute the bulk of the operating system in user mode. This kernel organization is called a microkernel.

源码文件功能描述如下

文件名	功能描述
bio.c	Disk block cache for the file system.
console.c	Connect to the user keyboard and screen.
entry.S	Very first boot instructions.
exec.c	exec() system call.
file.c	File descriptor support.
fs.c	File system.
kalloc.c	Physical page allocator.
kernelvec.S	Handle traps from kernel, and timer interrupts.
log.c	File system logging and crash recovery.
main.c	Control initialization of other modules during boot.
pipe.c	Pipes.
plic.c	RISC-V interrupt controller.
printf.c	Formatted output to the console.
proc.c	Processes and scheduling.
sleeplock.c	Locks that yield the CPU.
spinlock.c	Locks that don’t yield the CPU.
start.c	Early machine-mode boot code.
string.c	C string and byte-array library.
swtch.S	Thread switching.
syscall.c	Dispatch system calls to handling function.
sysfile.c	File-related system calls.
sysproc.c	Process-related system calls.
trampoline.S	Assembly code to switch between user and kernel.
trap.c	C code to handle and return from traps and interrupts.
uart.c	Serial-port console device driver.
virtio_disk.c	Disk device driver.
vm.c	Manage page tables and address spaces.

进程虚拟空间分布

地址    功能域

MAXVA   <--------->
        trampoline
        <--------->
        trapframe
        <--------->
        heap
        <--------->
        user stack
        <--------->
        user text
        and data
        (followed by global variables)
        (Instructions come first)
0       <--------->

risc-v指针宽度为64位，但硬件只使用低39位用于在页表中寻找虚拟地址，而xv6系统中只使用了38位。因此最大地址为2^38 - 1=0x3f,ffff,ffff MAXVA(kernel/riscv.h:363)

进程重要信息存储在struct proc(kernel/proc.h:85)

RISC-V ecall instruction raises the hardware privilege level and changes the program counter to a kernel-defined entry point.

When the system call completes, the kernel switches back to the user stack and returns to user space by calling the sret instruction, which lowers the hardware privilege level and resumes executing user instructions just after the system call instruction.

启动流程：

boot loader将kernel加载到内存0x8000 0000
在machine mode下，跳转到_entry(kernel/entry.S:7)，设置堆栈，并跳转运行C代码
在C代码入口函数start(kernel/start.c:21)中，切换到supervisor mode，配置时钟中断，并跳转到主函数。
在主函数main(kernel/main.c:11)中，初始化设备和子系统，创建第一个进程
在初始化进程userinit(kernel/proc.c:233)中，寄存器a7装载SYS_EXEC(kernel/syscall.h:8)后再次进入内核
在内核系统调用处理函数syscall(kernel/syscall.c:132)中，启动/init进程
系统调用完成后，返回进程init(user/init.c)，创建一个新console设备文件，并打开文件描述符0,1,2。

it sets the previous privilege mode to supervisor in the register mstatus, it sets the return address to main by writing main’s address into the register mepc, disables virtual address translation in supervisor mode by writing 0 into the page-table register satp, and delegates all interrupts and exceptions to supervisor mode.

system call实验

System call tracing

新增一个trace系统调用，它接收一个整数参数，它表明哪些系统调用被标记。当被标记的系统调用返回时，需要打印<pid>: syscall <call name> -> <return value>。此标记对子进程和forks都有效，但对其他进程无效。

关键点

user/trace.c中设置调用trace(x)后需要再trace(0)清空进程标记；
user/user.h中增加用户调用系统函数int trace(int sys_mask);，user/usys.pl增加trace生成相关汇编代码；
kernel/syscall.h新增宏编号#define SYS_trace 22，kernel/sysproc.c新增标记实现函数uint64 sys_trace(void)
syscall增加标记打印逻辑，需要注意allocproc中共用struct proc proc[NPROC];，申请后要清空之前的标记；

Sysinfo

新增一个sysinfo系统调用，它会收集系统空闲内存字节大小freemem和正在使用的进程数量nproc。需要提供用户测试程序sysinfotest调用这个接口，若整个调用没有问题，则打印"sysinfotest: OK"。

关键点

需要使用接口copyout将内核空间的数据拷贝到用户空间中

练习

增加一个系统调用，返回系统剩余可用内存大小

Chapter 3 页表

创建地址空间

核心的数据结构pagetable_t kernel_pagetable(kernel/vm.c:1)中，核心的功能函数是walk，用于查找虚拟地址对应的PTE。

main中调用kvminit(kernel/vm.c:54)创建内核页表，再调用kvmmake，最终通过kvmmap、mappages和walk完成物理地址虚拟地址映射。

main中调用kvminithart(kernel/vm.c:62)安装内核页表。主要讲根页表的地址设置到satp寄存器中，设置前后需要刷新TLB缓存。

The RISC-V has an instruction sfence.vma that flushes the current CPU’s TLB. Xv6 executes sfence.vma in kvminithart after reloading the satp register

物理内存分配

分配器定义在kalloc.c(kernel/kalloc.c:1)。每个空闲页的列表元素都是一个struct run。

main中调用kinit(kernel/kalloc.c:27)来初始化分配器。它将初始化空闲列表kmem->freelist(kernel/kalloc.c:21)用于保存内核结束位置到PHYSTOP区间的每一页。

进程地址空间

每个进程都有独立的页表。

MAXVA   <--------->
        trampoline  RX--
        <--------->
        trapframe   R-W-
        <--------->
        unused
        <--------->
        heap        R-WU
        <--------->
        stack       R-WU
        <--------->
        guard page
        <--------->
        data        R-WU
        <--------->
        unused
        <--------->
        text        RX-U
0       <--------->

一个进程的用户内存从虚拟地址零开始，可以增长到MAXVA(kernel/riscv.h:360)，允许最大使用256GB内存。

零地址放置的text代码，没有写入权限，当异常的程序试图向零地址写入数据，会出发page fault。

sbrk

sbrk是进程为调整内存时的系统调用。它由函数growproc(kernel/proc.c:260)实现。

exec

exec是一个系统调用，它可以用从文件读取的数据替换进程用户空间数据，这样的文件被称为二进制或可行性文件。函数exec(kernel/exec.c:23)会读取并解析ELF格式的文件，它包含struct elfhdrELF文件头部和一系列struct proghdr程序区域头部。每个程序区域头部描述程序必须加载到内存的位置。

/init程序区域头部像下面这样

$ objdump -p user/_init 

user/_init:     file format elf64-little

Program Header:
0x70000003 off    0x0000000000006bac vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**0
         filesz 0x0000000000000033 memsz 0x0000000000000000 flags r--
    LOAD off    0x0000000000001000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**12
         filesz 0x0000000000001000 memsz 0x0000000000001000 flags r-x
    LOAD off    0x0000000000002000 vaddr 0x0000000000001000 paddr 0x0000000000001000 align 2**12
         filesz 0x0000000000000010 memsz 0x0000000000000030 flags rw-
   STACK off    0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**4
         filesz 0x0000000000000000 memsz 0x0000000000000000 flags rw-

值得注意的是，头部信息中filesz可能会小于memsz，那时因为这些变量值为0，文件中无需存储，但加载时需要申请memsz大小的空间并清零。

然后，函数需要拷贝参数列表，并将堆栈和PC设置好。最后将释放旧页表，使用新页表。

练习

解析riscv的设备树，找出总共拥有多少物理内存
写一个用户程序调用sbrk(1)，观察调用前后页表的变化。内核申请了多少空间？新内存的PTE包含哪些数据？
修改源码让内核使用超级页
在Unix系统中，若exec处理的可执行文件以#!开头，则会使用第一行剩余部分替换程序作为解释文件执行。修改源码让内核支持这个特性。
实现一个地址空间随机分布的内核。

Chapter 4 traps和系统调用

riscv的trap机制

Each RISC-V CPU has a set of control registers that the kernel writes to tell the CPU how to handle traps, and that the kernel can read to find out about a trap that has occurred.

在文件riscv.h(kernel/riscv.h:1)中包含系统用到的所有描述。这里列举最重要的寄存器

stvec: 内核写入trap处理程序的地址
sepc: 当trap发生时，处理器会保存pc；当从trap返回时，调用sret会从此寄存器恢复pc。
scause: 处理器会存放一个数字描述trap的原因
sscratch: trap处理程序使用sscratch来避免改写用户寄存器
sstatus: SIE位控制设备中断是否使能；SPP位指示trap触发前是user mode还是supervisor mode；

xv6只将它们使用在计时器中断的特殊场景

硬件处理所有类型trap流程：

若sstatus的SIE位被清零，则后面的步骤略过
sstatus的SIE位清零
拷贝pc值到sepc
保存当前模式到sstatus的SPP位
设置scause
模式切换为supervisor mode
拷贝stvec到pc
继续从pc处开始执行

Note that the CPU doesn’t switch to the kernel page table, doesn’t switch to a stack in the kernel, and doesn’t save any registers other than the pc.

用户空间的traps流程

用户空间trap路径是：uservec(kernel/trampoline.S:21)->usertrap(kernel/trap.c:37)-return->usertrapret(kernel/trap.c:90)->userret(kernel/trampoline.S:101)

xv6使用trampoline页来存储stvec，它页包含uservec。trampoline页会被每个进程映射到TRAMPOLINE的地址处。

uservec函数会将32个用户寄存器存储到TRAPFRAME地址所在的trapframe结构中，然后将寄存器satp切换为内核页表，再调用usertrap函数。

usertrap(kernel/trap.c:37)函数会检测trap的原因并处理它。首先将设置stvec为kernelvec，以便处理内核trap。保存sepc寄存器。如果trap是系统调用，则调用syscall处理它；若是设备中断，devintr处理它；否则，是一种异常场景，调用内核终止异常的进程。若是系统调用，则在函数结束时会调用usertrapret

返回用户空间的第一步是调用usertrapret(kernel/trap.c:90)。它会将stvec设置回uservec，uservec的映射地址可以通过TRAMPOLINE、trampoline和uservec计算出来（注意内核中这些是物理地址）。然后恢复pc，最后调用userret函数并将a0设置为用户页表。

userret(kernel/trampoline.S:101)函数将切换satp为用户页表，并恢复32个用户寄存器。最后调用sret返回用户空间。

系统调用流程

以initcode.S中第一个系统调用exec为例。

initcode.S在寄存器a0和a1存放着exec的参数，并将系统调用编号存放在a7中。根据系统调用编号在syscalls(kernel/syscall.c:107)数组中匹配到处理函数。指令ecall会触发trap切换到内核中并引发uservec、usertrap和syscall执行。

系统调用参数列表

内核trap代码将用户寄存器放在当前进程的trap frame上，可以通过内核函数argint、argaddr和argfd返回第n个参数，它们通过调用argraw实现(kernel/syscall.c:34)。

有些参数通过用户地址传递，fechstr(kernel/syscall.c:25)函数能够拷贝用户传递的字符串。

内核空间的traps流程

内核空间trap路径是：kernelvec(kernel/kernelvec.S:12)->kerneltrap(kernel/trap.c:135)->kernelvec(kernel/kernelvec.S:12)

若trap不是设备中断，则异常会直接导致xv6内核出panic。

当处理器遇到trap进入内核空间时，总会禁用中断，直到设置stvec后。

缺页异常

若发生在用户空间，内核将终止相关进程。若发生在内核空间，则会直接panic。

许多内核利用缺页异常来实现写时复制（copy-on-write, COW）机制。COW fork的基本方法是，为父进程和子进程初始化共享所有物理页，但将他们的全部设置为只读。当进程向某页写入数据时，会触发store page faults。此时内核需要重新申请新的一页将数据拷贝过来，并再次进行映射。一个重要的优化项是，对于缺页发生在仅从该进程中引用的，无需进行拷贝。

另一个广泛使用的特性是惰性分配（lazy allocation）。当一个应用调用sbrk请求更多内存，内核只调整其使用大小，但不会申请物理内存和创建PTEs。当这些新地址被访问时，才会申请对应的内存页并完成映射。

还有一个广泛使用的特性是需求分页（demand paging）。当大型程序启动时，内核无需将所有数据加载到内存中，而仅仅配置足够的用户地址空间，并将其设置为无效。当发生页错误时，内核将页的内容读入并映射到用户空间。

为应对程序运行时需要的空间比硬件RAM大的场景，操作系统可以实现磁盘映射（paging to disk）。

练习

配置内核页表，让内核可以直接使用用户空间地址；
实现惰性内存分配机制；
实现COW fork；
是否有方法消除TRAPFRAME页映射到每个用户地址空间？比如，修改uservec函数将32个用户寄存器存入内核栈，或将其存入proc结构中？
是否有方法消除TRAMPOLINE页映射？

中断和设备驱动

在xv6中，内核trap会处理和识别设备中断，最终交于devintr(kernel/trap.c:178)。

Many device drivers execute code in two contexts: a top half that runs in a process’s kernel thread, and a bottom half that executes at interrupt time.

终端输入

终端驱动(kernel/console.c)是一个简单的驱动框架的示例。

The UART hardware that the driver talks to is a 16550 chip [13] emulated by QEMU. On a real computer, a 16550 would manage an RS232 serial link connecting to a terminal or other computer. When running QEMU, it’s connected to your keyboard and display.

UART的基地址是0x10000000(UART0, kernel/memlayout.h:21)，UART0各寄存器定义在文件(kernel/uart.c:22)中。

xv6的main调用consoleinit(kernel/console.c:182)函数，然后再调用uartinit(kernel/uart.c:53)函数来初始化UART硬件。

在init.c(user/init.c:19)打开的文件描述符，它能够读取xv6的命令。调用read系统调用，通过内核来调用consoleread(kernel/console.c:80)。它会一直等待中断并缓存数据到cons.buf中，直到整个行输入完成，会将缓存中数据拷贝给用户最终返回到用户空间。

当用户输入一个字符，UART硬件设备会想处理器产生一个中断，它会激活xv6的trap处理程序。设备中断最后会调用devintr(kernel/trap.c:178)进行处理。接着，通过PLIC硬件单元来分辨是哪个设备中断，如果是UART设备devintr会调用uartintr。

uartintr(kernel/uart.c:176)会从UART硬件读取任意输入字符，并交于consoleintr(kernel/console.c:136)进行处理；consoleintr的任务就是将输入放到cons.buf，直到整行输入完成，立即唤醒consoleread。

终端输出

在一个连接到终端的文件描述符的write系统调用，最终会调用uartputc(kernel/uart.c:87)。对于每个字符，会调用uartstart来开启设备发送。

驱动的并发

These calls acquire a lock, which protects the console driver’s data structures from concurrent access.

计时器中断

Xv6 uses timer interrupts to maintain its clock and to enable it to switch among compute-bound processes; the yield calls in usertrap and kerneltrap cause this switching.

RISC-V requires that timer interrupts be taken in machine mode, not supervisor mode.

代码在start.c配置接收计时器中断(kernel/start.c:63)。部分工作的目的是编写CLINT硬件（core-local interruptor），在特定延时后产生一个中断。最终，start配置mtvec到timervec并使能计时器中断。

A timer interrupt can occur at any point when user or kernel code is executing; there’s no way for the kernel to disable timer interrupts during critical operations.

The basic strategy is for the handler to ask the RISC-V to raise a “software interrupt” and immediately return.

在machine mode的中断处理程序是timervec(kernel/kernelvec.S:95)，它主要配置CLINT的MTIMECMP寄存器，并设置sip为2后立刻返回。

练习

修改uart.c完全不使用中断，同时也需要修改console.c
增加一个以太网卡驱动

Chapter 6 锁

Xv6 uses a number of concurrency control techniques, depending on the situation; many more are possible. This chapter focuses on a widely used technique: the lock.

竞争

锁

On the RISC-V this instruction is amoswap r, a. amoswap reads the value at the memory address a, writes the contents of register r to that address, and puts the value it read into r.

It performs this sequence atomically, using special hardware to prevent any other CPU from using the memory address between the read and the write.

Xv6’s acquire (kernel/spinlock.c:22) uses the portable C library call __sync_lock_test_and_set, which boils down to the amoswap instruction; the return value is the old (swapped) contents of lk->locked.

acquire(kernel/spinlock.c:22)利用riscv处理器的指令amoswap.w.aq a0, a0, (s1)实现。当获取锁成功时lk->locked为1。

release(kernel/spinlock.c:47)利用riscv处理器的指令amoswap.w zero, zero, (s1)实现。当释放锁成功时lk->locked为0。

使用锁

A hard part about using locks is deciding how many locks to use and which data and invariants each lock should protect.

再入锁

It might appear that some deadlocks and lock-ordering challenges could be avoided by using re-entrant locks, which are also called recursive locks. The idea is that if the lock is held by a process and if that process attempts to acquire the lock again, then the kernel could just allow this (since the process already has the lock), instead of calling panic, as the xv6 kernel does.

锁和中断处理程序

Some xv6 spinlocks protect data that is used by both threads and interrupt handlers.

To avoid this situation, if a spinlock is used by an interrupt handler, a CPU must never hold that lock with interrupts enabled.

例如，clockintr计时器中断处理会增加ticks(kernel/trap.c:164)，同时内核线程sys_sleep(kernel/sysproc.c:59)会读取ticks的值。锁tickslock会让两次访问串行化。

指令和内存顺序

It is natural to think of programs executing in the order in which source code statements appear. That’s a reasonable mental model for single-threaded code, but is incorrect when multiple threads interact through shared memory.

To tell the hardware and compiler not to re-order, xv6 uses __sync_synchronize() in both acquire (kernel/spinlock.c:22) and release (kernel/spinlock.c:47). __sync_synchronize() is a memory barrier: it tells the compiler and CPU to not reorder loads or stores across the barrier.

睡眠锁

Xv6 provides such locks in the form of sleep-locks. acquiresleep (kernel/sleeplock.c:22) yields the CPU while waiting

练习

若屏蔽kalloc(kernel/kalloc.c:69)acquire和release的调用，会出现哪些问题？若没有看到问题，原因是什么？
在kfree中屏蔽锁（恢复kalloc中的锁），会出现哪些问题？
修改kalloc.c源码让内存申请支持并发，CPU不用相互等待。
使用POSIX线程进行编码。例如，实现一个并行哈希表并测试puts/gets数据量是否随着核数量的增加而增加。
在xv6中实现pthreads的子集。实现一个用户级的线程库，这样一个用户进程就可以有多个线程，并安排这些线程运行在不同的cpu上并行运行。想出一个设计，正确地处理一个线程进行阻塞系统调用，并改变其共享地址空间。

调度

复用

Xv6 multiplexes by switching each CPU from one process to another in two situations. First, xv6’s sleep and wakeup mechanism switches when a process waits for device or pipe I/O to complete, or waits for a child to exit, or waits in the sleep system call. Second, xv6 periodically forces a switch to cope with processes that compute for long periods without sleeping.

上下文切换

it just saves and restores sets of 32 RISC-V registers, called contexts.

当进程想放弃CPU时，内核线程会调用swtch来保存它的上下文并返回调度器上下文。每个上下文都包含在struct context(kernel/proc.h:2)，进程的struct proc和CPU的struct cpu都包含上下文。

swtch(kernel/swtch.S:3)只保存被调用者保存的寄存器。C编译器会生成代码保存调用者保存的寄存器到栈上。

When swtch returns, it returns to the instructions pointed to by the restored ra register, that is, the instruction from which the new thread previously called swtch.

调度

当调用函数swtch后，会切换到调度器的栈。调度器会继续在循环中查找可以切换的进程，并再次调用swtch进行切换。

We just saw that xv6 holds p->lock across calls to swtch: the caller of swtch must already hold the lock, and control of the lock passes to the switched-to code.

mycpu和myproc

Xv6 maintains a struct cpu for each CPU (kernel/proc.h:22), which records the process currently running on that CPU (if any), saved registers for the CPU’s scheduler thread, and the count of nested spinlocks needed to manage interrupt disabling.

RISC-V numbers its CPUs, giving each a hartid. Xv6 ensures that each CPU’s hartid is stored in that CPU’s tp register while in the kernel. This allows mycpu to use tp to index an array of cpu structures to find the right one.

It would be more convenient if xv6 could ask the RISC-V hardware for the current hartid whenever needed, but RISC-V allows that only in machine mode, not in supervisor mode.

The return value of myproc is safe to use even if interrupts are enabled: if a timer interrupt moves the calling process to a different CPU, its struct proc pointer will stay the same.

睡眠和唤醒

Sleep and wakeup are often called sequence coordination or conditional synchronization mechanisms.

管道

A more complex example that uses sleep and wakeup to synchronize producers and consumers is xv6’s implementation of pipes. Each pipe is represented by a struct pipe, which contains a lock and a data buffer.

Let’s suppose that calls to piperead and pipewrite happen simultaneously on two different CPUs.

进程锁

The lock associated with each process (p->lock) is the most complex lock in xv6. A simple way to think about p->lock is that it must be held while reading or writing any of the following struct proc fields: p->state, p->chan, p->killed, p->xstate, and p->pid.

练习

文件系统

概述

The xv6 file system implementation is organized in seven layers

<---------------->
File descriptor
<---------------->
Pathname
<---------------->
Directory
<---------------->
Inode
<---------------->
Logging
<---------------->
Buffer cache
<---------------->
Disk
<---------------->

Disk hardware traditionally presents the data on the disk as a numbered sequence of 512-byte blocks (also called sectors): sector 0 is the first 512 bytes, sector 1 is the next, and so on.

The file system does not use block 0 (it holds the boot sector). Block 1 is called the superblock; it contains metadata about the file system (the file system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log). Blocks starting at 2 hold the log. After the log are the inodes, with multiple inodes per block. After those come bitmap blocks tracking which data blocks are in use. The remaining blocks are data blocks

xv6文件系统结构

0       <---------------->
        boot
1       <---------------->
        super
2       <---------------->
        log

        <---------------->
        inodes


        <---------------->
        bit map
        <---------------->
        data

        ....

        data
        <---------------->

缓存层

代码在bio.c中，缓存层有两个任务

同步访问硬盘块block。确保只有一个块的副本在内存中，且同一时间只有一个内核线程在使用它。
缓存热门数据块block。

The main interface exported by the buffer cache consists of bread and bwrite. A kernel thread must release a buffer by calling brelse when it is done with it.

bread (kernel/bio.c:93) calls bget to get a buffer for the given sector (kernel/bio.c:97).

When the caller is done with a buffer, it must call brelse to release it.

日志层

One of the most interesting problems in file system design is crash recovery.

Xv6 solves the problem of crashes during file-system operations with a simple form of logging.

Once the system call has logged all of its writes, it writes a special commit record to the disk indicating that the log contains a complete operation. At that point the system call copies the writes to the on-disk file system data structures. After those writes have completed, the system call erases the log on disk.

block块分配器

File and directory content is stored in disk blocks, which must be allocated from a free pool. Xv6’s block allocator maintains a free bitmap on disk, with one bit per block.

inode层

It might refer to the on-disk data structure containing a file’s size and list of data block numbers. Or “inode” might refer to an in-memory inode

磁盘上数据展现形式

dinode
|----------------|
type
|----------------|
major
|----------------|
minor
|----------------|
nlink
|----------------|
size
|----------------|
address 1           -->     data0_1
|----------------|
...
|----------------|
address 12          -->     data0_12
|----------------|
indirect            -->     indirect block
|----------------|          |----------------|
                            address 1           -->     data1_1
                            |----------------|
                            ...
                            |----------------|
                            address 256         -->     data1_256
                            |----------------|

文件夹层

A directory is implemented internally much like a file.

目录名称

文件描述符层

All the open files in the system are kept in a global file table, the ftable.

The functions sys_link and sys_unlink edit directories, creating or removing references to inodes. They are another good example of the power of using transactions.

练习

附录

CH.1 Unix utilities实验代码

sleep

#include "kernel/types.h"
#include "user/user.h"

int main(int argc, char *argv[])
{
    int seconds;

    if (argc <= 1)
    {
        fprintf(2, "sleep: need one arg\n");
        exit(0);
    }

    // ticks = 1/10 seconds
    seconds = 10 * atoi(argv[1]);
    if (seconds < 0)
    {
        seconds = 0;
    }

    sleep(seconds);
    exit(0);
}

pingpong

#include "kernel/types.h"
#include "user/user.h"

int main(int argc, char *argv[])
{
    int p1[2]; // parent->child
    int p2[2]; // child->parent
    pipe(p1);
    pipe(p2);
    if (fork() == 0)
    {
        char buffer[5];
        close(p1[1]);
        close(p2[0]);
        read(p1[0], buffer, sizeof(buffer));
        printf("%d: received %s\n", getpid(), buffer);
        write(p2[1], "pong", 5);
        close(p1[0]);
        close(p2[1]);
    }
    else
    {
        char buffer[5];
        close(p1[0]);
        close(p2[1]);
        write(p1[1], "ping", 5);
        read(p2[0], buffer, sizeof(buffer));
        printf("%d: received %s\n", getpid(), buffer);
        close(p1[1]);
        close(p2[0]);
    }
    exit(0);
}

primes

#include "kernel/types.h"
#include "user/user.h"

#define MAX_NUMBER 35

int main()
{
    int p0[2]; // parent->child
    int n, prime;
    pipe(p0);

    // feeds the numbers 2 through 35
    for (int i = 2; i <= MAX_NUMBER; ++i)
        write(p0[1], &i, sizeof(i));

    while (read(p0[0], &n, sizeof(n)))
    {
        printf("%d prime %d\n", getpid(), n);
        prime = n;
        int p1[2]; // child -> grandchild
        pipe(p1);

        if (fork() == 0)
        {
            // child
            close(p0[0]);
            close(p0[1]);
            p0[0] = p1[0];
            p0[1] = p1[1];
            continue;
        }
        else
        {
            // parent
            while (n < MAX_NUMBER && read(p0[0], &n, sizeof(n)))
                if (n % prime != 0)
                    write(p1[1], &n, sizeof(n));

            // close all resources
            close(p0[0]);
            close(p0[1]);
            close(p1[0]);
            close(p1[1]);
            // wait children
            if (n < MAX_NUMBER)
                wait(0);
            exit(0);
        }
    }

    wait(0);
    printf("%d exit\n", getpid());
    exit(0);
}

find

#include "kernel/types.h"
#include "user/user.h"
#include "kernel/fs.h"
#include "kernel/stat.h"

void cmp_file(char *path, char *name)
{
    char *str1, *str2;
    for (str1 = path + strlen(path), str2 = name + strlen(name); str1 >= path && str2 >= name && *str1 == *str2; --str1, --str2)
        ; // printf("%c %c\n", *str1, *str2);

    if ((int)(str2 - name) == -1)
        printf("%s\n", path);
}

void find(char *path, char *patern)
{
    int fd;
    struct dirent de;
    struct stat st;
    char buf[512], *p;

    if ((fd = open(path, 0)) < 0)
    {
        fprintf(2, "find: connot open %s\n", path);
        return;
    }

    if (fstat(fd, &st) < 0)
    {
        fprintf(2, "find: connot stat %s\n", path);
        close(fd);
        return;
    }

    switch (st.type)
    {
    case T_DEVICE:
    case T_FILE:
        cmp_file(path, patern);
        break;

    case T_DIR:
        if (strlen(path) + 1 + DIRSIZ + 1 > sizeof(buf))
        {
            printf("find: path too long\n");
            break;
        }

        strcpy(buf, path);
        p = buf + strlen(buf);
        if (*(p - 1) != '/')
            *p++ = '/';
        while (read(fd, &de, sizeof(de)) == sizeof(de))
        {
            if (de.inum == 0 || strcmp(".", de.name) == 0 || strcmp("..", de.name) == 0)
                continue;
            memmove(p, de.name, DIRSIZ);
            p[DIRSIZ] = 0;
            find(buf, patern);
        }
        break;
    }

    close(fd);
}

int main(int argc, char *argv[])
{
    if (argc <= 2)
    {
        fprintf(2, "find: need more args\n");
        exit(-1);
    }
    find(argv[1], argv[2]);
    exit(0);
}

xargs

#include "kernel/types.h"
#include "kernel/param.h"
#include "user/user.h"

int main(int argc, char *argv[])
{
    char buffer[512];
    char *newarg[MAXARG];
    int i;
    int new_idx;
    for (i = 1, new_idx = 0; i < argc; ++i)
        if (argv[i][0] == '-')
            i += 1;
        else
            newarg[new_idx++] = argv[i];
    while (*gets(buffer, sizeof(buffer)) != '\0')
    {
        buffer[strlen(buffer) - 1] = '\0';
        newarg[new_idx] = buffer;

        if (fork() == 0)
        {
            exec(newarg[0], newarg);
            exit(0);
        }
        wait(0);
    }

    exit(0);
}

Chapter 1 系统接口

Unix utilities实验

启动系统

sleep

pingpong

primes

find

xargs

Chapter 2 系统结构

system call实验

System call tracing

Sysinfo

练习

Chapter 3 页表

创建地址空间

物理内存分配

进程地址空间

sbrk

exec

练习

Chapter 4 traps和系统调用

riscv的trap机制

用户空间的traps流程

系统调用流程

系统调用参数列表

内核空间的traps流程

缺页异常

练习

中断和设备驱动

终端输入

终端输出

驱动的并发

计时器中断

练习

Chapter 6 锁

竞争

锁

使用锁

再入锁

锁和中断处理程序

指令和内存顺序

睡眠锁

练习

调度

复用

上下文切换

调度

mycpu和myproc

睡眠和唤醒

管道

进程锁

练习

文件系统

概述

缓存层

日志层

block块分配器

inode层

文件夹层

目录名称

文件描述符层

练习

附录

CH.1 Unix utilities实验代码

sleep

pingpong

primes

find

xargs

CH.2 system calls实验代码