OS
KB
- Scheduler:
- simplicity
- fairness
- responsiveness
- Low waiting time
- Flexible priority
- predicatability
- Write-back cache: only write back if the entry is going to be replaced.
- Architecture
- Monolithic OS: where the entire OS is working in kernel space and is alone in supervisor mode
- Modular OS: some part is implemented in modules that can be loaded at runtime
- Micro OS: kernel is broken down into separate processes
- Memory layout of a process
- Stack <------> Heap, Data, Text
- Process Control Block (PCB)
- PID
- process state: Start, Ready (from Start or from interrupted), Running, Waiting (for I/O etc.), Terminated
- Privileges
- Pointer to parent process
- Program counter
- CPU registers
- CPU scheduling info
- Memory info (page table, memory limits, segment table)
- Accounting info (CPU slices, time limits...)
- IO status info
- What is a file descriptor?
- Unix-y system
- Name for a unit of I/O "resource"
- Abstraction designed by an API standard
- Why is system call expensive?
- Thread vs process
- Threads share code, data, heap, and files, but not registers, and stack
- Kernel level threads
- I/O device
- Block device -- hard disk, USB cameras etc.
- Char devices -- serial ports, keyboards etc.
- DMA (direct memory access)
- Process scheduling
- PCBs are maintained in Process Scheduling Queues, and one queue for each process state.
- PCBs of the same state is placed in the same queue
- Job queue (all), ready queue (device queue -- for waiting ones).
- Context switch -- stores and restores the state of context of a CPU in a PCB -- essential for multi-tasking OS.
- Some hardware systems employ two or more sets of processor registers to avoid the cost
- Hyper-threading: basically cheating to gain performance from the I/O blocking in programs.
- What does
fork
do?
- https://www.geeksforgeeks.org/fork-system-call/
fork()
creates a new process by duplicating the calling process
- What are duplicated?
- Same PC (program counter)
- Same registers
- Same open files
- Returned an integer value
- negative: error
- zero: inside child process
- positive: PID
unistd.h
- virtual memory
- versus physical memory
- Each process has its own virtual memory, limited only by the width of pointer
- Contains both code and data (stack, heap)
malloc
, free
are the main APIs used, also includes calloc
and realloc
calloc
will initialize the allocated memory to 0, while malloc
doesn't
calloc
takes two parameters: num
of blocks and size per block.
realloc
: can be used to change the allocated size given an existing pointer.
- OS (kernel) is responsible for managing the mapping between virtual memory and physical memory
- Only a subset of addresses in the virtual memory is in VM, other used ones might be cached in disk.
- Page algorithm.
- Use less main memory, isolate processes, simplify programming
- Implementation: HW+SW
- Demand paging/segmentation
- If CPU try to use a page not available in the main memory, a memory access fault interrupt (page fault) is generated
- OS will try to find the required page in the logical address space.
- What is "logical address space"?
- Swapping: swapping a process out means removing all its pages from memory or mark them so. It will happen if you suspend a process from execution.
- Thrashing: if a process kept swapping pages in and out
- TLB: Translation Look-aside Buffer
- Each process will have a page table of PTEs (page table entries)
- PTE: frame number, valid/invalid bit, dirty bit, protection bit etc.
- The problem is where to place page table
- Main memory: still a bit slow
- TLB is a special cache for storing some queries -> most recently used PTE
- If TLB miss, the page number is used to index the process's full page table
- MMU (Memory Management Unit) is a special hardware that manages many aspects of paging, and include TLB
- Frames vs Pages
- The Physical Address Space is conceptually divided into a number of fixed-size blocks, called frames.
- The Logical address Space is also splitted into fixed-size blocks, called pages.
- Page Size = Frame Size
- Multi-level/hierarchical paging
- Virtual address can be split in to Level 1 offset + Level 1 offset + Level 3 offset + final offset.
- First, chop up the page table into page-sized units; then, if an entire page of page-table entries (PTEs) is invalid, don’t allocate that page of the page table at all.
- Difference between virtual memory and caches?
- IPC mechanisms
- https://opensource.com/article/19/4/interprocess-communication-linux-storage
- Shared files
- File lock API
flock
, fcntl
- Shared memory
- API:
- System V: legacy
- POSIX:
mmap
-- combines speed with persistence
- Protection: semaphores
- Message passing
- Pipes: file descriptor based
- Pipes provide a unidirectional interprocess communication channel.
- A pipe is created using pipe(2), which returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end.
- https://toroid.org/unix-pipe-implementation
- Allocate buffers, API for reading/writing from/to the buffer, and locking mechanism for access control (atomicity)
- Reading or writing pipe data is atomic if the size of data written is not greater than
PIPE_BUF
.
- The atomicity of a pipe operation becomes important when more than one process is involved (FIFOS). For example, if the number of bytes written to a pipe exceeds the atomic limit for a single operation, and multiple processes are writing to the pipe, the data will be "interleaved" or "chunked". In other words, one process may insert data into the pipeline between the writes of another.
- Sockets:
- Network protocols: have packeting overhead and are bound to even more resources (e.g. ports, loopback interface
- Unix domain socket: avoid some checks and routing. Access is controlled by file permission.
- RPC
- Signals
- Virtualization
- Hypervisor: control the actual hardware.
- Hypervisor + Machine = Host
- Full vs Para virtualization
- Xen & KVM are two hypervisor available in linux.
- Xen kernel needs to be installed and booted into.
- KVM is a module of Linux.
- Type-1 hypervisor: Bare metal hypervisor that runs on hardware (e.g. Hyper-V and ESXI Server)
- Type-2 hypervisor: e.g. VMware
- RAID: Redundant Array of Independent Disks https://en.wikipedia.org/wiki/RAID. RAID levels are basically a recipes of different types of trade-offs between reliability, performance, availability, and capacity. https://www.prepressure.com/library/technology/raid has some great illustrations.
- select、poll、epoll 之间的区别
- epoll 可以理解为 event poll,其实不是轮询!
- What is deadlock?
- What is starvation?
- Task is consistently waiting for resource
- How does
sleep
system call works?
- The process will be suspended by kernel and enter the pending queue
- The time is not guaranteed
- How is
time.sleep
implemented in Python?
- On linux, and most other POSIX platforms, it will generally use select. See the 3.3 source.
- Signals?
- Signals are software interrupts sent to a program to indicate that an important event has occurred
SIGHUP
: hang up
SIGINT
: Ctrl-C
SIGQUIT
: Ctrl-D
SIGFPE
: float point exception
SIGKILL
: must quit now!
SIGALRM
: used for timers
SIGTERM
: sent to kill by default
- Trap is signal handler
- https://stackoverflow.com/questions/529034/python-pass-or-sleep-for-long-running-processes
- Describe the process of Linux booting?
- BIOS: testing hardware, retrieve basic system information from CMOS, find MBR (Master Boot Record) on devices, execute section 0 in MBR
- MBR: Find a primary bootable partition, and execute that partition (second level MBR, e.g. LILO and GRUB)
- GRUB: bootloader code and partition table -- Locate installed OS, pass control to the user-selected OS
- Kernel: initialize devices and load
initrd
(init ramdisk) module; Mounts root FS
- Init: execute
/sbin/init
, start init process, starts swapping, check FS ... runs boot scripts ... starts system at run level
- Run-level: e.g. single user mode, X11 etc.
- How to terminate a process?
- How does timer/timing work?
- Low latency https://softwareengineering.stackexchange.com/questions/183723/low-latency-unix-linux
- kernel bypass: moves some of the TCP/IP stack from the kernel to the network card https://blog.cloudflare.com/kernel-bypass/
- Thread pinning
- Turn off hyper-threading
- What are the cost of abstractions?
- Socket: no delay, fast open, fast ack...
- DMA..
- Super large RAM to avoid I/O as much as possible -> in-memory database
- Lock-free, bitmap...
- Shared memory to avoid overhead of other types of IPC
- Don't share L3
- Turn of some interrupts
High-level
https://medium.com/cracking-the-data-science-interview/how-operating-systems-work-10-concepts-you-should-know-as-a-developer-8d63bb38331f
- Three key elements of an OS
- Abstractions: process, thread, file, socket, memory
- Mechanisms: create, schedule, open, write, allocate
- Policies: LRU, EDF
GRADUATE COURSE: Intro. to Computer Systems Research
- Concurrency
- Mutex: Mesa (Processes and Monitors)
- High-performance Web server
- C10K problem
- Event-driven, async I/O
- Low-level: TCP flow-control (buffer), cong-control (packet loss), slow-start (estimate bandwidth)
- Persistent connections
- Concurrency Control & Recovery in DB
- Motivation for concurrency: Performance (multi-core, I/O overlapping)
- Programming model: Transaction: ACID
- Serializability test
- 2 Phase Locking
- Data structures: Volatile/Non-volatile, buffer pool, paging, WAL
- STEAL/NO-STEAL: flush before committing?
- FORCE/NO-FORCE: flush every time immediately after committing?
- Logging: UNDO/REDO, Physical/Logical
- Recovery: Use log and checkpoint
- Distributed Transactions
- Why distribution? Partition of workload, Reliability by replication
- Coordinator + Participants
- DT Log on stable storage
- 2PC: network failure tolerant
- 3PC: Non-blocking
- Distributed Computation:
- Clocks, Event Ordering, and Global Predicate Computation
- Stable RTT based scheme to sync physical clock
- Lamport clock: can’t capture causality, hard to replay
- Vector clock: better than above — capturing causality
- Consensus protocol: Paxos
- Distributed File System
- Spanner
- Baseline: simplest approach to implement distributed transactions in a sharded setting is to layer 2PC over Paxos