OS

KB

Scheduler:
- simplicity
- fairness
- responsiveness
- Low waiting time
- Flexible priority
- predicatability
Write-back cache: only write back if the entry is going to be replaced.
Architecture
- Monolithic OS: where the entire OS is working in kernel space and is alone in supervisor mode
- Modular OS: some part is implemented in modules that can be loaded at runtime
- Micro OS: kernel is broken down into separate processes
Memory layout of a process
- Stack <------> Heap, Data, Text
Process Control Block (PCB)
- PID
- process state: Start, Ready (from Start or from interrupted), Running, Waiting (for I/O etc.), Terminated
- Privileges
- Pointer to parent process
- Program counter
- CPU registers
- CPU scheduling info
- Memory info (page table, memory limits, segment table)
- Accounting info (CPU slices, time limits...)
- IO status info
What is a file descriptor?
- Unix-y system
- Name for a unit of I/O "resource"
- Abstraction designed by an API standard
Why is system call expensive?
Thread vs process
- Threads share code, data, heap, and files, but not registers, and stack
Kernel level threads
- https://www.quora.com/Are-POSIX-threads-user-level-threads-or-kernel-level-threads-Can-they-be-combination-of-both
- https://stackoverflow.com/questions/29122619/why-do-modern-osses-linux-windows-solaris-implement-a-one-one-thread-model
  - many to one -- blocking
  - one to one -- overhead
  - many to many -- complexity
I/O device
- Block device -- hard disk, USB cameras etc.
- Char devices -- serial ports, keyboards etc.
DMA (direct memory access)
- CPU can grant I/O module the authority to read from or write to memory directly
- Needs special hardware called DMA controller
- https://stackoverflow.com/questions/3851677/what-is-the-difference-between-dma-and-memory-mapped-io
- RDMA -- high-throughput, low-latency networking, esp. useful in computer clusters
Process scheduling
- PCBs are maintained in Process Scheduling Queues, and one queue for each process state.
- PCBs of the same state is placed in the same queue
- Job queue (all), ready queue (device queue -- for waiting ones).
- Context switch -- stores and restores the state of context of a CPU in a PCB -- essential for multi-tasking OS.
  - Some hardware systems employ two or more sets of processor registers to avoid the cost
Hyper-threading: basically cheating to gain performance from the I/O blocking in programs.
What does fork do?
- https://www.geeksforgeeks.org/fork-system-call/
- fork() creates a new process by duplicating the calling process
- What are duplicated?
  - Same PC (program counter)
  - Same registers
  - Same open files
- Returned an integer value
  - negative: error
  - zero: inside child process
  - positive: PID
- unistd.h
virtual memory
- versus physical memory
- Each process has its own virtual memory, limited only by the width of pointer
- Contains both code and data (stack, heap)
- malloc, free are the main APIs used, also includes calloc and realloc
  - calloc will initialize the allocated memory to 0, while malloc doesn't
  - calloc takes two parameters: num of blocks and size per block.
  - realloc: can be used to change the allocated size given an existing pointer.
- OS (kernel) is responsible for managing the mapping between virtual memory and physical memory
  - Only a subset of addresses in the virtual memory is in VM, other used ones might be cached in disk.
  - Page algorithm.
- Use less main memory, isolate processes, simplify programming
- Implementation: HW+SW
  - Demand paging/segmentation
    - If CPU try to use a page not available in the main memory, a memory access fault interrupt (page fault) is generated
    - OS will try to find the required page in the logical address space.
      - What is "logical address space"?
- Swapping: swapping a process out means removing all its pages from memory or mark them so. It will happen if you suspend a process from execution.
- Thrashing: if a process kept swapping pages in and out
- TLB: Translation Look-aside Buffer
  - Each process will have a page table of PTEs (page table entries)
    - PTE: frame number, valid/invalid bit, dirty bit, protection bit etc.
  - The problem is where to place page table
    - Main memory: still a bit slow
    - TLB is a special cache for storing some queries -> most recently used PTE
      - If TLB miss, the page number is used to index the process's full page table
  - MMU (Memory Management Unit) is a special hardware that manages many aspects of paging, and include TLB
- Frames vs Pages
  - The Physical Address Space is conceptually divided into a number of fixed-size blocks, called frames.
  - The Logical address Space is also splitted into fixed-size blocks, called pages.
  - Page Size = Frame Size
- Multi-level/hierarchical paging
  - Virtual address can be split in to Level 1 offset + Level 1 offset + Level 3 offset + final offset.
  - First, chop up the page table into page-sized units; then, if an entire page of page-table entries (PTEs) is invalid, don’t allocate that page of the page table at all.
Difference between virtual memory and caches?
IPC mechanisms
- https://opensource.com/article/19/4/interprocess-communication-linux-storage
- Shared files
  - File lock API flock, fcntl
- Shared memory
  - API:
    - System V: legacy
    - POSIX: mmap -- combines speed with persistence
  - Protection: semaphores
- Message passing
- Pipes: file descriptor based
  - Pipes provide a unidirectional interprocess communication channel.
  - A pipe is created using pipe(2), which returns two file descriptors, one referring to the read end of the pipe, the other referring to the write end.
  - https://toroid.org/unix-pipe-implementation
    - Allocate buffers, API for reading/writing from/to the buffer, and locking mechanism for access control (atomicity)
    - Reading or writing pipe data is atomic if the size of data written is not greater than PIPE_BUF.
  - The atomicity of a pipe operation becomes important when more than one process is involved (FIFOS). For example, if the number of bytes written to a pipe exceeds the atomic limit for a single operation, and multiple processes are writing to the pipe, the data will be "interleaved" or "chunked". In other words, one process may insert data into the pipeline between the writes of another.
- Sockets:
  - Network protocols: have packeting overhead and are bound to even more resources (e.g. ports, loopback interface
  - Unix domain socket: avoid some checks and routing. Access is controlled by file permission.
    - https://medium.com/swlh/getting-started-with-unix-domain-sockets-4472c0db4eb1
- RPC
- Signals
Virtualization
- Hypervisor: control the actual hardware.
- Hypervisor + Machine = Host
  - Many VMs: guests
- Full vs Para virtualization
- Xen & KVM are two hypervisor available in linux.
  - Xen kernel needs to be installed and booted into.
  - KVM is a module of Linux.
  - Type-1 hypervisor: Bare metal hypervisor that runs on hardware (e.g. Hyper-V and ESXI Server)
  - Type-2 hypervisor: e.g. VMware
RAID: Redundant Array of Independent Disks https://en.wikipedia.org/wiki/RAID. RAID levels are basically a recipes of different types of trade-offs between reliability, performance, availability, and capacity. https://www.prepressure.com/library/technology/raid has some great illustrations.
select、poll、epoll 之间的区别
- epoll 可以理解为 event poll，其实不是轮询!

What is deadlock?
What is starvation?
- Task is consistently waiting for resource
  - e.g. due to Low priority
How does sleep system call works?
- The process will be suspended by kernel and enter the pending queue
- The time is not guaranteed
- How is time.sleep implemented in Python?
  - On linux, and most other POSIX platforms, it will generally use select. See the 3.3 source.
Signals?
- Signals are software interrupts sent to a program to indicate that an important event has occurred
- SIGHUP: hang up
- SIGINT: Ctrl-C
- SIGQUIT: Ctrl-D
- SIGFPE: float point exception
- SIGKILL: must quit now!
- SIGALRM: used for timers
- SIGTERM: sent to kill by default
- Trap is signal handler
https://stackoverflow.com/questions/529034/python-pass-or-sleep-for-long-running-processes
Describe the process of Linux booting?
- BIOS: testing hardware, retrieve basic system information from CMOS, find MBR (Master Boot Record) on devices, execute section 0 in MBR
- MBR: Find a primary bootable partition, and execute that partition (second level MBR, e.g. LILO and GRUB)
- GRUB: bootloader code and partition table -- Locate installed OS, pass control to the user-selected OS
- Kernel: initialize devices and load initrd (init ramdisk) module; Mounts root FS
- Init: execute /sbin/init, start init process, starts swapping, check FS ... runs boot scripts ... starts system at run level
- Run-level: e.g. single user mode, X11 etc.
How to terminate a process?
- https://unix.stackexchange.com/questions/124127/kill-all-descendant-processes
How does timer/timing work?
Low latency https://softwareengineering.stackexchange.com/questions/183723/low-latency-unix-linux
- kernel bypass: moves some of the TCP/IP stack from the kernel to the network card https://blog.cloudflare.com/kernel-bypass/
- Thread pinning
- Turn off hyper-threading
- What are the cost of abstractions?
- Socket: no delay, fast open, fast ack...
- DMA..
- Super large RAM to avoid I/O as much as possible -> in-memory database
- Lock-free, bitmap...
- Shared memory to avoid overhead of other types of IPC
- Don't share L3
- Turn of some interrupts

dynamic_cast vs static_cast
- https://stackoverflow.com/questions/28002/regular-cast-vs-static-cast-vs-dynamic-cast
https://tests4geeks.com/blog/cpp-interview-questions/#q1
RAII (Resource acquisition is initialization)
- Associate resource lifecycle with object lifecycle to enable safe, easy resource management

High-level

https://medium.com/cracking-the-data-science-interview/how-operating-systems-work-10-concepts-you-should-know-as-a-developer-8d63bb38331f

Three key elements of an OS
- Abstractions: process, thread, file, socket, memory
- Mechanisms: create, schedule, open, write, allocate
- Policies: LRU, EDF

GRADUATE COURSE: Intro. to Computer Systems Research

Concurrency
- Mutex: Mesa (Processes and Monitors)
High-performance Web server
- C10K problem
- Event-driven, async I/O
- Low-level: TCP flow-control (buffer), cong-control (packet loss), slow-start (estimate bandwidth)
- Persistent connections
Concurrency Control & Recovery in DB
- Motivation for concurrency: Performance (multi-core, I/O overlapping)
- Programming model: Transaction: ACID
  - Serializability test
  - 2 Phase Locking
- Data structures: Volatile/Non-volatile, buffer pool, paging, WAL
  - STEAL/NO-STEAL: flush before committing?
  - FORCE/NO-FORCE: flush every time immediately after committing?
- Logging: UNDO/REDO, Physical/Logical
- Recovery: Use log and checkpoint
Distributed Transactions
- Why distribution? Partition of workload, Reliability by replication
- Coordinator + Participants
- DT Log on stable storage
- 2PC: network failure tolerant
- 3PC: Non-blocking
Distributed Computation:
- Clocks, Event Ordering, and Global Predicate Computation
- Stable RTT based scheme to sync physical clock
- Lamport clock: can’t capture causality, hard to replay
- Vector clock: better than above — capturing causality
Consensus protocol: Paxos
Distributed File System
- Spanner
  - Baseline: simplest approach to implement distributed transactions in a sharded setting is to layer 2PC over Paxos