Distributed
Main
- etcd 分布锁机制:https://juejin.im/post/6883866765890322445
- Herd effect: https://blog.csdn.net/aazhzhu/article/details/89967346
- Distributed file system
- NFS: mount remote directory in local FS
- HDFS (Hadoop distributed file system): fault-tolerant with replications. No single remote.
- CAP: tradeoff
- Consistency
- Availability
- Partition tolerance
- LSM tree?
- Zookeeper 做服务的注册中心,如果服务规模大于 1000,会发生羊群效应网络风暴,怎么优化?
- etcd 服务发现机制是啥
- https://www.slideshare.net/SreenivasMakam/service-discovery-using-etcd-consul-and-kubernetes
- Abstract IP/port with service
- Health checking
- Load balancing
- https://blog.csdn.net/u010523770/article/details/78347309 这篇文章很好,谈论分布式系统的一致性问题
- 简述 Flink 的状态机制
Paxos
https://static.googleusercontent.com/media/research.google.com/en//archive/paxos_made_live.pdf Chubby
- The network might drop messages between replicas
- I kinda feel like “multi-“ part can be elegantly decomposed from the whole system
- Also, it seems like I have been doing the distinguished proposer in the wrong way
- " Paxos must force future coordinators to select that same value in order to ensure continued agreement” — How to force, I don’t quite understand
- The reason why this works is subtle (and I don’t really understand)
- We use a catch-up mechanism to enable lagging replicas to catch up with leading replicas
- It is also to do “chaining” of multiple message in Paxos, which we don’t prepare to implement
- How to test: failure injection
- Our client-server also miss a proper seq number (client can implement this trivially)
https://stackoverflow.com/questions/34281075/paxos-phase-2a-message-loss
Google Spanner: Google's Globally-Distributed Database