Database

Main

MySQL optimization
- avoid query without limit.
- R/W instance
- Query cache + application cache (Redis/memcached)
- Sharding (distributed DB avoids this hassle)
  - horizontal partition
  - 区别：分库 + 分表
Redis
- expire on write
- more concurrent
- how to avoid DDoS?
  - if empty, still cache it with short expiration
Mysql 索引使用的数据结构主要有 BTree 索引和哈希索引。对于哈希索引来说，底层的数据结构就是哈希表，因此在绝大多数需求为单条记录查询的时候，可以选择哈希索引，查询性能最快；其余大部分场景，建议选择 BTree 索引。
MyISAM 更适合读密集的表，而 InnoDB 更适合写密集的的表。
数据模型质量的高低直接影响数据库性能的好坏。
ACID（原子性、一致性、隔离性和持久性）
- 原子性是指事务包含的所有操作要么全部成功，要么全部失败回滚，[删删删]因此事务的操作如果成功就必须要完全应用到数据库，如果操作失败则不能对数据库有任何影响。
- 一致性是指事务必须使数据库从一个一致性状态变换到另一个一致性状态，也就是说一个事务执行之前和执行之后都必须处于一致性状态。
- 隔离性是当多个用户并发访问数据库时，比如操作同一张表时，数据库为每一个用户开启的事务，不能被其他事务的操作所干扰，多个并发事务之间要相互隔离。
  - Example: transaction between bank account A, B, C
    - A -> B (A -= 10, B += 10)
    - A -> C (A -= 20, C += 20)
  - Isolation determines how transaction integrity is visible to other users and systems.
  - https://medium.com/@huynhquangthao/mysql-testing-isolation-levels-650a0d0fae75
    - How READ UNCOMMMITED might screw things up? In the real world, this behavior is dangerous because the second transaction can fail and rollback, or we only see the middle state of one transaction. This behavior easily leads to data inconsistency.
    - How READ COMMITED might screw things up? While the middle state might be consistent (since it is committed), it will expand -- which will alter the number of return entries in a select query in between the current transaction.
    - How REPEATABLE READ might screw things up? You will get the same result for read, but you can write into "unseen" new data!
- 持久性是指一个事务一旦被提交了，那么对数据库中的数据的改变就是永久性的，即便是在数据库系统遇到故障的情况下也不会丢失提交事务的操作。
Join
- https://www.geeksforgeeks.org/sql-join-set-1-inner-left-right-and-full-joins/
- left join(左联接) 返回包括左表中的所有记录和右表中联结字段相等的记录 right join(右联接) 返回包括右表中的所有记录和左表中联结字段相等的记录
  - The rows for which there is no matching row on right side, the result-set will contain null.
- inner join(等值连接) 只返回两个表中联结字段相等的行
- left join is also known as left outer join
MySQL MVCC?
- https://medium.com/@ajones1_999/understanding-mysql-multiversion-concurrency-control-6b52f1bd5b7e
- MySQL’s mechanism for allowing you to simultaneously read and write from the same row is called “Multi-version Concurrency Control”.
  - writes create new versions of rows, reads see the version that was current when they started.
  - In MySQL, this “version enabling thing” is a transaction id
  - So, in addition to the columns you may have updated, a write operation also marks the row with its transaction id
  - Every time MySQL writes data into a row, it also writes an entry into the rollback segment.
- InnoDB: 当 SELECT COUNT(*) FROM TABLE 时需要扫描全表
- MyIASM 是 MySQL 默认的引擎
Redis:
- zset
  - https://stackoverflow.com/questions/29800178/what-is-a-zset-in-redis-database
  - redis zset 的原理 https://juejin.im/post/6844904033589657607
- https://redis.io/topics/data-types-intro
- 请你回答一下 mongodb 和 redis 的区别
  - 参考回答：内存管理机制上：Redis 数据全部存在内存，定期写入磁盘，当内存不够时，可以选择指定的 LRU 算法删除数据。MongoDB 数据存在内存，由 linux 系统 mmap 实现，当内存不够时，只将热点数据放入内存，其他数据存在磁盘。支持的数据结构上：Redis 支持的数据结构丰富，包括 hash、set、list 等。MongoDB 数据结构比较单一，但是支持丰富的数据表达，索引，最类似关系型数据库，支持的查询语言非常丰富
- 基于 redis 实现定时任务
  - 需求：针对不同的用户能够实现不同时间的间隔循环任务。比如在用户注册成功 24 小时后给用户推送相关短信等类似需求。
    - 使用 crontab?太重，且基本不现实，不可能给每一个用户在服务器上生成一个定时任务。
    - 定时轮询?IO 频繁且效率太低 (pull is not good!)
  - 想到经常的使用的 redis 可以设置缓存时间，应该会有过期的事件通知吧
- architecture: single-threaded, event-driven I/O --> might be the reason for a large set of data structures
- rehash
  - What is rehash: The initial size of hash table dictht is 4. As more & more keys enter into the system, the hash table size also grows. When does redis resize hash table? Redis can resize hash tables...
  - 因为 redis 是单线程，当 K 很多时，如果一次性将键值对全部 rehash，庞大的计算量会影响服务器性能，甚至可能会导致服务器在一段时间内停止服务。不可能一步完成整个 rehash 操作，所以 redis 是分多次、渐进式的 rehash。
    - 有点像垃圾回收。。。
NoSQL:
- https://www.mongodb.com/nosql-explained
- MongoDB is a document DB. Similar to store data in JSON format.
- KV: Redis
- Wide-column: Each row is not required to have the same column (between relational and document?) -- e.g. HBase. Can store user profile data (no need to migrate?).
- Graph database: Neo4j. Node and relationships -- with better algorithmic queries for data mining.

Index

https://www.essentialsql.com/what-is-a-database-index/

Indexes are related to specific tables and consist of one or more keys.
索引是对数据库表中一列或多列的值进行排序的一种结构
The keys are a fancy term for the values we want to look up in the index. The keys are based on the tables’ columns.
Index -- compare with the book!
The structure that is used to store a database index is called a B+ Tree.
- In a B+ Tree, the key values are separated into many smaller piles

数据库设计三大范式：（https://www.cnblogs.com/linjiqin/archive/2012/04/01/2428695.html）

第一范式(确保每列保持原子性)：第一范式的合理遵循需要根据系统的实际需求来定。比如某些数据库系统中需要用到“地址”这个属性，本来直接将“地址”属性设计成一个数据库表的字段就行。但是如果系统经常会访问“地址”属性中的“城市”部分，那么就非要将“地址”这个属性重新拆分为省份、城市、详细地址等多个部分进行存储，这样在对地址中某一部分操作的时候将非常方便。
第二范式(确保表中的每列都和主键相关)：第二范式需要确保数据库表中的每一列都和主键相关，而不能只与主键的某一部分相关（主要针对联合主键而言）。也就是说在一个数据库表中，一个表中只能保存一种数据，不可以把多种数据保存在同一张数据库表中。
- 需要根据联合主键拆表
第三范式(确保每列都和主键列直接相关,而不是间接相关)：第三范式需要确保数据表中的每一列数据都和主键直接相关，而不能间接相关。
- 不考虑联合主键的情况下拆表消除冗余数据

注：这三者有所重复，层层递进。

Database

Main

- if empty, still cache it with short expiration

Index