- 为什么要用redis而不用map做缓存?
- redis为什么快,或者说 为什么比memcached等同类产品快?
- redis如何处理过期键?
- redis既然是基于内存的,那么redis如何持久化?
- redis如何进行数据同步,主从同步如何实现,mysql是如何实现的?
二、why not map
- map使用的是Java堆,受Java堆的内存限制,容易OOM,且面临过期、持久化等问题
- redis提供丰富的数据结构使用
- 天生的分布式缓存服务器
三、why so fast
动态字符串(simple dynamic string),SDS与C的字符串比较如下
SDS可以减少内存分配的次数(空间预分配机制)。在扩展空间时,除了分配修改时所必要的空间,还会分配额外的空闲空间(free 属性)。
SDS是二进制安全的,所有SDS API都会以处理二进制的方式来处理SDS存放在buf数组里的数据。
- 与一个定值进行计算,HashMap与数组大小有关
- 两个数组,一个存储实际数据,一个是为扩容而准备的数组,与HashMap不同的可以边扩容边使用
- 跳跃表
四、how to deal expired key
- 设置键的生存时间可以通过
命令。 - 设置键的过期时间可以通过
- PERSIST(移除过期时间)
- TTL(Time To Live)返回剩余生存时间,以秒为单位
- PTTL以毫秒为单位返回键的剩余生存时间
- 定时删除(对内存友好,对CPU不友好):到时间点上就把所有过期的键删除了。
- 惰性删除(对CPU极度友好,对内存极度不友好):每次从键空间取键的时候,判断一下该键是否过期了,如果过期了就删除。
- 定期删除(折中):每隔一段时间去删除过期键,限制删除的执行时长和频率。
Redis采用的是惰性删除 + 定期删除两种策略,所以说,在Redis里边如果过期键到了过期的时间了,未必被立马删除的!
volatile-lru -> Evict using approximated LRU among the keys with an expire set(least recently used,在过期键中找出最近最少使用的键进行淘汰)
allkeys-lru -> Evict any key using approximated LRU
volatile-lfu -> Evict using approximated LFU among the keys with an expire set(least frequently used,在过期键中找出最少使用的键进行淘汰)
allkeys-lfu -> Evict any key using approximated LFU
volatile-random -> Remove a random key among the ones with an expire set.
allkeys-random -> Remove a random key, any key
volatile-ttl -> Remove the key with the nearest expire time (minor TTL)
noeviction -> Don't evict anything, just return an error on write operations
下面是redis关于内存管理的配置说明,由此可知,默认的内存淘汰策略是不淘汰(maxmemory-policy noeviction),抛错。
############################## MEMORY MANAGEMENT ################################
# Set a memory usage limit to the specified amount of bytes.
# When the memory limit is reached Redis will try to remove keys
# according to the eviction policy selected (see maxmemory-policy).
# If Redis can't remove keys according to the policy, or if the policy is
# set to 'noeviction', Redis will start to reply with errors to commands
# that would use more memory, like SET, LPUSH, and so on, and will continue
# to reply to read-only commands like GET.
# This option is usually useful when using Redis as an LRU or LFU cache, or to
# set a hard memory limit for an instance (using the 'noeviction' policy).
# WARNING: If you have replicas attached to an instance with maxmemory on,
# the size of the output buffers needed to feed the replicas are subtracted
# from the used memory count, so that network problems / resyncs will
# not trigger a loop where keys are evicted, and in turn the output
# buffer of replicas is full with DELs of keys evicted triggering the deletion
# of more keys, and so forth until the database is completely emptied.
# In short... if you have replicas attached it is suggested that you set a lower
# limit for maxmemory so that there is some free RAM on the system for replica
# output buffers (but this is not needed if the policy is 'noeviction').
# maxmemory <bytes>
### 介绍了有哪几种淘汰算法
# MAXMEMORY POLICY: how Redis will select what to remove when maxmemory
# is reached. You can select among five behaviors:
# LRU means Least Recently Used
# LFU means Least Frequently Used
# Both LRU, LFU and volatile-ttl are implemented using approximated
# randomized algorithms.
# Note: with any of the above policies, Redis will return an error on write
# operations, when there are no suitable keys for eviction.
# At the date of writing these commands are: set setnx setex append
# incr decr rpush lpush rpushx lpushx linsert lset rpoplpush sadd
# sinter sinterstore sunion sunionstore sdiff sdiffstore zadd zincrby
# zunionstore zinterstore hset hsetnx hmset hincrby incrby decrby
# getset mset msetnx exec sort
# The default is:
# maxmemory-policy noeviction
### 对于lru、lfu、ttl的淘汰算法并不是精确的,而是近似的,默认从随机选的5个键进行比较,并解释了为什么不是3个(不准确)、不是10个(耗cpu)
# LRU, LFU and minimal TTL algorithms are not precise algorithms but approximated
# algorithms (in order to save memory), so you can tune it for speed or
# accuracy. For default Redis will check five keys and pick the one that was
# used less recently, you can change the sample size using the following
# configuration directive.
# The default of 5 produces good enough results. 10 Approximates very closely
# true LRU but costs more CPU. 3 is faster but not very accurate.
# maxmemory-samples 5
### 从redis5开始,副本默认会忽略最大内存限制
# Starting from Redis 5, by default a replica will ignore its maxmemory setting
# (unless it is promoted to master after a failover or manually). It means
# that the eviction of keys will be just handled by the master, sending the
# DEL commands to the replica as keys evict in the master side.
# This behavior ensures that masters and replicas stay consistent, and is usually
# what you want, however if your replica is writable, or you want the replica to have
# a different memory setting, and you are sure all the writes performed to the
# replica are idempotent, then you may change this default (but be sure to understand
# what you are doing).
# Note that since the replica by default does not evict, it may end using more
# memory than the one set via maxmemory (there are certain buffers that may
# be larger on the replica, or data structures may sometimes take more memory and so
# forth). So make sure you monitor your replicas and make sure they have enough
# memory to never hit a real out-of-memory condition before the master hits
# the configured maxmemory setting.
# replica-ignore-maxmemory yes
五、how to persistence
- RDB(Redis Database),将某一时刻的所有数据保存到一个RDB文件中。
- AOF(Append-Only-File),当Redis服务器执行写命令的时候,将执行的写命令保存到AOF文件中
RDB(Redis Database)持久化可以手动执行,也可以根据服务器配置定期执行。RDB持久化所生成的RDB文件是一个经过压缩的二进制文件,Redis可以通过这个文件还原数据库的数据。
save 900 1 #在900秒(15分钟)之后,至少有1个key发生变化,
save 300 10 #在300秒(5分钟)之后,至少有10个key发生变化
save 60 10000 #在60秒(1分钟)之后,至少有10000个key发生变化
################################ SNAPSHOTTING ################################
### 介绍了rdb触发的条件,生成save point,如果不满足就不会触发rdb
### 注意,由于触发的条件的关系,内存数据不会实时保持到硬盘,可能会导致数据的丢失
# Save the DB on disk:
# save <seconds> <changes>
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
# Note: you can disable saving completely by commenting out all "save" lines.
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
# save ""
# By default Redis will stop accepting writes if RDB snapshots are enabled
# (at least one save point) and the latest background save failed.
# This will make the user aware (in a hard way) that data is not persisting
# on disk properly, otherwise chances are that no one will notice and some
# disaster will happen.
# If the background saving process will start working again Redis will
# automatically allow writes again.
# However if you have setup your proper monitoring of the Redis server
# and persistence, you may want to disable this feature so that Redis will
# continue to work as usual even if there are problems with disk,
# permissions, and so forth.
stop-writes-on-bgsave-error yes
# Compress string objects using LZF when dump .rdb databases?
# For default that's set to 'yes' as it's almost always a win.
# If you want to save some CPU in the saving child set it to 'no' but
# the dataset will likely be bigger if you have compressible values or keys.
rdbcompression yes
# Since version 5 of RDB a CRC64 checksum is placed at the end of the file.
# This makes the format more resistant to corruption but there is a performance
# hit to pay (around 10%) when saving and loading RDB files, so you can disable it
# for maximum performances.
# RDB files created with checksum disabled have a checksum of zero that will
# tell the loading code to skip the check.
rdbchecksum yes
### 一些关于rdb文件名、文件保存目录的设置
# The filename where to dump the DB
dbfilename dump.rdb
# The working directory.
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
# The Append Only File will also be created inside this directory.
# Note that you must specify a directory here, not a file name.
dir /var/lib/redis
AOF(Append-Only-File)通过记录每次对服务器写的操作,当服务器重启的时候会重新执行这些命令来恢复原始的数据,AOF 命令以 Redis 特定协议追加每次写操作到文件末尾。
- 命令追加:命令写入aof_buf缓冲区
- 文件写入:调用flushAppendOnlyFile函数,考虑是否要将aof_buf缓冲区写入AOF文件中
- 文件同步:考虑是否将内存缓冲区的数据真正写入到硬盘
appendfsync always # 每次有数据修改发生时都会写入AOF文件。
appendfsync everysec # 每秒钟同步一次,该策略为AOF的默认策略。
appendfsync no # 从不同步,有linux宿主机确定调用时机。高效但是数据不会被持久化。
- 子进程进行AOF重写期间,主进程可以继续处理命令请求;
- 子进程带有主进程的数据副本,使用子进程而不是线程,可以避免在锁的情况下,保证数据的安全性。
- 为了解决这种数据不一致的问题,Redis增加了一个AOF重写缓存,这个缓存在fork出子进程之后开始启用,Redis服务器主进程在执行完写命令之后,会同时将这个写命令追加到AOF缓冲区和AOF重写缓冲区
- 即子进程在执行AOF重写时,主进程需要执行以下三个工作:
- 执行client发来的命令请求
- 将写命令追加到现有的AOF文件中
- 将写命令追加到AOF重写缓存中
############################## APPEND ONLY MODE ###############################
### 介绍了aof为什么存在,可与rdb同时存在,如果都开启时,redis默认恢复数据先加载aof文件
### 由配置文件可知,默认是关闭aof的
# By default Redis asynchronously dumps the dataset on disk. This mode is
# good enough in many applications, but an issue with the Redis process or
# a power outage may result into a few minutes of writes lost (depending on
# the configured save points).
# The Append Only File is an alternative persistence mode that provides
# much better durability. For instance using the default data fsync policy
# (see later in the config file) Redis can lose just one second of writes in a
# dramatic event like a server power outage, or a single write if something
# wrong with the Redis process itself happens, but the operating system is
# still running correctly.
# AOF and RDB persistence can be enabled at the same time without problems.
# If the AOF is enabled on startup Redis will load the AOF, that is the file
# with the better durability guarantees.
# Please check http://redis.io/topics/persistence for more information.
appendonly no
# The name of the append only file (default: "appendonly.aof")
appendfilename "appendonly.aof"
### 决定刷盘的频率,默认为每秒刷一次,这意味着最多丢失一秒的写数据
# The fsync() call tells the Operating System to actually write data on disk
# instead of waiting for more data in the output buffer. Some OS will really flush
# data on disk, some other OS will just try to do it ASAP.
# Redis supports three different modes:
# no: don't fsync, just let the OS flush the data when it wants. Faster.
# always: fsync after every write to the append only log. Slow, Safest.
# everysec: fsync only one time every second. Compromise.
# The default is "everysec", as that's usually the right compromise between
# speed and data safety. It's up to you to understand if you can relax this to
# "no" that will let the operating system flush the output buffer when
# it wants, for better performances (but if you can live with the idea of
# some data loss consider the default persistence mode that's snapshotting),
# or on the contrary, use "always" that's very slow but a bit safer than
# everysec.
# More details please check the following article:
# http://antirez.com/post/redis-persistence-demystified.html
# If unsure, use "everysec".
# appendfsync always
appendfsync everysec
# appendfsync no
# When the AOF fsync policy is set to always or everysec, and a background
# saving process (a background save or AOF log background rewriting) is
# performing a lot of I/O against the disk, in some Linux configurations
# Redis may block too long on the fsync() call. Note that there is no fix for
# this currently, as even performing fsync in a different thread will block
# our synchronous write(2) call.
# In order to mitigate this problem it's possible to use the following option
# that will prevent fsync() from being called in the main process while a
# BGSAVE or BGREWRITEAOF is in progress.
# This means that while another child is saving, the durability of Redis is
# the same as "appendfsync none". In practical terms, this means that it is
# possible to lose up to 30 seconds of log in the worst scenario (with the
# default Linux settings).
# If you have latency problems turn this to "yes". Otherwise leave it as
# "no" that is the safest pick from the point of view of durability.
no-appendfsync-on-rewrite no
### 设置aof触发重写的条件
# Automatic rewrite of the append only file.
# Redis is able to automatically rewrite the log file implicitly calling
# BGREWRITEAOF when the AOF log size grows by the specified percentage.
# This is how it works: Redis remembers the size of the AOF file after the
# latest rewrite (if no rewrite has happened since the restart, the size of
# the AOF at startup is used).
# This base size is compared to the current size. If the current size is
# bigger than the specified percentage, the rewrite is triggered. Also
# you need to specify a minimal size for the AOF file to be rewritten, this
# is useful to avoid rewriting the AOF file even if the percentage increase
# is reached but it is still pretty small.
# Specify a percentage of zero in order to disable the automatic AOF
# rewrite feature.
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
### 一些对aof加载持久化数据的优化策略
# An AOF file may be found to be truncated at the end during the Redis
# startup process, when the AOF data gets loaded back into memory.
# This may happen when the system where Redis is running
# crashes, especially when an ext4 filesystem is mounted without the
# data=ordered option (however this can't happen when Redis itself
# crashes or aborts but the operating system still works correctly).
# Redis can either exit with an error when this happens, or load as much
# data as possible (the default now) and start if the AOF file is found
# to be truncated at the end. The following option controls this behavior.
# If aof-load-truncated is set to yes, a truncated AOF file is loaded and
# the Redis server starts emitting a log to inform the user of the event.
# Otherwise if the option is set to no, the server aborts with an error
# and refuses to start. When the option is set to no, the user requires
# to fix the AOF file using the "redis-check-aof" utility before to restart
# the server.
# Note that if the AOF file will be found to be corrupted in the middle
# the server will still exit with an error. This option only applies when
# Redis will try to read more data from the AOF file but not enough bytes
# will be found.
aof-load-truncated yes
# When rewriting the AOF file, Redis is able to use an RDB preamble in the
# AOF file for faster rewrites and recoveries. When this option is turned
# on the rewritten AOF file is composed of two different stanzas:
# [RDB file][AOF tail]
# When loading Redis recognizes that the AOF file starts with the "REDIS"
# string and loads the prefixed RDB file, and continues loading the AOF
# tail.
aof-use-rdb-preamble yes
- 执行
命令创建出的RDB文件,程序会对数据库中的过期键检查,已过期的键不会保存在RDB文件中。 - 载入RDB文件时,程序同样会对RDB文件中的键进行检查,过期的键会被忽略。
- 主从服务器:RDB文件无论是生成或载入,都会对过期键进行检查;生成时,过期键不写入;载入时,过期键会忽略。而从服务器载入时,不会检查是否过期,数据都会载入。
- 如果数据库的键已过期,但还没被惰性/定期删除,AOF文件不会因为这个过期键产生任何影响(也就说会保留),当过期的键被删除了以后,会追加一条DEL命令来显示记录该键被删除了。
- 重写AOF文件时,程序会对AOF文件中的键进行检查,过期的键会被忽略。
- 主从服务器:由主服务器进行删除过期键,并显式向从服务器发送DEL命令;从服务器自身不具备删除过期键值行为。
六、how to sync data
- 主服务器负责接收写请求(可以读,但一般只用于写)
- 从服务器负责接收读请求
- 从服务器的数据由主服务器复制过去。主从服务器的数据是一致的(可能存在数据延迟)
- 读写分离(主服务器负责写,从服务器负责读)
- 高可用(某一台从服务器挂了,其他从服务器还能继续接收请求,不影响服务)
- 处理更多的并发量(每台从服务器都可以接收读请求,读QPS就上去了)
- 同步(sync):将从服务器的数据库状态更新至主服务器的数据库状态。
- 命令传播(command propagate):主服务器的数据库状态被修改,导致主从服务器的数据库状态不一致,让主从服务器的数据库状态重新回到一致状态。
- 初次同步-完整:从服务器没有复制过任何的主服务器,从主服务器同步全部数据。
- 初次同步-不完整:从服务器没有复制过任何的主服务器,从主服务器同步了部分数据。
- 初次同步完整后,断线同步,进行命令传播:处于命令传播阶段的主从服务器因为网络原因中断了复制,从服务器通过自动重连重新连接主服务器,并继续复制主服务器
- 从服务器调用slaveof命令,设置主服务的IP地址和端口,尝试建立与主服务器的socket连接
- 发送ping命令检测主服务器的读写能力(发送身份凭据,如果需要)
- 主服务记录从服务的IP地址和端口,成功建立连接
- 从服务器向主服务器发送PSYNC命令
- 收到PSYNC命令的主服务器执行BGSAVE命令,在后台生成一个RDB文件。并用一个缓冲区来记录从现在开始执行的所有写命令。
- 当主服务器的BGSAVE命令执行完后,将生成的RDB文件发送给从服务器,从服务器接收和载入RBD文件。将自己的数据库状态更新至与主服务器执行BGSAVE命令时的状态。
- 主服务器将所有缓冲区的写命令发送给从服务器,从服务器执行这些写命令,达到数据最终一致性。
- 主从服务器的复制偏移量
- 主服务器的复制积压缓冲区
- 服务器运行的ID(run ID)
- 主服务器每次传播N个字节,就将自己的复制偏移量加上N
- 从服务器每次收到主服务器的N个字节,就将自己的复制偏移量加上N
服务器运行的ID(run ID)实际上就是用来比对ID是否相同。如果不相同,则说明从服务器断线之前复制的主服务器和当前连接的主服务器是两台服务器,这就会进行完整重同步。
在命令传播阶段,从服务器默认会以每秒一次的频率,向服务器发送命令REPLCONF ACK <replication_offset>
- 检测主从服务器的网络状态
- 辅助实现min-slaves选项
- 检测命令丢失
