Just come back from MYSQL Sunday sessions of 2010 Oracle Open World. Share some thoughts while memory is still fresh, the two Facebook sessions are pretty good. Harrison shared some insight of how Facebook using MYSQL replication from operation perspective. Mark's session is similar to the one he did in MYSQL Conf this April.
Global Transaction IDs
We recently experienced some network issue between our Data Center and Cloud provider. Finding the right binlog position for slaves in cloud after network issue some time can be tricky. And in many cases, the relylog is totally toasted, we need to change master to previous position based on exec_master_log_pos. The thing is we can't guarantee the integrity of slave data because some corrupted statement could get applied. In most case, the slave SQL thread stopped because it reads invalid statement from corrupted relylog. Who's to say there's no corrupted 'valid' statement applied?
Well, we are still considered lucky in this case as master is operational and all binlogs are in good shape. Image if Master crashes and we need to prompt one slave to new master and sync other slaves to the new master. Theoretically we need to find a slave that closest to master and prompt it to master and then sync the rest slaves. It’s easy saying than to be done in real world crashes. Because MYSQL didn’t provide an easy mapping between master BINLOG position and slave Rely LOG position. On top of that, each slave maintain its own rely log that means the rely log number and position can’t be shared globally among slaves. This makes the tasks to sync all slaves with new master particularly difficult if not impossible. What Facebook and Google MYSQL team was using is a Global Transaction ID (similar to SCN in Oracle), they use this Global Transaction ID to map to rely log and binlog position, this makes slave master synchronization a lot easier. Unfortunately, this Global Transaction ID only exists in Mysql Google patch and Facebook patch. Not in main line MySQL.
DRBD Master
The second thought is derived from first, switch between master and slaves is not an easy task especially with multi-slaves setup. It’s particularly difficult when master crashes instead of graceful shutdown. Even with Global Transaction ID it’s still involve manual work and prolonged downtime. So some users actually setup protected masters using DRBD. http://www.drbd.org/ DRBD is a pair of DB in a cluster that has block to block mirroring. If case of crashes, the other one can quickly take over with minimum data loss.
Thus we got master protected and guaranteed. We can then build slaves against the cluster. Greatly reduced the need to synchronize slaves as master pretty much crash free.
Of course this is not solution for everything, by eliminating the need to switch between master and slave it also loss some advantage of master slave rotation. For example, we can’t run expensive DDL like ALTER TABLE on slaves and switch to master and do the same DDL on former master. Because in this case, slave is not in the same league as master.
Seconds behind master
I think most of you already noticed this is not a reliable indicator to determine if slave is running behind master. The problem is because this indicator is not a heartbeat indicator. It’s only updated when there’s update events received by slaves. So Slave will happily register 0 seconds behind master even after it’s not getting any event for a long time. In some case it might be legitimate that there’s really no event on master and a lot of time it’s because we have network issue between master and slave, the connection simply got dropped. The solution is update a heartbeat table and check the difference manually. MySQL 5.5 has built in Heartbeat mechanism which definitely a big plus for MYSQL replication.