posted by Sakila The Librarian
on
Mon 07 Sep 2009 12:11 UTC
Tags:
Over-the-Top Tales from the Trenches.Motto: Bringing order to the chaos of every day DBA life.So you have got your nice MySQL Master-Slave replication pair setup. Everything is sweet, then the master dies/restarts or you have a slightly extended network outage.Your monitoring software (in our case Avail) fires off a page and you are rudely interrupted from reading the Pythian blog.These real world interruptions, what can I say... it pays the bills.Anyway being the rounded DBA or enlightened DBA as Babette would say, you are capable of handling any type of database. You log into the machine and check out why the slave threw an error or if your monitoring is slow, why the slave is lagging by 2 hours.You run SHOW SLAVE STATUS\\Gmysql> show slave status \\G*************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: x.x.x.x Master_User: rep Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000480 Read_Master_Log_Pos: 690470773 Relay_Log_File: db2-relay-bin.000028 Relay_Log_Pos: 683977007 Relay_Master_Log_File: mysql-bin.000480 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: avail,avail Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table:Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted \t\t(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted \t\t(you can check this by running 'mysqlbinlog' on the relay log), a network problem, \t\tor a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, \t\tyou will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Skip_Counter: 0 Exec_Master_Log_Pos: 126 Relay_Log_Space: 690471192 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL1 row in set (0.00 sec)The bits which are important are in red.Now the normal procedure for this kind of error is to get the Exec_Master_Log_Pos and tell the slave to restart from there. Using a command like: