MySQL High Availability : 1. Replication 2. Cluster 3. DRBD (Distributed Replicated Block Device) and Heartbeat (3rd party solution) Replication: - Simple Case: Single master - Multiple Slaves - Asynchronous - No guarantee about status of replication of data Advantages: - Can be done over slower links - Can spread load across multiple slaves for heavy read env - Can be used for backup solutions : * online (slave has active data) * offline (can bring down slave, take file system backup and restart to catchup with master) Disadvantages: - Writes can not be spread. You can use multi-master config to address this. - No write guarantee and availability of data at slave. This may be needed for an web app. MySQL Cluster: - Synchronous - Can read/write from any node and info will be distributed to other nodes. - Max 255 nodes per cluster Advantages: - Gurantee of data copy after transaction commit. (Transaction may be aborted during failure) - Provides automatic failover between nodes. - Recommended for very high availability: e.g. Telecom and Banks Disadvantages: - Supported only on limited platforms (which of them ?) Tip: You can replicate one cluster to another cluster across geographical locations DRBD : Distributed Replicated Block Device - DRBD is distributed storage system on Linux. - Uses DRBD logical block device (which uses primary/secondary physical device blocks across network) - free GNU software - similar to RAID but software based - implemented as kernel level and user level software. - defines its own cluster management process to promote a secondary node to primary node during failure. - often used with Heartbeat cluster manager DRBD with MySQL: - Uses virtual block device replicated from primary server to secondary server - Only provides synchronous write replication of datablocks. Secondary server not available for read while datablocks are being replicated. Hence there is no load balancing. Memcached: - simple yet highly-scalable key-based cache that stores data and objects - Application should look into memcached cache first. Load from MySQL only if needed Heartbeat: - software solution on Linux. Use it for automatic failover. - Typically used with MySQL replication or DRBD solution - Can support automated IP failover for replication or DRBD Note: MySQL not yet integrated with SunCluster solution Tip: Use dedicated network interfaces for replication nodes (for DRBD or otherwise) like cross-over ethernet cable for 1-1 connection between servers. Tip: Linux Supports bonded ethernet network interfaces. There is one virtual bond0 network device to which multiple real devices (eth1, eth2, eth3) are attached. If one fails, it uses other ethernet device to communicate with other node. Tip: MySQL Load Balancer : Based on MySQL Proxy. Initial release load balances the reads only. When used with replication, it internally maintains info about master/slave and removes the slaves from available list which lag behind the master too much. This is proxy + monitor plugin which periodically runs commands on slaves to findout the replication status. Replication: There are two core types of replication format: - Statement Based Replication (SBR), which replicates entire SQL statements - Row Based Replication (RBR), which replicates only the changed rows. - Mixed Based Replication (MBR), which is the default mode within MySQL 5.1.12 and later. Tip: mysqlbinlog utility can be used to find out position on binary log. Tip: stop slave; DROP DATABASE my_database; load data from master; start slave Example Replication Setup: Here is what I had to do to set up replication with LOAD DATA FROM MASTER: 1. Edit the my.cnf file on the slave and on the master server: master my.cnf: [mysqld] server-id = 1 log-bin slave my.cnf: [mysqld] server-id = 2 master-host = master-host.net master-user = repl master-password = repl_pass replicate-do-db = database1 replicate-do-db = database2 log-warnings Following options are only applicable to master: --binlog-do-db=db1 --binlog-ignore-db=db1 --binlog_format = row|statement|mixed Following only to slave: --replicate-do-db=db1 --replicate-ignore-db=db1 --replicate-do-table=db1.t1 --replicate-do-table=db1.t2 --replicate-rewrite-db=src_db->renamed_db Effect of Options: Option --binlog-do-db (for master) and --replicate-do-db (for slave) : Stmt based logging : The current database (as selected by USE) and the Update table should match for this to do binlog/replication. Row based logging: Default database has no effect. Only table updates belonging to specified database, will be logged/replicated. Option --binlog-ignore-db and --replicate-ignore-db : Stmt based logging: Do not replicate current stmt if current db (as selected by USE) matches the specified db. Row based logging: Default db has no effect. Don't replicate tables belonging to specified database. Option --replicate-do-table=db_name.tbl_name Replicates only specified tables. Useful even in cross-db updates. Option --replicate-rewrite-db=from_name->to_name Useful only in stmt based logging and single DB updates. Changes the default database (selected by USE) to the specified name. Name translation is done first before evaluating replicate-* rules. Other master specific options: auto_increment_increment, auto_increment_offset Other slave specific startup options: --log-slave-updates --log-warnings[=level] (Default:1) disable this by: --skip-log-warnings --read-only --relay-log=file_name The default basename is host_name-relay-bin. --relay-log-index=file_name Filtering Rules: Rules are biased towards accepting the update at slave. For efficiency reasons, the first match of db level options will be accepted first. This means, you can't say "I want all of this db, except these tables." At slave: - Translate as per --replicate-rewrite-db option if specified. - Match replicate-do-db ? Do the update & exit; - Match replicate-ignore-db? Ignore update & exit; - Match replicate-do-table ? Do & exit; - Match replicate-ignore-table ? Ignore & exit; - Match replicate-do-table ? Do & exit; - Match replicate-wild-do-table ? Do & exit; - Match replicate-wild-ignore-table ? Ignore & exit; - If more tables involved in current stmt, Go back to table level checking for other tables. - If there is no match so far, we should Do & Exit -- However -- If there were replicate-do[-wild]-table option specified, then replication is restricted to only those tables; So Ignore & Exit; end 2. Restart both the slave and the master. 3. On the master do: GRANT SUPER,REPLICATION CLIENT,REPLICATION SLAVE,RELOAD ON *.* TO repl@"%" IDENTIFIED BY 'repl_pass'; 4. On the slave do: LOAD DATA FROM MASTER; start slave; // or slave start ??? The slave should now download the databases specified with replicate-do-db in the slaves my.cnf, and be replicating new data. FAQ: How to check if incident event writtent to binlog ? [ SHOW BINLOG EVENTS [ FROM 0|73|etc] ; ] How to use skip_gap_event option ?; To skip gap event at master: RESTORE FROM 'image_file_name' SKIP_GAP_EVENT; To skip gap event at slave: SET GLOBAL sql_slave_skip_counter=1; START SLAVE; Tip: For the greatest durability and consistency in a replication setup using InnoDB with transactions: use innodb_flush_log_at_trx_commit=1 and sync_binlog=1 in the master my.cnf file. What about replication across storage engines ? How to replay the binlog ? Tip: Multi-level replication: You do not have to enable binary logging on the slave for replication to be enabled. However, if you enable binary logging on the slave then you can use the binary log for data backups and crash recovery on the slave, and also use the slave as part of a more complex replication topology (for example, where the slave acts as a master to other slaves). Get Master Replication Info: mysql> FLUSH TABLES WITH READ LOCK; For InnoDB tables, note that FLUSH TABLES WITH READ LOCK also blocks COMMIT operations. Leave the client running otherwise locks will be released. # Use the SHOW MASTER STATUS statement to determine the current binary log filename and offset on the master: mysql > SHOW MASTER STATUS; +---------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +---------------+----------+--------------+------------------+ | mysql-bin.003 | 73 | test | manual,mysql | +---------------+----------+--------------+------------------+ [ SHOW BINLOG EVENTS [ FROM 0|73|etc] ; ] # Snapshot using mysqldump: shell> mysqldump --all-databases --lock-all-tables >dbdump.db An alternative to using a bare dump, is to use the --master-data option, which automatically appends the CHANGE MASTER statement required on the slave to start the replication process. shell> mysqldump --all-databases --master-data >dbdump.db Creating a Data Snapshot Using Raw Data Files If you are using InnoDB tables, you should use the InnoDB Hot Backup tool (nonfree) to obtain a consistent snapshot. To create a raw data snapshot of MyISAM tables you can use standard copy tools such as cp For InnoDB, all tables in all databases are stored in a single file unless you have the innodb_file_per_table option enabled You may want to specifically exclude the following files from your archive: * Files relating to the mysql database. * The master.info file. * The master's binary log files. * Any relay log files. To get the most consistent results with a raw data snapshot you should shut down the server during the process, as below: 1. Acquire a read lock and get the master's status. 2. In a separate session, shut down the MySQL server: shell> mysqladmin shutdown 3. Make a copy of the MySQL data files. Examples are shown below for common ways to do this - you need to choose only one of them: shell> tar cf /tmp/db.tar ./data shell> zip -r /tmp/db.zip ./data shell> rsync --recursive ./data /tmp/dbdata 4. Start up the MySQL instance on the master. Tip: Raw data is useless without the master status (position id) if this needs to be reused to replay later binlog events. Note: If you are not using innodb, you need not shutdown server; just get read locks, master status; then copy data files; then free the lock from client: mysql> UNLOCK TABLES; Once you have created the archive or copy of the database, you will need to copy the files to each slave before starting the slave replication process. Introducing Additional Slaves to an Existing Replication Environment: - You can tar/untar an existing slave and start a new slave. - You should use explicit options to avoid problems: --relay-log (existing_slave_hostname-relay-bin) --relay-index-log (existing_slave_hostname-relay-bin.index) - copy master.info , relay.info from existing to new slave. - define new server-id for new slave (in my.cnf) - start slave Setting the Master Configuration on the Slave mysql> CHANGE MASTER TO -> MASTER_HOST='master_host_name', -> MASTER_USER='replication_user_name', -> MASTER_PASSWORD='replication_password', -> MASTER_LOG_FILE='recorded_log_file_name', -> MASTER_LOG_POS=recorded_log_position; Above command creates/updates master.info file at slave. Note Replication cannot use Unix socket files. You must be able to connect to the master MySQL server using TCP/IP. Tip: Until 5.1.4 (and earlier) only statement based replication was done. From 5.1.8 (and later) mixed-format logging can be enabled. From 5.1.12, the mixed-format logging is the default. Starting with MySQL 5.1.20, the binary logging format is determined in part by the storage engine being used and the statement being executed. Once the determination of the possible logging formats required by the statement is complete it is compared to the current BINLOG_FORMAT setting. Tip: The master.info file parameters always override the my.cnf values. You should use: "change master to ..." command to change the master.info contents. Checking Replication Status mysql> SHOW SLAVE STATUS\G On the master, you can check the status of slaves by examining the list of running processes. Slaves execute the Binlog Dump command: mysql> SHOW PROCESSLIST \G; *************************** 4. row *************************** Id: 10 User: root Host: slave1:58371 db: NULL Command: Binlog Dump Time: 777 State: Has sent all binlog to slave; waiting for binlog to be updated Info: NULL f you have used the --report-host option, then the SHOW SLAVE HOSTS statement will show basic information about connected slaves: mysql> SHOW SLAVE HOSTS; +-----------+--------+------+-------------------+-----------+ | Server_id | Host | Port | Rpl_recovery_rank | Master_id | +-----------+--------+------+-------------------+-----------+ | 10 | slave1 | 3306 | 0 | 1 | +-----------+--------+------+-------------------+-----------+ 1 row in set (0.00 sec) mysql> STOP SLAVE IO_THREAD; If necessary, you can start either the IO_THREAD or SQL_THREAD threads individually. How to do Point-in-time recovery: Restore the previous night (or earlier snapshot) backup. Then replay binlog upto specified point in time: shell> mysqlbinlog --stop-date="2005-04-20 9:59:59" \ /var/log/mysql/bin.123456 | mysql -u root -p shell> mysqlbinlog /var/log/mysql/bin.123456 > /tmp/mysql_restore.sql You can also use log positions (--start-position and --stop-position) to exactly replay. First you can examine the log positions by dumping them in mysql_restore.sql file and then run: shell> mysqlbinlog --start-position="368315" /var/log/mysql/bin.123456 \ | mysql -u root -p Tip: The binlog stores begin; commit tx info as well. Tip: setting sync_binlog to 1 ensures the binlog is sync'ed to disk on each write. Tip: you should set --innodb_support_xa to 1 to keep innodb and binlog in sync. only at tx commit, the binlog writes it's events to it's log. After restart, if the server finds the binlog is shorter than expected (as compared to Innodb), then it issues error. Replication should be restarted with fresh copy of master data in that case. Tip: The replication master can issue "Gap" event to tell the slave that manual intervention may be required. In that case, SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1; START SLAVE; will continue replication at slave. Don't know all finer details. Tip: binlog need not be enabled on slaves. But must be enabled on master Cluster Replication: - Row based logging must be enabled. set binlog_format=ROW; - SQL node storage engine receives all data node change events and able to populate it's binlog to serve to it's clients. - mysql.ndb_apply_status table must be replicated if you replicate between clusters. This is the table which makes sure the clusters are really in sync. Tip: It is possible to setup 2 replication channels. When 1 replication channel fails, the other one should be manually started (or from application). Tip: In general all data is in memory for clusters. Now it is possible to create table with data in disk (from 5.1.6) using some tablespaces, etc. However all indexes are still in memory. Tip: NDB Cluster supports only the READ COMMITTED transaction isolation level. Tip: LOAD DATA FROM MASTER is not supported in MySQL Cluster. Tip: create index t1_b_idx on t1(b); b is integer. This creates "ordered" index. Tip: MySQL Cluster cannot produce a binlog for tables having BLOB columns but no primary key. Question : Cluster Transaction does it get sync'ed to disk ? No durable commits on disk. Commits are replicated, but there is no guarantee that logs are flushed to disk on commit. All machines used in the cluster must have the same architecture. That is, all machines hosting nodes must be either big-endian or little-endian, and you cannot use a mixture of both. Backup Strategies For Cluster and Otherwise: - Logical Backup : works for all engines. saved in text. - Engine specific solution: inno backup (hot backup utility), ndb backup? * Online versus offline backups * Snapshot backups. e.g. Veritas, LVM (linux logical volume manager) * Full versus incremental backups: Full backup + server's binlog flush logs; cerates new bin log. * Point-in-time recovery: restore backup + selectively apply binlog * Vendor based solutions: ibbackup for innodb Cluster uses cluster utilities for creating/restoring backup Local/Remote Backup: - mysqldump logical backup, can be local or remote. This is online non-blocking backup. (respects isolation level) - mysqlhotcopy - perl script - local copy only for MyISAM and ARCHIVE Online Backup with MySQL. What is it ? Tip: The maximum number of data nodes is 48. The total maximum number of nodes in a MySQL Cluster is 256 This number includes all SQL nodes (MySQL Servers), API nodes (applications accessing the cluster other than MySQL servers), data nodes, and management servers. Tip: Circular replication is supported for Cluster replication starting 5.1.18 Events are properly tagged wrongly tagged with originating server id hence the events are not reapplied to same server again. #### Beginning with MySQL 5.1.24, you should execute the following statement before starting circular replication: mysql> SET GLOBAL SLAVE_EXEC_MODE = 'IDEMPOTENT'; ### Cluster replication and primary keys. In MySQL 5.1.6, only those NDB tables having explicit primary keys could be replicated. This limitation was lifted in MySQL 5.1.7. However, in the event of a node failure, errors in replication of NDB tables without primary keys can still occur, due to the possibility of duplicate rows being inserted in such cases. For this reason, it is highly recommended that all NDB tables being replicated have primary keys. Tip: How to use Blackhole engine for replication ? configure blackhole engine as slave and get binlog from master; Then configure other slaves which pull selected DBs from blackhole engine. This way, blackhole engine acts like relay server. Question: What is this auto increment ? it is column type. Tip: How do you compare Replication and Cluster ? Think of high web serving application for Replication. Think of telecom and bank and HA OLTP load for cluster. Q Ap: What is the best method to load data into the slave, use mysqldump or use "load data from master" LOAD DATA FROM MASTER is deprecated, the best solution is to use mysqldump Q from Manuel: With master/slave replication you mentioned "automatic failover", how can you do that? You can use clustering and HA software such as Linux HA to control the activity of the master. The software can switch over to the slave in case of issues. The client application will get access to the new master using a virtual IP address. Setting up cluster: Node1 : mgmt server Node2: datanode Node3: datanode SQL node can be started anywhere (does not matter). The config.ini : [NDBD DEFAULT] NoOfReplicas=2 [MYSQLD DEFAULT] [NDB_MGMD DEFAULT] [TCP DEFAULT] # default port number for mgmt node is 2202 #portnumber=2202 # Managment Server [NDB_MGMD] # the IP of THIS SERVER HostName=urd03 DataDir=/home/thava/mysql/repo/5/install/cluster # default port number for data node is 1186 # Storage Engines [NDBD] # the IP of the FIRST SERVER (STORAGE NODE) HostName=urd04 DataDir=/home/thava/mysql/repo/5/install/node1 [NDBD] # the IP of the SECOND SERVER (STORAGE NODE) HostName=urd05 DataDir=/home/thava/mysql/repo/5/install/node2 # 2 MySQL Clients # I personally leave this blank to allow rapid changes of the mysql clients; # you can enter the hostnames of the above two servers here. I suggest you dont. [MYSQLD] The my.cnf file: [client] port=3366 socket=/tmp/mysqlthava.sock [mysqld] ndbcluster ndb-connectstring=urd03 port=3366 socket=/tmp/mysqlthava.sock log-bin [mysql_cluster] # the IP of the MANAGMENT (THIRD) SERVER ndb-connectstring=urd03 Do the following: ndb_mgmd -f $PWD/config.ini # This starts mgmt server at node 1 # start data nodes from node2 and node3 ndbd --connect-string="urd03:1186" --initial # next time no --initial # start mysqld mysqld_safe & shell> ndb_mgm show Cluster Rolling Upgrade/Downgrade - supported. You should do it manually one node at a time: - First upgrade/downgrade mgmt node - All data nodes - SQL nodes mysql> show engine ndb status; # displays ndb status It is important to keep in mind that the MySQL grant tables use the MyISAM storage engine. In other words, changes in users and their privileges do not automatically propagate between SQL nodes. It should be done manually. Replication: Step1: Bringup master with following my.cnf: [client] port=3366 socket=/tmp/mysql-thava.sock [mysqld] port=3366 socket=/tmp/mysql-thava.sock datadir=/home/thava/mysql/repo/5/install/hostA-data log-bin=mysql-bin server-id=1 sync_binlog=1 # innodb_flush_log_at_trx_commit=1 Step 2: If you want to create separate user for replication , do this: mysql> GRANT REPLICATION SLAVE, RELOAD, SUPER, REPLICATION CLIENT ON *.* TO 'slave'@'%' IDENTIFIED BY 'slavepass'; FLUSH PRIVILEGES; (or you can use root account) Step 3: mysql> flush tables with read lock ; Query OK, 0 rows affected (0.01 sec) mysql> show master status; +------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +------------------+----------+--------------+------------------+ | mysql-bin.000002 | 1063| | | +------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) mysql> unlock tables; Step 4: Bringup Slave with different server-id in my.cnf file: (minimize slave options in .cnf file. Use change master command instead): [mysqld] master-host=hostA #These options apply to my.cnf: LOG-SLAVE-UPDATES, etc. stop slave; # if needed. CHANGE MASTER TO MASTER_HOST='urd01', MASTER_USER='slave', MASTER_PASSWORD='slavepass', MASTER_PORT=3366; Load data from master; # cluster tables won't be loaded. CHANGE MASTER TO MASTER_HOST='urd01', MASTER_USER='slave', MASTER_PASSWORD='slavepass', MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=1063, MASTER_PORT=3366; start slave; # now this will catchup from now on. Step 5: Partitioning : - Is a layer on top of most storage engines - Have restrictions on type of partitioning supported depending on SE - CSV, MERGE engines don't support partitioning - NDBCLUSTER supports only KEY based partitioning. In general, it is not possible to control partitioning style in cluster tables. - In 5.1, all partitions of a table should belong to single SE Example: CREATE TABLE ti (id INT, amount DECIMAL(7,2), tr_date DATE) ENGINE=INNODB PARTITION BY HASH( MONTH(tr_date) ) PARTITIONS 6; PARTITION BY KEY(joined) PARTITIONS 6; PARTITION BY RANGE( YEAR(joined) ) ( PARTITION p0 VALUES LESS THAN (1960), PARTITION p1 VALUES LESS THAN (1970), PARTITION p2 VALUES LESS THAN (1980), PARTITION p3 VALUES LESS THAN (1990), PARTITION p4 VALUES LESS THAN MAXVALUE ); PARTITION BY LIST(val)( PARTITION mypart VALUES IN (1,3,5), PARTITION anotherPart VALUES IN (2,4,6) ); ; 17.2.1. RANGE Partitioning 17.2.2. LIST Partitioning 17.2.3. HASH Partitioning 17.2.4. KEY Partitioning 17.2.5. Subpartitioning 17.2.6. How MySQL Partitioning Handles NULL Values