Backup Pointers : http://forge.mysql.com/wiki/OnlineBackup#Architecture_of_online_backup_system (from Rafal) http://forge.mysql.com/wiki/MySQL_Internals_MySQL_Backup (list of WLs etc) MySQL Backup Reference Manual: http://dev.mysql.com/doc/mysql-backup/en/index.html Replication & Backup Team: https://inside.mysql.com/wiki/Replication Misc Extra : Q3 Goals: https://intranet.mysql.com/~lthalmann/ltplan.pl?action=report_2009_q3 FAQ for Replication & Backup Team: https://intranet.mysql.com/~lthalmann/ltplan.pl?file=faq.org Team Goals: https://intranet.mysql.com/~lthalmann/ltplan.pl?file=projects.org§ion=Team-Goals-2009-Q3 Sprint Proposal: https://intranet.mysql.com/%7Elthalmann/ltplan/sprint_proposal.php Pushbuild 2 pushes status: http://pb2.norway.sun.com/web.py?branch=mysql-6.0-backup&template=show_pushes https://inside.mysql.com/wiki/BackupQualityReport Notes from developer manual: Logical backup : select * from yourtable into outfile 'mytable.txt' ; To restore: load data infile 'mytable.txt'; mysqlimport program is just a convenience program which reads my.cnf and runs mysql client and SQL command 'load data infile ... '. Physical Backup: The way you do physical backup highly depends on storage engine: - For Memory storage engine, better to go with logical backup. - For MyISAM storage engine, use: cp files (offline) or use mysqlhotcopy (perl script to backup db or table. Done online after locking server. supports myisam and archive engines only) For restore, copy files back to data directory. - For InnoDB, use ibbackup tool. (not bundled by default, commercial); use ibback for restore - For NDB, use START BACKUP command ; ndb_restore for restore.??? * Full versus incremental backups Server's binary log can be used for incremental backups. Even if MyISAM does not support ACID transactions, the logs can still be used for incremental backup and recovery (?) * Point-in-time recovery * Backup scheduling, compression, and encryption * Table maintenance How to do incremental backup ? - Enable binary logging by starting server with --log-bin option. What are the different types of logs in MySQL ? Binary Log, General Querylog, Slow Query Log, Error log Note: To display binlog use mysqlbinlog show variables like '%log%' ; mysqldump --tab=/path/to/some/dir --opt db_name mysqldump db_name > backup-file.sql You can read the dump file back into the server like this: shell> mysql db_name < backup-file.sql Or like this: shell> mysql -e "source /path-to-backup/backup-file.sql" db_name mysqlhotcopy db_name /path/to/some/dir (supports myisam and archive engines only) How to dump schema only (not data) ? mysqldump --no-data // mysqldump outputs to stdout in text format. User Documentation Reference Manual (notes): http://dev.mysql.com/doc/mysql-backup/en/index.html -- The BACKUP DATABASE and RESTORE SQL statements -- mysqlbackup program displays info about backup image file: shell> mysqlbackup /usr/local/mysql/data/test.bak Image path: '/usr/local/mysql/data/test.bak' Image size: 12305 bytes Image compression: none Image version: 1 Creation time: 2009-09-08 18:08:43 UTC Server version: 5.4.4 (5.4.4-beta) Server byte order: little-endian -- BACKUP {DATABASE | SCHEMA} { * | db_name [, db_name] ... } TO 'image_file_name' [WITH COMPRESSION [COMPRESSION_ALGORITHM [=] algorithm_name]] BACKUP DATABASE * TO '/tmp/all.backup'; BACKUP DATABASE world TO '/tmp/world.backup'; BACKUP DATABASE db1, db2 TO '/tmp/db1-db2.backup'; mysql> BACKUP DATABASE test TO '/tmp/world.backup'; +-----------+ | backup_id | +-----------+ | 8 | +-----------+ Note: backup_id is some arbitrary id so that you can lookup the backup_history table which contains an entry with backup_id column with same value. mysql> RESTORE FROM '/tmp/mybackupfile'; -- BACKUP DATABASE currently does not back up the mysql or INFORMATION_SCHEMA databases. -- BACKUP DATABASE saves privileges for the backed-up databases in the backup image file. The privileges are stored in the form of GRANT statements. -- Only privileges at the database level or below (table, column, routine) are saved. Global privileges are not saved because they are not specific to the databases included in the backup. -- GRANT SELECT, INSERT ON db1.* to 'someuser'@'localhost' The privileges specified by this statement are restored if the 'someuser'@'localhost' account exists. -- since "mysql" database is not backed up by backup command, you can backup mysql using mysqldump cmd: shell> mysqldump --databases mysql > path/mysql-db.sql mysql> BACKUP DATABASE * TO 'path/other-dbs.bak'; shell> mysql -u root -p < path/mysql-db.sql mysql> RESTORE FROM 'path/other-dbs.bak'; -- The backup system chooses from among the backup engines available to it: * There is a default blocking backup engine. * A consistent-snapshot engine implements the same kind of backup as that made by mysqldump --single-transaction. A backup engine is the one which provides backup/restore drivers for storage engine. -- A backup operation begins at time t1 and ends at time t2, producing a backup image that contains the backup state (database state) at time t, where t1 < t < t2. The time t is called the validity point of the backup image. It represents the time when all storage engines are synchronized for the backup. Restoring this image restores the state to be the same as it was at time t. Consistency of the backup means that these constraints must be true: * Data from transactional tables is included only for committed transactions. * Data from nontransactional tables is included only for completed statements. * Referential integrity is maintained between all backed-up tables within a given backup image. -- When a backup or restore operation is in progress, it is not allowable to modify the structure of database objects. Following statements are blocked: DROP DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TRIGGER/INDEX/ USER/TABLESPACE CREATE DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TRIGGER/INDEX ALTER DATABASE/TABLE/VIEW/FUNCTION/PROCEDURE/EVENT/TABLESPACE RENAME TABLE/USER GRANT/REVOKE TRUNCATE/OPTIMIZE/REPAIR TABLE Currently, all instances of statements that change metadata are blocked, even for database or table objects that are not included in the backup but this will be improved. -- The server maintains backup_history and backup_progress tables in the mysql database that contain metadata indicating backup status and progress -- You can change log_backup_output system variable to 'TABLE,FILE' to enable logging to both: (or invoke server with --log_backup_output=TABLE,FILE option) mysql> show variables like '%backup%' ; +--------------------------+---------------------------------------------------------+ | Variable_name | Value | +--------------------------+---------------------------------------------------------+ | backup_history_log | ON | | backup_history_log_file | /home/thava/repo/backup/install/var/backup_history.log | | backup_progress_log | ON | | backup_progress_log_file | /home/thava/repo/backup/install/var/backup_progress.log | | backup_wait_timeout | 50 | | backupdir | /home/thava/repo/backup/install/var | | log_backup_output | TABLE | | secure_backup_file_priv | | +--------------------------+---------------------------------------------------------+ 8 rows in set (0.01 sec) mysql> set log_backup_output='TABLE,FILE'; The backup_history table has this structure: CREATE TABLE backup_history ( backup_id BIGINT UNSIGNED NOT NULL, process_id INT UNSIGNED NOT NULL, binlog_start_pos INT UNSIGNED NOT NULL, binlog_file CHAR(64) NOT NULL, backup_state ENUM('complete', 'starting', 'validity point', 'running', 'error', 'cancel') NOT NULL, operation ENUM('backup', 'restore') NOT NULL, error_num INT NOT NULL, num_objects INT UNSIGNED NOT NULL, total_bytes BIGINT UNSIGNED NOT NULL, validity_point_time DATETIME NOT NULL, start_time DATETIME NOT NULL, stop_time DATETIME NOT NULL, host_or_server_name CHAR (30) NOT NULL, username CHAR (30) NOT NULL, backup_file CHAR (100) NOT NULL, backup_file_path VARCHAR (512) NOT NULL, user_comment VARCHAR (200) NOT NULL, command VARCHAR (512) NOT NULL, engines VARCHAR (100) NOT NULL ) ENGINE=CSV CHARSET=utf8; -- FLUSH Syntax FLUSH [NO_WRITE_TO_BINLOG | LOCAL] flush_option [, flush_option] ... The FLUSH statement has the following MySQL Backup-related flush_option value: BACKUP LOGS Example: flush backup logs; (used just to sync outstanding writes to backup log file.) Closes and reopens the backup log files. This option was added in MySQL 5.4.4. -- PURGE BACKUP LOGS [ TO id | BEFORE datetime_expr ] PURGE BACKUP LOGS TO 143; // clean entries in backup_log table whose backup_id is less than 143 PURGE BACKUP LOGS BEFORE '2009-08-31 14:32:19'; // clean entries before this time. -- RESTORE Syntax RESTORE FROM 'image_file_name' [option] ... option = {OVERWRITE | SKIP_GAP_EVENT} -- Server options: --log_backup_output=TABLE,FILE (or NONE) --mysql-backup[={0|1}] /* set to 1 to enable backup/restore commands to work */ --secure-backup-file-priv=path /* backup image file must reside in this dir for backup/restore to work */ -- Session system variables: -- ----- Notes from sql/backup/kernel.cc : Structure of the MySQL Backup System | Online Backup Module | (1) | +---------------+ +------------------------+ Server ---> | Backup Kernel | <--> | Backup/Restore drivers | | +---------------+ (3) +------------------------+ Core <--- | Stream layer | (2) | +---------------+ | (4) ^ (5) | | | +--------|---|---------------------------- | v | Backup Stream Library | Backup stream library is autonomous so that external applications can link against it to be able to read backup images created by the system. The format of backup images is described @ref stream_format "here". The components of the system communicate with each other using well defined interfaces: -# @ref KernelAPI "Backup Kernel API" -# Object Services API and Backup Log API -# Backup Driver API -# @ref streamlib "Backup Stream Library API" -# Backup Stream Library Callback API Backup kernel API. This is how backup is performed using the context object: Backup_restore_ctx context(thd); // create context instance Backup_info *info= context.prepare_for_backup(location, orig_loc); // prepare for backup // select objects to backup info->add_all_dbs(); or info->add_dbs(); info->close(); // indicate that selection is done context.do_backup(); // perform backup context.close(); // explicit clean-up } // if code jumps here, context destructor will do the clean-up automatically Similar code will be used for restore (bit simpler as we don't support selective restores yet): Backup_restore_ctx context(thd); // create context instance Restore_info *info= context.prepare_for_restore(location, orig_loc); // prepare for restore context.do_restore(); // perform restore context.close(); // explicit clean-up } // if code jumps here, context destructor will do the clean-up automatically ------ Note: -- backup uses it's own backup::Mem_allocator class. -- Heavy use of String, LEX_STRING -- obs is a backup service namespace mostly defined in si_objects.cc Notes on kernel.cc functions: Backup_restore_ctx::Backup_restore_ctx(thd) // constructor. { Save thd->options to m_thd_options; MYSQL_BACKUP_LOG *log_handler = logger.get_backup_history_log_file_handler(); } // Prepare for backup and restore operations Notes on Backup_restore_ctx class: Backup_restore_ctx::prepare(backupdir, location string) { if backup already running give error; mem_alloc = new Mem_allocator(); obs::bml_get(m_thd); // Get backup meta lock which blocks all DDL operations. } // Prepare actions specific to backup only Backup_info* Backup_restore_ctx::Backup_restore_ctx::prepare_for_backup(...) { Logger::init(); Report starting time; prepare(); // Do common prepare defined above. Output_stream *s= new Output_stream(...); // Create output stream // create all info about what you are backing up. This is also called "backup catalogue" Backup_info *info= new Backup_info(*this, m_thd); // Remember if binlog is turned on. We will store validity point's binlog pos in image. info->flags |= is bin log turned on ? ; info->save_start_time( now ); } Restore_info* Backup_restore_ctx::prepare_for_restore(...) { // Block replication from starting. obs::block_replication(TRUE, "RESTORE"); // Refuse to run restore on slave; if (obs::is_slave()) Give error; Report Start time; prepare(); // Do common prepare actions defined earlier // Open input stream Input_stream *s= new Input_stream(*this, &m_path); // create all info about what you are restoring. This is also called "restore catalogue" Restore_info *info= new Restore_info(*this, m_thd); // reports errors m_catalog= info; info->save_start_time(when); // Read header from input stream read_header(*info, *s); // Move to next chunk from input stream ret= s->next_chunk(); // Read remaining catalog info (after header) from input stream. ret= read_catalog(*info, *s); // Move to next chunk ret= s->next_chunk(); // Disable new slave connections for replication obs::disable_slave_connections(); // Write "RESTORE_EVENT" in binlog (also called "incident" or GAP event), // so that existing slaves stop replication after hitting this. obs::write_incident_event(); // turn off binlog during restore obs::engage_binlog(FALSE); } ------ Backup command pseudo code : sql_parse : Lex->db_list => contains the list of databases. Lex->value_list => contains list of items ??? mysql_execute_command( "backup database test to '/home/thava/tmp/test.bak2"); execute_backup_command() { trans_commit_implicit(); // Implicit tx. close_thread_tables(); // Release metadata locks. If not in locked tables mode, thd->mdl_context.release_all_locks(); // ??? bml_enter(thd); // Check (and acquire?) backup meta lock. thd->backup_in_progress = sql_command; // backup or restore. Backup_restore_ctx context(thd); // Initialize backup history log file handler. From backup/kernel.cc, To report error do following : DBUG_RETURN(send_error(context, ER_BACKUP_CONTEXT_CREATE)); // normal error fatal_error(report_killed() ? ER_QUERY_INTERRUPTED : ER_BACKUP_LOGGER_INIT); // fatal error m_log.report_error(ER_BACKUP_CANNOT_INCLUDE_DB, db_name.c_ptr()); Backup_info *info= context.prepare_for_backup(...) { If another backup/restore operation running, report error; Check if backup location is valid. mem_alloc = new Mem_allocator(); // Separate mem allocator for backup obs::bml_get(m_thd); // Get backup meta lock (block DDL) si_objects.cc Output_stream *s = new Output_stream*(...); Backup_info *info= new Backup_info(...) { Create linked list of backup engines: default, ConsistentSnapshot, nodata ; // used to select backup engine later. Mark and remember server binlog is enabled-- we will store vp's binlog pos in image. Mark the start time. } } // End prepare for backup. info->add_dbs() { Get the list of databases (specified) and add it to the backup info. For each db: // Execute query using run_service_interface_sql() to get info about DB from information schema obs::Obj *obj= get_database_stub(thd, &db_name); check_db_existence() { run query: "show create database test" ; & confirm } Db *db= add_db(obj) { check_access(m_thd, BACKUP_ACL, ...); // Check for BACKUP privileges for user. obs::check_user_access(m_thd, name) { // Check if user has access to all DB objects // such as table, trigger, event and routine Get number of (db) objects by running select query without privilege elevations and with privilege elevation. If the results differ, then user lacks privilege. } Db *db= Image_info::add_db(obs:Obj*, pos); } add_db_items(*db){ Obj_iterator *it= get_db_tables(m_thd, &db.name()) { // si_objects.cc: // run select query and get info. create_row_set_iterator(thd, ..query...); } For each table t { add_table(db, obj){ backup::Snapshot_info *snap= find_backup_engine(t){ Get storage engine reference for this table. // See backup_info.cc: get_storage_engine(). // It opens the table as a temp table to get SE info !!! if partitioned table, consider SE of underlying table. native drivers don't support partitioning. Get handlerton from SE reference. hton->get_backup_engine provides native backup engine. Go through list of backup engines and select appropriate one. } Image_info::add_table(); // Backup_info uses Image_info. } obj= find_tablespace_for_table(); if there is tablespace for this table, add table space. } // End for each table Add all Stored procedures & functions for the database: get_db_stored_procedures() && add_objects(db, BSTREAM_IT_SPROC,...); get_db_stored_functions() && add_objects(db, BSTREAM_IT_SFUNC,...); Add all views defined in this database. Add all DB Events in this database. Add all DB Triggers in this database. Add all Privileges information (using Information_schema.schema_privileges) } // End add_db_items() }// End Backup_info::add_dbs if (info->db_count == 0) context.report_error(ER_BACKUP_NOTHING_TO_BACKUP); context.do_backup(){ write_preamble(info, output_stream){ bstream_wr_header(); // flags|start_time|total_snapshots|server_version|snap_shot_descriptors // snapshot_descriptor is one per storage engine, includes total no of tables, etc. info. bstream_wr_catalogue(); // List of dbnames, table names, etc. bstream_wr_metadata() { For each global db item (including databases) do { Single entry has following format : item_entry: type flags catalog_pos [extra_data(unused)] [object_metadata] object_metadata : create_stmt_string_for_table, etc. catalog_pos : The pos index(offset?) with in catalogue header. Write : type, flags, catalog_pos ; [ bstream_wr_item_def() => bstream_wr_meta_item() ] Run query: show create database $dbname and write it to output stream; First item database (global item) : create database test /* charset latin */; Second item table t: "use test; create table t (i integer) engine=MyISAM ..." Item privileges are currently not written. } } // End bstream_wr_metadata() } // End write preamble write_table_data(){ create Backup_pump() for "MyISAM" snapshot snap.get_backup_driver() => Returns backup driver as per backup engine API. (See myisam_backup_engine.cc) Add this pump to scheduler : sch.add(p) { p->begin() { // begin pumping from Backup::begin() Allocate hash of tables, memory, etc. Unless Env variable MYISAM_BACKUP_NO_INDEX is set to 1, mark flag to backup index also. Set state to dumping data index files; } } Loop: Scheduler::step(){ // Call next backup engine's pump() method to get All data. pump(){ m_bw->get_buf(); m_drv->get_data(); } } // Start activating 'at begin' drivers ; Call sch.add() for them. Prepare for VP; // See WL#4610 for refined commit blocker -- which is todo do: Block commits : block_commits(thd) { // Step 1: Global read lock lock_global_read_lock(thd); // lock mutex &LOCK_global_read_lock // Use global read lock to block commits make_global_read_lock_block_commit(); } sch.prepare(){ pump->prepare() => m_drv->prelock(); } sch.step() { m_drv->get_data() } // VP creation start sch.lock(); save_vp_info() { save timestamp; binlog position; } sch.unlock() { m_drv->unlock() { ... Backup::kill_locking_thread(); ... } } unblock_commits(); report_vp_info(); // Finish Phase while sch.finish_count > 0 sch.step() { } } // End write_table_data() Save end time; write_summary() { write 0, vptime, endtime, binlog pos, binlog group } } // End context.dobackup() context.close(); } // End execute_backup_command Note: Use debug flag "backup" to see all steps. Backup_image := Preamble | [table_data]* | Summary Preamble := Header | Catalogue | Metadata Header := flags|start_time|total_snapshots|server_version|snap_shot_descriptors Catalogue:= Charset_info|User_info|Tablespace_info|list_of_dbnames|list_table_info|list_dbitem_info Summary contains validity point information plus some loginfo for point-in-time recovery Chunk contains snapshots ??? Total number of snapshots <= 256 ! where #0 mysql_execute_command (thd=0x95acf18) at sql_parse.cc:1797 #1 0x083b9d17 in Execute_sql_statement::execute_server_code (this=0xb7445128, thd=0x95acf18) at sql_prepare.cc:2933 #2 0x083b94c6 in Prepared_statement::execute_server_runnable (this=0xb7444e48, server_runnable=0xb7445128) at sql_prepare.cc:3437 #3 0x083ba27f in Ed_connection::execute_direct (this=0xb74451dc, server_runnable=0xb7445128) at sql_prepare.cc:3967 #4 0x083ba3ad in Ed_connection::execute_direct (this=0xb74451dc, sql_text={str = 0x9625820 "SHOW CREATE DATABASE `test`", length = 27}) at sql_prepare.cc:3933 #5 0x0852113a in run_service_interface_sql (thd=0x95acf18, ed_connection=0xb74451dc, query=0xb74451cc) at si_objects.cc:211 #6 0x08521d6c in obs::check_db_existence (thd=0x95acf18, db_name=0x96257b4) at si_objects.cc:2928 #7 0x08872923 in Backup_info::add_dbs (this=0x9622f80, thd=0x95acf18, dbs=@0x95ae69c) at backup_info.cc:686 #8 0x0885b6db in execute_backup_command (thd=0x95acf18, lex=0x95ade60, backupdir=0xb7445af8, overwrite=false, skip_gap_event=false) at kernel.cc:234 #9 0x08308288 in mysql_execute_command (thd=0x95acf18) at sql_parse.cc:2432 #10 0x08310c7e in mysql_parse (thd=0x95acf18, inBuf=0x95daa10 "backup database test to '/home/thava/tmp/test.bak2'", length=51, found_semicolon=0xb74469c8) at sql_parse.cc:5950 #11 0x08311830 in dispatch_command (command=COM_QUERY, thd=0x95acf18, packet=0x95ceaf1 "backup database test to '/home/thava/tmp/test.bak2'", packet_length=51) at sql_parse.cc:1062 #12 0x08312d9c in do_command (thd=0x95acf18) at sql_parse.cc:744 #13 0x082ff357 in handle_one_connection (arg=0x95acf18) at sql_connect.cc:1163 #14 0xb7f5450f in start_thread () from /lib/tls/i686/cmov/libpthread.so.0 #15 0xb7d5c7ee in clone () from /lib/tls/i686/cmov/libc.so.6 (gdb) ----- Backup FAQ : 1) What operations are blocked during backup process ? All DDLs are blocked. (using bml_enter(); bml_get() calls ) All DMLs related to backup database tables are blocked by the driver acquring read lock on the tables from the prelock() stage. All writes are blocked during create-vp lock() stage. Questions: - Backup synchronization ??? - Where is myisamlog used ? There is no myisam.log file in our system. - Does 'flush logs' result in checkpoint ? It does increment (and rotates) the binary log. What is the significance of XID_EVENT in binary log ? - This is used before commit of tx involving Tx capable SE tables. What is the significance of TABLE_MAP_EVENT in binary log ? - Used when different definitions are used in master and slave. - During replication, only committed txns are sent to slaves ? (delayed relay log?) Otherwise the "rollback" statement would be very tricky. - Does MyISAM suspend checkpoint operations during the entire backup process ? All about (server) binary log: - Statements are stored in the form of "events" that describe the modifications. - The binary log also contains information about how long each statement took that updated data. * Information about the state of the server that is needed to reproduce statements correctly * Error codes * Metadata needed for the maintenance of the binary llog itself (for example, rotate events) Purpose: Replication or recovery The log consists of a set of binary log files, plus an index file. + The header bytes provide information about the type of event, when it was generated, by which server, and so forth. + The data bytes provide information specific to the type of event o The first event is a descriptor event that describes the format version of the file (the format used to write events in the file). o The remaining events are interpreted according to the version. o The final event is a log-rotation event that specifies the next binary log filename. * The index file is a text file that lists the current binary log files. - Currently, there are three binary log format versions HOSTNAME-bin.0000101, HOSTNAME-bin.0000102 HOSTNAME-bin.0000103, etc. HOSTNAME-bin.index - Log event structure (for the latest version v4) : +=====================================+ | event | timestamp 0 : 4 | | header +----------------------------+ | | type_code 4 : 1 | | +----------------------------+ | | server_id 5 : 4 | | +----------------------------+ | | event_length 9 : 4 | | +----------------------------+ | | next_position 13 : 4 | | +----------------------------+ | | flags 17 : 2 | | +----------------------------+ | | extra_headers 19 : x-19 | +=====================================+ | event | fixed part x : y | | data +----------------------------+ | | variable part | +=====================================+ How the recovery is done using checkpoint and tx logs : Transaction recovery procedures generally make use of a deferred-write and write-through techniques. When the recovery procedure uses deferred write or deferred update, the transaction operations do not immediately update the physical database. Instead, only the transaction log is updated. The database is physically updated only after the transaction reaches its commit point using the transaction log information. The recovery process for all started and committed transactions (before the failure) will follow these steps: 1. Identify the last checkpoint in the transaction log. 2. For a transaction that started and committed before the last checkpoint, nothing needs to be done because the data are already saved. 3. For a transaction that performed a commit operation after the last checkpoint, the DBMS uses the transaction log records to redo the transaction and update the database, using the “after” values in the transaction log. 4. For any transaction with a ROLLBACK operation after the last checkpoint or that was left active (with neither a COMMIT nor a ROLLBACK) before the failure occurred, nothing needs to be done because the database was never updated, and the transaction will be rerun. When the recovery procedure uses write-through or immediate update, the database is immediately updated by transaction operations during the transaction’s execution, even before the transaction reaches its commit point. The recovery process will follow these steps: Cases 1-3 are as above in the case of deferred write. case 4. For any transaction with a ROLLBACK operation after the last checkpoint or that was left active (with neither COMMIT nor a ROLLBACK) before the failure occurred, the DBMS uses the transaction log records to a ROLLBACK or undo the operations, using the “before” values in the transaction log. (No need for a separate undo log. Same log used as "undo" log!!!) Note: When checkpoint is happening, no updates to the database are allowed. ================== Notes on MySQL backup Driver class : backup_engine.h : class Driver { /// Construct from list of tables. The list is stored for future use. Driver(const Table_list &tables) :m_tables(tables) {}; // buf_size is minsize that backup kernel will provide driver for buf. // After return from begin() call driver should be ready to serve // requests for sending/receiving data. virtual result_t begin(const size_t buf_size) =0; // Finalize backup/restore process. What is next ? virtual result_t end()=0; /// Cancel ongoing backup/restore process. virtual result_t cancel() =0; protected: const Table_list &m_tables; } Driver Phases : (Mainly from the perspective of Driver) Phase 1: Idle After creation of the driver instance and before @c begin() call Phase 2: Initial Transfer when initial data is sent before driver can create VP. "At end" drivers will send majority of their data in this phase. Phase 3: Waiting for prelock When driver waits for other drivers to finish their initial transfer. This phase is ended by a call to prelock() method. Phase 4: Preparing for lock when driver does necessary preparations (if any) to be able to instantly create a VP upon request from kernel. Phase 5: Waiting for lock when driver waits for other drivers to finish their preparations. Phase is finished by a call to @c lock() method. Phase 6: Synchronization when the validity point is created inside lock() method. For synchronization reasons, data in all tables being backed-up should be frozen during that phase. Phase is ended by a call to @c unlock() method. Phase 7: Final transfer when final backup image data (if any) is sent to the kernel. "At begin" will send all their data in this phase. This phase is ended by a call to end() method. Note: In each phase, except for the synchronization phase (6), kernel is polling atleast one driver using get_data() method. Thus a driver has a chance to send data in each phase of the backup process. For example, when waiting in phase 3 or 5, driver can send log recording changes which happen during that time. A driver informs the kernel about finishing the initial transfer phase (2) or the lock preparation phase (4) by the value returned from the get_data() method. class Backup_driver : public Driver { public: /* Return Values for get_data() : OK - New data in buffer. READY - OK + End of Initial Transfer OR Prepare-lock DONE - OK + End of Final Transfer PROCESSING - Req accepted but not complete. Buffer in use. BUSY - Req not accepted. Retry later. Buffer not used. ERROR - Error condition */ virtual result_t get_data(Buffer &buf) =0; /* Prepare for synchronization. Will wait for lock() call. Return values for prelock() : READY The driver is ready for synchronization, i.e. it can accept the following lock() call. OK The driver is preparing for synchronization. Kernel should call get_data() and wait until driver is ready. ERROR */ virtual result_t prelock() { return READY; }; // Create VP by calling lock() method: // Returns Error code or OK upon success. // Note: global block commit is active now. // All engines should create VP on this latest state. virtual result_t lock() =0; // Flag end of VP creation using unlock() call: virtual result_t unlock() =0; /* Returns OK or ERROR */ virtual size_t size() =0; /* Return estimate size */ // Return initial size() estimate. virtual size_t init_size() =0; } Note: Buffer is allocated by kernel given to driver. struct Buffer { size_t size; /* in/out param. Total size. Driver sets it to usedsize */ uint table_num; /* table no to which data belongs.*/ bool last; // TRUE if this is last block of data in the stream. byte *data; // Pointer to data ar ... } For example, consider backing-up three tables t1, t2 and t3. Data blocks produced by a backup driver are divided into four streams: #0: shared data #1: data for table t1 #2: data for table t2 #3: data for table t3 Backup Image Format, and Obj instances, etc. : image_info.h : struct st_bstream_image_header { unsigned int version; /* Image's format version number. */ server_version; /* flags indicate (a) if image created using big/little endian. (b) if binlog position stored in image or not. */ unsigned int flags; /* Image options. */ time_t start_time; time_t end_time; time_t vp_time; /** Position of the last binlog entry at the VP time. */ struct st_bstream_binlog_info binlog_info; /** Number of table data snapshots in the image. One snapshot per driver. */ unsigned short int snap_count; struct st_bstream_snapshot_info snapshot[256]; }; /* What kind of items are saved in image ? */ enum_bstream_item_type { charset, user, privilege, db, table, view, stored proc, func, event, triggers, tablespace }; /* Item info basic structure. */ struct st_bstream_item_info{ item type; /* db, table, etc. */ name of the item; long int pos; /* position of the item in image's catalogue. */ }; /* DB specific item info. Nothing but basic item info. */ /* The same style is used for table space info, etc. */ struct st_bstream_db_info { struct st_bstream_item_info base; /**< The base of the info class. */ }; /** Describes per database item. @note The fields snap_num, pos are used for triggers only and are NULL otherwise. The fields contain the coordinates of the table on which the trigger fires. All other per db items are not tied to another per db item. However trigger (per db item) is tied to table (another per db item). Hence this needs special treatment. Hence trigger info contains : Basic item info; describes name, db catalog pos number; Associated db info; snapshot number of the associated table; position of the table within snapshot; What about associated table basic info (catalog pos no?) ???? */ struct st_bstream_dbitem_info { struct st_bstream_item_info base;/**< Data common to all items. */ struct st_bstream_db_info *db; /**< Database to which this item belongs. */ unsigned short int snap_num; /**< Snapshot where table data is stored. */ unsigned long int pos; /**< Table position within the snapshot. */ }; class Image_info : public st_bstream_image_header { /* Define following class types only for the purpose of backup image. */ class Obj, Ts, Db, Table; class Dbobj; /* Any perdb obj other than table:func,event,triggers...*/ /* we need these iterators to list the image contents! */ class Iterator, Ts_iterator, Db_iterator, Dbobj_iterator; ulonglong data_size; /* How much table data saved in image ? */ /* What is the diff between master binlog info and the one in the image header ? */ st_bstream_binlog_info master_binlog_info; /* get count functions for obj, table, view, routine, priv, db, ts and snapshots: */ ulong object_count(); ulong table_count(); ... /* Get obj given name: db, ts */ Db* find_db(const String&) const; Ts* find_ts(const String&) const; /* Get obj pointer given pos number: db, ts, db_object, table. */ Db* get_db(uint pos) ; Ts* get_ts(uint pos); Dbobj* get_db_object(uint db_num, ulong pos); Table* get_table(ushort snap_num, ulong pos) const; /* Iterators for enumerating contents of archive: dbs, ts, db_objects */ Db_iterator* get_dbs(); Ts_iterator* get_tablespaces(); Dbobj_iterator* get_db_objects(Db& db); /* all info about snapshots */ Snapshot_info *m_snap[MAX_SNAP_COUNT]; /*Routines to save start, endtime, vptime, binlog & master binlog pos..*/ /* Routines to Get vp time, end time */ protected: /* Routines to: Add snapshot, Add Db* at uint pos, Add Ts* at pos, Add db object given obj_type, obs::Obj*, at pos: Dbobj* add_db_object(Db &db, const obj_type type, obs::Obj *obj, ulong pos); Table* add_table(Db &db, obs::Obj *tbl, Snapshot_info &snap, ulong pos); protected: /* Implementation specific */ MEM_ROOT mem_root; class Tables; /* implements Table_list interface */ ... private: Map m_dbs; /* List of dbs */ Map m_ts_map; /* List of Ts */ }; Note: - class Image_info::Tables implements tables list: class Image_info::Tables: public Table_list, public Map; - The list is implemented using map. Insertion/Retrieval happens using pos in the map. - There is following typedef for Base defined with in this Tables class: typedef Map Base; - Table_list is abstract class which enforces that operator[pos] returns Table_ref (which contains basic table name info). - class Snapshot_info { public: - Enum for snapshot type: Native, Default, CS, No data - Get snapshot type: type() - version_t version(); // Get snapshot version. int type. // This is separate from Image version (int) - char *name(); // For "%s backup/restore driver" pattern - Get table count. - Determine if given table can be saved in this snapshot: bool accept(Table_ref&, storage_engine_ref); - Create backup/restore driver: get_backup_driver(Backup_driver*&) etc. - Get table given pos: Image_info::Table* get_table(ulong pos) const; - Is this pos in snapshot or pos in Image catalogue ??? protected: - Add table, given pos: int add_table(Image_info::Table& t, ulong pos); private: Image_info::Tables m_tables; // List of tables in this snapshot } // Note: This is pure virutal class !!! class Image_info::Obj : public Sql_alloc { // Image allocation is done using MEM_ROOT. Dummy destructors! obj_type type(); // Same as enum_bstream_item_type // charset, user, privilege, db, table, view, stored proc, func, // event, triggers, tablespace // info() returns pointer to item info st_bstream_item_info* info(); // Pointer to obs::Obj instance if it is known. obs::Obj *m_obj_ptr; // Description of obj. (defn??) char* describe(describe_buf&) const =0; protected: Obj(obs::Obj *obj_ptr); // Given obs::Obj }; // Definition of Table space class Image_info::Ts : public st_bstream_ts_info, public Image_info::Obj, public Db_ref; Note: - Table space is per DB item. - Table space implements generic Image_info::Obj - Table space implements ts item info class. // Definition of Table class Image_info::Db : public st_bstream_db_info, public Image_info::Obj, public Db_ref ; Note: - Db also contains functions to add/get obj given pos. Internally uses Map m_objs; to store the list of Dbobj's. This pos seems to be just a pos in Db rather than pos in the catalogue or snapshot. /** Specialization of Image_info::Obj for storing per-database object. @note For tables, there is dedicated class @c Image_info::Table. */ class Image_info::Dbobj : public st_bstream_dbitem_info, public Image_info::Obj, public Table_ref; Note: - Why does it implement Table_ref ??? Many Dbobj may not be associated with Table. Trigger is an exception which is associated with table. It appears like a hack since Table_ref provides name() kind of basic functions which are reused here even for non-tables! /// Specialization of @c Image_info::Obj for storing info about a table. class Image_info::Table : public st_bstream_table_info, public Image_info::Obj, public Table_ref { const Db &m_db; /* Associated Database */ Note: Db_ref just provides name() etc basic functions. Db is a class which is ready for storing in Image. Table *next_table; /* can be used to create linked list in db */ TABLE_LIST *m_table; /* If not NULL, points at opened table. */ ..... ..... } /** Base class for all iterators. An iterator is used as follows Iterator_X it; Image_info::Obj *obj; while ((obj= it++)) { } This is an abstract class. Derived iterators must define @c get_ptr() and @c next() methods which are used to implement @c operator++(). */ class Image_info::Iterator { Obj* operator++(int); protected: const Image_info &m_info; <== Note this. .... }; /* Used to iterate over all tablespaces stored in a backup image. */ class Image_info::Ts_iterator : public Image_info::Iterator { Ts_iterator(const Image_info&); protected: uint pos; // Position in the iterator starting from 0. .... }; Similarly: class Image_info::Db_iterator : public Image_info::Iterator; class Image_info::Dbobj_iterator : public Image_info::Db_iterator; Note: constructor: Dbobj_iterator(Image_info &info, const Db &db); Note: void Image_info::count_object(const enum_bstream_item_type type); increments counter for given type: case BSTREAM_IT_SPROC: case BSTREAM_IT_SFUNC: case BSTREAM_IT_EVENT: case BSTREAM_IT_TRIGGER: m_routine_count++; // wow!!! break; ulong Image_info::routine_count() { return m_routine_count; } ulong Image_info::db_count() const { return m_dbs.count(); } ulong Image_info::ts_count() const { return m_ts_map.count(); } ulong Image_info::table_count() const { return m_table_count; } ulong Image_info::view_count() const { return m_view_count; } ... // This pos happens to be same as pos in image catalog // since databases are stored first in the catalog. Image_info::Db* Image_info::get_db(uint pos) const { return m_dbs[pos]; } // This pos starts from 0. Not an abs pos in catalog. Image_info::Ts* Image_info::get_ts(uint pos) const { return m_ts_map[pos];} Note: How to get instance of obs::Obj ? Examples: obs::Obj *db_obj= obs::get_database(&name_str); obs::Obj *tbl_obj= obs::get_table(&db->name(), &name_str); Peformance Test Results: https://inside.mysql.com/wiki/PerformanceResults-backup Efficiency of backup operation vs mysqldump vs plain copy vs xtrabackup xtrabackup - Percona's backup utility Engine MySQL-Backup Copy MySQLDump XtraBackup MyISAM 6.72S/GB 6.29s/gb 29s/gb 6.30s InnoDB 52s ?? 28s ?? Restore: MyISAM Almost Same Time as backup InnoDB Takes 25 Times as backup Big number of DB Objects: -- Results for backup database with big number of DB objects do not depend much on engine since it is mostly metadata. -- Restore is 2 to 4 times slower. -- Dump is around 20 times slower for backup!!! (why ???) -- Restore from dump is almost as efficient as native backup Impact of backup operation on the workload - During backup there may be 50% performance drop for InnoDB - For MyISAM there may be around 30% performance drop. Note: There is no obvious and striking findings in backup performance report. Comparison charts are bit confusing. Will be good to plot the throughput for one vs other. The restore gives deterministic figures because it is intrusive. The restore may take longer in future if it is to be non-intrusive. Probably innodb engine statistics about locks/waiting time may be useful to register ?? The parent worklog for online backup testing is WL#4208 Notes on WL#4406 : Online Backup: Performance Testing Implementor is Ranger. Trigger is not mentioned in the WL but is tested. The idea is to make a baseline for performance testing. The figures will be compared with file copying time. Testing with different database sizes are included. Testing with high amount of metadata is not included (but is tested?) Effect of online queries on backup performance is also to be tested. Good comprehensive goals. WL#4220: Backup Stress Testing -- Created by chuck. Includes perf testing WL#4406 and WL#4218 - Invariant testing - (Tx consistency. Only for innodb?) (Patch queued. Completed?) Endurance Testing (WL#4922) - Highly parallel exec not to fail. (This may expose any backup starvation problem) Question: Where backup_restore.pl script from WL#4044 (large datatest by Elena)? WL#5214 : Test MySQL backup with different database sizes Though WL#4406 covers some part of it, the focus of 4406 is general performance where as this should just focus on database size only (hence could do better job in this area). Should also handle large metadata. Will be better to have throughput graph as a result vs data size. MyISAM Backup: - See worklog 866 to see how MyISAM handles backup