Contents:

Pseudo_Code_Of_ibbackup
   Case_1_Backup
   Case_2_ApplyLog

MySQL_64Bit_DataTypes
Compiler_Flags
Zlib_Compression_Interface
InnoDB_Utilities
Important_Global_Variables
InnoDB_DoubleWrite:
     http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/
     http://dimitrik.free.fr/blog/archives/2009/08/mysql-performance-innodb-doublewrite-buffer-impact.html

How_To_Check_fsync

innodb_support_xa

Innodb_Group_Commit 

InnoDB_Code

Debugging_ibbackup

mysqlbackup_Implementation

==============================================================================
Pseudo_Code_Of_ibbackup
==============================================================================

Background:

Initializing System Tablespace:

      system_tablespace = back_tablespace_new(spaceid=0, 0, FALSE, FALSE)
      {

      }

Data Structures:
   typedef struct {
         char               *filename;
         ulint              size_in_mb;
         unlint             space_id;
         back_filestatus_t  status;
         ulint              zip_size; /* compressed page size in bytes */
         back_filestatus_t  status;   /* either exists or DROPPED state */
   } back_datafile_t;

   typedef struct {
	ulint max_count;
	ulint count;
	back_datafile_t *items;
   } back_datafilearray_t;

typedef struct back_tablespace {
	ulint space_id;
	ulint file_format;     /* 0 - Antelope */
	ulint zip_size;   /* compressed page sz in bytes or 0 = uncompressed */
	back_datafilearray_t data_files;
	back_datafilearray_t backup_data_files;
	ibool is_auto_extending;
	ulint max_auto_extend_size;
	ibool is_compressed;
	struct back_tablespace *next;
    } back_tablespace_t;

==============================================================================

Global Variables:
  back_first_tablespace ==> First system table space;
  back_last_tablespace ==> (updated for per-file table space as well?)

 /* All pertable datafiles collected during backup */
  back_datafilearray_t  back_collected_datafiles; 

==============================================================================

Case_1_Backup

ibbackup ./etc/my.cnf  backup.cnf

main:

  dict_ind_init();  // init some dict to use some innodb low level routines.
  Initialize global variable fil_system which is tablespace mem cache.
  Initialize 2 buffer pool frames;
  Init data file array;
  Init system table space info: back_tablespace_t *system_tablespace;
  [ temporarily sets autoextending & compressed to FALSE but may change later.]

  Read 6 parameters from my.cnf and then backup.cnf :

7667         parse_my_cnf_file(
7668                 system_tablespace,
7669                 &system_tablespace->data_files, 
                           /*ibdata1:32M;ibdata2:32M:autoextend */
7670                 FALSE,
7671                 my_cnf,
7672                 &back_datadir,
7673                 &back_innodb_data_home_dir,
7674                 &back_log_dir,
7675                 &back_n_log_files,
7676                 &back_log_file_size );


  get_pertable_tablespaces(create_backup_database_dirs,
                         is_src_data_incremental):
    collect_pertable_filenames(filenames, create_db_dir, is_src_differential):
      if (back_datadir)  // Get it from original data dir.
        collect_pertable_tablespaces(filenames, ...)
      else               // Get it from backup dir. Used in apply-log etc.
        back_collect_filenames(back_back_datadir,
                  uncompress_option? filter_ibz : filter_ibd,
                   excluded_names[in], filenames[out]);

Note: Above function is idempotent for  "backup"; i.e. When called multiple
      times, only the new per-table files are identified and table spaces
      created. 
      For operations such as "apply-log", the above function is *not* 
      idempotent. A new tablespace is always created for each file found.
      The tablespace contains space->data_files and space->backup_data_files;
      For backup operation both are updated;
      For apply-log operation only backup_data_files is updated.


Q: Since compressed tablespaces are not again compressed, how does apply-log
   differentiate when collecting file names of all existing table spaces ?

  collect_pertable_tablespaces(filenames):                /* From datadir */
     Any file with .ibd suffix is per-table tablespace.
     Note: ibdata1 and ibdata2 don't have .ibd suffix.
     Get table space id from the .ibd file:
        Read first page; 
        Space id available in both TableSpace Header & Page header in the page;
        if the values don't match, error out.
     Get compressed page size(0-uncompressed) from the first page of a tablespace:
        Get flags from (first page) tablespace header;
        Lookup compressed page size using dict_table_flags_to_zip_size(flags);

  Create new pertable tablespace;
     /* For this it again reads the first page of tablespace! why?! */
     back_tablespace_t *space = new back_tablespace_new(...);

  Get File format id:
    trx_sys_read_pertable_file_format_id(pathname-pertable, &format_id); 
    File format is 4 bytes at  page+54th offset.
        mach_read_from_4(ptr):
            mach0data.ic:182: return long from byte[] from lsb to msb;
    File format = 0  ==> Antelope;

    Populate space->data_files and space->backup_data_files;

    Note: Basically we do initialize tablespaces for *all* per-table 
          data files even before starting!

   system_tablespace_read_format_id(system-table-space);  // Antelope;
   
   make_backup() :
     check_backup_directories() :
        Verify that backup dir is not subdir of source datadir!
        back_are_files_the_same():
           On Windows, check accordingly.
           On Unix, check stat() syscall result and see if inodes are same.
     
      back_look_for_checkpoint_pos():
        Looks for a checkpoint in the first log file, 
        set the start of the log copying accordingly.
        back_read_cp_info():
           Open ib_logfile0; 
           Read 3*512 + 512 bytes. Guarantees to have 1st & 2nd checkpoint info.
           Reads the checkpoint info needed in hot backup.
        Findout the log number (i.e. N in ib_logfileN) having last cpinfo
  
      back_doublewrite_init();  // Init info about double write buf.
      recv_sys_create();  // Initialize local recovery system to parse,
                          // log data to see if scanned log is corrupt.

      recv_sys->parse_start_lsn = backup_start_checkpoint; etc.
                          // Initialize start lsn as last check point.

      back_copy_log();

      For each tablespace:
         backup_tablespace(incremental?, start_lsn_incremental, space);

      back_suspend();  // At the end.

      back_check_log_scanned_far_enough(); // Copy remaining log if needed.
         Get new checkpoint info;
         back_copy_log();  // copy logs from where we left last time.
         // we must have copied now atleast upto latest checkpoint.
         // otherwise, the backup will be unusable. Because we don't have
         // the logs between our last parsed LSN and the latest checkpoint.
         i.e. verify (back_up_log_end_lsn >= last_cp_lsn) Else Error;

      Assert(back_back_log_file_offset
	     == back_up_log_end_lsn - back_up_log_start_lsn);

      At the end of ibbackup_logfile, append a 512 bytes block
      with following details:
               back_up_log_start_lsn,
               back_up_log_end_lsn,
               back_up_start_checkpoint,
               LOG_END_MAGIC

====================  End Backup Logic  ======================================


     From log/log0recv.c:

 747 @return TRUE if success */
 748 UNIV_INTERN
 749 ibool
 750 recv_read_cp_info_for_backup(
 751 /*=========================*/
 752         const byte*     hdr,    /*!< in: buffer containing the log group
 753                                 header */
 754         ib_uint64_t*    lsn,    /*!< out: checkpoint lsn */
 755         ulint*          offset, /*!< out: checkpoint offset in the log group */
 756         ulint*          fsp_limit,/*!< out: fsp limit of space 0,
 757                                 1000000000 if the database is running
 758                                 with < version 3.23.50 of InnoDB */
 759         ib_uint64_t*    cp_no,  /*!< out: checkpoint number */
 760         ib_uint64_t*    first_header_lsn)
 761                                 /*!< out: lsn of of the start of the
 762                                 first log file */


  The log contains the first header that looks like:
       <checkpoint_no,lsn,offset,log_buf,size,....>+check_sum_for_header+...

  Note: checksum header is of 288 bytes length.

  Verify that the checksum matches. To do that:
        ut_fold_binary(buf, checkpoint_checksum_1 offset);
 
  Initial 8 bytes of Log record is checkpoint number;

  In example session:
  checkpoint number is 26 and checkpoint LSN is 81683811.
  
  If the log file size of ib_logfile0 file and my.cnf parameter, does not
  match, give error and abort!

In the example session with 2 rows with UNIQUE_THAVA_DATA :
   log at offset    (0x9d50)   : 40288
   data at offset   (0xc8120)  : 819,488
   double write data(0x128120) : 1,212,704 

  
2366 /*  The back_doublewrite_init function initializes "back_doublewrite" struct
2367     with the location of the doublewrite buffer in the system tablespace.
2368 
2369     If this function fails, it does not return.
2370 */
2371 static void back_doublewrite_init(...)

The 5th page in tablespace is system tx header. It contains info 
about double write buffer. The last 200 bytes in this page contains
double-write buffer header. There are 2 blocks: each block contains
64-consecutive pages dedicated for double write buffer. The tablespace
page-5 contains pointers to all these info.


        recv_sys_create():  means initialize recovery system 
            Init red-black tree with 5 MB memory. i.e. 5*1024*1024
        Init start_lsn = backup_start_checkpoint (= 81683811 how?!)
        Init ibbackup log name = "ibbackup_logfile"; 
                                (the original = ib_logfile0)
        back_copy_log():

           copy from checkpoint lsn.

           open ib_logfile0 for reading;

           Read BACK_LOG_COPY_SEG_SIZE chunks of each 1 MB

             recv_scan_log_seg_for_backup():    // log/log0recv.c

                log_block_get_data_len(log_block) which is 355 bytes!

                scanned_checkpoint_no is 19! how?!

                while copying log record, we also parse it to detect
                corrupted log record. Why?! redundant?

           copy the chunk to ibbackup_logfile
           posix_fadvise() call on output log file;  (why for each 1MB?!)

Standard_Breakpoints:
  back_look_for_checkpoint_pos();


Todo: Remember to remove too many fflush(stdout) from code.
==============================================================================

Case_2_ApplyLog

main
{
  Initialize 2 buffer pool frames for apply log;
  The second frame will be needed in btr_page_reorganize_low()
  recover_backup()
  {
    open ibbackup_logfile; it is only 2*1024 bytes!
    size = Read file size : 2K;
    Substract 512 bytes:  size -= 512;
    Read Last 512 bytes from ibbackup_logfile into log_end_mark;
    It contains:   start_lsn, end_lsn, start_checkpoint (each 8 bytes)
     
     ibbackup:         *start-lsn* : 15390720, 
                         *end-lsn* : 15392144,
     ibbackup:  *start-checkpoint* : 15391113.

    (end_mark + 24) Should have  BACK_UP_LOG_END_NEW_FORMAT_MAGIC_N ;
    It is: 542632761 :  0x2057eb39

    end_mark+28 : 4 bytes indicate partial backup;

    recv_sys_create();
    Recovery Scan Size is 4 Pages;
    Set buf_start_lsn;
    fil_space_create('ibdata1') : Internal TableSpace object creation;
    Load single table spaces info;
    fil_open_log_and_system_tablespace_files() : Do real open of ibdata1,etc;

    While Not Finished:
    {
      Read LogSegment in ibbackup_logfile : From offset 0 to  (2K-512)
      Read min(remaining_size, 4Pages) which is 2K-512 = 1536
      recv_scan_log_recs(.. buf, buflen, start_lsn, ..., &scanned_up_to);
      {
         cur_lsn = start_lsn;
         For Each LogBlock in (buf .. buf+len) :   // To Process One LogBlock.
         {
           no = log_block_get_hdr_no(buf); ==> Yields 30061
                Note: It is the first 4 bytes in the log header!
           
           Map LSN to Log Block Hdr Number! 
           cur_lsn_to_hdr_no =  (lsn/512) + 1;   It is  >0 and <= 1G

           Q: Does this mean, My LSN can be inaccurate with in 512 bytes????

           The cur_lsn's header number and current buffer header num
           should match;  Else Break loop;

           If checksum of the LogBlock Does not Match, Break Loop;

           If LogBlock has flushbit (the MSB) set, it is First LogBlock,
           then set the <out>contiguous_lsn to cur_lsn;

           If (LogBlock's CheckPoint-Number < Recovery System's CheckPointNum)
              Break Loop to Skip this LogBlock;

           data_len = log_block_get_data_len(log_block);   i.e. 512
           scanned_lsn += data_len;  i.e. 15390720 + 512 =   15391232
           recv_sys_add_to_parsing_buf(log_block, scanned_lsn)
           {
              /* It is for "adding" to an existing buffer. */
              Can Add Only if recv_sys->parse_start_lsn > 0;
              Q: When was recv_sys initialized ?
                 It was initialized by recover_backup() in back0back.c as:
                    recv_sys->parse_start_lsn = back_up_start_checkpoint;
           }


         }

      }

      If recv_sys->heap size becomes higher than limit-memory:
         recv_apply_log_recs_for_backup();
         
    }

    Report binlog number from TRX_SYS_PAGE_NO in ibdata1;

  }
}

Note: No single log record can be greater than 500KB. The log records operate
      on rows ?

==============================================================================

Zlib_Compression_Interface:

compress2(dest, destlen, source, srclen, compression-level) :
   z_stream stream;  
   setup stream with input & output;
   deflateInit(&stream);
   deflate(&stream);
   deflateEnd(&stream);

That's it!
==============================================================================
  
MySQL_64Bit_DataTypes:

Notes: mysys makes heavy use of size_t; There is no int32 or int64 types!
       Probably it never needs the datatype to be specific as 32 or 64?!!!
       sql/ has only sql_class.cc which refers to uint64. 
                otherwise there is no int64 type!

Only mysqld.cc contains _WIN32 and _WIN64 (both are defined in 64 bit Win!).
Otherwise the check is mostly #ifdef  __WIN__
==============================================================================
Compiler_Flags

Windows Flags:
 
    WIN32 API:
      Most important API on windows for all core services:
      graphical user interface; access system resources such as
      memory and devices; display graphics and formatted text; incorporate audio,
      video, networking, or security.

      The core DLLs of Win32 are kernel32.dll, user32.dll, and gdi32.dll. 
      Win32 was introduced with Windows NT;
      It provides almost all windows internal services.

      Windows 64-bit also uses same named DLLs for 64-bit;
      The pointers are 64-bit by default;

    cmake only defines  WIN32 for within CMakeLists.txt;
    We define following variables (_D... ) for C programs
    internally from CMakeLists.txt :

    Note: All the flags set for win32 is also set for win64.

    General:
       _WIN32 : Automatically set by compilers. Signifies Win32 API available.
       WIN32  : Set by MSVC++ compiler. (And also by Windows SDK header file?)

    In innobackup-c :
    Flag           32-bit/64-bit/both

    __WIN__   : both
    _WINDOWS  : both
    _WIN64    : 64-bit

    Note: innodb code finally uses __WIN__ in its code base.

    In ibbackup, there is include/win/inttypes.h
         Defines "x" , "lX" etc for fscanf macros on WIN64 systems.
       include/win/stdint.h : 
         Defines  int8_t, int64_t, etc definitions intended for MS VC++;
         It also defines: ssize_t as signed size_t; long or long long;

Windows Considerations:
  the maximum length for a path is MAX_PATH, which is defined as 260 characters.
  File I/O functions in the Windows API convert "/" to "\" 

  innodb defines: 
     OS_FILE_PATH_SEPARATOR

=============================================================================

InnoDB_Utilities:

   ut_a(X)   : utility for assert. e.g. ut_a(p != NULL);

   ut_align( const void*  ptr, ulint align_no);    /* rounds up alignment */
   ut_align_down(...) ;    /* rounds down alignment */


   hash_node_t ;
   UT_LIST_NODE_T(type);
   back_strarr_t *;  [ array of strings ! ]
   os_file_create_simple_no_error_handling(...);
   os_file_get_last_error(TRUE);
   ut_print_timestamp(stderr);
   char* os_file_dirname(..) : returns parent dirname of the path!

   See Also Win32 APIs : PathCanonicalize() : 
                         GetFullPathName():

   mem_free_func(); ==> some intelligent memory allocation internal
                        functions used from innodb/include/mem0mem.ic !!!

==============================================================================

Important_Global_Variables:

   fil_system  : Tablespace Memory Cache. innodb/fil/fil0fil.c
                 fil stands for "file" system ?!

   Page size =  16KB

#define OS_FILE_LOG_BLOCK_SIZE                512
                                        file has been completely written */
#define LOG_CHECKPOINT_1        OS_FILE_LOG_BLOCK_SIZE
                                /* first checkpoint field in the log header;  */
#define LOG_CHECKPOINT_2        (3 * OS_FILE_LOG_BLOCK_SIZE)
                                /* second checkpoint field in the log header */

typedef byte * page_t; !!!
There is no structure for page header!
There is a sequence of OFFSETS which are defined for page structure!

e.g.   unsigned long long page_lsn = mach_read_ull(page+FIL_PAGE_LSN);

Note: InnoDB usually writes MSB first (i.e. Bigendian order!!!)

include/fil0fil.h  Defines  #define FIL_PAGE_LSN   16
              LSN of the newest modification to the page!

For data page, this makes sense. What about the Log Page ???

More about  fil0fil.h :
  - Low-level File System.
  - What is page number ? Is it number within table space or within file ?

typdef struct {
   uint4 pageno;    /* Page number with in file or space ??? */
   uint2 boffset;   /* byte offset within page */
} thava_space_addr_t;

typedef struct my_page{
ulint checksum; /* checksum of the page (since 4.0.14) */
ulint page_offset; /* page offset inside space */
fil_addr_t previous; /* offset or fil_addr_t */
fil_addr_t next; /* offset or fil_addr_t */
dulint page_lsn; /* lsn of the end of the newest
modification log record to the page */
PAGE_TYPE page type; /* file page type */
dulint file_flush_lsn;/* the file has been flushed to disk
at least up to this lsn */
int space_id; /* space id of the page */
char data[]; /* will grow */
ulint page_lsn; /* the last 4 bytes of page_lsn */
ulint checksum; /* page checksum, or checksum magic, or 0 */

}


#define  LOG_START_LSN  16 * 512 = 8K; Counting of LSN starts from this value.
                             This is less than 1 innodb page size. why ?

LogBlock Header contains:
   Header-Number(4 bytes); 
   Data-Length; (2 bytes)
   Offset-to-first-mtr-log-record-group (2 bytes)
   Log-block-check-point-no (4 bytes) : This is "next checkpoint no"
                                        when log is written to.
   Total header size = 12 bytes;

Log File Header Size = 4 * 512 = 2KB

LOG File Header Contents:
    0   -  Log group ID (4 bytes)
    4   -  LSN of start of data in this log file. (8 bytes)
    12  -  Log file number (4 bytes) (only in archived log file).
    16  - 32 byte string e.g "Created by ibbackup ... "


Log Block Header contains:
     Log Start LSN = 8K 

log_group_calc_size_offset() : Converts real offset in log file to LSN;
                               [ by substracting LOG_FILE_HDR_SIZE n times ];

log_group_calc_real_offset() : Converts from LSN to real offset in file 
                               [ by adding LOG_FILE_HDR_SIZE ]

log_calc_where_lsn_is()      : Converts real LSN into (fileno, offset);
                               Main inputs: lsn, first_header_lsn


==============================================================================

InnoDB_DoubleWrite

   http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/ 

Innodb Double Write

One of very interesting techniques Innodb uses is technique called
doublewrite. It means Innodb will write data twice when it performs
table space writes writes to log files are done only once.

So why doublewrite is needed ? It is needed to archive data safety in case of
partial page writes. Innodb does not log full pages to the log files, but uses
what is called �physiological� logging which means log records contain
page number for the operation as well as operation data (ie update the row)
and log sequence information. Such logging structure is great as it require
less data to be written to the log, however it requires pages to be internally
consistent. It does not matter which page version it is � it could be
�current� version in which case Innodb will skip page upate operation or
�former� in which case Innodb will perform update. If page is
inconsistent recovery can�t proceed.


So how does double write works ? You can think about it as about one more
short term log file allocated inside Innodb tablespace � it contains space
for 100 pages. (1.6MB) When Innodb flushes pages from Innodb buffer pool it does 
so by multiple pages. So several pages will be written to double write buffer
(sequentially), fsync() called to ensure they make it to the disk, then pages
written to their real location and fsync() called the second time. Now on
recovery Innodb checks doublewrite buffer contents and pages in their original
location. If page is inconsistent in double write buffer it is simply
discarded, if it is inconsistent in the tablespace it is recovered from double
write buffer.

Q: Where does double write buffer live? InnoDB system tablespace? 
   Or tablespace being written ?

How much does double write buffer affect MySQL Performance ? Even though
double write requires each page written twice its overhead is far less than
double. Write to double write buffer is sequential so it is pretty cheap. It
also allows Innodb to save on fsync()s � instead of calling fsync() for
each page write Innodb submits multiple page writes and calls fsync() which
allows Operating System to optimize in which order writes are executed and use
multiple devices in parallel. This optimization could be used without
doublewrite though, it was just implemented at the same time. So in general I
would expect no more than 5-10% performance loss due to use of doublewrite.

Can you disable doublewrite ? If you do not care about your data (ie slaves on
RAID0) or if your file system guarantees you no partial page writes could
exist you can disable doublewrite by setting innodb_doublewrite=0 It is
however not worth the trouble in most cases.

==============================================================================

How_To_Check_fsync:
    http://www.mysqlperformanceblog.com/2006/05/03/group-commit-and-real-fsync/

Check if you OS is doing real fsync. You should to know anyway if you care
about your data safety. This can be done for example by using SysBench:
sysbench test=fileio file-fsync-freq=1 file-num=1
file-total-size=16384 file-test-mode=rndwr. This will write and fsync
the same page and you should see how many requests/sec it is doing. You also
might want to check diskTest from this page
http://www.faemalia.net/mysqlUtils/ which does some extra tests for fsync()
correctness.

==============================================================================

What is innodb_support_xa ?

  Parameter provides consistency between binlog and innodb tx log.

==============================================================================

Innodb_Group_Commit  : What is InnoDB Group Commit ?

   For each innodb tx, there is 1 (or 2 if xa is on) fsync is done.
If there are N concurrent commits, if the fsyncs are combined.
If the number of fsyncs remain as  N*no_of_txns, then group commit broken.
==============================================================================

InnoDB_Code

log0log.c : 

log_init() 
{

  OS_FILE_LOG_BLOCK_SIZE = 512 bytes
  log_sys -> lsn = LOG_START_LSN = 16 * 512 = 8 K;
  LOG_BUFFER_SIZE >= 4 pages i.e. 64 KB; Typically this will be min 32MB or so.

  Allocate log_sys->buf =  LOG_BUFFER_SIZE bytes;
  log_sys->max_buf_free =  (LOG_BUFFER_SIZE/2) - 4 pages(i.e.64KB) - 4*512;
  log_sys->checkpoint_buf = allocate 512 bytes;

  log_block_init(log_sys->buf,  log_sys->lsn(i.e. 8K)) :

     no = lsn / 512 = 16;
     log_block_set_hdr_no(buf, 16):
        write 16 (4bytes) into mem location (buf+ 0 (offset for block hdr no!))

     log_block_set_data_len(buf, hdr_size=12 bytes) :
        write 12 (2 bytes) into mem location buf + 4 (offset for hdr data length)
     
     log_block_set_first_rec_group(buf, 0) :
       write 0 (2bytes) into buf+ 6; // for minitransaction log record ???

  log_sys->buf_free = LOG_BLOCK_HDR_SIZE = 12 bytes;
  log_sys->lsn  =  8 KB + 12 bytes;

}


log_calc_where_lsn_is(first_header_lsn, lsn to search, N_logfiles, filesize)
{

   LSN is approximately :=
      Amount of Real Log Content + 8K + 12 bytes Log file header size;
      (As if entire LOG lives in single log file)
   LSN points to next byte to write; ????
}


==============================================================================

Debugging_ibbackup


==============================================================================

mysqlbackup_Implementation

 incremental-backup apply will be similar to ibbackup. won't delete unknown
      files.
 --pipe won't be supported.


ibbackup_Implementation:

 make_backup():

     Look for checkpoint pos;
     init info about location of double write buffer in system tablespace;
     init recovery system in order to parse the log while we back it up;

     back_copy_log():
      Loop:  For each MB of log segment :
        recv_scan_log_seg_for_backup():  // Here, we read the segments.
        recv_scan_log_recs();    //  Here we do parsing to make sure it is OK.
        Apply posix fadvise for each of 1MB;
     End of copying log;

     Start copying data:
       For each tablespace in all tablespaces linked from back_first_tablespace :
         backup_tablespace();
         If it is last ts, check if there are new per-table datafiles;

     No more per-table datafiles; We are at end.
     If suspend-at-end enabled, back_suspend() :
       Until suspend file is deleted, do every second:
          Keep scanning the log; i.e. back_check_log_scanned_far_enough():