New job assessment: MySQL online backup script

These is one of the proposed solutions for the job assessment commented in a previous post.

_How could you backup all the data of a MySQL installation without taking the DB down at all. The DB is 100G and is write intensive. Provide script. _

My solution to this problem will be "Online backup using LVM snapshots in a replication slave". It implies that a series of previous decisions have been taken:

LVM snapshots: Database data files must be on a LVM volume, which will allow the creation of snapshots. An alternative could be use Btrfs snapshots.

Advantages:

* Almost online backup: The DB will continue working while copying the data files.


* Simple to setup and without cost.


* You can even start a separated mysql instance (with RW snapshots in LVM2) to perform any task.

Drawbacks:

* To ensure the data file integrity the tables must be locked while creating the snapshot.

The snapshot creation is fast (~2s), but this means a lock anyway.

* All datafiles must be in the same Logical Volume, to ensure an atomic snapshot.


* The snapshot has an overhead that will decrease the write/read performance

(it is said that up to a 6x overhead). This is because any Logical Extend modified must be copied. After a while this overhead will be reduced because changes are usually made in the same Logical Extends.

Replication slaves: (see official documentation about replication backups and backup raw data in slaves ). Backup operations will be executed in a slave instead of in the master.

Advantages:

* Will avoid the performance impact in the Master mysql server,

since "the slave can be paused and shut down without affecting the running operation of the master".

* Any backup technique can be done. In fact, using this second method should be enough.


* Simple to setup.


* If there is already a MySQL cluster it will use existent infrastructure.

Drawbacks:

* This is more an architectonic solution or backup policy than a backupscript.


* If there are logical inconsistencies in the slave, they are included in the backup.

So, I will suppose that:

This is the backup script (tested with xfs) that must be executed on a Slave replication server:

(in github)

[sourcecode language="python"]

!/usr/bin/python

-- coding: utf-8 --

Requirements

- Data files must be in lvm

- Optionally in xfs (xfs_freeze)

- User must have LOCK TABLES and RELOAD privilieges::

grant LOCK TABLES, RELOAD on .

to backupuser@localhost

identified by 'backupassword';

import MySQLdb import sys import os from datetime import datetime

DB Configuration

MYSQLHOST = "localhost" # Where the slave is MYSQLPORT = 3306 MYSQLUSER = "backupuser" MYSQLPASSWD = "backupassword" MYSQL_DB = "appdb"

Datafiles location and LVM information

DATAFILESPATH = "/mysql/data" # do not add / at the end DATAFILESLV = "/dev/datavg/datalv" SNAPSHOT_SIZE = "10G" # tune de size as needed.

SNAPSHOT_MOUNTPOINT = "/mysql/data.snapshot" # do not add / at the end

Backup target conf

BACKUP_DESTINATION = "/mysql/data.backup"

----------------------------------------------------------------

Commands

Avoids sudo ask the password

SUDO = "SUDO_ASKPASS=/bin/true /usr/bin/sudo -A "

SUDO = "sudo" LVCREATECMD = "%s /sbin/lvcreate" % SUDO LVREMOVECMD = "%s /sbin/lvremove" % SUDO MOUNTCMD = "%s /bin/mount" % SUDO UMOUNTCMD = "%s /bin/umount" % SUDO

There is a bug in this command with the locale, we set LANG=

XFSFREEZECMD = "LANG= %s /usr/sbin/xfs_freeze" % SUDO

RSYNC_CMD = "%s /usr/bin/rsync" % SUDO

----------------------------------------------------------------

MySQL functions

def mysqlconnect(): dbconn = MySQLdb.connect (host = MYSQLHOST, port = MYSQLPORT, user = MYSQLUSER, passwd = MYSQLPASSWD, db = MYSQLDB) return dbconn

def mysqllocktables(dbconn): sqlcmd = "FLUSH TABLES WITH READ LOCK"

print "Locking tables: %s" % sqlcmd

cursor = dbconn.cursor()
cursor.execute(sqlcmd)
cursor.close()

def mysqlunlocktables(dbconn): sqlcmd = "UNLOCK TABLES"

print "Unlocking tables: %s" % sqlcmd

cursor = dbconn.cursor()
cursor.execute(sqlcmd)
cursor.close()

----------------------------------------------------------------

LVM operations

class FailedLvmOperation(Exception): pass

Get the fs type with a common shell script

def getfstype(fspath): p = os.popen('mount | grep $(df %s |grep /dev |'\ 'cut -f 1 -d " ") | cut -f 3,5 -d " "' % fspath) (fsmountpoint, fstype) = p.readline().strip().split(' ') p.close() return (fsmountpoint, fstype)

def lvmcreatesnapshot(): # XFS filesystem supports freezing. That is convenient before the snapshot (fsmountpoint, fstype) = getfstype(DATAFILESPATH) if fstype == 'xfs': print "Freezing '%s'" % fsmountpoint os.system('%s -f %s' % (XFSFREEZECMD, fs_mountpoint))

newlv_name = "%s_backup_%ilv" % \
                (DATA_FILES_LV.split('/')[-1], os.getpid())
cmdline = "%s --snapshot %s -L%s --name %s" % \
    (LVCREATE_CMD, DATA_FILES_LV, SNAPSHOT_SIZE, newlv_name)

print "Creating the snapshot backup LV '%s' from '%s'" % \
        (newlv_name, DATA_FILES_LV)
print " # %s" % cmdline

ret = os.system(cmdline)

# Always unfreeze!!
if fs_type == 'xfs':
    print "Unfreezing '%s'" % fs_mountpoint
    os.system('%s -u %s' % (XFS_FREEZE_CMD, fs_mountpoint))

if ret != 0: raise FailedLvmOperation

# Return the path to the device
return '/'.join(DATA_FILES_LV.split('/')[:-1]+[newlv_name])

def lvmremovesnapshot(lvname): cmdline = "%s -f %s" % \ (LVREMOVECMD, lv_name)

print "Removing the snapshot backup LV '%s'" % lv_name
print " # %s" % cmdline

ret = os.system(cmdline)
if ret != 0:
    raise FailedLvmOperation

----------------------------------------------------------------

Mount the filesystem

class FailedMountOperation(Exception): pass

def mountsnapshot(lvname): # XFS filesystem needs nouuid option to mount snapshots (fsmountpoint, fstype) = getfstype(DATAFILESPATH) if fstype == 'xfs': cmdline = "%s -o nouuid %s %s" % \ (MOUNTCMD, lvname, SNAPSHOTMOUNTPOINT) else: cmdline = "%s %s %s" % (MOUNTCMD, lvname, SNAPSHOT_MOUNTPOINT)

print "Mounting the snapshot backup LV '%s' on '%s'" % \
        (lv_name, SNAPSHOT_MOUNTPOINT)
print " # %s" % cmdline

ret = os.system(cmdline)
if ret != 0:
    raise FailedMountOperation

def umountsnapshot(lvname): cmdline = "%s %s" % (UMOUNTCMD, SNAPSHOTMOUNTPOINT)

print "Unmounting the snapshot backup LV '%s' from '%s'" % \
        (lv_name, SNAPSHOT_MOUNTPOINT)
print " # %s" % cmdline

ret = os.system(cmdline)
if ret != 0:
    raise FailedMountOperation

----------------------------------------------------------------

Perform the backup process. For instance, an rsync

class FailedBackupOperation(Exception): pass

def dobackup(): cmdline = "%s --progress -av %s/ %s" % \ (RSYNCCMD, DATAFILESPATH, BACKUP_DESTINATION)

print "Executing the backup"
print " # %s" % cmdline

ret = os.system(cmdline)
if ret != 0:
    raise FailedBackupOperation

def main(): dbconn = mysqlconnect() mysqllock_tables(dbconn)

start_time = datetime.now()
# Critical, tables are locked!
snapshotlv = ''
try:
    snapshotlv = lvm_create_snapshot()
except:
    print "Backup failed."
    raise
finally:
    mysql_unlock_tables(dbconn)
    dbconn.close()
    print "Tables had been locked for %s" % str(datetime.now()-start_time)

try:
    mount_snapshot(snapshotlv)
    do_backup()
    umount_snapshot(snapshotlv)
    lvm_remove_snapshot(snapshotlv)
except:
    print "Backup failed. Snapshot LV '%s' still exists. " % snapshotlv
    raise

print "Backup completed. Elapsed time %s" % str(datetime.now()-start_time)

if name == 'main': main() [/sourcecode]