Man page for mk-parallel-restore

August 24, 2007 – 5:36 pm

MK-PARALLEL-RESTORE


Section: User Contributed Perl Documentation (1)
Updated: 2008-06-01
Index
Return to Main Contents

 

NAME

mk-parallel-restore - Load files into MySQL in parallel.
 

SYNOPSIS


mk-parallel-restore /path/to/files
mk-parallel-restore –tab /path/to/files

 

DESCRIPTION


mk-parallel-restore is a way to load SQL or delimited-file dumps into MySQL
in parallel at high speed. It is especially designed for restoring files
dumped by mk-parallel-dump. It automatically
detects whether a file contains SQL or delimited data from the filename
extension, and either shells out to "mysql" or executes "LOAD DATA INFILE"
with the file. On UNIX-like systems, it will even make a FIFO to decompress
gzipped files for "LOAD DATA INFILE".

By default it discovers all files in the directory you specify on the command
line. It uses the file’s parent directory as the database name and the file’s
name (up to the first dot) as the table name. It can deal with files named
like the following:


dir/tbl.sql
dir/tbl.txt
dir/tbl.csv
dir/tbl.trg

It is also happy with files that look like this, where "EXT" is one of the
extensions just listed.


dir/tbl.EXT.000
dir/tbl.EXT.000.gz

By default, it loads "SQL" files first, if they exist, then loads "CSV" or
"TXT" files next, in order of the numbers in the filename extension as just
shown. This makes it easy for you to reload a table’s definition followed by
its data, in case you dumped them into separate files (as happens with
"mysqldump"’s "–tab" option). It loads "TRG" files, which create
triggers, last. Creating the triggers before loading data might keep the data
from being restored correctly. Files that are named 00_views.sql are loaded
even later, after all the parallel restores are finished; dependencies among
views and tables make them hard to restore one at a time. See
mk-parallel-dump for details on how data is dumped.

Exit status is 0 if everything went well, 1 if any files failed, and any
other value indicates an internal error.
 

OUTPUT


Output depends on verbosity. When “–test” is given, output includes
commands that would be executed.

When “–verbose” is 0, there is normally no output unless there’s an error.

When “–verbose” is 1, there is one line of output for the entire job,
showing how many tables were processed, how many files were loaded with what
status, how much time elapsed, and how much time the parallel load jobs added
up to. If any files were skipped, the filenames are printed to the output.

When “–verbose” is 2, there’s one line of output per table, showing extra
data such as how many threads were running when each table finished loading:


DATABASE TABLE FILES TIME STATUS THREADS
sakila language 2 0.07 0 2
sakila film_actor 2 0.07 0 2
sakila actor 2 0.06 0 2
sakila payment 2 0.07 0 2
sakila transport_backup 2 0.05 0 2
sakila country 2 0.08 0 2
sakila film 2 0.05 0 2
sakila rental 2 0.07 0 2

 

SPEED OF PARALLEL LOADING


User-contributed benchmarks are welcome. See
<http://www.paragon-cs.com/wordpress/?p=52> for one user’s experiences.
 

DOWNLOADING


You can download Maatkit from the Sourceforge website at
<http://sourceforge.net/projects/maatkit>, or you can get any of the tools
easily with a command like the following:


wget http://www.maatkit.org/get/toolname
or
wget http://www.maatkit.org/trunk/toolname

Where "toolname" can be replaced with the name (or fragment of a name) of any
of the Maatkit tools. Once downloaded, they’re ready to run; no installation is
needed. The first URL gets the latest released version of the tool, and the
second gets the latest trunk code from Subversion.
 

OPTIONS



–askpass


Prompt for a password when connecting to MySQL.
–basedir


type: string

Directory where FIFO files will be created.

–bulkinsbufsize


type: int

Set bulk_insert_buffer_size before each "LOAD DATA INFILE".

Has no effect without “–tab”.

–charset


short form: -A; type: string; default: BINARY

Sets the connection, database, and "LOAD DATA INFILE" character set.

The default is "BINARY", which is the safest value to use for "LOAD DATA
INFILE"
. Has no effect without “–tab”.

–commit


Commit after each load via "LOAD DATA INFILE".
–createdb


Create databases if they don’t exist.
–csv


Files are in CSV format (implies “–tab”).

Changes “–tab” options so the following "LOAD DATA INFILE" statement is used:


LOAD DATA INFILE <filename> INTO TABLE <table>
FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘\"’
LINES TERMINATED BY ‘\n’;

–database


short form: -D; type: string

Load all files into this database.

Overrides the database which is normally specified by the directory in which the
files live. Does not specify a default database for the connection.

–databases


short form: -d; type: hash

Restore only this comma-separated list of databases.

–dbregex


type: string

Restore only databases whose names match this regex.

–defaults-file


short form: -F; type: string

Only read mysql options from the given file. You must give an absolute
pathname.

–disablekeys


negatable: yes

Execute "ALTER TABLE DISABLE KEYS" before each table.

–fifo


negatable: yes; default: yes

Stream files into a FIFO for –tab.

Load compressed tab-separated files by piping them into a FIFO and using the
FIFO with "LOAD DATA INFILE", instead of by decompressing the files on disk.
Sets “–umask” to 0.

–host


short form: -h; type: string

Connect to host.

–ignore


short form: -i

Adds the "IGNORE" modifier to "LOAD DATA INFILE".

–ignoredb


short form: -g; type: Hash

Ignore this comma-separated list of databases.

–ignoretbl


short form: -n; type: Hash

Ignore this comma-separated list of table names.

Table names may be qualified with the database name.

–local


short form: -L

Uses the "LOCAL" option to "LOAD DATA INFILE".

If you enable this option, the files are read locally by the client library, not
by the server.

–locktables


negatable: yes

Lock tables before "LOAD DATA INFILE".

–noautovalon0


negatable: yes

Set SQL "NO_AUTO_VALUE_ON_ZERO" before "LOAD DATA INFILE".

–nobinlog


negatable: yes

Set "SQL_LOG_BIN=0" before "LOAD DATA INFILE".

This prevents large loads from being logged to the server’s binary log.

–noforeignkeys


negatable: yes

Set "FOREIGN_KEY_CHECKS=0" before "LOAD DATA INFILE".

–nouniquechecks


negatable: yes

Set "UNIQUE_CHECKS=0" before "LOAD DATA INFILE".

–numthread


short form: -m; type: int

Specifies the number of parallel processes to run.

The default is 2 (this is mk-parallel-restore after all — 1 is not parallel).
On GNU/Linux machines, the default is the number of times ‘processor’ appears in
/proc/cpuinfo. On Windows, the default is read from the environment. In any
case, the default is at least 2, even when there’s only a single processor.

–password


short form: -p; type: string

Password to use when connecting.

–port


short form: -P; type: int

Port number to use for connection.

–quiet


short form: -q

Sets “–verbose” to 0.

–replace


short form: -r

Adds the "REPLACE" modifier to "LOAD DATA INFILE".

–setvars


type: string; default: wait_timeout=10000

Set these MySQL variables.

Specify any variables you want to be set immediately after connecting to MySQL.
These will be included in a "SET" command.

–socket


short form: -S; type: string

Socket file to use for connection.

–tab


short form: -T

Load tab-separated files with "LOAD DATA INFILE".

This is similar to what "mysqlimport" does, but more flexible.
Enables the following options, unless they are specifically disabled:
“–commit”, “–disablekeys”, “–noautovalon0”, “–nobinlog”,
“–nouniquechecks”, “–noforeignkeys”.

–tables


short form: -t; type: hash

Restore only this comma-separated list of table names.

Table names may be qualified with the database name.

–tblregex


type: string

Restore only tables whose names match this regex.

–test


Print commands instead of executing them.
–truncate


Run "TRUNCATE TABLE" before "LOAD DATA INFILE".

This will delete all rows from a table before loading the first tab-delimited
file into it.

–umask


type: string

Set the program’s "umask" to this octal value.

This is useful when you want created files (such as FIFO files) to be readable
or writable by other users (for example, the MySQL server itself).

–user


short form: -u; type: string

User for login if not current user.

–verbose


short form: -v; cumulative: yes; default: 1

Verbosity; can specify multiple times.

Repeatedly specifying it increments the verbosity. Default is 1 if not
specified. See “OUTPUT”.

–wait


short form: -w; type: time; default: 5m

Wait limit when server is down.

If the MySQL server crashes during loading, waits until the server comes back
and then continues with the rest of the files. "mk-parallel-restore" will
check the server every second until this time is exhausted, at which point it
will give up and exit.


 

ENVIRONMENT


The environment variable "MKDEBUG" enables verbose debugging output in all of
the Maatkit tools:


MKDEBUG=1 mk-….

 

SYSTEM REQUIREMENTS


You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.
 

BUGS


Please use the Sourceforge bug tracker, forums, and mailing lists to request
support or report bugs: <http://sourceforge.net/projects/maatkit/>.

Please include the complete command-line used to reproduce the problem you are
seeing, the version of all MySQL servers involved, the complete output of the
tool when run with “–version”, and if possible, debugging output produced by
running with the "MKDEBUG=1" environment variable.
 

COPYRIGHT, LICENSE AND WARRANTY


This program is copyright (c) 2007 Baron Schwartz. Feedback and improvements
are welcome.

THIS PROGRAM IS PROVIDEDAS ISAND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
systems, you can issue `man perlgpl’ or `man perlartistic’ to read these
licenses.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA.
 

AUTHOR


Baron Schwartz.
 

SEE ALSO


See also mk-parallel-dump.
 

VERSION


This manual page documents Ver 1.0.7 Distrib 1972 $Revision: 1970 $.



 

Index



NAME

SYNOPSIS

DESCRIPTION

OUTPUT

SPEED OF PARALLEL LOADING

DOWNLOADING

OPTIONS

ENVIRONMENT

SYSTEM REQUIREMENTS

BUGS

COPYRIGHT, LICENSE AND WARRANTY

AUTHOR

SEE ALSO

VERSION



Post a Comment