Linux: How do I find files in Linux (using locate or find)? [CentOS, RHEL, Redhat, Fedora]

March 24, 2008 – 2:03 am

There are few ways to find files in Linux. Each have their own advantages and disadvantages. Some are distribution specific and some are not installed by default. Let us start with fastest way to find files using locate and secondly we will talk about using find: a very powerful and utility which is available in almost all distributions to help users find files in Linux.

  1. locate
    • Usage: locate <what you are looking for>
    • Example: locate whois
    • Advantages: Looks through database therefore is really fast
    • Disadvantages: database has to be updated in order to find new files.
  2. find
    • Usage: find <path> –name “<what you are looking for>”
    • Example: find / –name “whois”
    • Advantages: No database to use; lots of advance features
    • Disadvantages: Can be slow; requires more typing even for simple search

Both of the above methods used for finding files in Linux can have many parameters. You can read the man page of locate or man page for find to read more ways to running the commands to find files.

Man page for ???

September 26, 2007 – 4:38 pm

man


Section: User Commands (1)
Updated: September 19, 2005
Index
Return to Main Contents

 

NAME

man - format and display the on-line manual pages
 

SYNOPSIS

man

[-acdfFhkKtwW]

[--path]

[-m

system]

[-p

string]

[-C

config_file]

[-M

pathlist]

[-P

pager]

[-B

browser]

[-H

htmlpager]

[-S

section_list]

[section]

name …


 

DESCRIPTION

man

formats and displays the on-line manual pages. If you specify
section,

man

only looks in that section of the manual.
name

is normally the name of the manual page, which is typically the name
of a command, function, or file.
However, if
name

contains a slash
(/)

then
man

interprets it as a file specification, so that you can do
man ./foo.5

or even
man /cd/foo/bar.1.gz.

See below for a description of where
man

looks for the manual page files.

 
 

OPTIONS


-C config_file


Specify the configuration file to use; the default is
/etc/man.config.

(See
man.config(5).)

-M path


Specify the list of directories to search for man pages.
Separate the directories with colons. An empty list is the same as
not specifying
-M

at all. See
SEARCH PATH FOR MANUAL PAGES.

-P pager


Specify which pager to use.
This option overrides the
MANPAGER

environment variable, which in turn overrides the
PAGER

variable. By default,
man

uses
/usr/bin/less -is.

-B


Specify which browser to use on HTML files.
This option overrides the
BROWSER

environment variable. By default,
man

uses
/usr/bin/lynx,

-H


Specify a command that renders HTML files as text.
This option overrides the
HTMLPAGER

environment variable. By default,
man

uses
/usr/bin/lynx-dump,

-S section_list


List is a colon separated list of manual sections to search.
This option overrides the
MANSECT

environment variable.

-a


By default,
man

will exit after displaying the first manual page it
finds. Using this option forces
man

to display all the manual pages that match
name,

not just the first.

-c


Reformat the source man page, even when an up-to-date cat page exists.
This can be meaningful if the cat page was formatted for a screen
with a different number of columns, or if the preformatted page
is corrupted.
-d


Don’t actually display the man pages, but do print gobs of debugging
information.
-D


Both display and print debugging info.
-f


Equivalent to
whatis.

-F or –preformat


Format only - do not display.
-h


Print a help message and exit.
-k


Equivalent to
apropos.

-K


Search for the specified string in *all* man pages. Warning: this is
probably very slow! It helps to specify a section.
(Just to give a rough idea, on my machine this takes about a minute
per 500 man pages.)
-m system


Specify an alternate set of man pages to search based on the system
name given.
-p string


Specify the sequence of preprocessors to run before
nroff

or
troff.

Not all installations will have a full set of preprocessors.
Some of the preprocessors and the letters used to designate them are:
eqn (e), grap (g), pic (p), tbl (t), vgrind (v), refer (r).
This option overrides the
MANROFFSEQ

environment variable.

-t


Use
/usr/bin/groff -Tps -mandoc

to format the manual page, passing the output to
stdout.

The default output format of
/usr/bin/groff -Tps -mandoc

is Postscript, refer to the manual page of
/usr/bin/groff -Tps -mandoc

for ways to pick an alternate format.


Depending on the selected format and the availability of printing
devices, the output
may need to be passed through some filter or another before being
printed.


-w or –path


Don’t actually display the man pages, but do print the location(s) of
the files that would be formatted or displayed. If no argument is given:
display (on stdout) the list of directories that is searched by
man

for man pages. If
manpath

is a link to man, then "manpath" is equivalent to "man –path".

-W


Like -w, but print file names one per line, without additional information.
This is useful in shell commands like

man -aW man | xargs ls -l



 

CAT PAGES

Man will try to save the formatted man pages, in order to save
formatting time the next time these pages are needed.
Traditionally, formatted versions of pages in DIR/manX are
saved in DIR/catX, but other mappings from man dir to cat dir
can be specified in
/etc/man.config.

No cat pages are saved when the required cat directory does not exist.
No cat pages are saved when they are formatted for a line length
different from 80.
No cat pages are saved when man.config contains the line NOCACHE.

It is possible to make
man

suid to a user man. Then, if a cat directory
has owner man and mode 0755 (only writable by man), and the cat files
have owner man and mode 0644 or 0444 (only writable by man, or not
writable at all), no ordinary user can change the cat pages or put
other files in the cat directory. If
man

is not made suid, then a cat directory should have mode 0777
if all users should be able to leave cat pages there.

The option
-c

forces reformatting a page, even if a recent cat page exists.


 

HTML PAGES

Man will find HTML pages if they live in directories named as

expected to be ".html", thus a valid name for an HTML version of the
ls(1)

man page would be
/usr/share/man/htmlman1/ls.1.html.


 

SEARCH PATH FOR MANUAL PAGES

man

uses a sophisticated method of finding manual page files, based on the
invocation options and environment variables, the
/etc/man.config

configuration file, and some built in conventions and heuristics.

First of all, when the
name

argument to
man

contains a slash
(/),

man

assumes it is a file specification itself,
and there is no searching involved.

But in the normal case where
name

doesn’t contain a slash,
man

searches a variety of directories for a file that could be a manual page
for the topic named.

If you specify the
-M pathlist

option,
pathlist

is a colon-separated list of the directories that
man

searches.

If you don’t specify
-M

but set the
MANPATH

environment variable, the value of that variable is the list of the
directories that
man

searches.

If you don’t specify an explicit path list with
-M

or
MANPATH,

man

develops its own path list based on the contents of the configuration
file
/etc/man.config.

The
MANPATH

statements in the configuration file identify particular directories to
include in the search path.

Furthermore, the
MANPATH_MAP

statements add to the search path depending on your command search path
(i.e. your
PATH

environment variable). For each directory that may be in the command
search path, a
MANPATH_MAP

statement specifies a directory that should be added to the search
path for manual page files.
man

looks at the
PATH

variable and adds the corresponding directories to the manual page
file search path. Thus, with the proper use of
MANPATH_MAP,

when you issue the command
man xyz,

you get a manual page for the program that would run if you issued the
command
xyz.

In addition, for each directory in the command search path (we’ll call
it a "command directory") for which you do
not

have a
MANPATH_MAP

statement,
man

automatically looks for a manual page directory "nearby"
namely as a subdirectory in the command directory itself or
in the parent directory of the command directory.

You can disable the automatic "nearby" searches by including a
NOAUTOPATH

statement in
/etc/man.config.

In each directory in the search path as described above,
man

searches for a file named
topic.section,

with an optional suffix on the section number and
possibly a compression suffix.
If it doesn’t find such a file, it then looks in any subdirectories
named
manN

or
catN

where
N

is the manual section number.
If the file is in a
catN

subdirectory,
man

assumes it is a formatted manual page file (cat page). Otherwise,
man

assumes it is unformatted. In either case, if the filename has a
known compression suffix (like
.gz),

man

assumes it is gzipped.

If you want to see where (or if)
man

would find the manual page for a particular topic, use the
–path (-w)

option.


 

ENVIRONMENT


MANPATH


If
MANPATH

is set,
man

uses it as the path to search for manual page files. It overrides the
configuration file and the automatic search path, but is overridden by
the
-M

invocation option. See
SEARCH PATH FOR MANUAL PAGES.

MANPL


If
MANPL

is set, its value is used as the display page length.
Otherwise, the entire man page will occupy one (long) page.

MANROFFSEQ


If
MANROFFSEQ

is set, its value is used to determine the set of preprocessors run
before running
nroff

or
troff.

By default, pages are passed through
the tbl preprocessor before
nroff.

MANSECT


If
MANSECT

is set, its value is used to determine which manual sections to search.

MANWIDTH


If
MANWIDTH

is set, its value is used as the width manpages should be displayed.
Otherwise the pages may be displayed over the whole width of your
screen.

MANPAGER


If
MANPAGER

is set, its value is used as the name of the program to use to display
the man page. If not, then
PAGER

is used. If that has no value either,
/usr/bin/less -is

is used.

BROWSER


The name of a browser to use for displaying HTML manual pages. If
it is not set, /usr/bin/lynx is used.
HTMLPAGER


The command to use for rendering HTML manual pages as text. If
it is not set, /usr/bin/lynx -dump is used.
LANG


If
LANG

is set, its value defines the name of the subdirectory where man
first looks for man pages. Thus, the command `LANG=dk man 1 foo’
will cause man to look for the foo man page in …/dk/man1/foo.1,
and if it cannot find such a file, then in ./man-pages/foo.1,
where … is a directory on the search path.

NLSPATH, LC_MESSAGES, LANG


The environment variables
NLSPATH

and
LC_MESSAGES

(or
LANG

when the latter does not exist)
play a role in locating the message catalog.
(But the English messages are compiled in, and for English no catalog
is required.)
Note that programs like
col(1)

called by man also use e.g. LC_CTYPE.

PATH


PATH

helps determine the search path for manual page files. See
SEARCH PATH FOR MANUAL PAGES.

SYSTEM


SYSTEM

is used to get the default alternate system name (for use
with the
-m

option).


 

BUGS

The
-t

option only works if a troff-like program is installed.

If you see blinking \255 or <AD> instead of hyphens,
put `LESSCHARSET=latin1′ in your environment.
 

TIPS

If you add the line



  (global-set-key [(f1)] (lambda () (interactive) (manual-entry (current-word))))


to your
.emacs

file, then hitting F1 will give you the man page for the library call
at the current cursor position.

To get a plain text version of a man page, without backspaces
and underscores, try



  # man foo | col -b > foo.mantxt
 

AUTHOR

John W. Eaton was the original author of
man.

Zeyd M. Ben-Halim released man 1.2, and Andries Brouwer followed up with versions 1.3 thru 1.5p.
Federico Lucifredi <flucifredi@acm.org> is the current maintainer.
 

SEE ALSO

apropos(1), whatis(1), less(1), groff(1), man.config(5).



 

Index



NAME

SYNOPSIS

DESCRIPTION

OPTIONS

CAT PAGES

HTML PAGES

SEARCH PATH FOR MANUAL PAGES

ENVIRONMENT

BUGS

TIPS

AUTHOR

SEE ALSO



Man page for read???

September 26, 2007 – 10:20 am

READELF


Section: GNU Development Tools (1)
Updated: 2006-03-31
Index
Return to Main Contents

 

NAME

readelf - Displays information about ELF files.
 

SYNOPSIS


readelf [-a|--all]

        [-h|--file-header]

        [-l|--program-headers|--segments]

        [-S|--section-headers|--sections]

        [-g|--section-groups]

        [-t|--section-details]

        [-e|--headers]

        [-s|--syms|--symbols]

        [-n|--notes]

        [-r|--relocs]

        [-u|--unwind]

        [-d|--dynamic]

        [-V|--version-info]

        [-A|--arch-specific]

        [-D|--use-dynamic]

        [-x <number or name>|--hex-dump=<number or name>]

        [-w[liaprmfFsoR]|

         –debug-dump[=line,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=loc,=Ranges]]

        [-I|-histogram]

        [-v|--version]

        [-W|--wide]

        [-H|--help]

        elffile
 

DESCRIPTION


readelf displays information about one or more ELF format object
files. The options control what particular information to display.

elffile… are the object files to be examined. 32-bit and
64-bit ELF files are supported, as are archives containing ELF files.

This program performs a similar function to objdump but it
goes into more detail and it exists independently of the BFD
library, so if there is a bug in BFD then readelf will not be
affected.
 

OPTIONS


The long and short forms of options, shown here as alternatives, are
equivalent. At least one option besides -v or -H must be
given.


-a

–all

Equivalent to specifiying –file-header,
–program-headers, –sections, –symbols,
–relocs, –dynamic, –notes and
–version-info.

-h

–file-header

Displays the information contained in the ELF header at the start of the
file.

-l

–program-headers


–segments

Displays the information contained in the file’s segment headers, if it
has any.

-S

–sections


–section-headers

Displays the information contained in the file’s section headers, if it
has any.

-g

–section-groups

Displays the information contained in the file’s section groups, if it
has any.

-t

–section-details

Displays the detailed section information. Implies -S.

-s

–symbols


–syms

Displays the entries in symbol table section of the file, if it has one.

-e

–headers

Display all the headers in the file. Equivalent to -h -l -S.

-n

–notes

Displays the contents of the NOTE segments and/or sections, if any.

-r

–relocs

Displays the contents of the file’s relocation section, if it has one.

-u

–unwind

Displays the contents of the file’s unwind section, if it has one. Only
the unwind sections for IA64 ELF files are currently supported.

-d

–dynamic

Displays the contents of the file’s dynamic section, if it has one.

-V

–version-info

Displays the contents of the version sections in the file, it they
exist.

-A

–arch-specific

Displays architecture-specific information in the file, if there
is any.

-D

–use-dynamic

When displaying symbols, this option makes readelf use the
symbol table in the file’s dynamic section, rather than the one in the
symbols section.

-x <number or name>

–hex-dump=<number or name>

Displays the contents of the indicated section as a hexadecimal dump.
A number identifies a particular section by index in the section table;
any other string identifies all sections with that name in the object file.

-w[liaprmfFsoR]

–debug-dump[=line,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=loc,=Ranges]

Displays the contents of the debug sections in the file, if any are
present. If one of the optional letters or words follows the switch
then only data found in those specific sections will be dumped.

-I

–histogram

Display a histogram of bucket list lengths when displaying the contents
of the symbol tables.

-v

–version

Display the version number of readelf.

-W

–wide

Don’t break output lines to fit into 80 columns. By default
readelf breaks section header and segment listing lines for
64-bit ELF files, so that they fit into 80 columns. This option causes
readelf to print each section header resp. each segment one a
single line, which is far more readable on terminals wider than 80 columns.

-H

–help

Display the command line options understood by readelf.

@file


Read command-line options from file. The options read are
inserted in place of the original @file option. If file
does not exist, or cannot be read, then the option will be treated
literally, and not removed.

Options in file are separated by whitespace. A whitespace
character may be included in an option by surrounding the entire
option in either single or double quotes. Any character (including a
backslash) may be included by prefixing the character to be included
with a backslash. The file may itself contain additional
@file options; any such options will be processed recursively.


 

SEE ALSO


objdump(1), and the Info entries for binutils.
 

COPYRIGHT


Copyright (c) 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc.

Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.1
or any later version published by the Free Software Foundation;
with no Invariant Sections, with no Front-Cover Texts, and with no
Back-Cover Texts. A copy of the license is included in the
section entitled “GNU Free Documentation License”.



 

Index



NAME

SYNOPSIS

DESCRIPTION

OPTIONS

SEE ALSO

COPYRIGHT



Man page for mk-parallel-dump

August 24, 2007 – 5:38 pm

MK-PARALLEL-DUMP


Section: User Contributed Perl Documentation (1)
Updated: 2008-06-01
Index
Return to Main Contents

 

NAME

mk-parallel-dump - Dump sets of MySQL tables in parallel.
 

SYNOPSIS


mk-parallel-dump
mk-parallel-dump –tab –basedir /path/to/backups/
mk-parallel-dump –sets order,profile,session –settable meta.backupset

 

DESCRIPTION


mk-parallel-dump connects to a MySQL server, finds database and table names,
and dumps them in parallel for speed. It can be used in several pre-packaged
ways, or as a generic wrapper to call some program in parallel, passing it
parameters for each table. It supports backup sets and dumping only tables that
have changed since the last dump.

To dump all tables to gzipped files in the current directory, each database with
its own directory, with a global read lock, flushing and recording binary log
positions, each table in a single file:


mk-parallel-dump

To dump tables elsewhere:


mk-parallel-dump –basedir /path/to/elsewhere

To dump to tab-separated files with "SELECT INTO OUTFILE", each table with
separate data and SQL files:


mk-parallel-dump –tab

To dump one or more backup sets (see “BACKUP SETS”):


mk-parallel-dump –sets set1,set2,set3 –settable meta.backupset

To “write your own command line,” use "–" to indicate where the arguments for
mk-parallel-dump stop and where the arguments for "mysqldump" (or any other
program) begin. The following example shows "mysqldump", and aside from
simpler options to "mysqldump", is basically what happens when you specify no
arguments at all:


mk-parallel-dump — mysqldump –skip-lock-tables ‘%D’ ‘%N’ \
\| gzip –fast -c - \> ‘%D.%N.gz’

The "%" modifiers are macros (see “MACROS”). The "–skip-lock-tables"
argument is very important in that last example, because otherwise both
mk-parallel-dump and "mysqldump" will lock tables, so "mysqldump" will hang,
waiting for the locks. Notice the shell metacharacters "|" and ">" are
escaped so the shell won’t interpret them, and they’ll get passed through to the
generated command-line.

There’s no reason you can’t use mk-parallel-dump to do other tasks in
parallel, such as "OPTIMIZE TABLE":


mk-parallel-dump –noflushlock –nolocktables — mysqlcheck –optimize ‘%D’ ‘%N’

When you use built-in defaults, mk-parallel-dump will relay these arguments
on to every forked copy of "mysqldump": “–defaults-file”, “–host”,
“–port”, “–socket”, “–user”, “–password”. If you write your own
command-line, you will need to specify them manually.

If you specify the “–tab” option, mk-parallel-dump creates separate files
that hold views and triggers, so they can be restored correctly (this is not
currently possible with the "mysqldump" from MySQL AB, which will restore
triggers before restoring data). Otherwise it does not back up your entire
database; it dumps tables and data only. It does not dump view definitions
or stored routines. However, if you dump the "mysql" database, you’ll be
dumping the stored routines anyway.

Exit status is 0 if everything went well, 1 if any chunks failed, and any
other value indicates an internal error.

mk-parallel-dump doesn’t clean out any destination directories before
dumping into them. You can move away the old destination, then remove it
after a successful dump, with a shell script like the following:


#!/bin/sh
CNT=`ls | grep -c old`;
if [ -d default ]; then mv default default.old.$CNT;
mk-parallel-dump
if [ $? != 0 ]
then
echo "There were errors, not purging old sets."
else
echo "No errors during dump, purging old sets."
rm -rf default.old.*
fi

 

BACKUP SETS


Backup sets are groups of logically related tables you want to backup together.
You specify a set by inserting the table names into a table in the MySQL server
from which you’re dumping, and then naming it in the “–sets” option.
mk-parallel-dump always works a set at a time; if you don’t specify a set, it
auto-discovers tables, filters them with the various command-line options
(“–databases”, etc) and considers them the default set.

The table that stores backup sets should have at least these columns: setname,
priority, db, tbl. The following is a suggested table structure:


CREATE TABLE backupset (
setname CHAR(10) NOT NULL,
priority INT NOT NULL DEFAULT 0,
db CHAR(64) NOT NULL,
tbl CHAR(64) NOT NULL,
ts TIMESTAMP NOT NULL,
PRIMARY KEY(setname, db, tbl),
KEY(setname, priority, db, tbl)
);

Entries are ordered by priority, db, and tbl. Priority 0 tables are dumped
first, not last. If it looks like tables are dumped in the wrong order, it’s
probably because they’re being dumped asynchronously. The output is printed
when the dump finishes, not when it starts.

If you specify “–age”, mk-parallel-dump expects the "ts" column to
exist, and will update the column to the current date and time when it
successfully dumps a table.

Don’t use "default" as a set name. It is used when you don’t specify any
sets and when you want all tables not explicitly assigned to a set to be
dumped (see “–defaultset”).

Set names may contain only lowercase letters, numbers, and underscores.
 

CHUNKS


mk-parallel-dump can break your tables into chunks when dumping, and put
approximately the amount of data you specify into each chunk. This is useful to
avoid enormous files for restoration, which can not only take a long time but
may be a lot of extra work for transactional storage engines like InnoDB. A
huge file can create a huge rollback segment in your tablespace.

To dump in chunks, specify the “–chunksize” option. This option is an
integer with an optional suffix. Without the suffix, it’s the number of rows
you want in each chunk. With the suffix, it’s the approximate size of the data.

mk-parallel-dump tries to use index statistics to calculate where the
boundaries between chunks should be. If the values are not evenly distributed,
some chunks can have a lot of rows, and others may have very few or even none.
Some chunks can exceed the size you want.

When you specify the size with a suffix, the allowed suffixes are k, M and G,
for kibibytes, mebibytes, and gibibytes, respectively. mk-parallel-dump
doesn’t know anything about data size. It asks MySQL (via "SHOW TABLE STATUS")
how long an average row is in the table, and converts your option to a number
of rows.

Not all tables can be broken into chunks. mk-parallel-dump looks for an
index whose leading column is numeric (integers, real numbers, and date and time
types). It prefers the primary key if its first column is chunk-able.
Otherwise it chooses the first chunk-able column in the table.

Generating a series of "WHERE" clauses to divide a table into evenly-sized
chunks is difficult. If you have any ideas on how to improve the algorithm,
please write to the author (see “BUGS”).
 

MACROS


mk-parallel-dump can insert "%" variables into arguments. The available macros
are as follows:


MACRO MEANING
===== =================
%S The backup set
%D The database name
%N The table name
%C The chunk number
%W The WHERE clause

You can place a number between the "%" and the letter. The macro replacement
then assumes it’s a digit and pads it with leading zeroes (in practice, this is
only useful for %C).
 

OUTPUT


Output depends on verbosity. When “–test” is given, output includes
commands that would be executed.

When “–verbose” is 0, there is normally no output unless there’s an error.

When “–verbose” is 1, there is one line of output for each backup set,
showing the set, how many tables and chunks were dumped with what status, how
much time elapsed, and how much time the parallel dump jobs added up to. A
final line shows sums for all sets, unless there is only one set.

When “–verbose” is 2, there is also one line of output for each table.
Each line is printed when a forked “child” process ends and is removed from
the list of children. The output shows the backup set, database, table,
seconds spent dumping, the exit status of the forked dump process, and number
of current processes (including the one just reaped; so this typically shows
“how many are running in parallel”). A status of 0 indicates success:


SET DATABASE TABLE TIME STATUS THREADS
default mysql db 0 0 4
default mysql columns_priv 0 0 4
default mysql help_category 0 0 3

 

SPEED OF PARALLEL DUMPS


How much faster is it to dump in parallel? That depends on your hardware and
data. You may be able dump files twice as fast, or more if you have lots of
disks and CPUs. Here are some user-contributed figures.

The following table is for a 3.6GHz Xeon machine with 4 processors and a RAID-10
array of 15k disks, directly attached to the server with a fibre channel. Most
of the space is in one huge table that wasn’t dumped in parallel:


COMMAND SIZE TIME
————————– —– —-
mk-parallel-dump 1.4GB 269
mysqldump 1.4GB 345

On the same machine, in a database with lots of roughly equal-sized tables:


COMMAND SIZE TIME
————————– —– —-
mk-parallel-dump 117MB 7
mysqldump 117MB 37

It doesn’t always work that well. A dual 2.80GHz Xeon server with a RAID-5
array of three 7200RPM SATA disk drives running MySQL 5.0.38 on GNU/Linux
achieved the following dump times:


COMMAND SIZE TIME
————————– —– —-
mk-parallel-dump 3.0GB 2596
mysqldump | gzip –fast 3.0GB 3195

While dumping two threads in parallel, this machine was at an average of 74%
CPU utilization and 12% I/O wait. This machine doesn’t have enough disks and
CPUs to do that many things at once, so it’s not going to speed up much.

Dumping lots of tiny tables by forking of lots of "mysqldump" processes isn’t
usually much faster, because of the overhead of starting "mysqldump",
connecting, inspecting the table, and dumping it. Note that tab-separated
dumps are typically much faster and don’t suffer as much from the effects of
many tiny tables, because they’re not done via "mysqldump".

See also <http://www.paragon-cs.com/wordpress/?p=52> for a test of parallel
dumping and restoring.
 

DOWNLOADING


You can download Maatkit from the Sourceforge website at
<http://sourceforge.net/projects/maatkit>, or you can get any of the tools
easily with a command like the following:


wget http://www.maatkit.org/get/toolname
or
wget http://www.maatkit.org/trunk/toolname

Where "toolname" can be replaced with the name (or fragment of a name) of any
of the Maatkit tools. Once downloaded, they’re ready to run; no installation is
needed. The first URL gets the latest released version of the tool, and the
second gets the latest trunk code from Subversion.
 

OPTIONS


Some options can be disabled by prefixing them with "–no", such as
"–no-gzip".


–age


Specifies how ‘old’ a table must be before mk-parallel-dump will consider
it.

When “–sets” is not specified, mk-parallel-dump uses "SHOW TABLE STATUS"
instead of "SHOW TABLES" to get a list of tables in each database, and compares
the time to the "Update_time" column in the output. If the "Update_time"
column is not "NULL" and is older than the specified interval ago, it will not
be dumped. Thus, it means “dump tables that have changed since X amount of
time” (presumably the last regular backup). This means the table will always be
dumped if it uses InnoDB or another storage engine that doesn’t report the
"Update_time".

When “–sets” is specified, the “–settable” table determines when a table
was last dumped, and the meaning of "–age" reverses; it becomes “dump tables
not dumped in X amount of time.”

–basedir


The directory in which files will be stored. If you use pre-canned options,
such as “–tab”, mk-parallel-dump knows what the eventual filenames will
be, and can place all the files in this directory. It will also create any
parent directories that don’t exist, if needed (see also “–umask”).

The default is the current working directory.

If you write your own command line, mk-parallel-dump cannot know which
arguments in the command line are filenames, and thus doesn’t know the
eventual destination of the dump files. It does not try to create parent
directories in this case.

–binlogpos


Dump binary log positions from both "SHOW MASTER STATUS" and "SHOW SLAVE
STATUS"
, whichever can be retrieved from the server. The data is dumped to a
file named 00_master_data.sql. This is done for each backup set.

The file also contains details of each table dumped, including the WHERE clauses
used to dump it in chunks.

This option is enabled by default.

–charset


Enables character set settings in Perl and MySQL. If the value is "utf8", sets
Perl’s binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to
DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other
value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after
connecting to MySQL.
–chunksize


Specifies that the table should be dumped in segments of approximately the size
given. The syntax is either a plain integer, which is interpreted as a number
of rows per chunk, or an integer with a suffix of G, M, or k, which is
interpreted as the size of the data to be dumped in each chunk. See “CHUNKS
for more details.
–csv


Changes “–tab” options so the dump file is in comma-separated values
(CSV) format. The SELECT INTO OUTFILE statement looks like the following, and
can be re-loaded with the same options:


SELECT * INTO OUTFILE %D.%N.%6C.txt
FIELDS TERMINATED BY ‘,’ OPTIONALLY ENCLOSED BY ‘\"’
LINES TERMINATED BY ‘\n’ FROM %D.%N;

This option implies “–tab”.

–databases


Dump this comma-separated list of databases.
–dbregex


Dump only databases whose names match this Perl regular expression.
–defaultset


When “–sets” is given, this option makes mk-parallel-dump dump a
"default" set consisting of tables not explicitly included in any set.
–defaults-file


Only read default options from the given file. You must give an absolute
pathname.
–flushlock


Lock all tables globally with "FLUSH TABLES WITH READ LOCK". This is enabled
by default, unless you’re dumping sets (see “–sets”). This lock is taken
once, at the beginning of the whole process, and is never released.

If you want to lock only the tables you’re dumping, use “–locktables”.

–flushlog


Execute "FLUSH LOGS" after locking and before dumping master/slave binary log
positions. This is done for each backup set.

This option is NOT enabled by default because it causes the MySQL server to
rotate its error log, potentially overwriting error messages.

–gzip


Compresses files with gzip. This is enabled by default unless your platform is
Win32. By default, this causes the standard SQL dumps to be piped to gzip’s
"STDIN" and the result is redirected to the destination file. If this option
isn’t enabled, by default "mysqldump"’s "–result-file" parameter is used to
direct the dump to the destination file. When using “–tab”, this option
causes gzip to be called separately on each resulting file after it is dumped
(because "SELECT INTO OUTFILE" cannot be directed to a pipe).
–help


Displays a help message.
–host


Connect to host.
–ignoredb


Do not dump this comma-separated list of databases.
–ignoreengine


Do not dump any data for this comma-separated list of storage engines. The
schema file will be dumped as usual.

The default value is "FEDERATED,MRG_MyISAM". This prevents dumping data for
Federated tables and Merge tables.

–ignoretbl


Do not dump this comma-separated list of table names. Table names may be
qualified with the database name.
–locktables


Disables “–flushlock” (unless it was explicitly set) and locks tables with
"LOCK TABLES READ". Enabled by default when “–sets” is specified. The
lock is taken and released with every set of tables dumped.
–losslessfp


Wraps double and float types with a call to "FORMAT()" with 17 digits of
precision. According to the comments in Google’s patches, this will give
lossless dumping and reloading in most cases. (I shamelessly stole this
technique from them. I don’t know enough about floating-point math to have an
opinion).

This works only with “–tab”.

–numthread


Specifies the number of parallel processes to run. The default is 2 (this is
mk-parallel-dump, after all — 1 is not parallel). On GNU/Linux machines,
the default is the number of times ‘processor’ appears in /proc/cpuinfo. On
Windows, the default is read from the environment. In any case, the default is
at least 2, even when there’s only a single processor.
–password


Password to use when connecting.
–port


Port number to use for connection.
–quiet


Sets “–verbose” to 0.
–sets


Dump this comma-separated list of backup sets, in order. Requires
“–settable”. See “BACKUP SETS”. The special "default" set is
reserved; don’t use it as a set name.
–setperdb


Specifies that each database is a separate backup set. Each set is named the
same as the database. Implies “–locktables”.
–settable


Specifies the table in which backup sets are kept. It may be given in
database.table form.
–setvars


Specify any variables you want to be set immediately after connecting to MySQL.
These will be included in a "SET" command.
–socket


Socket file to use for connection.
–stopslave


Issue "STOP SLAVE" on server before dumping data. This ensures that the data
is not changing during the dump. Issues "START SLAVE" after the dump is
complete.

If the slave is not running, throws an error and exits. This is to prevent
possibly bad things from happening if the slave is not running because of a
problem, or because someone intentionally stopped the slave for maintenance or
some other purpose.

–tab


Dump via "SELECT INTO OUTFILE", which is similar to what "mysqldump" does with
the "–tab" option, but you’re not constrained to a single database at a time.

Before you use this option, make sure you know what "SELECT INTO OUTFILE" does!
I recommend using it only if you’re running mk-parallel-dump on the same
machine as the MySQL server, but there is no protection if you don’t.

The files will be gzipped after dumping if “–gzip” is enabled. This option
sets “–umask” to zero so auto-created directories are writable by the MySQL
server.

Triggers are dumped into ".trg" files, and views are postponed until the end of
the dump, then dumped all together into the "00_views.sql" file. This allows
restoring data before the triggers, which is important for restoring data
accurately. Views must be postponed until the end and dumped together so they
can be restored correctly; interdependencies between views and tables may
prevent correct restoration otherwise.

–tables


Dump this comma-separated list of table names. Table names may be qualified
with the database name.
–tblregex


Dump only tables whose names match this Perl regular expression.
–test


Print commands instead of executing them.
–umask


Set the program’s "umask" to this octal value. This is useful when you want
created files and directories to be readable or writable by other users (for
example, the MySQL server itself).
–user


User for login if not current user.
–verbose


Sets the verbosity; repeatedly specifying it increments the verbosity.
Default is 1 if not specified. See “OUTPUT”.
–version


Output version information and exit.
–wait


If the MySQL server crashes during dumping, waits until the server comes back
and then continues with the rest of the tables. "mk-parallel-dump" will
check the server every second until this time is exhausted, at which point it
will give up and exit.

This implements Peter Zaitsev’s “safe dump” request: sometimes a dump on a
server that has corrupt data will kill the server. mk-parallel-dump will
wait for the server to restart, then keep going. It’s hard to say which table
killed the server, so no tables will be retried. Tables that were being
concurrently dumped when the crash happened will not be retried. No additional
locks will be taken after the server restarts; it’s assumed this behavior is
useful only on a server you’re not trying to dump while it’s in production.


 

ENVIRONMENT


The environment variable "MKDEBUG" enables verbose debugging output in all of
the Maatkit tools:


MKDEBUG=1 mk-….

 

SYSTEM REQUIREMENTS


You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.

This program works best on GNU/Linux. Filename quoting might not work well on
Microsoft Windows if you have spaces or funny characters in your database or
table names.
 

BUGS


Please use the Sourceforge bug tracker, forums, and mailing lists to request
support or report bugs: <http://sourceforge.net/projects/maatkit/>.

Please include the complete command-line used to reproduce the problem you are
seeing, the version of all MySQL servers involved, the complete output of the
tool when run with “–version”, and if possible, debugging output produced by
running with the "MKDEBUG=1" environment variable.
 

COPYRIGHT, LICENSE AND WARRANTY


This program is copyright (c) 2007 Baron Schwartz. Feedback and improvements
are welcome.

THIS PROGRAM IS PROVIDEDAS ISAND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
systems, you can issue `man perlgpl’ or `man perlartistic’ to read these
licenses.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA.
 

AUTHOR


Baron Schwartz.
 

SEE ALSO


See also mk-parallel-restore.
 

VERSION


This manual page documents Ver 1.0.8 Distrib 1972 $Revision: 1970 $.



 

Index



NAME

SYNOPSIS

DESCRIPTION

BACKUP SETS

CHUNKS

MACROS

OUTPUT

SPEED OF PARALLEL DUMPS

DOWNLOADING

OPTIONS

ENVIRONMENT

SYSTEM REQUIREMENTS

BUGS

COPYRIGHT, LICENSE AND WARRANTY

AUTHOR

SEE ALSO

VERSION



Man page for mk-visual-explain

August 24, 2007 – 5:38 pm

MK-VISUAL-EXPLAIN


Section: User Contributed Perl Documentation (1)
Updated: 2008-06-01
Index
Return to Main Contents

 

NAME

mk-visual-explain - Format EXPLAIN output as a tree.
 

SYNOPSIS


mk-visual-explain <file_containing_explain_output>
mk-visual-explain -c <file_containing_query>
mysql -e "explain select * from mysql.user" | mk-visual-explain

 

DESCRIPTION


mk-visual-explain reverse-engineers MySQL’s EXPLAIN output into a query
execution plan, which it then formats as a left-deep tree — the same way the
plan is represented inside MySQL. It is possible to do this by hand, or to read
EXPLAIN’s output directly, but it requires patience and expertise. Many people
find a tree representation more understandable.

You can pipe input into mk-visual-explain or specify a filename at the
command line, including the magical ‘-’ filename, which will read from standard
input. It can do two things with the input: parse it for something that looks
like EXPLAIN output, or connect to a MySQL instance and run EXPLAIN on the
input.

When parsing its input, mk-visual-explain understands three formats: tabular
like that shown in the mysql command-line client, vertical like that created by
using the \G line terminator in the mysql command-line client, and tab
separated. It ignores any lines it doesn’t know how to parse.

When executing the input, mk-visual-explain replaces everything in the input
up to the first SELECT keyword with ‘EXPLAIN SELECT,’ and then executes the
result. You must specify “–connect” to execute the input as a query.

Either way, it builds a tree from the result set and prints it to standard
output. For the following query,


select * from sakila.film_actor join sakila.film using(film_id);

mk-visual-explain generates this query plan:


JOIN
+- Bookmark lookup
| +- Table
| | table film_actor
| | possible_keys idx_fk_film_id
| +- Index lookup
| key film_actor->idx_fk_film_id
| possible_keys idx_fk_film_id
| key_len 2
| ref sakila.film.film_id
| rows 2
+- Table scan
rows 952
+- Table
table film
possible_keys PRIMARY

The query plan is left-deep, depth-first search, and the tree’s root is the
output node — the last step in the execution plan. In other words, read it
like this:


1


Table scan the ‘film’ table, which accesses an estimated 952 rows.
2


For each row, find matching rows by doing an index lookup into the
film_actor->idx_fk_film_id index with the value from sakila.film.film_id, then a
bookmark lookup into the film_actor table.

For more information on how to read EXPLAIN output, please see
<http://dev.mysql.com/doc/en/explain.html>, and this talk titled “Query
Optimizer Internals and What’s New in the MySQL 5.2 Optimizer,” from Timour
Katchaounov, one of the MySQL developers:
<http://conferences.oreillynet.com/presentations/mysql07/katchaounov_timour.pdf>.
 

DOWNLOADING


You can download Maatkit from the Sourceforge website at
<http://sourceforge.net/projects/maatkit>, or you can get any of the tools
easily with a command like the following:


wget http://www.maatkit.org/get/toolname
or
wget http://www.maatkit.org/trunk/toolname

Where "toolname" can be replaced with the name (or fragment of a name) of any
of the Maatkit tools. Once downloaded, they’re ready to run; no installation is
needed. The first URL gets the latest released version of the tool, and the
second gets the latest trunk code from Subversion.
 

OPTIONS



–askpass


Prompt for a password when connecting to MySQL.
–charset


Enables character set settings in Perl and MySQL. If the value is "utf8", sets
Perl’s binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to
DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other
value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after
connecting to MySQL.
–clusterpk


Assume that PRIMARY KEY index accesses don’t need to do a bookmark lookup to
retrieve rows. This is the case for InnoDB.
–connect


Treat input as a query, and obtain EXPLAIN output by connecting to a MySQL
instance and running EXPLAIN on the query. When this option is given,
mk-visual-explain uses the other connection-specific options such as “–user” to
connect to the MySQL instance. If you have a .my.cnf file, it will read it, so
you may not need to specify any connection-specific options.
–database


Connect to this database.
–defaults-file


If you specify this option, only this file is read for MySQL default options;
otherwise all the default files will be read.
–format


Controls the output formatting. The default is a terse pretty-printed tree.
The valid values are:


value meaning
===== =======
tree Pretty-printed terse tree.
dump Data::Dumper output (see L<Data::Dumper> for more).

–help


Displays a help message.
–host


MySQL host to connect to.
–password


The password to use when connecting.
–port


The port number to use for the connection.
–setvars


Specify any variables you want to be set immediately after connecting to MySQL.
These will be included in a "SET" command.
–socket


The socket file to use for the connection.
–user


MySQL user account to use for the connection.
–version


Output version information and exit.

 

MODULES


This program is actually a runnable module, not just an ordinary Perl script.
In fact, there are two modules embedded in it. This makes unit testing easy,
but it also makes it easy for you to use the parsing and tree-building
functionality if you want.

The ExplainParser package accepts a string and parses whatever it thinks looks
like EXPLAIN output from it. The synopsis is as follows:


require "mk-visual-explain";
my $p = ExplainParser->new();
my $rows = $p->parse("some text");
# $rows is an arrayref of hashrefs.

The ExplainTree package accepts a set of rows and turns it into a tree. For
convenience, you can also have it delegate to ExplainParser and parse text for
you. Here’s the synopsis:


require "mk-visual-explain";
my $e = ExplainTree->new();
my $tree = $e->parse("some text", \%options);
my $output = $e->pretty_print($tree);
print $tree;

 

ALGORITHM


This section explains the algorithm that converts EXPLAIN into a tree. You may
be interested in reading this if you want to understand EXPLAIN more fully, or
trying to figure out how this works, but otherwise this section will probably
not make your life richer.

The tree can be built by examining the id, select_type, and table columns of
each row. Here’s what I know about them:

The id column is the sequential number of the select. This does not indicate
nesting; it just comes from counting SELECT from the left of the SQL statement.
It’s like capturing parentheses in a regular expression. A UNION RESULT row
doesn’t have an id, because it isn’t a SELECT. The source code actually refers
to UNIONs as a fake_lex, as I recall.

If two adjacent rows have the same id value, they are joined with the standard
single-sweep multi-join method.

The select_type column tells a) that a new sub-scope has opened b) what kind
of relationship the row has to the previous row c) what kind of operation the
row represents.


*

SIMPLE means there are no subqueries or unions in the whole query.
*

PRIMARY means there are, but this is the outermost SELECT.
*

[DEPENDENT] UNION means this result is UNIONed with the previous result (not
row; a result might encompass more than one row).
*

UNION RESULT terminates a set of UNIONed results.
*

[DEPENDENT|UNCACHEABLE] SUBQUERY means a new sub-scope is opening. This is the
kind of subquery that happens in a WHERE clause, SELECT list or whatnot; it does
not return a so-called “derived table.”
*

DERIVED is a subquery in the FROM clause.

Tables that are JOINed all have the same select_type. For example, if you JOIN
three tables inside a dependent subquery, they’ll all say the same thing:
DEPENDENT SUBQUERY.

The table column usually specifies the table name or alias, but may also say
<derivedN> or <unionN,N…N>. If it says <derivedN>, the row represents an
access to the temporary table that holds the result of the subquery whose id is
N. If it says <unionN,..N> it’s the same thing, but it refers to the results it
UNIONs together.

Finally, order matters. If a row’s id is less than the one before it, I think
that means it is dependent on something other than the one before it. For
example,


explain select
(select 1 from sakila.film),
(select 2 from sakila.film_actor),
(select 3 from sakila.actor);


| id | select_type | table |
+—-+————-+————+
| 1 | PRIMARY | NULL |
| 4 | SUBQUERY | actor |
| 3 | SUBQUERY | film_actor |
| 2 | SUBQUERY | film |

If the results were in order 2-3-4, I think that would mean 3 is a subquery of
2, 4 is a subquery of 3. As it is, this means 4 is a subquery of the neares
previous recent row with a smaller id, which is 1. Likewise for 3 and 2.

This structure is hard to programatically build into a tree for the same reason
it’s hard to understand by inspection: there are both forward and backward
references. <derivedN> is a forward reference to selectN, while <unionM,N> is a
backward reference to selectM and selectN. That makes recursion and other
tree-building algorithms hard to get right (NOTE: after implementation, I now
see how it would be possible to deal with both forward and backward references,
but I have no motivation to change something that works). Consider the
following:


select * from (
select 1 from sakila.actor as actor_1
union
select 1 from sakila.actor as actor_2
) as der_1
union
select * from (
select 1 from sakila.actor as actor_3
union all
select 1 from sakila.actor as actor_4
) as der_2;


| id | select_type | table |
+——+————–+————+
| 1 | PRIMARY | <derived2> |
| 2 | DERIVED | actor_1 |
| 3 | UNION | actor_2 |
| NULL | UNION RESULT | <union2,3> |
| 4 | UNION | <derived5> |
| 5 | DERIVED | actor_3 |
| 6 | UNION | actor_4 |
| NULL | UNION RESULT | <union5,6> |
| NULL | UNION RESULT | <union1,4> |

This would be a lot easier to work with if it looked like this (I’ve
bracketed the id on rows I moved):


| id | select_type | table |
+——+————–+————+
| [1] | UNION RESULT | <union1,4> |
| 1 | PRIMARY | <derived2> |
| [2] | UNION RESULT | <union2,3> |
| 2 | DERIVED | actor_1 |
| 3 | UNION | actor_2 |
| 4 | UNION | <derived5> |
| [5] | UNION RESULT | <union5,6> |
| 5 | DERIVED | actor_3 |
| 6 | UNION | actor_4 |

In fact, why not re-number all the ids, so the PRIMARY row becomes 2, and so on?
That would make it even easier to read. Unfortunately that would also have the
effect of destroying the meaning of the id column, which I think is important to
preserve in the final tree. Also, though it makes it easier to read, it doesn’t
make it easier to manipulate programmatically; so it’s fine to leave them
numbered as they are.

The goal of re-ordering is to make it easier to figure out which rows are
children of which rows in the execution plan. Given the reordered list and some
row whose table is <union…> or <derived>, it is easy to find the beginning of
the slice of rows that should be child nodes in the tree: you just look for the
first row whose ID is the same as the first number in the table.

The next question is how to find the last row that should be a child node of a
UNION or DERIVED. I’ll start with DERIVED, because the solution makes UNION
easy.

Consider how MySQL numbers the SELECTs sequentially according to their position
in the SQL, left-to-right. Since a DERIVED table encloses everything within it
in a scope, which becomes a temporary table, there are only two things to think
about: its child subqueries and unions (if any), and its next siblings in the
scope that encloses it. Its children will all have an id greater than it does,
by definition, so any later rows with a smaller id terminate the scope.

Here’s an example. The middle derived table here has a subquery and a UNION to
make it a little more complex for the example.


explain select 1
from (
select film_id from sakila.film limit 1
) as der_1
join (
select film_id, actor_id, (select count(*) from sakila.rental) as r
from sakila.film_actor limit 1
union all
select 1, 1, 1 from sakila.film_actor as dummy
) as der_2 using (film_id)
join (
select actor_id from sakila.actor limit 1
) as der_3 using (actor_id);

Here’s the output of EXPLAIN:


| id | select_type | table |
| 1 | PRIMARY | <derived2> |
| 1 | PRIMARY | <derived6> |
| 1 | PRIMARY | <derived3> |
| 6 | DERIVED | actor |
| 3 | DERIVED | film_actor |
| 4 | SUBQUERY | rental |
| 5 | UNION | dummy |
| NULL | UNION RESULT | <union3,5> |
| 2 | DERIVED | film |

The siblings all have id 1, and the middle one I care about is derived3.
(Notice MySQL doesn’t execute them in the order I defined them, which is fine).
Now notice that MySQL prints out the rows in the opposite order I defined the
subqueries: 6, 3, 2. It always seems to do this, and there might be other
methods of finding the scope boundaries including looking for the lower boundary
of the next largest sibling, but this is a good enough heuristic. I am forced
to rely on it for non-DERIVED subqueries, so I rely on it here too. Therefore,
I decide that everything greater than or equal to 3 belongs to the DERIVED
scope.

The rule for UNION is simple: they consume the entire enclosing scope, and to
find the component parts of each one, you find each part’s beginning as referred
to in the <unionN,…> definition, and its end is either just before the next
one, or if it’s the last part, the end is the end of the scope.

This is only simple because UNION consumes the entire scope, which is either the
entire statement, or the scope of a DERIVED table. This is because a UNION
cannot be a sibling of another UNION or a table, DERIVED or not. (Try writing
such a statement if you don’t see it intuitively). Therefore, you can just find
the enclosing scope’s boundaries, and the rest is easy. Notice in the example
above, the UNION is over <union3,5>, which includes the row with id 4 — it
includes every row between 3 and 5.

Finally, there are non-derived subqueries to deal with as well. In this case I
can’t look at siblings to find the end of the scope as I did for DERIVED. I
have to trust that MySQL executes depth-first. Here’s an example:


explain
select actor_id,
(
select count(film_id)
+ (select count(*) from sakila.film)
from sakila.film join sakila.film_actor using(film_id)
where exists(
select * from sakila.actor
where sakila.actor.actor_id = sakila.film_actor.actor_id
)
)
from sakila.actor;


| id | select_type | table |
| 1 | PRIMARY | actor |
| 2 | SUBQUERY | film |
| 2 | SUBQUERY | film_actor |
| 4 | DEPENDENT SUBQUERY | actor |
| 3 | SUBQUERY | film |

In order, the tree should be built like this:


*

See row 1.
*

See row 2. It’s a higher id than 1, so it’s a subquery, along with every other
row whose id is greater than 2.
*

Inside this scope, see 2 and 2 and JOIN them. See 4. It’s a higher id than 2,
so it’s again a subquery; recurse. After that, see 3, which is also higher;
recurse.

But the only reason the nested subquery didn’t include select 3 is because
select 4 came first. In other words, if EXPLAIN looked like this,


| id | select_type | table |
| 1 | PRIMARY | actor |
| 2 | SUBQUERY | film |
| 2 | SUBQUERY | film_actor |
| 3 | SUBQUERY | film |
| 4 | DEPENDENT SUBQUERY | actor |

I would be forced to assume upon seeing select 3 that select 4 is a subquery
of it, rather than just being the next sibling in the enclosing scope. If this
is ever wrong, then the algorithm is wrong, and I don’t see what could be done
about it.

UNION is a little more complicated than just “the entire scope is a UNION,”
because the UNION might itself be inside an enclosing scope that’s only
indicated by the first item inside the UNION. There are only three kinds of
enclosing scopes: UNION, DERIVED, and SUBQUERY. A UNION can’t enclose a UNION,
and a DERIVED has its own “scope markers,” but a SUBQUERY can wholly enclose a
UNION, like this strange example on the empty table t1:


explain select * from t1 where not exists(
(select t11.i from t1 t11) union (select t12.i from t1 t12));


| id | select_type | table | Extra |
+——+————–+————+——————————–+
| 1 | PRIMARY | t1 | const row not found |
| 2 | SUBQUERY | NULL | No tables used |
| 3 | SUBQUERY | NULL | no matching row in const table |
| 4 | UNION | t12 | const row not found |
| NULL | UNION RESULT | <union2,4> | |

The UNION’s backward references might make it look like the UNION encloses the
subquery, but studying the query makes it clear this isn’t the case. So when a
UNION’s first row says SUBQUERY, it is this special case.

By the way, I don’t fully understand this query plan; there are 4 numbered
SELECT in the plan, but only 3 in the query. The parens around the UNIONs are
meaningful. Removing them will make the EXPLAIN different. Please tell me how
and why this works if you know.

Armed with this knowledge, it’s possible to use recursion to turn the
parent-child relationship between all the rows into a tree representing the
execution plan.

MySQL prints the rows in execution order, even the forward and backward
references. At any given scope, the rows are processed as a left-deep tree.
MySQL does not do “bushy” execution plans. It begins with a table, finds a
matching row in the next table, and continues till the last table, when it emits
a row. When it runs out, it backtracks till it can find the next row and
repeats. There are subtleties of course, but this is the basic plan. This is
why MySQL transforms all RIGHT OUTER JOINs into LEFT OUTER JOINs and cannot do
FULL OUTER JOIN.

This means in any given scope, say


| id | select_type | table |
| 1 | SIMPLE | tbl1 |
| 1 | SIMPLE | tbl2 |
| 1 | SIMPLE | tbl3 |

The execution plan looks like a depth-first traversal of this tree:


JOIN
/ \
JOIN tbl3
/ \
tbl1 tbl2

The JOIN might not be a JOIN. It might be a subquery, for example. This comes
from the type column of EXPLAIN. The documentation says this is a “join type,”
but I think “access type” is more accurate, because it’s “how MySQL accesses
rows.”

mk-visual-explain decorates the tree significantly more than just turning
rows into nodes. Each node may get a series of transformations that turn it
into a subtree of more than one node. For example, an index scan not marked
with ‘Using index’ must do a bookmark lookup into the table rows; that is a
three-node subtree. However, after the above node-ordering and scoping stuff,
the rest of the process is pretty simple.
 

SEE ALSO


See also mk-query-profiler.
 

ENVIRONMENT


The environment variable "MKDEBUG" enables verbose debugging output in all of
the Maatkit tools:


MKDEBUG=1 mk-….

 

BUGS


Please use the Sourceforge bug tracker, forums, and mailing lists to request
support or report bugs: <http://sourceforge.net/projects/maatkit/>.

Please include the complete command-line used to reproduce the problem you are
seeing, the version of all MySQL servers involved, the complete output of the
tool when run with “–version”, and if possible, debugging output produced by
running with the "MKDEBUG=1" environment variable.
 

SYSTEM REQUIREMENTS


You need Perl, DBI, DBD::mysql, and some core packages that ought to be
installed in any reasonably new version of Perl.
 

AUTHOR


Baron “Xaprb” Schwartz.
 

COPYRIGHT, LICENSE AND WARRANTY


This program is copyright (c) 2007 Baron Schwartz.
Feedback and improvements are welcome.

THIS PROGRAM IS PROVIDEDAS ISAND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
systems, you can issue `man perlgpl’ or `man perlartistic’ to read these
licenses.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA.
 

VERSION


This manual page documents Ver 1.0.8 Distrib 1972 $Revision: 1970 $.



 

Index



NAME

SYNOPSIS

DESCRIPTION

DOWNLOADING

OPTIONS

MODULES

ALGORITHM

SEE ALSO

ENVIRONMENT

BUGS

SYSTEM REQUIREMENTS

AUTHOR

COPYRIGHT, LICENSE AND WARRANTY

VERSION



Man page for mk-query-profiler

August 24, 2007 – 5:38 pm

MK-QUERY-PROFILER


Section: User Contributed Perl Documentation (1)
Updated: 2008-06-01
Index
Return to Main Contents

 

NAME

mk-query-profiler - Execute SQL statements and print statistics, or measure
activity caused by other processes.
 

SYNOPSIS


mk-query-profiler can profile the (semicolon-terminated, blank-line
separated) queries in a file:


mk-query-profiler queries.sql
cat queries.sql | mk-query-profiler
mk-query-profiler -vv queries.sql
mk-query-profiler -v –separate –only 2,5,6 queries.sql
mk-query-profiler –tab queries.sql > results.csv

It can also just observe what happens in the server:


mk-query-profiler –external

Or it can run shell commands from a file and measure the result:


mk-query-profiler –external commands.txt
mk-query-profiler –external - < commands.txt

Read “HOW TO INTERPRET” to learn what it all means.
 

DOWNLOADING


You can download Maatkit from the Sourceforge website at
<http://sourceforge.net/projects/maatkit>, or you can get any of the tools
easily with a command like the following:


wget http://www.maatkit.org/get/toolname
or
wget http://www.maatkit.org/trunk/toolname

Where "toolname" can be replaced with the name (or fragment of a name) of any
of the Maatkit tools. Once downloaded, they’re ready to run; no installation is
needed. The first URL gets the latest released version of the tool, and the
second gets the latest trunk code from Subversion.
 

OPTIONS



–allowcache


short form: -a

Let MySQL query cache cache the queries executed.

By default this is disabled. When enabled, cache profiling information is added
to the printout. See <http://dev.mysql.com/doc/en/query-cache.html> for more
information about the query cache.

–askpass


Prompt for a password when connecting to MySQL.
–calibrate


short form: -c; negatable: yes; default: yes

Try to compensate for "SHOW STATUS".

Measure and compensate for the “cost of observation” caused by running SHOW
STATUS. Only works reliably on a quiet server; on a busy server, other
processes can cause the calibration to be wrong.

–charset


short form: -A; type: string

Default character set.

Enables character set settings in Perl and MySQL. If the value is "utf8", sets
Perl’s binmode on STDOUT to utf8, passes the "mysql_enable_utf8" option to
DBD::mysql, and runs "SET NAMES UTF8" after connecting to MySQL. Any other
value sets binmode on STDOUT without the utf8 layer, and runs "SET NAMES" after
connecting to MySQL.

–database


short form: -D; type: string

Database to use for connection.

–defaults-file


short form: -F; type: string

Only read mysql options from the given file. You must give an absolute
pathname.

–external


short form: -e

Calibrate, then pause while an external program runs.

This is typically useful while you run an external program. When you press
[enter] mk-query-profiler will stop sleeping and take another measurement, then
print statistics as usual.

When there is a filename on the command line, mk-query-profiler executes
each line in the file as a shell command. If you give - as the filename,
mk-query-profiler reads from STDIN.

Output from shell commands is printed to STDOUT and terminated with __BEGIN__,
after which mk-query-profiler prints its own output.

–flush


short form: -f; cumulative: yes

Flush tables. Specify twice to do between every query.

Calls FLUSH TABLES before profiling. If you are executing queries from a
batch file, specifying –flush twice will cause mk-query-profiler to call
FLUSH TABLES between every query, not just once at the beginning. Default is
not to flush at all. See <http://dev.mysql.com/doc/en/flush.html> for more
information.

–host


short form: -h; type: string

Connect to host.

–innodb


short form: -i; negatable: yes; default: yes

Show InnoDB statistics.

–only


short form: -n; type: hash

Only show statistics for this comma-separated list of queries or commands.

–password


short form: -p; type: string

Password to use for connection.

–port


short form: -P; type: int

Port number to use for connection.

–separate


short form: -s

Print stats separately for each query.

The default is to show only the summary of the entire batch. See also
“–verbose”.

–session


negatable: yes; default: yes

Use session "SHOW STATUS" and "SHOW VARIABLES".

Disabled if the server version doesn’t support it.

–setvars


type: string; default: wait_timeout=10000

Set these MySQL variables.

Specify any variables you want to be set immediately after connecting to MySQL.
These will be included in a "SET" command.

–socket


short form: -S; type: string

Socket file to use for connection.

–tab


short form: -t

Print tab-separated values instead of whitespace-aligned columns.

–user


short form: -u; type: string

User for login if not current user.

–verbose


short form: -v; cumulative: yes; default: 0

Verbosity; specify multiple times for more detailed output.

When “–tab” is given, prints variables that don’t change. Otherwise
increasing the level of verbosity includes extra sections in the output.

–verify


short form: -r

Verify nothing else is accessing the server.

This is a weak verification; it simply calibrates twice (see “–calibrate”)
and verifies that the cost of observation remains constant.


 

DESCRIPTION


mk-query-profiler reads a file containing one or more SQL
statements or shell commands, executes them, and analyzes the output of SHOW STATUS afterwards.
It then prints statistics about how the batch performed. For example, it can
show how many table scans the batch caused, how many page reads, how many
temporary tables, and so forth.

All command-line arguments are optional, but you must either specify a file
containing the batch to profile as the last argument, or specify that you’re
profiling an external program with the “–external” option, or provide
input to STDIN.

If the file contains multiple statements, they must be separated by blank
lines. If you don’t do that, mk-query-profiler won’t be able to split the
file into individual queries, and MySQL will complain about syntax errors.

If the MySQL server version is before 5.0.2, you should make sure the server
is completely unused before trying to profile a batch. Prior to this version,
SHOW STATUS showed only global status variables, so other queries will
interfere and produce false results. mk-query-profiler will try to detect
if anything did interfere, but there can be no guarantees.

Prior to MySQL 5.0.2, InnoDB status variables are not available, and prior to
version 5.0.3, InnoDB row lock status variables are not available.
mk-query-profiler will omit any output related to these variables if they’re not
available.

For more information about SHOW STATUS, read the relevant section of the MySQL
manual at
<http://dev.mysql.com/doc/en/server-status-variables.html>
 

HOW TO INTERPRET


 

TAB-SEPARATED OUTPUT


If you specify “–tab”, you will get the raw output of SHOW STATUS in
tab-separated format, convenient for opening with a spreadsheet. This is not
the default output, but it’s so much easier to describe that I’ll cover it
first.


*

Most of the command-line options for controlling verbosity and such are
ignored in –tab mode.
*

The variable names you see in MySQL, such as ‘Com_select’, are kept —
there are no euphimisms, so you have to know your MySQL variables.
*

The columns are Variable_name, Before, After1…AfterN, Calibration.
The Variable_name column is just what it sounds like. Before is the result
from the first run of SHOW STATUS. After1, After2, etc are the results of
running SHOW STATUS after each query in the batch. Finally, the last column
is the result of running SHOW STATUS just after the last AfterN column, so you
can see how much work SHOW STATUS itself causes.
*

If you specify “–verbose”, output includes every variable
mk-query-profiler measures. If not (default) it only includes variables where
there was some difference from one column to the next.

 

NORMAL OUTPUT


If you don’t specify –tab, you’ll get a report formatted for human
readability. This is the default output format.

mk-query-profiler can output a lot of information, as you’ve seen if you
ran the examples in the “SYNOPSIS”. What does it all mean?

First, there are two basic groups of information you might see: per-query and
summary. If your batch contains only one query, these will be the same and
you’ll only see the summary. You can recognize the difference by looking for
centered, all-caps, boxed-in section headers. Externally profiled commands will
have EXTERNAL, individually profiled queries will have QUERY, and summary will
say SUMMARY.

Next, the information in each section is grouped into subsections, headed by
an underlined title. Each of these sections has varying information in it.
Which sections you see depends on command-line arguments and your MySQL
version. I’ll explain each section briefly. If you really want to know where
the numbers come from, read
<http://dev.mysql.com/doc/en/server-status-variables.html>.

You need to understand which numbers are insulated from other queries and
which are not. This depends on your MySQL version. Version 5.0.2 introduced
the concept of session status variables, so you can see information about only
your own connection. However, many variables aren’t session-ized, so when you
have MySQL 5.0.2 or greater, you will actually see a mix of session and global
variables. That means other queries happening at the same time will pollute
some of your results. If you have MySQL versions older than 5.0.2, you won’t
have ANY connection-specific stats, so your results will be polluted by other
queries no matter what. Because of the mixture of session and global
variables, by far the best way to profile is on a completely quiet server
where nothing else is interfering with your results.

While explaining the results in the sections that follow, I’ll refer to a
value as “protected” if it comes from a session-specific variable and can be
relied upon to be accurate even on a busy server. Just keep in mind, if
you’re not using MySQL 5.0.2 or newer, your results will be inaccurate unless
you’re running against a totally quiet server, even if I label it as
“protected.”
 

Overall stats


This section shows the overall elapsed time for the query, as measured by
Perl, and the optimizer cost as reported by MySQL.

If you’re viewing separate query statistics, this is all you’ll see. If
you’re looking at a summary, you’ll also see a breakdown of the questions the
queries asked the server.

The execution time is not totally reliable, as it includes network round-trip
time, Perl’s own execution time, and so on. However, on a low-latency
network, this should be fairly negligible, giving you a reasonable measure of
the query’s time, especially for queries longer than a few tenths of a second.

The optimizer cost comes from the Last_query_cost variable, and is protected
from other connections in MySQL 5.0.7 and greater. It is not available before
5.0.1.

The total number of questions is not protected, but the breakdown of
individual question types is, because it comes from the Com_ status variables.
 

Table and index accesses


This section shows you information about the batch’s table and index-level
operations (as opposed to row-level operations, which will be in the next
section). The “Table locks acquired” and “Temp files” values are unprotected,
but everything else in this section is protected.

The “Potential filesorts” value is calculated as the number of times a query had
both a scan sort (Sort_scan) and created a temporary table (Created_tmp_tables).
There is no Sort_filesort or similar status value, so it’s a best guess at
whether a query did a filesort. It should be fairly accurate.

If you specified “–allowcache”, you’ll see statistics on the query cache.
These are unprotected.
 

Row operations


These values are all about the row-level operations your batch caused. For
example, how many rows were inserted, updated, or deleted. You’ll also see
row-level index access statistics, such as how many times the query sought and
read the next entry in an index.

Depending on your MySQL version, you’ll either see one or two columns of
information in this section. The one headed “Handler” is all from the
Handler_ variables, and those statistics are protected. If your MySQL version
supports it, you’ll also see a column headed “InnoDB,” which is unprotected.
 

I/O Operations


This section gives information on I/O operations your batch caused, both in
memory and on disk. Unless you have MySQL 5.0.2 or greater, you’ll only see
information on the key cache. Otherwise, you’ll see a lot of information on
InnoDB’s I/O operations as well, such as how many times the query was able to
satisfy a read from the buffer pool and how many times it had to go to the
disk.

None of the information in this section is protected.
 

InnoDB Data Operations


This section only appears when you’re querying MySQL 5.0.2 or newer. None of
the information is protected. You’ll see statistics about how many pages were
affected, how many operations took place, and how many bytes were affected.
 

ENVIRONMENT


The environment variable "MKDEBUG" enables verbose debugging output in all of
the Maatkit tools:


MKDEBUG=1 mk-….

 

BUGS


Please use the Sourceforge bug tracker, forums, and mailing lists to request
support or report bugs: <http://sourceforge.net/projects/maatkit/>.

Please include the complete command-line used to reproduce the problem you are
seeing, the version of all MySQL servers involved, the complete output of the
tool when run with “–version”, and if possible, debugging output produced by
running with the "MKDEBUG=1" environment variable.
 

SYSTEM REQUIREMENTS


You need Perl, DBI, DBD::mysql, and some core modules.
 

SEE ALSO


See also mk-profile-compact.
 

LICENSE


This program is copyright (c) 2007 Baron Schwartz.
Feedback and improvements are welcome.

THIS PROGRAM IS PROVIDEDAS ISAND WITHOUT ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.

This program is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free Software
Foundation, version 2; OR the Perl Artistic License. On UNIX and similar
systems, you can issue `man perlgpl’ or `man perlartistic’ to read these
licenses.

You should have received a copy of the GNU General Public License along with
this program; if not, write to the Free Software Foundation, Inc., 59 Temple
Place, Suite 330, Boston, MA 02111-1307 USA.
 

AUTHOR


Baron Schwartz.
 

ACKNOWLEDGEMENTS


I was inspired by the wonderful mysqlreport utility available at
<http://www.hackmysql.com/>.

Other contributors: Bart van Bragt.

Thanks to all who have helped.
 

VERSION


This manual page documents Ver 1.1.10 Distrib 1972 $Revision: 1970 $.



 

Index



NAME

SYNOPSIS

DOWNLOADING

OPTIONS

DESCRIPTION

HOW TO INTERPRET

TAB-SEPARATED OUTPUT

NORMAL OUTPUT

Overall stats

Table and index accesses

Row operations

I/O Operations

InnoDB Data Operations

ENVIRONMENT

BUGS

SYSTEM REQUIREMENTS

SEE ALSO

LICENSE

AUTHOR

ACKNOWLEDGEMENTS

VERSION



Man page for mk-slave-prefetch

August 24, 2007 – 5:38 pm

MK-SLAVE-PREFETCH


Section: User Contributed Perl Documentation (1)
Updated: 2008-06-01
Index
Return to Main Contents

 

NAME

mk-slave-prefetch - Pipeline relay logs on a MySQL slave to pre-warm caches.
 

SYNOPSIS


mk-slave-prefetch
mk-slave-prefetch –statistics > /path/to/saved/statistics
mk-slave-prefetch /path/to/saved/statistics

 

DESCRIPTION


mk-slave-prefetch reads the slave’s relay log slightly ahead of where the
slave’s SQL thread is reading, converts statements into "SELECT", and
executes them. In theory, this should help alleviate the effects of the
slave’s single-threaded SQL execution. It will help take advantage of
multiple CPUs and disks by pre-reading the data from disk, so the data is
already in the cache when the slave SQL thread executes the un-modified
version of the statement.

Statements that can’t be converted into "SELECT" are ignored. However, there
is always a chance of bugs. It would be a very good idea to connect as a
read-only user. Here is an example of how to grant the necessary privileges:


GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE ON *.*
TO ‘prefetch’@'%’ IDENTIFIED BY ’sp33dmeup!’;

"mk-slave-prefetch" learns how long it takes statements to execute, and doesn’t
try to execute those that take a very long time. You can ask it to print what
it has learned after it executes. You can also specify a filename on the
command line. The file should contain the statistics printed by a previous
run. These will be used to pre-populate the statistics so it doesn’t have to
re-learn.

This program is based on concepts I heard Paul Tuckfield explain at the November
2006 MySQL Camp un-conference. However, the code is my own work. I have not
seen any other implementation of Paul’s idea.
 

DOES IT WORK?


Does it work? Does it actually speed up the slave?

That depends on your workl