|
|
 |
 |
Go to the first, previous, next, last section, table of contents.
MySQL Cluster uses the new NDB Cluster storage engine to enable running
several MySQL servers in a cluster.
The NDB Cluster storage engine is available in BitKeeper from MySQL
release 4.1.2, and in binary releases from MySQL-Max 4.1.3.
Currently supported operating systems are Linux, Mac OS X, and Solaris.
We are working to make NDB Cluster run on all operating systems
supported by MySQL, including Windows.
This chapter represents work in progress.
Other documents describing MySQL Cluster can be found at
http://www.mysql.com/cluster/.
You may also wish to subscribe to the MySQL Cluster mailing list.
See http://lists.mysql.com/. You may also find the MySQL forums
at http://forums.mysql.com/to be useful.
MySQL Cluster is a new technology to enable clustering of in-memory
databases in a share-nothing system. The share-nothing architecture allows
the system to work with very inexpensive hardware, and without any specific
requirements on hardware or software. It also does not have any single
point of failure because each component has its own memory and disk.
MySQL Cluster integrates the standard MySQL server with an
in-memory clustered storage engine called NDB. In our
documentation, the term NDB refers to the
part of the setup that is specific to the storage engine, whereas
MySQL Cluster refers to the combination
of MySQL and the new storage engine.
A MySQL Cluster consists of a set of computers, each running a number of processes
including MySQL servers, storage nodes for NDB Cluster, management servers
and (possibly) specialized data access programs. All these programs work
together to form MySQL Cluster. When data is stored in the NDB Cluster storage
engine, the tables are stored in the storage nodes for NDB Cluster.
Such tables are directly accessible from all other MySQL servers in
the cluster. Thus, in a payroll application storing data in a cluster, if one
application updates the salary of an employee, all
other MySQL servers that query this data can see the change immediately.
The data stored in the storage nodes for MySQL Cluster can be mirrored;
the cluster can handle failures of individual storage nodes with no other impact than
that a number of transactions are aborted due to losing the transaction
state. Since transactional applications are expected to handle transaction
failure, this should not be a source of problems.
By bringing MySQL Cluster to the Open Source world, MySQL makes clustered
data management with high availability, high performance, and scalability
available to all who need it.
NDB is an in-memory storage engine offering high-availability
and data-persistence features.
The NDB storage engine can be configured with a range of failover and
load-balancing options, but it is easiest to start with the storage
engine at the cluster level. MySQL Cluster's NDB storage engine
contains a complete set of data, dependent only on other data within
the cluster itself.
We will now describe how to set up a MySQL Cluster consisting of an
NDB storage engine and some MySQL servers.
The cluster portion of MySQL Cluster is currently configured independently
of the MySQL servers. In a MySQL Cluster, each part of the cluster
is considered to be a node.
Note: A node is in many contexts a computer, but for
MySQL Cluster it is a process. There can be any number of nodes on
a single computer.
Each node has a type, and there can be multiple nodes of each type
in a MySQL Cluster. In a minimal MySQL Cluster configuration, there
will be at least three nodes:
-
The management (
MGM) node: The role of this type of node is to
manage the other nodes within the MySQL Cluster, such as providing
configuration data, starting and stopping nodes, running backup, and so forth.
Because this node type manages the configuration of the other nodes, a
node of this type should be started first, before any other
node. With a running cluster, the MGM node does not necessarily have to
be running all the time. A MGM node is started with the command
ndb_mgmd; for this reason NDB_MGMD is provided as an alias for
MGM when configuring the cluster.
-
The storage or database (
DB) node: This is the type of node that
manages and stores the database itself. There are as many DB nodes
as you have replicas times the number of fragments. For example, with
two replicas, each having two fragments, you will need four DB nodes.
It is not necessary to have more than one replica, so a minimal
MySQL Cluster may contain just one DB node. A DB node is started with the
command ndbd, and so NDBD is provided as an alias for
DB when configuring the cluster.
-
The client (
API) node: This is the client node that will access
the cluster. In the case of MySQL Cluster, a client node is a traditional
MySQL server that uses the NDB Cluster storage engine, enabling
access to clustered tables. Basically, the MySQL server acts as a client
of the NDB cluster. Applications using the NDB API directly are also
considered API nodes. Since a MySQL server is typically started with the
command mysqld or mysqld_safe, MYSQLD is provided as
an alias for API when configuring the cluster.
Cluster processes are also referred to as cluster nodes. Configuration of
the cluster involves configuring each individual node in the cluster and
setting up individual communication links between nodes. MySQL Cluster
is currently designed with the intention that storage nodes are homogenous
in terms of processor power, memory space, and communication bandwidth. In
addition, in order to provide a single point of configuration, all configuration
data for the cluster as a whole is located in one configuration file.
The management server manages the cluster configuration file and the
cluster log. Each node in the cluster retrieves the configuration data
from the management server, and so requires a way to determine
where the management server resides. When interesting events occur in
the storage nodes, the nodes transfer information about these events to the
management server, which then writes the information to the cluster log.
In addition, there can be any number of clients to the cluster. These
are of two types.
-
Standard MySQL clients: These are no different for MySQL Cluster than
they are for standard (non-Cluster) MySQL. In other words, MySQL Cluster can
be accessed from existing MySQL applications written in PHP, Perl, C, C++, Java,
Python, Ruby, and so on.
-
Management clients: These clients connect to the management
server and provide commands for starting and stopping nodes gracefully,
starting and stopping message tracing (debug versions only), showing node versions
and status, starting and stopping backups, and so on.
A MySQL server that is part of a MySQL Cluster differs in only one
respect from a normal (non-clustered) MySQL server, employing the
NDBCLUSTER) storage engine. This engine is also referred to
simply as NDB, and the two forms of the name are synonomous.
In order to avoid unnecessary resources allocation, the server is
configured by default with the NDB storage engine disabled. To
enable NDB, you will need to modify the server's `my.cnf'
configuration file.
Since the MySQL server is a part of the cluster, it will also need to
know how to access an MGM node in order to obtain the cluster
configuration data. The default behavior is to look for the MGM node on
localhost. However, should you need to specify its location
elsewhere, this is can be done in `my.cnf' or on the MySQL server
command line. Before the NDB storage engine can be used, at
least one MGM node must be operational, as well as any desired DB
nodes.
NDB, the Cluster storage engine, is available in binary
distributions beginning with MySQL-Max 4.1.3 for Linux, Mac OS X, and
Solaris. It is not yet supported on Windows, but we intend to make it
available for win32 platforms in the near future.
If you choose to build from a source tarball or the MySQL 4.1 BitKeeper
tree, be sure to use the --with-ndbcluster option when running
configure. You can instead use the
BUILD/compile-pentium-max build script. Note that this script
includes OpenSSL, so you must have or obtain OpenSSL to build
successfully; otherwise you will need to modify
compile-pentium-max to exclude this requirement. Of course, you
can also just follow the standard instructions for compiling your own
binaries, then perform the usual tests and installation procedure.
See section 2.8.3 Installing from the Development Source Tree.
In the next few sections, we assume that you are already familiar with
installing MySQL, and here we cover only the differences between
configuring MySQL Cluster and configuring MySQL without clustering.
(See section 2 Installing MySQL if you require more information about the
latter.)
You will find Cluster configuration easiest if you have already have
all MGM and DB nodes running first; this is likely to be the most
time-consuming part of the configuration. Editing the `my.cnf'
file is fairly straightforward, and this section will cover only any
differences from configuring MySQL without clustering.
In order to familiarise you with the basics, we will describe the
simplest possible configuration for a functional MySQL Cluster. After
this, you should be able to design your desired setup from the
information provided in the other relevant sections of this chapter.
First, you need to create a configuration directory, for example
`/var/lib/mysql-cluster', by executing the following command as
root:
shell> mkdir /var/lib/mysql-cluster
In this directory, create a file named `config.ini' with the
following information, substituting appropriate values for
HostName and DataDir as necessary for your system.
# file "config.ini" - showing minimal setup consisting of 1 DB node,
# 1 management server, and 3 MySQL servers.
# The empty default sections are not required, and are shown only for
# the sake of completeness.
# Storage nodes are required to provide a hostname but MySQL Servers
# are not.
# If you don't know the hostname for your machine, use localhost.
# The DataDir parameter also has a default value, but it is recommended to
# set it explicitly.
# NDBD, MYSQLD, and NDB_MGMD are aliases for DB, API, and MGM respectively
#
[NDBD DEFAULT]
NoOfReplicas= 1
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
[NDB_MGMD]
HostName= myhost.example.com
[NDBD]
HostName= myhost.example.com
DataDir= /var/lib/mysql-cluster
[MYSQLD]
[MYSQLD]
[MYSQLD]
You can now start the management server as follows:
shell> cd /var/lib/mysql-cluster
shell> ndb_mgmd
Then start a single DB node by running ndbd. When starting
ndbd for a given DB node for the very first time, you should
use the --initial option:
shell> ndbd --initial
For subsequent ndbd starts, you will generally not want to use
--initial:
shell> ndbd
This is because the --initial option will delete all existing
data and log files (as well as all table metadata) for this storage
node and create new ones.
By default, ndbd will look for the management server at
localhost at port 1186. (Prior to MySQL 4.1.8, the default port
was 2200.)
Note: If you have installed MySQL from a binary tarball, you
will need to specify the path of the ndb_mgmd and ndbd
servers explicitly. (Normally, these will be found in
`/usr/local/mysql/bin'.)
Finally, go to the MySQL data directory (usually `/var/lib/mysql'
or `/usr/local/mysql/data'), and make sure that the `my.cnf'
file contains the option necessary to enable the NDB storage
engine:
[mysqld]
ndbcluster
You can now start the MySQL server as usual:
shell> mysqld_safe --user=mysql &
Wait a moment to make sure the MySQL server is running properly. If you
see the notice mysql ended, check the server's `.err' file
to find out what went wrong.
If all has gone well so far, you now can start using the cluster:
shell> mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 4.1.7
Type 'help;' or '\h' for help. Type '\c' to clear the buffer.
mysql> SHOW ENGINES;
+------------+---------+------------------------------------------------------------+
| Engine | Support | Comment |
+------------+---------+------------------------------------------------------------+
...
| NDBCLUSTER | DEFAULT | Clustered, fault-tolerant, memory-based tables |
| NDB | YES | Alias for NDBCLUSTER |
...
mysql> USE test;
Database changed
mysql> CREATE TABLE ctest (i INT) ENGINE=NDBCLUSTER;
Query OK, 0 rows affected (0.09 sec)
mysql> SHOW CREATE TABLE ctest \G
*************************** 1. row ***************************
Table: ctest
Create Table: CREATE TABLE `ctest` (
`i` int(11) default NULL
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
To check that your nodes were set up properly, start the management
client as shown:
shell> ndb_mgm
You can then use the SHOW command from within the management
client in order to obtain a report on the cluster's status:
NDB> SHOW
Cluster Configuration
---------------------
[ndbd(NDB)] 1 node(s)
id=2 @127.0.0.1 (Version: 3.5.3, Nodegroup: 0, Master)
[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1 (Version: 3.5.3)
[mysqld(API)] 3 node(s)
id=3 @127.0.0.1 (Version: 3.5.3)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)
At this point, you have successfully set up a working MySQL Cluster.
You can now store data in the cluster by using any table created with
ENGINE=NDBCLUSTER or its alias ENGINE=NDB.
Configuring MySQL Cluster requires working with two files:
-
`my.cnf': Specifies options for all MySQL Cluster executables. This
file, with which you should be familiar with from previous work with
MySQL, must be accessible by each executable running in the cluster.
-
`config.ini': This file is read only by the MySQL Cluster
management server, which then distributes the information contained in
this file to all processes participating in the cluster.
`config.ini' contains a description of each node involved in the
cluster. This includes configuration parameters for storage nodes and
configuration parameters for connections between all nodes in the
cluster.
We are continuously making improvements in Cluster configuration and
attempting to simplify this process. While we strive to maintain
backwards compatibility, there may be times when introduce an
incompatible change. In such cases we will try to let Cluster users
know in advance if a change is not backwards compatible. If you
find such a change which we have not documented, please use our
Bugs Database to report it.
In order to support MySQL Cluster, you will need to update
`my.cnf' as shown in the example below.
From version 4.1.8 some simplifications in `my.cnf' were made,
including new sections for the ndbcluster executables. However,
these should not be confused with those occurring in `config.ini'
files. As always, you may specify these parameters when invoking those
executables from the command line.
# my.cnf
# example additions to my.cnf for MySQL Cluster
# (valid from 4.1.8)
# enable ndbcluster storage engine, and provide connectstring for
# management server host (default port is 1186)
[mysqld]
ndbcluster
ndb-connectstring=ndb_mgmd.mysql.com
# provide connectstring for management server host (default port: 1186)
[ndbd]
connect-string=ndb_mgmd.mysql.com
# provide connectstring for management server host (default port: 1186)
[ndb_mgm]
connect-string=ndb_mgmd.mysql.com
# provide location of cluster configuration file
[ndb_mgmd]
config-file=/etc/config.ini
(For more information on connectstrings, see
section 16.3.4.2 The MySQL Cluster connectstring.)
# my.cnf
# example additions to my.cnf for MySQL Cluster
# (will work on all versions)
# enable ndbcluster storage engine, and provide connectstring for management
# server host to the default port 2200
[mysqld]
ndbcluster
ndb-connectstring=ndb_mgmd.mysql.com:2200
Also starting with MySQL 4.1.8, the `my.cnf' file supports a
separate [mysql_cluster] section for settings to be read by and
affecting all executables in the cluster:
# cluster-specific settings
[mysql_cluster]
ndb-connectstring=ndb_mgmd.mysql.com:2200
Currently the configuration file is in INI format, and is named
`config.ini' by default. It is read by ndb_mgmd at
startup and it can be placed anywhere. Its location and name are
specified by using --config-file=[<path>]<filename> on the
command line with ndb_mgmd. If the configuration file is not
specified, ndb_mgmd will by default try to read a
`config.ini' located in the current working directory.
Default values are defined for most parameters, and can also be
specified in `config.ini'. To create a default value section,
simply add the word DEFAULT to the section name. For example, DB
nodes are configured using [DB] sections. If all DB nodes use the
same data memory size, and this is not the same as the default size,
create a [DB DEFAULT] section containing a DataMemory
line to specify the default data memory size for all DB nodes.
The INI format consists of sections preceded by section headings
(surrounded by square brackets), followed by the appropriate parameter
names and values. One deviation from the standard format is that the
parameter name and value can be separated by a colon (`:') as well
as the equals sign (`='); another is that sections are not
uniquely identified by name. Instead, unique entries (such as two
different nodes of the same type) are identified by a unique ID.
At a minimum, the configuration file must define the computers and
nodes involved in the cluster and on which computers these nodes are
located. An example of a simple configuration file for a cluster
consisting of one management server, two storage nodes and two MySQL
servers is shown below:
# file "config.ini" - 2 DB nodes and 2 mysqld
# This file is placed in the startup directory of ndb_mgmd,
# i.e., the management server.
# The first MySQL Server can be started from any host and the second
# can only be started at the host mysqld_5.mysql.com
# NDBD, MYSQLD, and NDB_MGMD are aliases for DB, API, and MGM respectively
#
[NDBD DEFAULT]
NoOfReplicas= 2
DataDir= /var/lib/mysql-cluster
[NDB_MGMD]
Hostname= ndb_mgmd.mysql.com
DataDir= /var/lib/mysql-cluster
[NDBD]
HostName= ndbd_2.mysql.com
[NDBD]
HostName= ndbd_3.mysql.com
[MYSQLD]
[MYSQLD]
HostName= mysqld_5.mysql.com
There are six different sections in this configuration file:
-
[COMPUTER]: Defines the computers in the cluster.
-
[DB|NDBD]: Defines the cluster's storage nodes.
-
[API|MYSQLD]: Defines the cluster's MySQL server nodes.
-
[MGM|NDB_MGMD]: Defines the management server node in the cluster.
-
[TCP]: Defines TCP/IP connections between nodes in the cluster,
with TCP/IP being the default connection protocol.
-
[SHM]: Defines shared-memory connections between nodes. This
type of connection is available only in binaries
that have been built with --with-ndb-shm.
Note that each node has its own section in the `config.ini'. For
instance, since this cluster has two storage nodes, the configuration
file contains two sections defining these nodes. (In the example above,
these sections are labelled with [NDBD], but either or both of
them could have been labelled with [DB] instead.)
One can define DEFAULT values for each section. As of MySQL
4.1.5, all parameter names are case insensitive.
With the exception of the MySQL Cluster management server
(ndb_mgmd), each node making up a MySQL Cluster requires a
connectstring which points to the management server's location.
This is used in establishing a connection to the management server as
well as in performing other tasks depending on the node's role in the
cluster. The syntax for a connectstring is as follows:
<connectstring> := [<nodeid-specification>,]<host-specification>[,<host-specification>]
<nodeid-specification> := nodeid=<id>
<host-specification> := <host>[:<port>]
<id> is an integer larger than 1 identifying a node in config.ini
<port> is an integer referring to a regular unix port
<host> is a string which is a valid Internet host address
example 1 (long): "nodeid=2,myhost1:1100,myhost2:1100,192.168.0.3:1200"
example 2 (short): "myhost1"
All nodes will use localhost:1186 as the default
connectstring value if none is provided. If <port> is
omitted from the connectstring, the default port is 1186.
(Note: Prior to MySQL 4.1.8, the default port was 2200.) This
port should always be available on the network, since it has been
assigned by IANA for this purpose (see
http://www.iana.org/assignments/port-numbers for details).
By listing multiple <host-specification> values, it is possible
to designate several redundant management servers. A cluster node will
attempt to contact succesive management servers on each host in the
order specified, until a successful connection has been established.
There are a number of different ways to specify the connectstring:
-
Each executable has its own command line option which enables
specifying the management server at startup. (See the documentation
for the respective executable.)
-
Beginning with MySQL 4.1.8, it is also possible to set the
connectstring to be used by all nodes in the cluster by placing
it in a [mysql-cluster] section in the management server's
`my.cnf' file.
-
For backwards compatibility, two other options are available, using the same
syntax:
-
Set the
NDB_CONNECTSTRING environment variable to contain the
connectstring.
-
Write the
connectstring for each executable into a text file
named `Ndb.cfg' and place this file in the executable's startup
directory.
The recommended method for specifying the connectstring is to
set it on the command line or `my.cnf' fiel for each executable.
The [COMPUTER] section has no real significance other than serving as
a way to avoid the need of defining host names for each node in the system.
All parameters mentioned here are required.
[COMPUTER]Id
-
This is an internal identity in the configuration file. Later on in the
file one refers to this computer by the ID. It is an integer.
[COMPUTER]HostName
-
This is the host name of the computer. It is also possible to use an
IP address rather than the host name.
The [MGM] section (or its alias [NDB_MGMD]) is used to
configure the behavior of the management server. Either the
ExecuteOnComputer or HostName parameter must be present.
All other parameters can be omitted and if so will assume their
default values.
[MGM]Id
-
Each node in the cluster has a unique identity represented by an
integer value between 1 and 63 inclusive. This ID is used for
addressing the node by all internal cluster messages.
[MGM]ExecuteOnComputer
-
This refers to one of the computers defined in the
[COMPUTER]
section.
[MGM]PortNumber
-
This is the port number on which the management server listens for
configuration requests and management commands.
[MGM]LogDestination
-
This parameter specifies where to send cluster logging information.
There are three options in this regard:
CONSOLE, SYSLOG, and
FILE.
-
CONSOLE outputs the log to stdout:
CONSOLE
-
SYSLOG sends the log to a syslog facility, possible
values being one of auth, authpriv, cron,
daemon, ftp, kern, lpr, mail,
news, syslog, user, uucp, local0,
local1, local2, local3, local4,
local5, local6, or local7. (Note: Not
every facility is necessarily supported by every operating system.)
SYSLOG:facility=syslog
-
FILE pipes the cluster log ouput to a regular file on the same
machine. The following values can be specified:
-
filename: The name of the logfile.
-
maxsize: The maximum size to which the file can grow before
logging rolls over to a new file. When this occurs, the old logfile is
renamed by appending `.x' to the filename, where x is the
next number not yet used with this name.
-
maxfiles: The maximum number of logfiles.
FILE:filename=cluster.log,maxsize=1000000,maxfiles=6
It is possible to specify multiple log destinations as shown here,
using a semicolon-delimited string:
CONSOLE;SYSLOG:facility=local0;FILE:filename=/var/log/mgmd
The default value for the FILE parameter is
FILE:filename=ndb_<id>_cluster.log,maxsize=1000000,maxfiles=6,
where <id> is the ID of the node.
[MGM]ArbitrationRank
-
This parameter is used to define which nodes can act as arbitrators.
Only MGM nodes and API nodes can be arbitrators and can take one of the
following values:
-
0: The node will never be used as an arbitrator.
-
1: The node has high priority; that is, it will be preferred as
an arbitrator over low-priority nodes.
-
2: Indicates a low-priority node which be used as an arbtrator
only if a node with a higher priority is not available for that
purpose.
Normally, the management server should be configured as arbitrator by
setting its ArbitrationRank to 1 (the default) and that of all
API or server nodes to 0.
[MGM]ArbitrationDelay
-
An integer value which causes the management server's responses to
arbitration requests to be delayed by that number of milliseconds.
By default, this value is 0; it is normally not necessary to change it.
[MGM]DataDir
-
This sets the directory where output files from the management server
will be placed. These files include cluster log files, process output
files, and the daemon's pid file. (For log files, this can be
overridden by setting the
FILE parameter for
[MGM]LogDestination as discussed previously in this section.)
The [DB] section (or its alias [NDBD]) is used to configure
the behavior of the storage nodes. There are many parameters specified that
controls the buffer sizes, pool sizes, timeout parameters and so forth. The
only mandatory parameter is either ExecuteOnComputer or
HostName and the parameter NoOfReplicas which need to be
defined in the [DB DEFAULT] section. Most parameters should be set in
the [DB DEFAULT] section. Only parameters explicitly stated as
possible to have local values are allowed to be changed in the [DB]
section. HostName, Id and ExecuteOnComputer needs to be
defined in the local [DB] section.
The Id value (that is, the identification of the storage node)
can now be allocated when the node is started. It is still possible to
assign a node ID in the configuration file.
For each parameter it is possible to use k, M, or G as a suffix to indicate
units of 1024, 1024*1024, or 1024*1024*1024. For example, 100k means
102400. Parameters and values are currently case sensitive.
[DB]Id
-
This identity is the node ID used as the address of the node in all
cluster internal messages. This is an integer between 1 and 63. Each
node in the cluster has a unique identity.
[DB]ExecuteOnComputer
-
This is referring to one of the computers defined in the computer section.
[DB]HostName
-
This parameter is similar to specifying a computer to execute on. It defines
the host name of the computer the storage node is to reside on. Either this
parameter or
ExecuteOnComputer is required.
[DB]ServerPort
-
Each node in the cluster will use one port as the port other nodes use to
connect the transporters to each other. This port is used also for non-TCP
transporters in the connection setup phase. The default port will be
calculated to ensure that no nodes on the same computer receive the same
port number.
[DB]NoOfReplicas
-
This parameter can be set only in the
[DB DEFAULT] section because it is a
global parameter. It defines the number of replicas for each table
stored in the cluster. This parameter also specifies the size of node
groups. A node group is a set of nodes that all store the same information.
Node groups are formed implicitly. The first node group is formed by the
storage nodes with the lowest node identities. And the next by the next
lowest node identities. As an example presume we have 4 storage nodes and
NoOfReplicas is set to 2. The four storage nodes have node IDs 2, 3,
4 and 5. Then the first node group will be formed by node 2 and node 3. The
second node group will be formed by node 4 and node 5. It is important to
configure the cluster in such a manner such that nodes in the same node
groups are not placed on the same computer. This would cause a single HW
failure to cause a cluster crash.
If no node identities are provided then the order of the storage nodes will
be the determining factor for the node group. The actual node group assigned
will be printed by the SHOW command in the management client.
There is no default value and the maximum number is 4.
[DB]DataDir
-
This parameter specifies the directory where trace files, log files, pid files
and error logs are placed.
[DB]FileSystemPath
-
This parameter specifies the directory where all files created for metadata,
REDO logs, UNDO logs and data files are placed. The default value is to use
the same directory as the
DataDir. The directory must be created before
starting the ndbd process.
If you use the recommended directory hierarchy, you will use a directory
`/var/lib/mysql-cluster'. Under this directory a directory `ndb_2_fs'
will be created (if node ID was 2) which will be the file system for that
node.
[DB]BackupDataDir
-
It is possible also to specify the directory where backups will be placed.
By default, the directory
FileSystemPath/`BACKUP' will be choosen.
DataMemory and IndexMemory are the parameters that specify
the size of memory segments used to store the actual records and their
indexes. It is important to understand how DataMemory and
IndexMemory are used to understand how to set these parameters.
For most uses, they need to be updated to reflect the usage of the
cluster.
[DB]DataMemory
-
This parameter is one of the most important parameters because it defines the
space available to store the actual records in the database. The entire
DataMemory will be allocated in memory so it is important that the
machine contains enough memory to handle the DataMemory size.
The DataMemory is used to store two things. It stores the actual
records. Each record is currently of fixed size. So VARCHAR columns are
stored as fixed size columns. There is an overhead on each record of 16
bytes normally. Additionally each record is stored in a 32KB page with
128 byte page overhead. There will also be a small amount of waste for
each page because records are only stored in one page. The maximum record
size for the columns currently is 8052 bytes.
The DataMemory is also used to store ordered indexes. Ordered
indexes uses about 10 bytes per record. Each record in the table is always
represented in the ordered index.
The DataMemory consists of 32KB pages. These pages are allocated
to partitions of the tables. Each table is normally partitioned with the
same number of partitions as there are storage nodes in the cluster. Thus
for each node there are the same number of partitions (=fragments) as the
NoOfReplicas is set to. Once a page has been allocated to a
partition it is currently not possible to bring it back to the pool of
free pages. The method to restore pages to the pool is by deleting the
table. Performing a node recovery also will compress the partition because
all records are inserted into an empty partition from another live node.
Another important aspect is that the DataMemory also contains UNDO
information for records. For each update of a record a copy record is
allocated in the DataMemory. Also each copy record will also have an
instance in the ordered indexes of the table. Unique hash indexes are
updated only when the unique index columns are updated and in that case a new
entry in the index table is inserted and at commit the old entry is
deleted. Thus it is necessary also to allocate memory to be able to handle
the largest transactions which are performed in the cluster.
Performing large transactions has no advantage in MySQL Cluster other than
the consistency of using transactions which is the whole idea of
transactions. It is not faster and consumes large amounts of memory.
The default DataMemory size is 80MB. The minimum size is 1MB.
There is no maximum size, but in reality the
maximum size has to be adapted so that the process doesn't start swapping
when using the maximum size of the memory.
[DB]IndexMemory
-
The
IndexMemory is the parameter that controls the amount of storage
used for hash indexes in MySQL Cluster. Hash indexes are always used for
primary key indexes, unique indexes, and unique constraints. Actually
when defining a primary key and a unique index there will be two indexes
created in MySQL Cluster. One index is a hash index which is used for all
tuple accesses and also for lock handling. It is also used to ensure unique
constraints.
The size of the hash index is 25 bytes plus the size of the primary key.
For primary keys larger than 32 bytes another 8 bytes is added for some
internal references.
Thus for a table defined as
CREATE TABLE example
(
a INT NOT NULL,
b INT NOT NULL,
c INT NOT NULL,
PRIMARY KEY(a),
UNIQUE(b)
) ENGINE=NDBCLUSTER;
We will have 12 bytes overhead (having no nullable columns saves 4 bytes of
overhead) plus 12 bytes of data per record. In addition we will have two
ordered indexes on a and b consuming about 10 bytes each per record. We
will also have a primary key hash index in the base table with roughly 29
bytes per record. The unique constraint is implemented by a separate table
with b as primary key and a as a column. This table will consume another 29
bytes of index memory per record in the table and also 12 bytes of overhead
plus 8 bytes of data in the record part.
Thus for one million records, we will need 58MB of index memory to handle
the hash indexes for the primary key and the unique constraint. For the
DataMemory part we will need 64MB of memory to handle the
records of the base table and the unique index table plus the two ordered
index tables.
The conclusion is that hash indexes takes up a fair amount of memory space
but in return they provide very fast access to the data. They are also used
in MySQL Cluster to handle uniqueness constraints.
Currently the only partitioning algorithm is hashing and the ordered
indexes are local to each node and can thus not be used to handle
uniqueness constraints in the general case.
An important point for both IndexMemory and DataMemory is
that the total database size is the sum of all DataMemory and
IndexMemory in each node group. Each node group is used to store
replicated information, so if there are four nodes with 2 replicas there
will be two node groups and thus the total DataMemory available is
2*DataMemory in each of the nodes.
Another important point is about changes of DataMemory and
IndexMemory. First of all, it is highly recommended to have the same
amount of DataMemory and IndexMemory in all nodes. Since data
is distributed evenly over all nodes in the cluster the size available is
no better than the smallest sized node in the cluster times the number of
node groups.
DataMemory and IndexMemory can be changed, but it is dangerous to
decrease them because that can easily lead to a node that will not be able to
restart or even a cluster not being able to restart since there is not
enough memory space for the tables needed to restore into the starting
node. Increasing them should be quite okay, but it is recommended that such
upgrades are performed in the same manner as a software upgrade where first
the configuration file is updated, then the management server is restarted
and then one storage node at a time is restarted by command.
More IndexMemory is not used due to updates but inserts are inserted
immediately and deletes are not deleted until the transaction is committed.
The default IndexMemory size is 18MB. The minimum size is 1MB.
The next three parameters are important because they affect the number of
parallel transactions and the sizes of transactions that can be handled by
the system. MaxNoOfConcurrentTransactions sets the number of
parallel transactions possible in a node and
MaxNoOfConcurrentOperations sets the number of records that can be
in update phase or locked simultaneously.
Both of these parameters and particularly
MaxNoOfConcurrentOperations are likely targets for users setting
specific values and not using the default value. The default value is set
for systems using small transactions and to ensure not using too much
memory in the default case.
[DB]MaxNoOfConcurrentTransactions
-
For each active transaction in the cluster there needs to be also a
transaction record in one of the nodes in the cluster. The role of
transaction coordination is spread among the nodes and thus the total
number of transactions records in the cluster is the amount in one times
the number of nodes in the cluster.
Actually transaction records are allocated to MySQL servers, normally there
is at least one transaction record allocated in the cluster per connection
that uses or have used a table in the cluster. Thus one should ensure that
there is more transaction records in the cluster than there are concurrent
connections to all MySQL servers in the cluster.
This parameter has to be the same in all nodes in the cluster.
Changing this parameter is never safe and can cause a cluster crash. When a
node crashes one of the node (actually the oldest surviving node) will
build up the transaction state of all transactions ongoing in the crashed
node at the time of the crash. It is thus important that this node has as
many transaction records as the failed node.
The default value for this parameter is 4096.
[DB]MaxNoOfConcurrentOperations
-
This parameter is likely to be subject for change by users. Users
performing only short, small transactions don't need to set this parameter very
high. Applications desiring to be able to perform rather large transactions
involving many records need to set this parameter higher.
For each transaction that updates data in the cluster it is required to
have operation records. There are operation records both in the transaction
coordinator and in the nodes where the actual updates are performed.
The operation records contain state information needed to be able to find
UNDO records for rollback, lock queues, and much other state information.
To dimension the cluster to handle transactions where one million records are
updated simultaneously one should set this parameter to one million divided
by the number of nodes. Thus for a cluster with four storage nodes one
should set this parameter to 250000.
Also read queries which set locks use up operation records. Some extra
space is allocated in the local nodes to cater for cases where the
distribution is not perfect over the nodes.
When queries translate into using the unique hash index there will actually
be two operation records used per record in the transaction. The first one
represents the read in the index table and the second handles the operation
on the base table.
The default value for this parameter is 32768.
This parameter actually handles two parts that can be configured separately.
The first part specifies how many operation records are to be placed in the
transaction coordinator part. The second part specifies how many operation
records that are to be used in the local database part.
If a very big transaction is performed on a 8-node cluster then this will
need as many operation records in the transaction coordinator as there are
reads, updates, deletes involved in the transaction. The transaction will
however spread the operation records of the actual reads, updates, and
inserts over all eight nodes. Thus if it is necessary to configure the
system for one very big transaction then it is a good idea to configure
those separately.
MaxNoOfConcurrentOperations will always be used
to calculate the number of operation records in the transaction coordinator
part of the node.
It is also important to have an idea of the memory requirements for those
operation records. In MySQL 4.1.5, operation records consume
about 1KB per record. This figure will shrink in future 5.x versions.
[DB]MaxNoOfLocalOperations
-
By default this parameter is calculated as 1.1 *
MaxNoOfConcurrentOperations
which fits systems with many simultaneous, not very large transactions. If
the configuration needs to handle one very large transaction at a time and
there are many nodes then it is a good idea to configure this separately.
The next set of parameters are used for temporary storage in the midst of
executing a part of a query in the cluster. All of these records will have
been released when the query part is completed and is waiting for the
commit or rollback.
Most of the defaults for these parameters will be okay for most users. Some
high-end users might want to increase those to enable more parallelism in
the system and some low-end users might want to decrease them to save
memory.
[DB]MaxNoOfConcurrentIndexOperations
-
For queries using a unique hash index another set of operation records are
temporarily used in the execution phase of the query. This parameter sets
the size of this pool. Thus this record is only allocated while executing a
part of a query, as soon as this part has been executed the record is
released. The state needed to handle aborts and commits is handled by the
normal operation records where the pool size is set by the parameter
MaxNoOfConcurrentOperations.
The default value of this parameter is 8192. Only in rare cases of
extremely high parallelism using unique hash indexes should this parameter
be necessary to increase. To decrease could be performed for memory savings
if the DBA is certain that such high parallelism is not occurring in the
cluster.
[DB]MaxNoOfFiredTriggers
-
The default value of
MaxNoOfFiredTriggers is 4000. Normally this
value should be sufficient for most systems. In some cases it could be
decreased if the DBA feels certain the parallelism in the cluster is not so
high.
This record is used when an operation is performed that affects a unique
hash index. Updating a column that is part of a unique hash index or
inserting/deleting a record in a table with unique hash indexes will fire
an insert or delete in the index table. This record is used to represent
this index table operation while its waiting for the original operation
that fired it to complete. Thus it is short lived but can still need a
fair amount of records in its pool for temporary situations with many
parallel write operations on a base table containing a set of unique hash
indexes.
[DB]TransactionBufferMemory
-
This parameter is also used for keeping fired operations to update index
tables. This part keeps the key and column information for the fired
operations. It should be very rare that this parameter needs to be updated.
Also normal read and write operations use a similar buffer. This buffer is
even more short term in its usage so this is a compile time parameter set to
4000*128 bytes (500KB). The parameter is
ZATTRBUF_FILESIZE in
Dbtc.hpp. A similar buffer for key info exists which contains 4000*16
bytes, 62.5KB of buffer space. The parameter in this case is
ZDATABUF_FILESIZE in Dbtc.hpp. Dbtc is the module for
handling the transaction coordination.
Similar parameters exist in the Dblqh module taking care of the reads
and updates where the data is located. In `Dblqh.hpp' with
ZATTRINBUF_FILESIZE set to 10000*128 bytes (1250KB) and
ZDATABUF_FILE_SIZE, set to 10000*16 bytes (roughly 156KB) of buffer
space. No known instances of that any of those compile time limits haven't
been big enough has been reported so far or discovered by any of our
extensive test suites.
The default size of the TransactionBufferMemory is 1MB.
[DB]MaxNoOfConcurrentScans
-
This parameter is used to control the amount of parallel scans that can be
performed in the cluster. Each transaction coordinator can handle the
amount of parallel scans defined by this parameter. Each scan query is
performed by scanning all partitions in parallel. Each partition scan will
use a scan record in the node where the partition is located. The number of
those records is the size of this parameter times the number of nodes so
that the cluster should be able to sustain maximum number of scans in
parallel from all nodes in the cluster.
Scans are performed in two cases. The first case is when no hash or ordered
indexes exists to handle the query. In this case the query is executed by
performing a full table scan. The second case is when there is no hash
index to support the query but there is an ordered index. Using the ordered
index means executing a parallel range scan. Since the order is only kept
on the local partitions it is necessary to perform the index scan on all
partitions.
The default value of
MaxNoOfConcurrentScans is 256. The maximum value
is 500.
This parameter will always specify the number of scans possible in the
transaction coordinator. If the number of local scan records is not provided
it is calculated as the product of MaxNoOfConcurrentScans and the number
of storage nodes in the system.
[DB]MaxNoOfLocalScans
-
Possible to specify the number of local scan records if many scans are not
fully parallelized.
[DB]BatchSizePerLocalScan
-
This parameter is used to calculate the number of lock records which needs
to be there to handle many concurrent scan operations.
The default value is 64 and this value has a strong connection to the
ScanBatchSize defined in the API nodes.
[DB]LongMessageBuffer
-
This is an internal buffer used for message passing internally in the node
and for messages between nodes in the system. It is highly unlikely that
anybody would need to change this parameter but for configurability it is
now configurable. By default it is set to 1MB.
[DB]NoOfFragmentLogFiles
-
This is an important parameter that states the size of the REDO
log files in the node. REDO log files are organized in a ring such that it
is important that the tail and the head doesn't meet. When the tail and
head have come to close the each other the node will start aborting all
updating transactions because there is no room for the log records.
REDO log records aren't removed until three local checkpoints have
completed since the log record was inserted. The speed of checkpoint is
controlled by a set of other parameters so these parameters are all glued
together.
The default parameter value is 8, which means 8 sets of 4 16MB files. Thus in
total 512MB. Thus the unit is 64MB of REDO log space. In high
update scenarios this parameter needs to be set very high. Test cases
where it has been necessary to set it to over 300 have been performed.
If the checkpointing is slow and there are so many writes to the database
that the log files are full and the log tail cannot be cut for recovery
reasons then all updating transactions will be aborted with internal error
code 410 which will be translated to
Out of log file space temporarily.
This condition will prevail until a checkpoint has completed and the log tail
can be moved forward.
[DB]MaxNoOfSavedMessages
-
This parameter sets the maximum number of trace files that will be kept
before overwriting old trace files. Trace files are generated when the node
crashes for some reason.
The default is 25 trace files.
The next set of parameters defines the pool sizes for metadata objects. It
is necessary to define the maximum number of attributes, tables, indexes,
and trigger objects used by indexes, events and replication between
clusters.
[DB]MaxNoOfAttributes
-
This parameter defines the number of attributes that can be defined in the
cluster.
The default value of this parameter is 1000. The minimum value is 32 and there is
no maximum. Each attribute consumes around 200 bytes of storage in each node
because metadata is fully replicated in the servers.
[DB]MaxNoOfTables
-
A table object is allocated for each table, for each unique hash index, and
for each ordered index. This parameter sets the maximum number of table
objects in the cluster.
For each attribute that has a
BLOB data type an extra table is used to
store most of the BLOB data. These tables also must be taken into account
when defining the number of tables.
The default value of this parameter is 128. The minimum is 8 and the
maximum is 1600. Each table object consumes around 20KB in each node.
[DB]MaxNoOfOrderedIndexes
-
For each ordered index in the cluster, objects are allocated to describe what
it is indexing and its storage parts. By default each index defined will have
an ordered index also defined. Unique indexes and primary key indexes have
both an ordered index and a hash index.
The default value of this parameter is 128. Each object consumes around 10KB
of data per node.
[DB]MaxNoOfUniqueHashIndexes
-
For each unique index (not for primary keys) a special table is allocated that
maps the unique key to the primary key of the indexed table. By default there
will be an ordered index also defined for each unique index. To avoid this,
you must use the
USING HASH option in the unique index definition.
The default value is 64. Each index will consume around 15KB per node.
[DB]MaxNoOfTriggers
-
For each unique hash index an internal update, insert and delete trigger is
allocated. Thus three triggers per unique hash index. Ordered indexes use
only one trigger object. Backups also use three trigger objects for each
normal table in the cluster. When replication between clusters is supported
it will also use internal triggers.
This parameter sets the maximum number of trigger objects in the cluster.
The default value of this parameter is 768.
[DB]MaxNoOfIndexes
-
This parameter was deprecated in MySQL 4.1.5. Now you should use
MaxNoOfOrderedIndexes and MaxNoOfUniqueHashIndexes instead.
This parameter is only used by unique hash indexes. There needs to be one
record in this pool for each unique hash index defined in the cluster.
The default value of this parameter is 128.
There is a set of boolean parameters affecting the behavior of storage
nodes. Boolean parameters can be specified to true by setting it to Y or 1
and to false by setting it to N or 0.
[DB]LockPagesInMainMemory
-
For a number of operating systems such as Solaris and Linux it is possible
to lock a process into memory and avoid all swapping problems. This is an
important feature to provide real-time characteristics of the cluster.
The default is that this feature is not enabled.
[DB]StopOnError
-
This parameter states whether the process is to exit on error condition or
whether it is perform an automatic restart.
The default is that this feature is enabled.
[DB]Diskless
-
In the internal interfaces it is possible to set tables as diskless tables
meaning that the tables are not checkpointed to disk and no logging occur.
They only exist in main memory. The tables will still exist after a crash
but not the records in the table.
This feature makes the entire cluster
Diskless, in this case even
the tables doesn't exist anymore after a crash. Enabling this feature can
be done by either setting it to Y or 1.
When this feature is enabled, backups will be performed but will not be
stored because there is no "disk". In future releases it is likely to make
the backup diskless a separate configurable parameter.
The default is that this feature is not enabled.
[DB]RestartOnErrorInsert
-
This feature is only accessible when building the debug version where it is
possible to insert errors in the execution of various code parts to test
failure cases.
The default is that this feature is not enabled.
There are quite a few parameters specifying timeouts and time intervals
between various actions in the storage nodes. Most of the timeouts are
specified in milliseconds with a few exceptions which will be mentioned
below.
[DB]TimeBetweenWatchDogCheck
-
To ensure that the main thread doesn't get stuck in an eternal loop
somewhere there is a watch dog thread which checks the main thread. This
parameter states the number of milliseconds between each check. After three
checks and still being in the same state the process is stopped by the
watch dog thread.
This parameter can easily be changed and can be different in the nodes
although there seems to be little reason for such a difference.
The default timeout is 4000 milliseconds (4 seconds).
[DB]StartPartialTimeout
-
This parameter specifies the time that the cluster will wait for all
storage nodes to come up before the algorithm to start the cluster is
invocated. This time out is used to avoid starting only a partial cluster
if possible.
The default value is 30000 milliseconds (30 seconds). 0 means eternal time
out. Thus only start if all nodes are available.
[DB]StartPartitionedTimeout
-
If the cluster is ready start after waiting
StartPartialTimeout but
is still in a possibly partitioned state one waits until also this timeout
has passed.
The default timeout is 60000 milliseconds (60 seconds).
[DB]StartFailureTimeout
-
If the start is not completed within the time specified by this parameter
the node start will fail. Setting this parameter to 0 means no time out is
applied on the time to start the cluster.
The default value is 60000 milliseconds (60 seconds). For storage nodes
containing large data sets this parameter needs to be increased because it
could very well take 10-15 minutes to perform a node restart of a storage
node with a few gigabytes of data.
[DB]HeartbeatIntervalDbDb
-
One of the main methods of discovering failed nodes is by heartbeats. This
parameter states how often heartbeat signals are sent and how often to
expect to receive them. After missing three heartbeat intervals in a row,
the node is declared dead. Thus the maximum time of discovering a failure
through the heartbeat mechanism is four times the heartbeat interval.
The default heartbeat interval is 1500 milliseconds (1.5 seconds). This
parameter must not be changed drastically. If one node uses 5000
milliseconds and the node watching it uses 1000 milliseconds then obviously
the node will be declared dead very quickly. So this parameter can be
changed in small steps during an online software upgrade but not in large
steps.
[DB]HeartbeatIntervalDbApi
-
In a similar manner each storage node sends heartbeats to each of the
connected MySQL servers to ensure that they behave properly. If a MySQL
server doesn't send a heartbeat in time (same algorithm as for storage node
with three heartbeats missed causing failure) it is declared down and all
ongoing transactions will be completed and all resources will be released
and the MySQL server cannot reconnect until the completion of all
activities started by the previous MySQL instance has been completed.
The default interval is 1500 milliseconds. This interval can be different
in the storage node because each storage node independently of all other
storage nodes watches the MySQL servers connected to it.
[DB]TimeBetweenLocalCheckpoints
-
This parameter is an exception in that it doesn't state any time to wait
before starting a new local checkpoint. This parameter is used to ensure
that in a cluster where not so many updates are taking place that we don't
perform local checkpoints. In most clusters with high update rates it is
likely that a new local checkpoint is started immediately after the
previous was completed.
The size of all write operations executed since the start of the previous
local checkpoints is added. This parameter is specified as the logarithm
of the number of words. So the default value 20 means 4MB of write
operations, 21 would mean 8MB and so forth up until the maximum value
31 which means 8GB of write operations.
All the write operations in the cluster are added together. Setting it to
6 or lower means that local checkpoints will execute continuosly without
any wait between them independent of the workload in the cluster.
[DB]TimeBetweenGlobalCheckpoints
-
When a transaction is committed it is committed in main memory in all nodes
where mirrors of the data existed. The log records of the transaction are
not forced to disk as part of the commit however. The reasoning here is
that having the transaction safely committed in at least two independent
computers should be meeting standards of durability.
At the same time it is also important to ensure that even the worst of
cases when the cluster completely crashes is handled properly. To ensure
this all transactions in a certain interval is put into a global
checkpoint. A global checkpoint is very similar to a grouped commit of
transactions. An entire group of transactions is sent to disk. Thus as part
of the commit the transaction was put into a global checkpoint group. Later
this groups log records are forced to disk and then the entire group of
transaction is safely committed also on all computers disk storage as well.
This parameter states the interval between global checkpoints. The default
time is 2000 milliseconds.
[DB]TimeBetweenInactiveTransactionAbortCheck
-
Time-out handling is performed by checking each timer on each transaction
every period of time in accordance with this parameter. Thus if this
parameter is set to 1000 milliseconds, then every transaction will be
checked for timeout once every second.
The default for this parameter is 1000 milliseconds (1 second).
[DB]TransactionInactiveTimeout
-
If the transaction is currently not performing any queries but is waiting
for further user input, this parameter states the maximum time that the
user can wait before the transaction is aborted.
The default for this parameter is no timeout. For a real-time database
that needs to control that no transaction keeps locks for a too long time
this parameter should be set to a much smaller value. The unit is
milliseconds.
[DB]TransactionDeadlockDetectionTimeout
-
When a transactioon is involved in executing a query it waits for other
nodes. If the other nodes doesn't respond it could depend on three things.
First, the node could be dead, second the operation could have entered a
lock queue and finally the node requested to perform the action could be
heavily overloaded. This timeout parameter states how long the
transaction coordinator will wait until it aborts the transaction when
waiting for query execution of another node.
Thus this parameter is important both for node failure handling and for
deadlock detection. Setting it too high would cause a non-desirable
behavior at deadlocks and node failures.
The default time out is 1200 milliseconds (1.2 seconds).
[DB]NoOfDiskPagesToDiskAfterRestartTUP
-
When executing a local checkpoint the algorithm sends all data pages to
disk during the local checkpoint. Simply sending them there as quickly as
possible will cause unnecessary load on both processors, networks, and
disks. Thus to control the write speed this parameter specifies how many
pages per 100 milliseconds is to be written. A page is here defined as 8KB.
The unit this parameter is specified in is thus 80KB per
second. So setting it to 20 means writing 1.6MB of data pages to disk
per second during a local checkpoint. Also writing of UNDO log records for
data pages is part of this sum. Writing of index pages (see IndexMemory to
understand what index pages are used for) and their UNDO log records is
handled by the parameter
NoOfDiskPagesToDiskAfterRestartACC. This
parameter handles the limitation of writes from the DataMemory.
So this parameter specifies how quickly local checkpoints will be executed.
This parameter is important in connection with NoOfFragmentLogFiles,
DataMemory, IndexMemory.
The default value is 40 (3.2MB of data pages per second).
[DB]NoOfDiskPagesToDiskAfterRestartACC
-
This parameter has the same unit as
NoOfDiskPagesToDiskAfterRestartTUP but limits the speed of writing
index pages from IndexMemory.
The default value of this parameter is 20 (1.6MB per second).
[DB]NoOfDiskPagesToDiskDuringRestartTUP
-
This parameter specifies the same things as
NoOfDiskPagesToDiskAfterRestartTUP and
NoOfDiskPagesToDiskAfterRestartACC, only it does it for local
checkpoints executed in the node as part of a local checkpoint when the
node is restarting. As part of all node restarts a local checkpoint is
always performed. Since during a node restart it is possible to use a
higher speed of writing to disk because fewer activities are performed in the
node due to the restart phase.
This parameter handles the DataMemory part.
The default value is 40 (3.2MB per second).
[DB]NoOfDiskPagesToDiskDuringRestartACC
-
During Restart for
IndexMemory part of local checkpoint.
The default value is 20 (1.6MB per second).
[DB]ArbitrationTimeout
-
This parameter specifies the time that the storage node will wait for a
response from the arbitrator when sending an arbitration message in the
case of a split network.
The default value is 1000 milliseconds (1 second).
A number of new configuration parameters were introduced in MySQL 4.1.5.
These correspond to values that
previously were compile time parameters. The main reason for this is to
enable the advanced user to have more control of the size of the process
and adjust various buffer sizes according to his needs.
All of these buffers are used as front-ends to the file system when
writing log records of various kinds to disk. If the node runs with
Diskless then these parameters can most definitely be set to their
minimum values because all disk writes are faked as okay by the file system
abstraction layer in the NDB storage engine.
[DB]UndoIndexBuffer
-
This buffer is used during local checkpoints. The
NDB storage engine
uses a recovery scheme based on a consistent checkpoint together with an
operational REDO log. In order to produce a consistent checkpoint without
blocking the entire system for writes, UNDO logging is done while performing
the local checkpoint. The UNDO logging is only activated on one fragment of
one table at a time. This optimization is possible because tables are
entirely stored in main memory.
This buffer is used for the updates on the primary key hash index. Inserts
and deletes rearrange the hash index and the NDB storage engine writes
UNDO log records that map all physical changes to an index page such that
they can be undone at a system restart. It also logs all active insert
operations at the start of a local checkpoint for the fragment.
Reads and updates only set lock bits and update a header in the hash index
entry. These changes are handled by the page write algorithm to ensure that
these operations need no UNDO logging.
This buffer is 2MB by default. The minimum value is 1MB. For most
applications this is good enough. Applications doing extremely heavy inserts
and deletes together with large transactions using large primary keys
might need to extend this buffer.
If this buffer is too small, the NDB storage engine issues the internal
error code 677 which will be translated into "Index UNDO buffers overloaded".
[DB]UndoDataBuffer
-
This buffer has exactly the same role as the
UndoIndexBuffer but is
used for the data part. This buffer is used during local checkpoint of a
fragment and inserts, deletes, and updates use the buffer.
Since these UNDO log entries tend to be bigger and more things are logged,
the buffer is also bigger by default. It is set to 16MB by default.
For some applications this might be too conservative and they might want
to decrease this size, the minimum size is 1MB. It should be rare that
applications need to increase this buffer size. If there is a need for this
it is a good idea to check if the disks can actually handle the load that
the update activity in the database causes. If they cannot then no size of
this buffer will be big enough.
If this buffer is too small and gets congested, the NDB storage engine
issues the internal error code 891 which will be translated to
"Data UNDO buffers overloaded".
[DB]RedoBuffer
-
All update activities also need to be logged. This enables a replay of these
updates at system restart. The recovery algorithm uses a consistent
checkpoint produced by a "fuzzy" checkpoint of the data together with UNDO
logging of the pages. Then it applies the REDO log to play back all changes
up until the time that will be restored in the system restart.
This buffer is 8MB by default. The minimum value is 1MB.
If this buffer is too small, the
NDB storage engine issues the internal
error code 1221 which will be translated into "REDO log buffers overloaded".
For cluster management, it is important to be able to control the
amount of log messages sent to stdout for various event types. The possible
events will be listed in this manual soon. There are 16 levels possible
from level 0 to level 15. Setting event reporting to level 15 means
receiving all event reports of that category and setting it to 0 means
getting no event reports in that category.
The reason why most defaults are set to 0 and thus not causing any output
to stdout is that the same message is sent to the cluster log in the
management server. Only the startup message is by default generated to
stdout.
A similar set of levels can be set in management client to define what
levels to record in the cluster log.
[DB]LogLevelStartup
-
Events generated during startup of the process.
The default level is 1.
[DB]LogLevelShutdown
-
Events generated as part of graceful shutdown of a node.
The default level is 0.
[DB]LogLevelStatistic
-
Statistical events such as how many primary key reads, updates, inserts and
many other statistical information of buffer usage, and so forth.
The default level is 0.
[DB]LogLevelCheckpoint
-
Events generated by local and global checkpoints.
The default level is 0.
[DB]LogLevelNodeRestart
-
Events generated during node restart.
The default level is 0.
[DB]LogLevelConnection
-
Events generated by connections between nodes in the cluster.
The default level is 0.
[DB]LogLevelError
-
Events generated by errors and warnings in the cluster. These are errors
not causing a node failure but still considered worth reporting.
The default level is 0.
[DB]LogLevelInfo
-
Events generated for information about state of cluster and so forth.
The default level is 0.
There is a set of parameters defining memory buffers that are set aside for
online backup execution.
[DB]BackupDataBufferSize
-
When executing a backup there are two buffers used for sending data to the
disk. This buffer is used to fill in data recorded by scanning the tables
in the node. When filling this to a certain level the pages are sent to
disk. This level is specified by the
BackupWriteSize parameter.
When sending data to the disk, the backup can continue filling this buffer
until it runs out of buffer space. When running out of buffer space, it will
simply stop the scan and wait until some disk writes return and thus free
up memory buffers to use for further scanning.
The default value is 2MB.
[DB]BackupLogBufferSize
-
This parameter has a similar role but instead used for writing a log of all
writes to the tables during execution of the backup. The same principles
apply for writing those pages as for
BackupDataBufferSize except
that when this part runs out of buffer space, it causes the backup to
fail due to lack of backup buffers. Thus the size of this buffer must be
big enough to handle the load caused by write activities during the backup
execution.
The default parameter should be big enough. Actually it is more likely that
a backup failure is caused by a disk not able to write as quickly as it
should. If the disk subsystem is not dimensioned for the write load caused
by the applications this will create a cluster which will have great
difficulties to perform the desired actions.
It is important to dimension the nodes in such a manner that the processors
becomes the bottleneck rather than the disks or the network connections.
The default value is 2MB.
[DB]BackupMemory
-
This parameter is simply the sum of the two previous, the
BackupDataBufferSize and BackupLogBufferSize.
The default value is 4MB.
[DB]BackupWriteSize
-
This parameter specifies the size of the write messages to disk for the log
and data buffer used for backups.
The default value is 32KB.
The [API] section (with its alias [MYSQLD]) defines the
behavior of the MySQL server. No parameter is mandatory. If no computer or
host name is provided, then any host can use this API node.
[API]Id
-
This identity is the node ID used as the address of the node in all cluster
internal messages. This is an integer between 1 and 63. Each node in the
cluster must have a unique identity.
[API]ExecuteOnComputer
-
This is referring to one of the computers defined in the computer section.
[API]ArbitrationRank
-
This parameter is used to define which nodes can act as an arbitrator. MGM
nodes and API nodes can be arbitrators. 0 means it isn't used as
arbitrator, 1 high priority and 2 low priority. A normal configuration uses
the management server as arbitrator setting the ArbitrationRank to 1 (which
is the default) and setting all APIs to 0.
[API]ArbitrationDelay
-
If setting this to anything else than 0 it means that the management server
will delay responses to the arbitration requests. Default is no delay and
this should not be necessary to change.
[API]BatchByteSize
-
For queries that get translated into full table scans or range scans on
indexes, it is important for best performance to fetch records in properly
sized batches. It is possible to set the proper size both in terms of
number of records and in terms of bytes. Real batch size will be limited
by both parameters.
Performance of queries can vary more than 40% due to how this parameter is
set. In future releases, the MySQL Server will make educated guesses on
what to set these parameters to, based on the query type.
This parameter is measured in bytes and is by default equal to 32KB.
[API]BatchSize
-
This parameter is measured in number of records and is by default set to
64. The maximum size is 992.
[API]MaxScanBatchSize
-
The batch size is the size of each batch sent from each storage node. Most
scans are performed in parallel so to protect the MySQL Server from getting
too much data from many nodes in parallel, this parameter sets a limit to
the total batch size over all nodes.
The default value of this parameter is set to 256KB. Its maximum size
is 16MB.
TCP/IP is the default transport mechanism for establishing connections in
MySQL Cluster. It
is actually not necessary to define any connection because there will be a
one connection setup between each of the storage nodes, between each
storage node, and all MySQL server nodes and between each storage node and
the management server.
It is only necessary to define a connection if it is necessary to change
the default values of the connection. In that case it is necessary to
define at least NodeId1, NodeId2 and the parameters to
change.
It is also possible to change the default values by setting the parameters
in the [TCP DEFAULT] section.
[TCP]NodeId1
-
[TCP]NodeId2
-
To identify a connection between two nodes it is necessary to provide the
node identity for both of them in
NodeId1 and NodeId2.
[TCP]SendBufferMemory
-
TCP transporters use a buffer all messages before performing the send call
to the operating system. When this buffer reaches 64KB it sends the
buffer, the buffer is also sent when a round of messages have been
executed. To handle temporary overload situations it is also possible to
define a bigger send buffer. The default size of the send buffer is 256KB.
[TCP]SendSignalId
-
To be able to retrace a distributed message diagram it is necessary to
identify each message with an identity. By setting this parameter to Y
these message identities are also transported over the network. This
feature is not enabled by default.
[TCP]Checksum
-
This parameter is also a Y/N parameter which is not enabled by default.
When enabled all messages are checksummed before put into the send buffer.
This feature enables control that messages are not corrupted while waiting
in the send buffer. It is also a double check that the transport mechanism
haven't corrupted the data.
[TCP]PortNumber
-
This is the port number to use for listening to connections from other
nodes. This port number should be specified in the
[TCP DEFAULT] section
normally.
This parameter should no longer be used. Use the parameter ServerPort on
storage nodes instead.
[TCP]ReceiveBufferMemory
-
This parameter specifies the size of the buffer used when receiving data
from the TCP/IP socket. There is seldom any need to change this parameter
from its default value of 64KB. One possible reason could be to save memory.
Shared memory segments are currently supported only for special builds of
MySQL Cluster using the configure parameter --with-ndb-shm. Its
implementation will most likely change. When defining shared memory as the
connection method it is necessary to define at least NodeId1,
NodeId2 and ShmKey. All other parameters have default values
that will work out fine in most cases.
[SHM]NodeId1
-
[SHM]NodeId2
-
To identify a connection between two nodes it is necessary to provide the
node identity for both of them in
NodeId1 and NodeId2.
[SHM]ShmKey
-
When setting up shared memory segments an identifier is used to uniquely
identify the shared memory segment to use for the communication. This is an
integer which does not have a default value.
[SHM]ShmSize
-
Each connection has a shared memory segment where messages between the
nodes are put by the sender and read by the reader. This segment has a size
defined by this parameter. Default value is 1MB.
[SHM]SendSignalId
-
To be able to retrace a distributed message diagram it is necessary to
identify each message with an identity. By setting this parameter to Y
these message identities are also transported over the network. This
feature is not enabled by default.
[SHM]Checksum
-
This parameter is also a Y/N parameter which is not enabled by default.
When enabled all messages are checksummed before put into the send buffer.
This feature enables control that messages are not corrupted while waiting
in the send buffer. It is also a double check that the transport mechanism
haven't corrupted the data.
SCI Transporters as connection between nodes in the MySQL Cluster is only
supported for special builds of MySQL Cluster using the configure
parameter --with-ndb-sci=/your/path/to/SCI. The path should point
to a directory that contains at least a lib and a include directory where
SISCI libraries and header files are provided.
It is strongly recommended to only use SCI Transporters for communication
between ndbd processes. Also using SCI Transporters will mean that the
ndbd process will never sleep so use SCI Transporters only for machines
with at least 2 CPU's which are dedicated for use by ndbd process(es). There
should be at least 1 CPU per ndbd process in this case and in addition at
least one more is needed to also handle OS activities.
[SCI]NodeId1
-
[SCI]NodeId2
-
To identify a connection between two nodes it is necessary to provide the
node identity for both of them in
NodeId1 and NodeId2.
[SCI]Host1SciId0
-
This identifies the SCI node id on the first node identified by NodeId1.
[SCI]Host1SciId1
-
It is possible to set-up SCI Transporters for fail-over between two SCI
cards which then should use separate networks between the nodes. This
identifies the node id and the second SCI card to be used on the first
node.
[SCI]Host2SciId0
-
This identifies the SCI node id on the second node identified by NodeId2.
[SCI]Host2SciId1
-
It is possible to set-up SCI Transporters for fail-over between two SCI
cards which then should use separate networks between the nodes. This
identifies the node id and the second SCI card to be used on the second
node.
[SCI]SharedBufferSize
-
Each SCI transporter has a shared memory segment between the two nodes. With
this segment set to the default 1 MB most applications should be ok. Smaller
sizes such as 256 kB has problems when performing many parallel inserts.
If the buffer is too small it can cause crashes of the ndbd process.
[SCI]SendLimit
-
A small buffer in front of the SCI media buffers up messages before sending
them over the SCI network. By default this is set to 8kB. Most benchmark
measurements shows that tops are reached at 64 kB but 16kB reaches within a
few percent of the performance and for all MySQL Cluster benchmarks it was
no measurable difference in increasing it beyond 8kB.
[SCI]SendSignalId
-
To be able to retrace a distributed message diagram it is necessary to
identify each message with an identity. By setting this parameter to Y
these message identities are also transported over the network. This
feature is not enabled by default.
[SCI]Checksum
-
This parameter is also a Y/N parameter which is not enabled by default.
When enabled all messages are checksummed before put into the send buffer.
This feature enables control that messages are not corrupted while waiting
in the send buffer. It is also a double check that the transport mechanism
haven't corrupted the data.
There are four processes that are important to know about when using MySQL
Cluster. We will cover how to work with those processes, which options to
use when starting and so forth.
mysqld is the traditional MySQL server process. To be used with
MySQL Cluster it needs to be built with support for the NDB Cluster storage
engine. If the mysqld binary has been built in such a manner, the
NDB Cluster storage engine is still disabled by default.
To enable the NDB Cluster storage engine there are two ways. Either use
--ndbcluster as a startup option when starting mysqld or
insert a line with ndbcluster in the [mysqld] section of your
`my.cnf' file.
An easy way to verify that your server runs with support for the
NDB Cluster storage engine is to issue the command
SHOW ENGINES from a mysql client.
You should see YES for the row listing NDBCLUSTER.
If you see NO, you are not running a mysqld that is compiled
with NDB Cluster support enabled. If you see DISABLED, then you
need to enable it in the `my.cnf' configuration file.
The MySQL server needs to know how to get the configuration of the cluster.
To access this configuration, it needs to know three things:
-
Its own node ID in the cluster.
-
The hostname (or IP address) where the management server resides.
-
The port on which it can connect to the management server.
The node ID can be skipped from MySQL 4.1.5 on, because a node ID can be
dynamically allocated.
The mysqld parameter ndb-connectstring is used to specify the
connectstring either when starting mysqld or in `my.cnf'.
See also section 16.3.4.2 The MySQL Cluster connectstring for more info on connectstrings.
shell> mysqld --ndb-connectstring=ndb_mgmd.mysql.com:1186
ndb_mgmd.mysql.com is the host where the management server resides,
and it is listening to port 1186.
With this setup the MySQL server will be a full citizen of MySQL Cluster
and will be fully aware of all storage nodes in the cluster and their
status. It will setup connection to all storage nodes and will be
able to use any storage node as a transaction coordinator and to
access their data for reading and updating.
ndbd is the process that is used to handle all the data in the
tables using the NDB Cluster storage engine. This is the process that
contains all the logic of distributed transaction handling, node recovery,
checkpointing to disk, online backup, and lots of other functionality.
In a cluster there is a set of ndbd processes cooperating in
handling the data. These processes can execute on the same computer or on
different computers, in a completely configurable manner.
Before MySQL 4.1.5, ndbd process should start from a separate
directory. The reason for this was that ndbd generates a set of log
files in its starting directory.
In MySQL 4.1.5, this was changed such that the files are placed in the
directory specified by DataDir in the configuration file. Thus
ndbd can be started from anywhere.
These log files are (the 2 is the node ID):
It is recommended to not use a directory mounted through NFS because in some
environments that can cause problems with the lock on the pid-file remaining
even after the process has stopped.
Also when starting the ndbd process it may be necessary to specify
the hostname of the management server and the port it is listening to,
optionally one may specify node ID the process is to use,
see section 16.3.4.2 The MySQL Cluster connectstring.
shell> ndbd --connect-string="nodeid=2;host=ndb_mgmd.mysql.com:1186"
When ndbd starts it will actually start two processes. The starting
process is called the "angel" and its only job is to discover when the
execution process has completed, and then restart the ndbd process if
configured to do so. Thus if one attempts to kill ndbd through the
kill command in Unix, it is necessary to kill both processes. A
more proper way to handle the stopping of ndbd processes is to use
the management client and stop the process from there.
The execution process uses one thread for all activities in reading,
writing, and scanning data and all other activities. This thread is designed
with asynchronous programming so it can easily handle thousands of
concurrent activites. In addition there is a watch-dog thread supervising
the execution thread to ensure it doesn't stop in an eternal loop or other
problem. There is a pool of threads handling file I/O. Each thread can
handle one open file. In addition threads can be used for connection
activities of the transporters in the ndbd process. Thus in a system
that performs a large number of activities including update activities the
ndbd process will consume up to about 2 CPUs if allowed to. Thus in
a large SMP box with many CPUs it is recommended to use several
ndbd processes which are configured to be part of different node
groups.
The management server is the process that reads the configuration file of
the cluster and distributes this information to all nodes in the cluster
requesting it. It also maintains the log of cluster activities.
Management clients can connect to the management server and use commands to
check status of the cluster in various aspects.
As of MySQL 4.1.5, it is no longer necessary to specify a connectstring
when starting the management server. However, if you are using several
management servers, a connectstring should be provided and each node in
the cluster should specify its nodeid explicitly.
The following files are created or used by ndb_mgmd in its starting
directory of ndb_mgmd. From MySQL 4.1.5, the log and PID files will
be placed in the DataDir specified in the configuration file:
-
`config.ini' is the configuration file of the cluster. This is created
by the user and read by the management server. How to write this file is
described in the section
MySQL Cluster Configuration.
-
`ndb_1_cluster.log' (was `cluster.log' in version 4.1.3) is the
file where events in the cluster are reported.
Examples of events are checkpoints started and completed, node failures and
nodes starting, levels of memory usage passed and so forth. The events
reported are described in the section section 16.5 Management of MySQL Cluster.
-
`ndb_1_out.log' (was `node1.out' in version 4.1.3) is the file used
for stdout and stderr when executing the
management server as a daemon process. 1 in this context is the node ID.
-
`ndb_1.pid' (was `node1.pid' in version 4.1.3) is the PID file used
when executing the management server as a daemon process. 1 in this context
is the node ID.
-
`ndb_1_cluster.log.1' (was `cluster.log.1' in version 4.1.3) when
the cluster log becomes bigger than one million
bytes then the file is renamed to this name where 1 is the number
of the cluster log file, so if 1, 2, and 3 already exists the next will be
having the number 4 instead.
The final important process to know about is the management client. This
process is not needed to run the cluster. Its value lies in its ability to
check status of the cluster, start backups, and perform other management
activities. It does so by providing access to a set of commands.
Actually the management client is using a C API that is used to access the
management server so for advanced users it is also possible to program
dedicated management processes which can do similar things as the
management client can do.
When starting the management client, it is necessary to state the hostname
and port of the management server as in the example below. The default is
localhost as host and port number 1186 (was 2200 prior to version 4.1.8).
shell> ndb_mgm localhost 1186
All MySQL Cluster executables (except mysqld) takes the
following options as of 4.1.8. If you're running an earlier version
please read carefully, as we have made changes in some of these
switches in order to make them consistent between the different
executables and with mysqld. (For example: -V was
-v in earlier versions.) Note also that you can use the
-? option to see what is supported in your version.
-?, --usage, --help
-
Prints a short description of the available command options.
-V, --version
-
Prints the version number of the
ndbd process. The version number is
the MySQL Cluster version number. It is important because at
startup the MySQL Cluster processes verifies that the versions of the
processes in the cluster can co-exist in the cluster. It is also important
for online software upgrade of MySQL Cluster (see section
Software Upgrade of MySQL Cluster).
-c connect_string (not ndb_mgmd), --connect-string connect_string
-
Set the connect string to the management server as a command option.
(for backwards compatability reasons the
ndb_mgmd does not take
the -c option until 5.0, as it currently specifies the config file).
Available with ndb_mgm from 4.1.8.
shell> ndbd --connect-string="nodeid=2;host=ndb_mgmd.mysql.com:1186"
--debug[=options]
-
This can only be used for versions compiled with debug information. It is
used to enable printouts from debug calls in the same manner as for the
mysqld process.
--ndbcluster
-
If the binary includes support for the
NDB Cluster storage engine the
default disabling of support for the NDB storage engine can be
overruled by using this option. Using the NDB Cluster storage engine
is necessary for using MySQL Cluster.
--skip-ndbcluster
-
Disable the
NDB Cluster storage engine. This is disabled by default
for binaries where it is included. So this option only applies if the
server was configured to use the NDB Cluster storage engine.
--ndb-connectstring=connect_string
-
When using the
NDB storage engine, it is possible to point out the
management server that distributes the cluster configuration by setting the
connect string option.
For common options see section 16.4.5 Command Options for MySQL Cluster Processes.
-d, --daemon
-
Instructs
ndbd to execute as a daemon process. From MySQL 4.1.5 on, this
is the default behavior.
--nodaemon
-
Instructs
ndbd not to start as a daemon process. Useful when
ndbd is debugged and one wants printouts on the screen.
--initial
-
Instructs
ndbd to perform an initial start. An initial start will
erase any files created by earlier ndbd instances for recovery. It
will also recreate recovery log files which on some operating systems can
take a substantial amount of time.
An initial start is only to be used at the very first start of the
ndbd process. It removes all files from the file system and
creates all REDO log files. When performing a software upgrade which has
changed the file contents on any files it is also necessary to use this
option when restarting the node with a new software version of
ndbd. Finally it could also be used as a final resort if for some
reason the node restart or system restart doesn't work. In this case be
aware that destroying the contents of the file system means that this node
can no longer be used to restore data.
This option does not affect any backup files created.
The previous possibility to use -i for this option was removed to
ensure that this option is not used by mistake.
--nostart
-
Instructs
ndbd not to automatically start. ndbd will connect
to the management server and get the configuration and initialise
communication objects. It will not start the execution engine until
requested to do so by the management server. The management server can
request by command issued by the management client.
For common options see section 16.4.5 Command Options for MySQL Cluster Processes.
-f filename (from 4.1.8), --config-file=filename, -c filename (obsolete from 5.0)
-
Instructs the management server which file to use as configuration file.
This option must be specified. The file name defaults to
config.ini.
-d, --daemon
-
Instructs
ndb_mgmd to start as a daemon process. This is the
default behavior.
-nodaemon
-
Instructs the management server not to start as a daemon process.
For common options see section 16.4.5 Command Options for MySQL Cluster Processes.
[host_name [port_num]]
-
To start the management client it is necessary to specify where the
management server resides. This means specifying the hostname and the port.
The default hostname is
localhost and the default port is 1186
(was 2200 prior to version 4.1.8).
--try-reconnect=number
-
If the connection to the management server is broken it is possible to
perform only a specified amount of retries before reporting a fault code to
the user. The default is that it keeps retrying every 5 seconds until it
succeeds.
Managing a MySQL Cluster involves a number of activities.
The first activity is to configure and startup MySQL Cluster.
This is covered by the sections section 16.3 MySQL Cluster Configuration
and section 16.4 Process Management in MySQL Cluster.
This section covers how to manage a running MySQL Cluster.
There are essentially two ways of actively managing a running MySQL Cluster.
The first is by commands entered into the management client where status of
cluster can be checked, log levels changed, backups started and stopped and
nodes can be stopped and started. The second method is to study the output
in the cluster log. The cluster log is directed towards the
`ndb_2_cluster.log' in the DataDir directory of the management
server. The cluster log contains event reports generated from the
ndbd processes in the cluster. It is also possible to send the
cluster log entries to a Unix system log.
In addition to the central configuration file, the cluster
may also be controlled through a command line interface. The command
line interface is available through a separate management client process.
This is the main administrative interface to a running cluster.
The management client has the following basic commands.
Below, <id> denotes either a database node id (e.g. 21) or the
keyword ALL that indicates that the command should be applied
to all database nodes in the cluster.
HELP
-
Prints information on all available commands.
SHOW
-
Prints information on the status of the cluster.
<id> START
-
Start a database node identified with
<id> or all database nodes.
<id> STOP
-
Stop a database node identified with
<id> or all database nodes.
<id> RESTART [-N] [-I]
-
Restart a database node identified with
<id> or all database nodes.
<id> STATUS
-
Displays status information for database node identified with
<id>
(or ALL database nodes).
ENTER SINGLE USER MODE <id>
-
Enters single user mode where only the API with node
<id> is
allowed to access the database system.
EXIT SINGLE USER MODE
-
Exists single user mode allowing all APIs to access the database system.
QUIT
-
Terminates the management client.
SHUTDOWN
-
Shuts down all cluster nodes (except mysql servers) and exits.
Commands for the event logs are given in the next section and commands
for backup and restore are given in a separate section on these topics.
MySQL Cluster has two event logs, the cluster log and the node log.
-
The cluster log is a log of the whole cluster and this log can have
multiple destinations (file, management server console window or
syslog).
-
The node log is a log that is local to each database node and is
written to the console window of the database node. The two logs can
be set to log different subsets of the list of events.
Note: The cluster log is the recommended log. The node log is only
intended to be used during application development or for debugging
application code.
Each reportable event has the following properties:
-
Category (STARTUP, SHUTDOWN, STATISTICS, CHECKPOINT, NODERESTART,
CONNECTION, ERROR, INFO)
-
Priority (1-15 where 1 - Most important, 15 - Least important)
-
Severity (ALERT, CRITICAL, ERROR, WARNING, INFO, DEBUG)
The two logs (the cluster log and the node log) can be filtered on
these properties.
The following management commands are related to the cluster log:
CLUSTERLOG ON
-
Turn cluster log on.
CLUSTERLOG OFF
-
Turn cluster log off.
CLUSTERLOG INFO
-
Information about cluster log settings.
<id> CLUSTERLOG <category>=<threshold>
-
Log category events with priority less than or equal to threshold in
the cluster log.
CLUSTERLOG FILTER <severity>
-
Toggles cluster logging of the specified severity type on/off.
The following table describes the default setting (for all database
nodes) of the cluster log category threshold. If an event has a
priority with a value lower than or equal to the priority threshold,
then it is reported in the cluster log.
Note that the events are reported per database node and that the
thresholds can be set differently on different nodes.
| Category | Default threshold (All database nodes)
|
| STARTUP | 7
|
| SHUTDOWN | 7
|
| STATISTICS | 7
|
| CHECKPOINT | 7
|
| NODERESTART | 7
|
| CONNECTION | 7
|
| ERROR | 15
|
| INFO | 7
|
The threshold is used to filter events within each category.
For example: a STARTUP event with a priority of 3 is never sent
unless the threshold for STARTUP is changed to 3 or lower.
Only events with priority 3 or lower are sent if the threshold is 3.
The event severities are (corresponds to UNIX syslog levels):
| 1 | ALERT | A condition that should be corrected immediately, such as a corrupted system database
|
| 2 | CRITICAL | Critical conditions, such as device errors or out of resources
|
| 3 | ERROR | Conditions that should be corrected, such as configuration errors
|
| 4 | WARNING | Conditions that are not error conditions but might require handling
|
| 5 | INFO | Informational messages
|
| 6 | DEBUG | Messages used during development of NDB Cluster
|
Syslog's LOG_EMERG and LOG_NOTICE are not used/mapped.
The event severities can be turned on or off. If the severity is on
then all events with priority less than or equal to the category
thresholds are logged. If the severity is off then no events belonging
to the severity are logged.
All reportable events are listed below.
| Event | Category | Priority | Severity | Description
|
| DB nodes connected | CONNECTION | 8 | INFO |
|
| DB nodes disconnected | CONNECTION | 8 | INFO |
|
| Communication closed | CONNECTION | 8 | INFO | API & DB nodes connection closed
|
| Communication opened | CONNECTION | 8 | INFO | API & DB nodes connection opened
|
| Global checkpoint started | CHECKPOINT | 9 | INFO | Start of a GCP, i.e., REDO log is written to disk
|
| Global checkpoint completed | CHECKPOINT | 10 | INFO | GCP finished
|
| Local checkpoint started | CHECKPOINT | 7 | INFO | Start of local check pointing, i.e., data is written to disk. LCP Id and GCI Ids (keep and oldest restorable)
|
| Local checkpoint completed | CHECKPOINT | 8 | INFO | LCP finished
|
| LCP stopped in calc keep GCI | CHECKPOINT | 0 | ALERT | LCP stopped!
|
| Local checkpoint fragment completed | CHECKPOINT | 11 | INFO | A LCP on a fragment has been completed
|
| Report undo log blocked | CHECKPOINT | 7 | INFO | Reports undo logging blocked due buffer near to overflow
|
| DB node start phases initiated | STARTUP | 1 | INFO | NDB Cluster starting
|
| DB node all start phases completed | STARTUP | 1 | INFO | NDB Cluster started
|
| Internal start signal received STTORRY | STARTUP | 15 | INFO | Internal start signal to blocks received after restart finished
|
| DB node start phase X completed | STARTUP | 4 | INFO | A start phase has completed
|
| Node has been successfully included into the cluster | STARTUP | 3 | INFO | President node, own node and dynamic id is shown
|
| Node has been refused to be included into the cluster | STARTUP | 8 | INFO |
|
| DB node neighbours | STARTUP | 8 | INFO | Show left and right DB nodes neighbours
|
| DB node shutdown initiated | STARTUP | 1 | INFO |
|
| DB node shutdown aborted | STARTUP | 1 | INFO |
|
| New REDO log started | STARTUP | 10 | INFO | GCI keep X, newest restorable GCI Y
|
| New log started | STARTUP | 10 | INFO | Log part X, start MB Y, stop MB Z
|
| Undo records executed | STARTUP | 15 | INFO |
|
| Completed copying of dictionary information | NODERESTART | 8 | INFO |
|
| Completed copying distribution information | NODERESTART | 8 | INFO |
|
| Starting to copy fragments | NODERESTART | 8 | INFO |
|
| Completed copying a fragment | NODERESTART | 10 | INFO |
|
| Completed copying all fragments | NODERESTART | 8 | INFO |
|
| Node failure phase completed | NODERESTART | 8 | ALERT | Reports node failure phases
|
| Node has failed, node state was X | NODERESTART | 8 | ALERT | Reports that a node has failed
|
| Report whether an arbitrator is found or not | NODERESTART | 6 | INFO | 7 different cases
|
| | | | - President restarts arbitration thread [state=X]
|
| | | | - Prepare arbitrator node X [ticket=Y]
|
| | | | - Receive arbitrator node X [ticket=Y]
|
| | | | - Started arbitrator node X [ticket=Y]
|
| | | | - Lost arbitrator node X - process failure [state=Y]
|
| | | | - Lost arbitrator node X - process exit [state=Y]
|
| | | | - Lost arbitrator node X <error msg>[state=Y]
|
| Report arbitrator results | NODERESTART | 2 | ALERT | 8 different results
|
| | | | - Arbitration check lost - less than 1/2 nodes left
|
| | | | - Arbitration check won - node group majority
|
| | | | - Arbitration check lost - missing node group
|
| | | |