Last modified: Mon Sep 6 13:48:47 JST 2010
Welcome to pgpool -II page |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
What is pgpool-II?pgpool-II is a middleware that works between PostgreSQL servers and a PostgreSQL database client. It provides the following features.
pgpool-II saves connections to the PostgreSQL servers, and reuse them whenever a new connection with the same properties (i.e. username, database, protocol version) comes in. It reduces connection overhead, and improves system's overall throughput. pgpool-II can manage multiple PostgreSQL servers. Using the replication function enables creating a realtime backup on 2 or more physical disks, so that the service can continue without stopping servers in case of a disk failure. If a database is replicated, executing a SELECT query on any server will return the same result. pgpool-II takes an advantage of the replication feature to reduce the load on each PostgreSQL server by distributing SELECT queries among multiple servers, improving system's overall throughput. At best, performance improves proportionally to the number of PostgreSQL servers. Load balance works best in a situation where there are a lot of users executing many queries at the same time. There is a limit on the maximum number of concurrent connections with PostgreSQL, and connections are rejected after this many connections. Setting the maximum number of connections, however, increases resource consumption and affect system performance. pgpool-II also has a limit on the maximum number of connections, but extra connections will be queued instead of returning an error immediately. Using the parallel query function, data can be divided among the multiple servers, so that a query can be executed on all the servers concurrently to reduce the overall execution time. Parallel query works the best when searching large-scale data. pgpool-II talks PostgreSQL's backend and frontend protocol, and relays a connection between them. Therefore, a database application (frontend) thinks that pgpool-II is the actual PostgreSQL server, and the server (backend) sees pgpool-II as one of its clients. Because pgpool-II is transparent to both the server and the client, an existing database application can be used with pgpool-II almost without a change to its sources. There is some restrictions of using SQL via pgpool-II. See Restrictions for more details. Supported Platformspgpool-II works on Linux, Solaris, FreeBSD, and most of the UNIX-like architectures. Windows is not supported. Supported PostgreSQL server's versions are 6.4 and higher. To use the parallel query, however, 7.4 and higher must be used. If you are using PostgreSQL 7.3 or before, there are some restrictions of functionalities of pgpool-II. Also you must make sure that all of PostgreSQL servers's major versions are identical. In addition this, hardware architectures and OSs must be identical if you want to use online recovery. pgpool-II Installationpgpool-II can be downloaded from pgpool Development page. Also packages are provided for various platforms including CentOS, RedHat Enterprise Linux, Fedora and Debian. The source code of pgpool-II can be downloaded from: pgpool development page Installing pgpool-II from source code requires gcc 2.9 or higher, and GNU make. Also, pgpool-II links libpq library, so the libpq library and development headers must be installed on a machine used to build pgpool-II. Additionally the OpenSSL library and development headers must be present in order to enable OpenSSL support in pgpool-II
Configuring pgpool-IIConfiguration files for pgpool-II are
Configuring
|
Special character | Description |
%d | Backend ID of a detached node. |
%h | Hostname of a detached node. |
%H | Hostname of the new master node. |
%p | Port number of a detached node. |
%D | Database cluster directory of a detached node. |
%M | Old master node ID. |
%m | New master node ID. |
%% | '%' character |
When a failover is performed, pgpool kills all its child processes, which will in turn terminate all active sessions to pgpool. Then pgpool invoke failover_command and wait for its completion. After this, pgpool starts new child processes and becomes ready to wait for connections from clients.
This parameter specifies a command when a node is attached. pgpool-II replaces special characters to backend information.
Special character | Description |
%d | Backend ID of an attached node. |
%h | Hostname of an attached node. |
%p | Port number of an attached node. |
%D | Database cluster path of an attached node. |
%M | Old master node |
%m | New master node |
%% | '%' character |
If true, trigger fail over when writing to the backend communication socket fails. This is the same behavior of pgpool-II 2.2.x or earlier. If set to false, pgpool will report an error and disconnect the session. Please note that, however, pgpool will do the fail over when connecting to backend fails or pgpool detects the administrative shutdown of postmaster. You need to reload pgpool.conf if you change the value.
pgpool-II ignores white spaces at the beginning of SQL queries while in the load balance mode. It is useful for using APIs like DBI/DBD:Pg which adds white spaces against the user's will. You need to reload pgpool.conf if you change the value.
Produces SQL log messages when true. This is similar to the log_statement parameter in PostgreSQL. It produces logs even if the debug option was not passed to pgpool-II at startup. You need to reload pgpool.conf if you change the value.
Similar to log_statement, except that it prints logs for each DB node separately. It would be usefull if you want to make sure that replication is working or not, for example. You need to reload pgpool.conf if you change the value.
If true, ps command status will show the client's hostname instead of an IP address. Also, if log_connections is enabled, hostname will be logged. You need to reload pgpool.conf if you change the value.
If true, all incoming connections will be printed to the log. You need to reload pgpool.conf if you change the value.
If true, use pool_hba.conf for client authentication. See Setting up pool_hba.conf for client authentication. You need to restart pgpool-II if you change the value.
Specifies the host name of the PostgreSQL backend. The empty
string (''
) means pgpool-II uses UNIX domain
socket.
Multiple backends can be specified by adding a number at the end
of the parameter name (e.g.backend_hostname0
). This
number is referred to as "DB node ID", and it starts from 0. The
backend which was given the DB node ID of 0 will be called
"Master DB". When multiple backends are defined, the service can
be continued even if the Master DB is down (not true in some
modes). In this case, the youngest DB node ID alive will be the
new Master DB.
If you plan to use only one PostgreSQL server, specify it by
backend_hostname0
.
This parameter can be added by reloading a configuration file. However, this cannot be updated so you must restart pgpool-II.
Specifies the port number of the backends. Multiple backends
can be specified by adding a number at the end of the parameter
name (e.g. backend_port0
). If you plan to use only
one PostgreSQL server, specify it by
backend_port0
.
This parameter can be added by reloading a configuration file. However, this cannot be updated so you must restart pgpool-II.
Specifies the load balance ratio for the backends. Multiple
backends can be specified by adding a number at the end of the
parameter name (e.g. backend_weight0
). If you plan
to use only one PostgreSQL server, specify it by
backend_weight0
. In the raw mode, set to 1.
This parameter can be added by reloading a configuration file. However, this cannot be updated so you must restart pgpool-II. In pgpool-II 2.2.6/2.3 or later, you can change the value by re-loading the configuration file. This will take effect on next newly connected client sesson. This is usefull if you want to prevent any query sent to slaves to perform some administrative work in master/slave mode.
Specifies the database cluster directory of the
backends. Multiple backends can be specified by adding a number
at the end of the parameter name
(e.g. backend_data_directory0
).
If you plan not to use online recovery, you do not need to
specify this parameter.
This parameter can be added by reloading a configuration file. However, this cannot be updated so you must restart pgpool-II.
If true, enable SSL support for both the frontend and backend
connections. Note that ssl_key
and ssl_cert
must also be set in order for SSL to work in the frontend connections.
The default is that SSL is off. Note that OpenSSL support must also have been configured at compilation time, as mentioned in the installation section.
The pgpool-II daemon must be restarted when updating SSL related settings.
The path to the private key file to use for incoming frontend connections.
There is no default value for this option, and if left unset SSL will be disabled for incoming frontend connections.
The path to the public x509 certificate file to use for incoming frontend connections.
There is no default value for this option, and if left unset SSL will be disabled for incoming frontend connections.
Debug message verbosity level. 0 measn no message, greater than 1 means more verbose message. Default value is 0.
Failover in the raw Mode
Failover can be performed in the raw mode if multiple servers are
defined. pgpool-II usually accesses the backend specified by
backend_hostname0
during the normal operation. If the
backend_hostname0 fails for some reason, pgpool-II tries to access the
backend specified by backend_hostname1. If that fails, pgpool-II tries
the backend_hostname2, 3 and so on.
In the connection pool mode, all functions in raw mode and the connection pool function can be used. To enable this mode, set configuration parameters in the raw mode and below.
The maximum number of cached connections in pgpool-II
children processes. pgpool-II reuses the cached connection if an
incoming connection is connecting to the same database by the
same username. If not, pgpool-II creates a new connection to the
backend. If the number of cached connections exceeds max_pool,
the oldest connection will be discarded, and uses that slot for
the new connection.
Default value is 4. Please be aware that the number of
connections from pgpool-II processes to the backend will be
num_init_children
* max_pool
.
This parameter can only be set at server start.
Cached connections expiration time in seconds. The expired cached connection will be disconnected. Default is 0, which means the cached connections will not be disconnected.
Specifies the SQL commands sent to the backend when exitting a session to reset the connection. Multiple commands can be specified by delimitting each by ";". Default is the following, but can be changed to suit your system.
reset_query_list = 'ABORT; DISCARD ALL'
Commands differ in each PostgreSQL versions. Here are recommended settings.
PostgreSQL version | reset_query_list value |
---|---|
7.1 or before | ABORT |
7.2 to 8.2 | ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT |
8.3 or later | ABORT; DISCARD ALL |
You need to re-read pgpool.conf upon modification of this direrctive.
Failover in the Connection Pool Mode
Failover in the connection pool mode is the same as in the raw mode.
This mode enables data replication between the backends. The configuration parameters below must be set in addtion to everything above.
Setting true enables replication mode. Default is false.
When set to true, SELECT queries will be distributed to each backend for load balance. Default is false.
When set to true, if a backend returns number of affected tuples by INSERT/UPDATE/DELETE different between the backends, the backends that differ from most frequent result set are degenerated. If set to false, the session is terminated and the backends are not degenerated. Default is false.
When set to true, if a backend returns packet kind different between the backends, the backends that differ from most frequent result set are degenerated. Typical use case is the SELECT statement is part of a transaction and replicate_select is set to true, and SELECT returns diffrent number of rows among backends. Other than SELECT statement might trigger this though. For example, a backend succeeded in an UPDATE, while others failed. Also please note that pgpool does NOT examine content of records returned from SELECT. If set to false, the session is terminated and the backends are not degenerated. Default is false.
Specify comma separated function name list, those do not update database. SELECTs use functions not specified in this list are not load balanced, replicated if in replication mode. In master slave mode, SELECTs are sent to master(primary) only.
Specify comma separated function name list, those do update database. SELECTs use functions specified in this list are not load balanced, replicated if in replication mode. In master slave mode, SELECTs are sent to master(primary) only.
You should not make those lists nonempty at the same time.
Prior to pgpool-II 3.0, nextval() and setval() are the functions those do writes to database. You can emulate this by using white_function_list and black_function_list:
white_function_list = '' black_function_list = 'upper,lower'
When set to true, pgpool-II replicates SELECTs in replication mode. If false, pgpool-II only sends them to Master DB. Default is false.
replicate_select, load_balance_mode, if the SELECT query is inside an explicit transaction block will affect to how replication works. Details are shown below.
SELECT is inside a transaction block | Y | Y | Y | N | N | N | Y | N |
replicate_select is true | Y | Y | N | N | Y | Y | N | N |
load_balance_mode is true | Y | N | N | N | Y | N | Y | Y |
results(R:replication, M: send only to master, L: load balance) | R | R | M | M | R | R | M | L |
Replicating a table with SERIAL data type, the SERIAL column value may differ between the backends. This problem is avoidable by locking the table explicitly (although, transactions' parallelism will be lost). To achieve this, however, the following change must be made:
INSERT INTO ...to
BEGIN; LOCK TABLE ... INSERT INTO ... COMMIT;Setting
insert_lock
to true, pgpool-II
automatically adds the above queries each time INSERT is
executed (if already in transaction, simply adds LOCK TABLE
....).
As of pgpool-II 2.2, it automatically detects if the table has a
SERIAL columns or not, so only tables having SERIAL columns are
locked.
pgpool-II 3.0 or later uses row lock against the sequence relation, rather than table lock. This will minimize lock conflict with VACUUM(including autovacuum).
Also you might want to make a fine control:
insert_lock
to true, and add /*NO
INSERT LOCK*/
at the beginning of an INSERT
statement which you do not want to acquire the table
lock.insert_lock
to false, and add
/*INSERT LOCK*/
at the beginning of an INSERT
statement which you want to acquire the table lock.
Default value is false. If insert_lock
is enabled,
the regression test for PostgreSQL 8.0 will fail in transactions,
privileges, rules and alter_table. The reasons for this is that
pgpool-II tries to LOCK the VIEW for the rule test, and others will
produce the following error message.
! ERROR: current transaction is aborted, commands ignored until end of transaction blockFor example, the transactions test tries to INSERT into a table which does not exist, and pgpool-II causes PostgreSQL to acquire the lock before that. The transaction will be aborted, and the following INSERT statement produces the above error message.
This parameter specifies a PostgreSQL username for online recovery. It can be changed on service.
This parameter specifies a PostgreSQL password for online recovery. It can be changed on service.
This parameter specifies a command at the first stage of online recovery. The command file must be put on database cluster directory because of a security issue. For example, if recovery_1st_stage_command = 'sync-command', then pgpool-II executes $PGDATA/sync-command. Note that pgpool-II accepts connections and queries while recovery_1st_stage command is executed. You can retrieve and update data.
It can be changed on service.
This parameter specifies a command at the second stage of online recovery. The command file must be put on database cluster directory because of a security issue. For example, if recovery_2nd_stage_command = 'sync-command', then pgpool-II executes $PGDATA/sync-command. Note that pgpool-II do not accept connections and queries while recovery_2nd_stage_command is executed. Thus if a client exists for a long time, the command is not executed. pgpool-II waits until all clients close their connections. The command is executed when no client connects to pgpool-II.
It can be changed on service.
pgpool does not accept connections at second stage. If a client connect to pgpool during recovery processing, pgpool wait to be close.
This parameter specifies recovery timeout in sec. If timeout, pgpool cancels online recovery and accepts connections. 0 means no wait.
It can be changed on service.
Similar to client_idle_limit but only takes effect in recovery 2nd stage. Disconnect the connection to a client being idle for client_idle_limit_in_recovery seconds since the last query has been sent. This is usefull for preventing for pgpool recovery disturbed by a lazy client or TCP/IP connection between client and pgpool is accidentally down. If -1 specified, disconnect the connection to a client immediately. The default value for client_idle_limit_in_recovery is 0, which means the functionality is turned off.
If your clients are very busy, pgpool-II cannot enter the recovery 2nd stage regardless of the value of client_idle_limit_in_recovery. In this case you can set -1 to client_idle_limit_in_recovery so that pgpool-II immediately disconnect such busy clients before entering the 2nd stage.
You need to reload pgpool.conf if you change client_idle_limit_in_recovery.
Specify table name used for large object replication control. If it is specified, pgpool will lock the table specified by lobj_lock_table and generate large object id by looking into pg_largeobject system catalog and call lo_create to create large object. This way, it's guaranteed to use same large object id in all DB nodes in replication mode. Please note that PostgreSQL 8.0 or before does not have lo_create, thus this functionality will not work.
Libpq function lo_creat() call will trigger this functionality. Also large object creation through Java API (JDBC driver), PHP API(pg_lo_create, or similar API in PHP library such as PDO) and same API in various programing languages are known to use similar protocol, thus should work.
Following large object create operation will not work:
It does not matter what lobj_lock_table's schema is, but it should be writable by any user. Here is an example to create such a table:
CREATE TABLE public.my_lock_table (); GRANT ALL ON public.my_lock_table TO PUBLIC;
The table specified by lobj_lock_table must be created beforehand. If you create the table in template1, any database created afterward will have the table.
If lobj_lock_table has empty string(''), the functionality is disabled (thus large object replication will not work). The default value for lobj_lock_table is ''.
condition for load balance
For the query to be load balanced, all the requirements below must be met:
Note that you could supress load balancing by inserting arbitary comments just in front of the SELECT query:
/*REPLICATION*/ SELECT ...
Please refer to replicate_select as well. See also a flow chart.
Note: JDBC driver has autocommit option. If autocommit is false, JDBC driver send "BEGIN" and "COMMIT" internally. So pgpool cannot load balancing. You need to call setAutoCommit(true) to enable autocommit.
Failover in the Replication Mode
pgpool-II degenerates a dead backend and continues the service. The service can be continued if there is at least one backend alive.
This mode is for using pgpool-II with another master/slave
replication software (like Slony-I and Streaming replication), so it really does the actual data
replication. DB nodes' information(backend_hostname, backend_port,
backend_weight, backend_data_directory, only if you need to do the
online recovery) must be set as the replication mode.
In addtion to that, set
master_slave_mode
and load_balance_mode
to
true. pgpool-II will send queries that need to be replicated to the
Master DB, and others will be load balanced if possible.
In master/slave mode, DDL and DML for temporary table can be executed on master only. SELECT should be executed on master only as well but for this you need to use comment /*NO LOAD BALANCE*/ before the SELECT statement.
In the master/slave mode, replication_mode
must be set
to false, and master_slave_mode
to true.
master/slave mode has 'master_slave_sub mode'. The default is 'slony' which is suitable for Slony-I. You can also set it to 'stream', which should be set if you want to work with PostgreSQL's built-in replication system (Streaming Replication). The sample configuration file for using Slony-I is pgpool.conf.sample-master-slave and pgpool.conf.sample-stream is for streaming replication.
Please retart pgpool-II if you change above parameters.
You can use white_function_list and black_function_list to control load balancing in master/slave mode. See white_function_list for more details.
As stated above, pgpool-II could work with Streaming Replication, which has been usable since PostgreSQL 9.0, if master_slave is enabled and master_slave_sub_mode is 'stream' (please note that pgpool-II assumes that Streaming Replication is used with Hot Standby aat present). Following directives can be used with this mode:
Specify the accepted replication delay of the standby against the primary server in bytes. If the delay exceeds delay_threshold, pgpool-II does not send SELECT to the standby server but sends to the primary server even if load balance mode is enabled. If delay_threshold is 0 or heath checking is disabled, the delay checking is not peformed. The default value for delay_threshold is 0. You need to reload pgpool.conf if you change the directive.
Specify how to log the replication delay. If 'none' is specified, no log is written. If 'always', log the delay every time health checking is performed. If 'if_over_threshold' is specified, the log is written when the delay exceeds delay_threshold. The default value for log_standby_delay is 'none'. You need to reload pgpool.conf if you change the directive.
You could check the replication delay by using "show pool_status" command as well.
In master/slave mode with streaming replication, if primary or standby node go dowm, you can use failover functionality of pgpool-II. Automatical detatching node can be used without any additional settings. In streaming replication the standby node checks "trigger file" and if it exists, the standby stops continuos recovery and goes into normal PostgreSQL mode. By using this, you can let the standby take over when the primary goes down.
Caution: If you plan to use multiple standby nodes, we recommend to set delay_threshold to prevent any query directed to other standby node from retrieving older data.
If the second standy take over when the first standby which already took over primary, you would get bogus from the second standby. We recommend not to plan this kind of configration.
How to prepare failover configuration is as follows.
$ cd /usr/loca/pgsql/bin $ cat failover_stream.sh #! /bin/sh # Failover command for streming replication. # This script assumes that DB node 0 is primary, and 1 is standby. # # If standby goes down, does nothing. If primary goes down, create a # trigger file so that standby take over primary node. # # Arguments: $1: failed node id. $2: new master hostname. $3: path to # trigger file. failed_node=$1 new_master=$2 trigger_file=$3 # Do nothing if standby goes down. if [ $failed_node = 1 ]; then exit 0; fi # Create trigger file. /usr/bin/ssh -T $new_master /bin/touch $trigger_file exit 0; chmod 755 failover_stream.sh
failover_command = '/usr/local/src/pgsql/9.0-beta/bin/failover_stream.sh %d %H /tmp/trigger_file0'
standby_mode = 'on' primary_conninfo = 'host=name of primary_host user=postgres' trigger_file = '/tmp/trigger_file0'
wal_level = hot_standby max_wal_senders = 1
host replication postgres 192.168.0.10/32 trust
Start primary and secondary PostgreSQL to initiate Streaming replication. If the primary node goes dhown, standby will automatically start as a normal PostgreSQL and will be ready to accept write queries.
While using Streaming replication and Hot Standby, it is important to determine which query is good to be sent to the primary or the standby, and which one should not be sent to the standby. pgpool-II's Streaming Replication mode carefully takes care of this. In this we explain how chapter how pgpool-II accomplishes this.
We distinguish which query should be sent to which node by looking at query.
In an explicit transaction:
Commands which cannot be executed on the standby such as INSERT are sent to the primary. After that even SELECTs are sent to the primary node, This is because the SELECT might want to see the result of INSERT immediately. These are continuing until the transaction closes or aborts.
In the extended protocol, it is possible to determine if the query can be sen to standby or not in load balance mode while parsing the query. The rule is same as non extended protocol. For example, INSERTs are sent to the primary node. Following bind, describe and execute are sent to the primary node as well.
[Note: If parse of SELECT is sent to the standby node due to load balance then DMLs(such as INSERT) are sent to pgpool-II, then the SELECT must be executed on the primary node. For this we re-parese the SELECT on the primary node.
Finally queries pgpool-II's parser finds to be an error are sent to the primary node.
In master/slave mode with streaming replication, you can use online recovery. Only the standby node can be recovered. You cannot recover the primary node. To recover the primary node, you have to stop all DB nodes and pgpool-II then restore from the backup.
recovery_user = 'postgres'
recovery_password = 't-ishii'
recovery_1st_stage_command = 'basebackup.sh'
recovery_2nd_stage_command = ''
# cd pgpool-II-x.x.x/sql/pgpool-recovery # make # make install # psql -f pgpool-recovery.sql template1
Thhat's it. Now you should be able to use pcp_recovery_node (as long as the standby node stops) or push "recovery" button of pgpoolAdmin to perform online recovery.
This mode enables parallel execution of queries.The table is divided, and data can be given to each node. Moreover, the replication and the loadbalance function can be used at the same time. In parallel mode, replication_mode and loadbalance_mode are set to true in pgpool.conf, master_slave is set to false, and parallel_mode is set to true. When you change this parameter, please reactivate pgpool-II.
Configuring the System DB
To use the parallel mode, the System DB must be configured
properly. The System DB maintains rules, in the format of the database
table, to choose an appropriate backend to send partitioned
data. The System DB does not need to be created on the same host as
pgpool-II. The System DB's configuration is done in
pgpool.conf
.
The hostname where the System DB is created. Specifying the empty string ('') means the System DB is at the same host as pgpool-II, and will be connected via UNIX domain socket.
The port number for the System DB
The partitioning rules and other information will be defined
in the database specified here. Default value is
'pgpool'
.
The partitioning rules and other information will be defined
in the schema specified here. Default value is
'pgpool_catalog'
.
The username to connect to the System DB.
The password for the System DB. If no password is set, set the empty string ('').
The path to a PEM format file containing one or more CA root
certificates, which can be used to verify the backend server
certificate. This is analagous to the -CAfile
option
to the OpenSSL verify(1)
command.
The default value for this option is unset, such that no
verification takes place. Verification will still occur if
this option is not set but a value has been given for
ssl_ca_cert_dir
.
The path to a directory containing PEM format CA certficate
files, which can be used to verify the backend server
certificate. This is analagous to the -CApath
option
to the OpenSSL verify(1)
command.
The default value for this option is unset, such that no
verification takes place. Verification will still occur if
this option is not set but a value has been given for
ssl_ca_cert
.
Initial Configuration of the System DB
First, create the database and schema specified in the
pgpool.conf
file. A sample script can be found in
$prefix/share/system_db.sql
. If you have specified a
different database name or schema, change them accordingly.
psql -f $prefix/share/system_db.sql pgpool
Registering a Partitioning Rule
The rules for data partitioning must be registered to
pgpool_catalog.dist_def
table.
CREATE TABLE pgpool_catalog.dist_def( dbname TEXT, -- database name schema_name TEXT, -- schema name table_name TEXT, -- table name col_name TEXT NOT NULL CHECK (col_name = ANY (col_list)), -- partitioning key column name col_list TEXT[] NOT NULL, -- names of table attributes type_list TEXT[] NOT NULL, -- types of table attributes dist_def_func TEXT NOT NULL, -- name of the partitioning rule function PRIMARY KEY (dbname,schema_name,table_name) );
Registering a Replication Rule
When the table that does the replication of one SQL sentence that specifies the table registered in the partitioning rule with JOIN etc. is specified, information on the table that does the replication is registered in the table named pgpool_catalog.replicate_def beforehand.
CREATE TABLE pgpool_catalog.replicate_def( dbname TEXT, --database name schema_name TEXT, --schema name table_name TEXT, --teble name col_list TEXT[] NOT NULL, -- names of table attributes type_list TEXT[] NOT NULL, -- types of table attributes PRIMARY KEY (dbname,schema_name,table_name) );
Example for partitioning pgbench tables
It divides into the accounts table in this example, and the replication will be done to the branches table and the tellers table. Moreover, the accounts table and the banches table assume uniting with bid The branches table registers the rule of the replication table. When three tables of the accounts table and the branches table and the tellers table are united, it is necessary to register the rule of the replication table to the tellers table beforehand.
INSERT INTO pgpool_catalog.dist_def VALUES ( 'pgpool', 'public', 'accounts', 'aid', ARRAY['aid','bid','abalance','filler'], ARRAY['integer','integer','integer','character(84)'], 'pgpool_catalog.dist_def_accounts' ); INSERT INTO pgpool_catalog.replicate_def VALUES ( 'pgpool', 'public', 'branches', ARRAY['bid','bbalance','filler'], ARRAY['integer','integer','character(84)'] );
The partitioning rule function (namely, pgpool_catalog.dist_def_accounts) takes a value for the partitioning key column, and returns the corresponding DB node ID. Note that ID must start from 0. Below is the example of each function for pgbench.
CREATE OR REPLACE FUNCTION pgpool_catalog.dist_def_accounts (val ANYELEMENT) RETURNS INTEGER AS ' SELECT CASE WHEN $1 >= 1 and $1 <= 30000 THEN 0 WHEN $1 > 30000 and $1 <= 60000 THEN 1 ELSE 2
Just like pg_hba.conf with PostgreSQL, pgpool supports a similar client authentication function using a configuration file called "pool_hba.conf".
When you install pgpool, pool_hba.conf.sample will be installed in "/usr/local/etc", which is the default directory for configuration files. Copy pool_hba.conf.sample as pool_hba.conf and edit it if necessary. By default, pool_hba authentication is enabled. See "6. Setting up pgpool.conf" for more detail.
The format of pool_hba.conf file follows very closely pg_hba.conf's format.
local DATABASE USER METHOD [OPTION] host DATABASE USER CIDR-ADDRESS METHOD [OPTION]
See "pool_hba.conf.sample" for details of each field.
Here are the limitations of pool_hba.
Though "hostssl" is invalid but pgpool-II 2.3 or later supports SSL. See SSL for more details.
Since pgpool does not know anything about users in the backend server, database name is simply checked against the entries in the DATABASE field of pool_hba.conf.
This is the same reason as in the "samegroup" described above. A user name is simply checked against the entries in the USER field of pool_hba.conf.
pgpool currently does not support IPv6.
Again, this is the same reason as in the "samegroup" described above. pgpool does not hold user/password information.
To use md5 autentication, you need to register your name and password in "pool_passwd". See Authentication / Access Controls for more details.
Note that everything described in this section is about a client authentication between a client and pgpool; a client still have to go through an authentication process with PostgreSQL. As far as pool_hba is concerned, it does not matter if a user name and/or database name given by a client (i.e. psql -U testuser testdb) really exist in the backend. pool_hba only cares if a match in the pool_hba.conf is found or not.
PAM authentication is supported using user information on the host where pgpool is executed. To enable PAM support in pgpool, specify "--with-pam" option to configure:
configure --with-pam
To enable PAM authentication, you need to create a service-configuration file for pgpool in the system's PAM configuration directory (which is usually at "/etc/pam.d"). A sample service-configuration file is installed as "share/pgpool.pam" under the install directory.
The Query cache can be used in all modes in pgpool-II. The setting of pgpool.conf is set as follows.
enable_query_cache = true
Moreover, please make the following tables in the System DB.
CREATE TABLE pgpool_catalog.query_cache ( hash TEXT, query TEXT, value bytea, dbname TEXT, create_time TIMESTAMP WITH TIME ZONE, PRIMARY KEY(hash, dbname) );
However, please rewrite it suitably when you use a different schema because the schema name is "pgpool_catalog" in this example.
All the backends and the System DB (if necessary) must be started before starting pgpool-II.
pgpool [-c][-f config_file][-a hba_file][-F pcp_config_file][-n][-D][-d]
-c --clear-cache | deletes query cache |
-f config_file --config-file config-file | specifies pgpool.conf |
-a hba_file --hba-file hba_file | specifies pool_hba.conf |
-F pcp_config_file --pcp-password-file | specifies pcp.conf |
-n --no-daemon | no daemon mode (terminal is not detached) |
-D --discard-status | Discard pgpool_status file and do not restore previous status |
-d --debug | debug mode |
pgpool [-f config_file][-F pcp_config_file] [-m {s[mart]|f[ast]|i[mmediate]}] stop
-m s[mart] --mode s[mart] |
waits for clients to disconnect, and shutdown (default) |
-m f[ast] --mode f[ast] |
does not wait for clients; shutdown immediately |
-m i[mmediate] --mode i[mmediate] |
the same as '-m f' |
pgpool records backend status into [logdir]/pgpool_status file. When pgpool restarts it reads the file and restore the backend status. This will prevent data difference among DB nodes which might be caused by following scenario:
If for some reason, for example, the stopping DB is synched with the active DB by hand, you could remove pgpool_status safely before starting pgpool.
pgpool-II can reload configuration files without restarting it.
pgpool [-c][-f config_file][-a hba_file][-F pcp_config_file] reload
-f config_file --config-file config-file | specifies pgpool.conf |
-a hba_file --hba-file hba_file | specifies pool_hba.conf |
-F pcp_config_file --pcp-password-file | specifies pcp.conf |
Please note that some configuration items are not changed with relaoding. Also configuration reflects its changes after new session starts.
pgpool-II offers to grab informations via SHOW commands. SHOW is a real SQL statement, but pgPool-II intercepts this command if it asks for specific pgPool-II informations. Those are:
"pool_status" SQL statement was already available in previous releases, but the other ones appear in release 3.0.
"SHOW pool_status" sends back a list of configuration parameters with their name, value, and description. Here is an excerpt of the result:
benchs2=# show pool_status; item | value | description -------------------------------+---------------------------------+------------------------------------------------------------------ listen_addresses | 127.0.0.1 | host name(s) or IP address(es) to listen to port | 9999 | pgpool accepting port number socket_dir | /tmp | pgpool socket directory num_init_children | 5 | # of children initially pre-forked child_life_time | 300 | if idle for this seconds, child exits
"SHOW pool_nodes" sends back a list of all configured nodes. It displays the hostname, the port, the status, and the weight (only meaningful if you use the load balancing mode). The possible values in the status column are explained in the pcp_node_info reference.
benchs2=# show pool_nodes; hostname | port | status | lb_weight -------------+------+--------+----------- 127.0.0.1 | 5432 | 2 | 0.5 192.168.1.7 | 5432 | 3 | 0.5 (2 lignes)
"SHOW pool_processes" sends back a list of all pgPool-II processes waiting for connections and dealing with a connection.
It has 6 columns:
It'll always return num_init_children lines.
benchs2=# show pool_processes; pool_pid | start_time | database | username | create_time | pool_counter ----------+---------------------+----------+-----------+---------------------+-------------- 8465 | 2010-08-14 08:35:40 | | | | 8466 | 2010-08-14 08:35:40 | benchs | guillaume | 2010-08-14 08:35:43 | 1 8467 | 2010-08-14 08:35:40 | | | | 8468 | 2010-08-14 08:35:40 | | | | 8469 | 2010-08-14 08:35:40 | | | | (5 lines)
"SHOW pool_pools" sends back a list of pools handled by pgPool-II. their name, value, and description. Here is an excerpt of the result:
It has 11 columns:
It'll always return num_init_children*max_pool lines.
pool_pid | start_time | pool_id | backend_id | database | username | create_time | majorversion | minorversion | pool_counter | pool_backendpid | pool_connected ----------+---------------------+---------+------------+----------+-----------+---------------------+--------------+--------------+--------------+-----------------+---------------- 8465 | 2010-08-14 08:35:40 | 0 | 0 | | | | | | | | 8465 | 2010-08-14 08:35:40 | 1 | 0 | | | | | | | | 8465 | 2010-08-14 08:35:40 | 2 | 0 | | | | | | | | 8465 | 2010-08-14 08:35:40 | 3 | 0 | | | | | | | | 8466 | 2010-08-14 08:35:40 | 0 | 0 | benchs | guillaume | 2010-08-14 08:35:43 | 3 | 0 | 1 | 8473 | 1 8466 | 2010-08-14 08:35:40 | 1 | 0 | | | | | | | | 8466 | 2010-08-14 08:35:40 | 2 | 0 | | | | | | | | 8466 | 2010-08-14 08:35:40 | 3 | 0 | | | | | | | | 8467 | 2010-08-14 08:35:40 | 0 | 0 | | | | | | | | 8467 | 2010-08-14 08:35:40 | 1 | 0 | | | | | | | | 8467 | 2010-08-14 08:35:40 | 2 | 0 | | | | | | | | 8467 | 2010-08-14 08:35:40 | 3 | 0 | | | | | | | | 8468 | 2010-08-14 08:35:40 | 0 | 0 | | | | | | | | 8468 | 2010-08-14 08:35:40 | 1 | 0 | | | | | | | | 8468 | 2010-08-14 08:35:40 | 2 | 0 | | | | | | | | 8468 | 2010-08-14 08:35:40 | 3 | 0 | | | | | | | | 8469 | 2010-08-14 08:35:40 | 0 | 0 | | | | | | | | 8469 | 2010-08-14 08:35:40 | 1 | 0 | | | | | | | | 8469 | 2010-08-14 08:35:40 | 2 | 0 | | | | | | | | 8469 | 2010-08-14 08:35:40 | 3 | 0 | | | | | | | | (20 lines)
"SHOW pool_version" displays a string containing the pgPool-II release number. Here is an example of it:
benchs2=# show pool_version; pool_version ------------------------ 3.0-dev (umiyameboshi) (1 line)
pgpool-II, with replication mode, can sync database and attach a node in service. We call that "online recovery".
A recovery target node must have detached before doing online recovery. If you wish to add PostgreSQL server dynamically, add backend_hostname etc. parameters and reload pgpool.conf. pgpool-II registers a new node as a detached node.
caution: Stop autovacuum on the master node (the first node which is up and running). Autovacuum may change the contents of the database and might cause inconsistency after online recovery if it's runnung.
If PostgreSQL have already started, you need to shutdown PostgreSQL process.
pgpool-II performs online recovery in separated two phase. It has a few seconds or minutes client wait connecting to pgpool-II while a recovery node synchronizes database. It follows these steps:
We call the first step of data sync "first stage". We synchronize data in the first stage. In the first stage, you can update or retrieve all tables concurrently.
You can specify a script executed at the first stage. pgpool-II passes three arguments to the script.
We synchronize data. We call it "second stage". Before entering the second stage, pgpool-II waits until all clients have disconnected. It blocks any connection until finishing the second stage. After all connections are disconnected, merge updated data between the first stage and the second stage. We perform final data synchronization.
Note that there is a restriction about online recovery. If pgpool-II works on multiple hosts, online recovery does not work correctly, because pgpool-II stops clients on the 2nd stage of online recovery. If there are some pgpool hosts, pgpool-II excepted for receiving online recovery request cannot block connections.
Set the following parameters for online recovery in pgpool.conf.
You need to install the C language function for online recovery to "template1" database of all backend nodes. Source code is in pgpool-II tar ball.
pgpool-II-x.x.x/sql/pgpool-recovery/
Change directory there and do "make install".
% cd pgpool-II-x.x.x/sql/pgpool-recovery/ % make install
Then, install SQL function.
% cd pgpool-II-x.x.x/sql/pgpool-recovery/ % psql -f pgpool-recovery.sql template1
We must deploy data sync scripts and a remote start script into database cluster($PGDATA). Sample script files are available in pgpool-II-x.x.x/sample directory.
We explain how to do online recovery by Point In Time Recovery(PITR), which is available PostgreSQL 8.2 or later. Note that all PostgreSQL servers need to enable PITR.
We prepare a script to get base backup on a master node and copy to recovery target node on the first stage. The script is named "copy-base-backup" for example. Here is the sample script.
#! /bin/sh DATA=$1 RECOVERY_TARGET=$2 RECOVERY_DATA=$3 psql -c "select pg_start_backup('pgpool-recovery')" postgres echo "restore_command = 'scp $HOSTNAME:/data/archive_log/%f %p'" > /data/recovery.conf tar -C /data -zcf pgsql.tar.gz pgsql psql -c 'select pg_stop_backup()' postgres scp pgsql.tar.gz $RECOVERY_TARGET:$RECOVERY_DATA
The script generates the following recovery.conf.
restore_command = 'scp master:/data/archive_log/%f %p'
Then, we prepare a script to switch XLOG on the second stage. The script is named "pgpool_recovery_pitr" for example. This script forces to switch the transaction log. For this purpose, pg_switch_xlog can be used. However it may return *before* actually the switcing has been done and this might lead to failre of online recovery. Pgpool-II provides safer function called "pgpool_switch_xlog" which will wait until the transaction log switching actually done. pgpool_switch_xlog is installed in the procedure Installing C functionssection.
Here is the sample script.
#! /bin/sh # Online recovery 2nd stage script # datadir=$1 # master dabatase cluster DEST=$2 # hostname of the DB node to be recovered DESTDIR=$3 # database cluster of the DB node to be recovered port=5432 # PostgreSQL port number archdir=/data/archive_log # archive log directory # Force to flush current value of sequences to xlog psql -p $port -t -c 'SELECT datname FROM pg_database WHERE NOT datistemplate AND datallowconn' template1| while read i do if [ "$i" != "" ];then psql -p $port -c "SELECT setval(oid, nextval(oid)) FROM pg_class WHERE relkind = 'S'" $i fi done psql -p $port -c "SELECT pgpool_switch_xlog('$archdir')" template1
While loop in the script forces to emit current value of sequences in all databases in the master node to the transaction log so that it propagates to the recovery target node.
We deploy these scripts into $PGDATA.
Finally, we edit pgpool.conf.
recovery_1st_stage_command = 'copy-base-backup' recovery_2nd_stage_command = 'pgpool_recovery_pitr'
We have finished preparing online recovery by PITR.
pgpool_remote_start
The script is to start up postmaster process from remote host. pgpool-II executes as the following way.
% pgpool_remote_start remote_host remote_datadir remote_host: Hostname of a recovery target. remote_datadir: Database cluster path of a recovery target.
In the sample script, we start up postmaster process over ssh. So you need to connect over ssh without pass .
If you recover with PITR, you need to expand a base backup. Then, postmaster automatically starts up with PITR. Then it accepts connections.
#! /bin/sh DEST=$1 DESTDIR=$2 PGCTL=/usr/local/pgsql/bin/pg_ctl # Expand a base backup ssh -T $DEST 'cd /data/; tar zxf pgsql.tar.gz' 2>/dev/null 1>/dev/null < /dev/null # Startup PostgreSQL server ssh -T $DEST $PGCTL -w -D $DESTDIR start 2>/dev/null 1>/dev/null < /dev/null &
PostgreSQL 7.4 does not have PITR. So we use rsync to do online recovery. In "sample" directory of pgpool-II tar ball, there is a recovery script named "pgpool_recovery". It uses rsync command. pgpool-II calls the script with three arguments.
% pgpool_recovery datadir remote_host remote_datadir datadir: Database cluster path of a master node. remote_host: Hostname of a recovery target node. remote_datadir: Database cluster path of a recovery target node.
The script copy physical file with rsync over ssh. So you need to connect over ssh without pass .
Note of rsync:
If you use pgpool_recovery, add the following lines into pgpool.conf.
recovery_1st_stage_command = 'pgpool_recovery' recovery_2nd_stage_command = 'pgpool_recovery'
For doing online recovery, you use pcp_recovery_node command or operate on pgpoolAdmin.
Note that you need to pass a greater number to the first argument of pcp_recovery_node. It is a timeout parameter. If you use pgpoolAdmin, set "_PGPOOL2_PCP_TIMEOUT " parameter to a greater number in pgmgt.conf.php.
This section describes troubles and their workarounds while you are using pgpool-II.
Health checking of pgpool-II detects DB nodes failure.
2010-07-23 16:42:57 ERROR: pid 20031: health check failed. 1 th host foo at port 5432 is down 2010-07-23 16:42:57 LOG: pid 20031: set 1 th backend down status 2010-07-23 16:42:57 LOG: pid 20031: starting degeneration. shutdown host foot(5432) 2010-07-23 16:42:58 LOG: pid 20031: failover_handler: set new master node: 0 2010-07-23 16:42:58 LOG: pid 20031: failover done. shutdown host foo(5432)
The log shows that DB node 1(host foo) goes down and disconnected then DB node 0 becomes new master. Check DB node 1 and remove the cause of failure. After that do online recovery against DB node 1 if possible.
2010-07-26 18:43:24 LOG: pid 24161: ProcessFrontendResponse: failed to read kind from frontend. frontend abnormally exited
This log indicates that the frontend program disconnected pgpool-II without proper procedure. The possible causes are: bugs of applications, applications are forced to be terminated, or temporary network failure. Even such kind of events happen, there's no possibility of DB destruction or breaking data consistency. It is advised that you check the applications and networks if the message keeps on logging.
It is possible that you have this error when pgpool-II operates in replication mode.
2010-07-22 14:18:32 ERROR: pid 9966: kind mismatch among backends. Possible last query was: "FETCH ALL FROM c;" kind details are: 0[T] 1[E: cursor "c" does not exist]
Pgpool-II waits for reponses from DB nodes after sending SQL command to them. The message indicates that each DB nodes return back different kind of response. You see SQL which possibly caused the error after "Possible last query was:". Then kind of response follows. If the response indicates an error, the error message from PostgreSQL is shown. Here you see "0[T]" saying that the 0the DB node responses "T"(starting to send row description), and "1[E" indicates that DB node 1 returns error with message "cursor "c" does not exist".
Caution: You will see this error when operating in master/slave mode as well. For example, even in the mode, SET command will be basically sent to all DB nodes to keep all the DB node in same condition.
Check the database and resync them using online recovery if you find databases are out of sync.
In replication mode, pgpool-II detects the difference of number of INSERT/UPDATE/DELETE rows.
2010-07-22 11:49:28 ERROR: pid 30710: pgpool detected difference of the number of inserted, updated or deleted tuples. Possible last query was: "update t1 set i = 1;" 2010-07-22 11:49:28 LOG: pid 30710: ReadyForQuery: Degenerate backends: 1 2010-07-22 11:49:28 LOG: pid 30710: ReadyForQuery: Affected tuples are: 0 1
In the example above, number of updated rows by "update t1 set i = 1" was differe among DB nodes. Also in the next line indicates that DB 1 is disconnected, and number of affected rows of DB node 0 was 0, while in DB node 1 that was 1.
Stop the DB node which suppposed to have wrong data and do online recovery.
pgpool-II 2.3.2 or later supports large object replication if backend is PostgreSQL 8.1 or later. For this, you need to enable lobj_lock_table directive in pgpool.conf. Large object replication using backend function lo_import is not supported, however.
Creating/inserting/updating/deleting temporary tables are always executed on master(primary). With pgpool-II 3.0 or later, SELECT to it is executed on master as well. However if the temporary table name is used as a literal in SELECT, there's no way to detect it, and the SELECT will be load balanced. That will cause "not found the table" or finding another table having same name. To avoid the problem, use /*NO LOAD BALANCE*/ SQL comment.
Sample SELECT which causes a problem: SELECT 't1'::regclass::oid;
psql's \d command uses literal table names. pgpool-II 3.0 or later checks if the SELECT includes access to system catalogs and such queries are always sent to master. Thus we avoid the problem.
There is no guarantee that the data, which returns different values each time even though the same query was executed (e.g. random number, transaction ID, OID, SERIAL, sequence), will be replicated correctly on multiple backends.
For SERIAL, enabling insert_lock will help replicating data. insert_lock also helps SELECT setval() and SELECT nextval().
In pgpool-II 2.3 or later, INSET/UPDATE using CURRENT_TIMESTAMP, CURRENT_DATE, now() can be replicated. Also INSET/UPDATE for tables using CURRENT_TIMESTAMP, CURRENT_DATE, now() as their DEFAULT values can be replicated. This is done by replacing those functions by constrants fetched from master at query execution time. There are a few limitations however:
CREATE TABLE rel1( d1 date DEFAULT CURRENT_DATE + 1 ) is treated same as: CREATE TABLE rel1( d1 date DEFAULT CURRENT_DATE )
Please note that if the column type is not a temporal one, rewriting is not performed. Such example:
foo bigint default (date_part('epoch'::text,('now'::text)::timestamp(3) with time zone) * (1000)::double precision)
CREATE TABLE rel1( c1 int, c2 timestamp default now() )We can replicate
INSERT INTO rel1(c1) VALUES(1)since this turn into
INSERT INTO rel1(c1, c2) VALUES(1, '2009-01-01 23:59:59.123456+09')However,
INSERT INTO rel1(c1) SELECT 1cannot to be transformed thus cannot be properly replicated in the current implementation.
Tables created by CREATE TEMP TABLE
will be deleted at the end of
session by specifying DISCARD ALL in reset_query_list if you are using PostgreSQL 8.3 or later.
For 8.2.x or before CREATE TEMP TABLE
will not be
deleted after exitting a session. It is because of the connection
pooling which, from the backend's point of view, seems that the
session is still alive. To avoid this, you must explicitly drop the
temporary tables by DROP TABLE
, or use CREATE TEMP
TABLE ... ON COMMIT DROP
inside the transaction block.
Here are the queries which cannot be processed by pgpool-II
You cannot use DEFAULT
with the
partitioning key column. For example, if the column x in the table t
was the partitioning key column,
INSERT INTO t(x) VALUES (DEFAULT);
is invalid. Also, functions cannot be used as the value either.
INSERT INTO t(x) VALUES (func());
Constant values must be used to INSERT with the partitioning
key. SELECT INTO
and INSERT INTO ... SELECT
are also not supported.
Data consistency between the backends may be lost if the partitioning key column values are updated. pgpool-II does not re-partition the updated data.
A transaction cannot be rolled back if a query has caused an error on some backends due to the constraint violation.
If a function is called in the WHERE
clause, that
query might not be executed correctly.
e.g.) UPDATE branches set bid = 100 where bid = (select max(bid) from beances);
If a function is called in the WHERE
clause, that
query might not be executed correctly.
e.g.) SELECT * FROM branches where bid = (select max(bid) from beances) FOR UPDATE;
COPY BINARY
is not supported. Copying from files are
also not supported. Only COPY FROM STDIN
and COPY
TO STDOUT
are supported.
To update the partitioning rule, pgpool-II must be restarted in order to read them from the System DB.
SELECT
statements executed inside a transaction block
will be executed in the separate transaction. Here is an example:
BEGIN; INSERT INTO t(a) VALUES (1); SELECT * FROM t ORDER BY a; <-- INSERT above is not visible from this SELECT statement END;
The same definition will be created on all the backends for views and rules.
SELECT * FROM a, b where a.i = b.i
JOIN
s like above will be executed within one backend, and
then merged with the results returned by each backend. Views and Rules
that joins across the nodes cannot be created.
However, to JOIN tables that divide data only in the same node, VIEW can be made.
VIEW is registered in the pgpool_catalog.dist_def table. Moreover, please register
a col_name and a dist_def_func. These are used when Insert for View was used.
The same definition will be created on all the backends for functions. Joins across the nodes, and data on the other nodes cannot be manipulated inside the functions.
The extended query protocol used by JDBC drivers, etc. is not supported. The simple query protocol must be used.
The Natural Join is not supported. "ON join condition" or "USING (join_column)" must be needed.
The USING CLAUSE is converted to ON CLAUSE by query rewrite process. Therefore, when "*" is used at target list, the row of the same column name appears.
pgpool-II does not translate between different multi-byte characters. The encoding for the client, backend and System DB must be the same.
pgpool-II cannot process multi-statement query.
Deadlocks across the backends cannot be detected. For example:
(tellers table is partitioned using the following rule) tid <= 10 --> node 0 tid >= 10 --> node 1 A) BEGIN; B) BEGIN; A) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE; B) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE; A) SELECT * FROM tellers WHERE tid = 1 FOR UPDATE; B) SELECT * FROM tellers WHERE tid = 11 FOR UPDATE;
In the case above, a single node cannot detect the deadlock, so pgpool-II will still wait for the response. This phenomenon occurs with other queries that acquire row level locks.
Also, if a deadlock occurs in one node, transaction states in each node will not be consistent. Therefore, pgpool-II terminates the process if a deadlock is detected.
pool_read_kind: kind does not match between master(84) slot[1] (69)
Objects in a schema other than public must be fully qualified like:
schema.object
pgpool-II cannot resolve the correct schema when the path was set as follows:
set search_path = xxx
and the schema name was omitted in a query.
Limitation object:Parallel mode
A table or a column name cannot starts by pool_. When rewriting the query, the name is used by internal processing.
Only one (1) partitioning key column can be defined in one partitioning rule. Conditions like 'x or y' are not supported.
libpq
is linked while building pgpool-II. libpq
version must be 3.0. Building pgpool-II with libpq version 2.0 will
fail. Also, the System DB must be PostgreSQL 7.4 or later.
Currently, query cache must be deleted manually. pgpool-II does not invalidate old query cache automatically when the data is updated.
PCP commands are UNIX commands which manipulate pgpool-II via network.
* pcp_node_count - retrieves the number of nodes * pcp_node_info - retrieves the node information * pcp_proc_count - retrieves the process list * pcp_proc_info - retrieves the process information * pcp_systemdb_info - retrieves the System DB information * pcp_detach_node - detaches a node from pgpool-II * pcp_attach_node - attaches a node to pgpool-II * pcp_stop_pgpool - stops pgpool-II
There are five arguments common to all of the PCP commands. They give information about pgpool-II and authentication. Extra arguments may be needed for some commands.
e.g.) $ pcp_node_count 10 localhost 9898 postgres hogehoge First argument - timeout value in seconds. PCP disconnects if pgpool-II does not respond in so many seconds. Second argument - pgpool-II's hostname Third argument - PCP port number Fourth argument - PCP username Fifth argument - PCP password
PCP usernames and passwords must use ones in the
pcp.conf
in $prefix/etc
directory. -F
option can be used when starting pgpool-II
if pcp.conf
is placed somewhere else. The password does
not need to be in md5 format when passing it to the PCP commands.
All PCP commands display the results to the standard output.
Format: pcp_node_count _timeout_ _host_ _port_ _userid_ _passwd_
Displays the number of total nodes defined in pgpool.conf
. It does
not distinguish nodes status, ie attached/detached. ALL nodes are counted.
Format: pcp_node_info _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_
Displays the information on the given node ID. The output example is as follows:
$ pcp_node_info 10 localhost 9898 postgres hogehoge 0 host1 5432 1 1073741823.500000 The result is in the following order: 1. hostname 2. port number 3. status 4. load balance weight Status is represented by a digit from [0 to 3]. 0 - This state is only used during the initialization. PCP will never display it. 1 - Node is up. No connections yet. 2 - Node is up. Connections are pooled. 3 - Node is down.
The load balance weight is displayed in normalized format.
The --verbose option can help understand the output. For example:
$ pcp_node_info --verbose 10 localhost 9898 postgres hogehoge 0 Hostname: host1 Port : 5432 Status : 1 Weight : 0.5
Specifying an invalid node ID will result in an error with exit status 12, and BackendError will be displayed.
Format: pcp_proc_count _timeout_ _host_ _port_ _userid_ _passwd_
Displays the list of pgpool-II child process IDs. If there is more than one process, IDs will be delimitted by a white space.
Format: pcp_proc_info _timeout_ _host_ _port_ _userid_ _passwd_ _processid_
Displays the information on the given pgpool-II child process ID. The output example is as follows:
$ pcp_proc_info 10 localhost 9898 postgres hogehoge 3815 postgres_db postgres 1150769932 1150767351 3 0 1 1467 1 postgres_db postgres 1150769932 1150767351 3 0 1 1468 1 The result is in the following order: 1. connected database name 2. connected username 3. process start-up timestamp 4. connection created timestamp 5. protocol major version 6. protocol minor version 7. connection-reuse counter 8. PostgreSQL backend process id 9. 1 if frontend conncted 0 if not
If there is no connection to the backends, nothing will be displayed. If there are multiple connections, one connection's information will be displayed on each line multiple times. Timestamps are displayed in EPOCH format.
The --verbose option can help understand the output. For example:
$ pcp_proc_info --verbose 10 localhost 9898 postgres hogehoge 3815 Database : postgres_db Username : postgres Start time : 1150769932 Creation time: 1150767351 Major : 3 Minor : 0 Counter : 1 PID : 1467 Connected : 1 Database : postgres_db Username : postgres Start time : 1150769932 Creation time: 1150767351 Major : 3 Minor : 0 Counter : 1 PID : 1468 Connected : 1
Specifying an invalid node ID will result in an error with exit status 12, and BackendError will be displayed.
Format: pcp_systemdb_info _timeout_ _host_ _port_ _userid_ _passwd_
Displays the System DB information. The output example is as follows:
$ pcp_systemdb_info 10 localhost 9898 postgres hogehoge localhost 5432 yamaguti '' pgpool_catalog pgpool 3 yamaguti public accounts aid 4 aid bid abalance filler integer integer integer character(84) dist_def_accounts yamaguti public branches bid 3 bid bbalance filler integer integer character(84) dist_def_branches yamaguti public tellers bid 4 tid bid tbalance filler integer integer integer character(84) dist_def_tellers First, the System DB information will be displayed on the first line. The result is in the following order: 1. hostname 2. port number 3. username 4. password. '' for no password. 5. schema name 6. database name 7. number of partioning rules defined
Second, partioning rules will be displayed on the following lines. If there are multiple definitions, one definition will be displayed on each line multiple times. The result is in the following order:
1. targeted partitioning database name 2. targeted partitioning schema name 3. targeted partitioning table name 4. partitioning key column name 5. number of columns in the targeted table 6. column names (displayed as many as said in 5.) 7. column types (displayed as many as said in 5.) 8. partitioning rule function name
If the System DB is not defined (i.e. not in pgpool-II mode, and query cache is disabled), it results in error with exit status 12, and BackendError will be displayed.
Format: pcp_detach_node [-g] _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_
Detaches the given node from pgpool-II. If -g is given, wait until all clients are disconnected (unless client_idle_limit_in_recovery is -1 or recovery_timeout is expired).
Format: pcp_attach_node _timeout_ _host_ _port_ _userid_ _passwd_ _nodeid_ Attaches the given node to pgpool-II.
Format: pcp_stop_pgpool _timeout_ _host_ _port_ _userid_ _passwd_ _mode_
Terminate pgpool-II process with the given shutdown mode. The availabe modes are as follows:
s - smart mode f - fast mode i - immediate mode
If pgpool-II process does not exist, it results in error with exit status 8, and ConnectionError will be displayed.
* Currently, there is no difference between the fast and immediate mode. pgpool-II terminates all the processes wheter there are clients connected to the backends.
PCP command exits with status 0 when everything goes well. If any error had occured, it will exit with the following error status.
UNKNOWNERR 1 Unknown Error (should not occur) EOFERR 2 EOF Error NOMEMERR 3 Memory shortage READERR 4 Error while reading from the server WRITEERR 5 Error while writing to the server TIMEOUTERR 6 Timeout INVALERR 7 Argument(s) to the PCP command was invalid CONNERR 8 Server connection error NOCONNERR 9 No connection exists SOCKERR 10 Socket error HOSTERR 11 Hostname resolution error BACKENDERR 12 PCP process error on the server (specifying an invalid ID, etc.) AUTHERR 13 Authorization failure
On The pgpool-II version 2.0.x, Extensive modifications are added compared with the version 1.x Please note incompatibility with information on the version 1.x.
The parallel execution engine is built into pgpool-II. This engine inquires same Query as each node, and indicates the engine that transmits the result to the front end in order of the response of the node.
It explains the Query rewriting that pgpool-II does in a parallel mode.
In a parallel mode, the inquiry of the retrieval system (SELECT processing) that the client transmitted is following it divided roughly Two processing is done.
Analysis of Query Rewriting of Query
I will sequentially explain these two processing.
Introduction
The inquiry of the retrieval system that the client transmitted analyzes Query based on information registered in system DB after SQL parser is passed. It evaluates it to the analysis of Query by the transition of the execution status. The execution status is the one that the set of a certain data acquires where or it is treatable is judged here. For instance, because data is divided as for the entire data sets of tables registered in the pgpool_catalog.dist_def table, it is necessary to acquire it from all nodes. Oppositely, the entire data sets of tables registered in the pgpool_catalog.replicate_def table are enough if it does not acquire from all nodes but it acquires it from either of the nodes. The state that should be processed here by all nodes The state of P and the state that should be processed by one node It defines it as a state of L. As special ..another.. There is a state of S. This shows the state when processing it to all data acquired from all nodes. For instance, the sorting application. After data is acquired from all nodes, it is necessary to execute the sorting application to the data of the table registered in the pgpool_catalog.dist_def table.
Retrieval system Query is analyzed in the following order of processing, and the execution status changes. In the process to which the execution status changes the following processing : as for the state of S It enters the state of S. Also whether it is processed with DB where is decided by the state of the final execution status of the last SELECT.
The relation between the final execution status of SELECT and the processed place is as follows.
Execution status | Processed place |
L | It inquires of either of node. |
P | It returns it to the client through all the same node inquiries and a parallel execution engines. |
S | After it processes it with system DB, it returns it to the client. |
Moreover, the above-mentioned rule adjusts to Sub-Query. In simple following Query, When p1-table is registered in pgpool_catalog.dist_def table of system DB, that is, when data is divided, the final execution status of Sub-Query : It becomes P, and call origin of Sub-Query as a result Execution status of SELECT also : It becomes P
SELECT * FROM (SELECT * FROM P1-table) as P2-table;
Next, it explains how the execution status changes concretely. Two first of all. It explains from the execution status of the From .
Execution status of FROM Clause
Retrieval system Query (SELECT) The set of data is defined according to the FROMCluase. ..the state of P and the state of L.. The state of S is taken. The execution status of the table : simply when the table specified for the FROM Clause is one It becomes the execution status of the entire data set composed of the FROM Cluase. The execution status is decided according to the JOIN method as follows when there is two or more tables or Sub-Query in the FROM Clause.
JOIN type | LEFT OUTER JOIN | RIGHT OUTER JOIN | FULL OUTER JOIN | others | ||||||||
left\right | P | L | S | P | L | S | P | L | S | P | L | S |
P | S | P | S | S | S | S | S | S | S | S | P | S |
L | S | L | S | P | L | S | S | L | S | P | L | S |
S | S | S | S | S | S | S | S | S | S | S | S | S |
In the following examples, P1-table with the table in the state of P : L1-table and L2-table : It is assumed that the table in the state of L. P1-table (left) and L1-table (right) according to the above-mentioned table then JOIN : In addition ..entering the state of P.., With the state of P L2-table in the state of L joins and the execution status of the FROM Clause It enters the state of P.
SELECT * FROM P1-table,L1-table,L2-table;
Change in execution status by TARGETLIST and Where Clause
In a basic Query, the same execution status as the FROM Clause is succeeded to. However, with TARGETLIST, te execution status of the WHERE Clause changes in the following cases.
The final execution status of the subQuery The execution status of TARGETLIST and the WHERE Clause enters the state of S for the state of P or S. In the following example, when the table used by the subQuery is P, the final execution status of the subQuery : It enters the state of P. Therefore, The execution status of the WHERE Clause : without depending on the execution status of L1-table It enters the state of S, and the execution place of this Query becomes system DB.
SELECT * FROM L1-table where L1-table.column IN (SELECT * FROM P1-table);
The FROM Clause It changes in the state of S to have to total it after data is acquired when and, there is an aggregate function in TARGETLIST for P. Moreover, optimization by the aggregate function is done under a specific condition.
The column that does not exist in the table and the subQuery defined by the FROM Clause might be used for the WHERE Clause. This is generated in following correlation Sub-Query.
SELECT * FROM L1-table FROM L1-table.col1 IN (SELECT * FROM P1-table WHERE P1-table.col = L1-table.col1);
It is used for the above-mentioned Sub-Query External refer to L1-table for L1-table.col1. The execution status of the WHERE Clause of the Sub-Query : in this case It enters the state of S.
Change in execution status by GROUP BY, HAVING, ORDER BY and LIMIT OFFSET
The execution status of the WHERE Clause It changes in the state of S when there are GROUP BY, HAVING Clause, and ORDER BY Clause and LIMIT OFFSET predicate for P. The Query where the GROUP BY Clause does not exist succeeds to the execution status of the WHERE Clause. Moreover, the execution status of the GROUP BY Clause is succeeded to when there is no HAVING Clause. The ORDER BY Clause and the LIMIT OFFSET predicate are also similar.
When UNION, EXTRACT, and INTERSECT are used
The Query that UNION, EXTRAT, and INTERSECT use depends on the final execution status of a left SELECT sentence and right SELECT sentence. The final execution status of a left, right SELECT sentence both : It enters the state of L at the state of L. Moreover, the final execution status of a left, right SELECT sentence both : For the state of P and UNION ALL It enters the state of P. It enters the state of S for other combinations.
Acquisition of the final execution status of SELECT
Everything the execution status Everything ..the state of L.. for L It enters the state of P for P. Besides, it enters the state of S. The load is distributed when loadbalance_mode of pgpool.conf is true for L, and it inquires of MASTER besides. Moreover, the parallel processing is done for P with a parallel execution engine. The Query rewriting at S that is the following phase is done.
The Query is rewritten by using the execution status acquired in an analyzing phase of the Query. As an example the state of P With P1-table the state of L It explains by the Query that uses L1-table.
SELECT P1-table.col, L1-table.col FROM P1-table,L1-table where P1-table.col = L1-table.col order by P1-table.col;
In this Query Because there is ORDER BY Clause It enters the state of S, and the FROM Clause, the WHERE Clause, and TARGETLIST : It enters the state of P. It is rewritten in such a Query as follows.
SELECT P1-table.col, L1-table.col FROM dblink(select pool_parallel(SELECT P1-table.col, L1-table.col FROM P1-table,L1-table where P1-table.col = L1-table.col)) order by P1-table.col;
the dblink transmits the inquiry to pgpool-II here. Moreover, pool_parallel is the function which send the Query of arguments to the parallel execution engine. The above-mentioned is an image to the end and no actually executable Query.
The FROM Clause, the WHERE Clause, and TARGETLIST all : like the above-mentioned example The parallel processing is done bringing the FROM Clause, the WHERE Clause, and TARGETLIST together for P.
The following example is seen.
SELECT L1-table.col FROM L1-table WHERE L1-table.col % 2 = 0 AND L1-table.col IN (SELECT P1-table FROM P1-table) ;
In this example, the FROM Clause : The state of L and TARGETLIST also : The state of L and the WHERE Clause : Because it has the subQuery in the state of P It enters the state of S. As for this, rewriting is done as follows.
SELECT L1-table.col FROM dblink(SELECT loadbalance(SELECT L1-table.col FROM L1-table WHERE L1-table.col % 2 = 0 AND TRUE)) WHERE L1-table.col %2 = 0 AND L1-table.col IN ( SELECT P1-Table FROM dblink(select pool_parallel(SELECT P1-table FROM P1-table)) ) ;
Here, pool_loadbalance is a function that transmits Queries to either of node.
As for the totaled Query (aggregate function and GROUP BY), it calculates, and the decreasing performance also improves the load of system DB to each node by recounting it with system DB.
First of all, rewriting the Query that pgpool-II actually does first is seen.
The Query which have state P in FROM Clause and count(*) in TARGETLIST, The Rewriting Query is done as follows
select count(*) from P1-table; -> rewrite SELECT sum(pool_c$1) as count FROM dblink(select pool_parallel('select count(*) from P1-table')) AS pool_$1g (pool_c$1 bigint);
The condition that the Query rewriting like the above-mentioned is done is the following.
When the Query is analyzed, the column name and the type are needed in a parallel mode. Therefore, when the expression and the function are used for TARGETLIST of the subQuery, it is necessary to give the alias and the type name in Cast. Please note processing as the text type when there are no Cast of the type in the expression and the function. As for count, when the Query rewriting by the case of the aggregate function and consolidating is done, the bigint type and sum become numeric types. It is calculated as a date type when the argument is a date type for min and max, and, besides, it is calculated as numeric. Avg is processed as sum/count.
The final execution status of SELECT and a rough standard of the performance are as follows.
Execution status | Performance |
L | There is no performance deterioration with a single purpose node excluding the overhead of pgpool-II because it does not use a parallel Query. |
P | The parallel processing is done with high-speed, especially the sequential scanning. Moreover, it becomes easy to get on cash because the size of the table becomes small by dividing data. |
S | When the Query rewriting by aggregate functions is done, it is fast. |
A tutorial for pgpool-II is available.