Python functions are Python modules that can be created using the standard CREATE FUNCTION syntax:
CREATE FUNCTION funcname(args)
RETURNS return_type LANGUAGE python AS
$python$
import Postgres
def main(*args):
return value
$python$;
When a Python function is called using SQL, the function’s module is imported, and the entry point is called–normally main.
Note
The term “Python function” is often used in reference to an SQL function object whose language is python.
The interface module, Postgres, provides all of the Python types and interface functions created by pg-python. In order to provide convenient access to often used objects, some additional built-ins are provided by pg-python.
- @pytypes
- Alias to Postgres.pytypes.
- prepare(sql_statement)
- Alias to Postgres.Statement
- proc(procedure_id)
- Function that takes pg_catalog.regprocedure text or a function Oid and returns the corresponding Postgres.Function.
- sqleval(sql_expr, *args)
- Alias to Postgres.eval.
- sqlexec(str_of_sql_statements)
- Execute multiple SQL statements. Alias to Postgres.execute.
- xact()
- Context manager that starts and ends an internal subtransaction. Alias to Postgres.Transaction.
Normal functions are executed by invoking the main entry point defined by the function module. If it does not exist, a Python AttributeError will be raised as a Postgres error.
The main entry point must be able to take the same number of parameters as declared by the CREATE FUNCTION statement that created the Postgres function. All arguments are given as positional arguments to the main object.
Tip
main does not have to be a regular function object. Any callable object will suffice.
The following example shows the creation of a simple addition function:
CREATE FUNCTION add(n1 int, n2 int)
RETURNS int LANGUAGE python AS
$python$
"""
Return the sum of the two parameters.
"""
@pytypes
def main(n1, n2):
return n1 + n2
$python$;
SELECT add(1,2);
-- add
-------
-- 3
--(1 row)
Note
The above example makes use of the pytypes decorator. This wrapper converts a few of the common data objects into standard Python objects. See Postgres.pytypes for more details.
Set returning functions are very much like normal functions. The difference is that the result is expected to be an iterable of objects that can be coerced into the result type.
The exhaustion of the iterable will signal the termination of the set produced by the function:
CREATE FUNCTION simple_strings()
RETURNS SETOF text LANGUAGE python AS
$python$
"""
Return an iterable of strings.
"""
def main():
return ("Hello", "World", "!")
$python$;
SELECT s FROM simple_strings() AS t(s);
-- s
---------
-- Hello
-- World
-- !
--(3 rows)
Any iterable can be returned. Notably, main can be a generator yielding the objects to be coerced.
Trigger functions, like normal functions, are Python modules. However, unlike normal functions, trigger functions don’t use the main entry point. Rather, trigger functions have entry points for each possible event type–with respect to orientation, timing, and manipulation. Multiple entry points can be defined in order to allow the trigger function to handle multiple event types.
Trigger returning functions are used by CREATE TRIGGER statements to designate the procedure that should be executed when the described event occurs.
Trigger functions are declared in the same fashion as normal functions, using CREATE FUNCTION:
CREATE FUNCTION funcname()
RETURNS TRIGGER
LANGUAGE python AS
$python$
"""
A trigger function that does nothing.
"""
def before_insert(td, new):
pass
$python$;
When a trigger function’s entry point is invoked, it is given at least one argument, the Postgres.TriggerData describing the event. Additional arguments given to the entry point, if any, depend on the entry point–each entry point takes a certain number of arguments that suites the event.
Row triggers can be fired for each row that is inserted, updated, or deleted. The additional arguments given to a row trigger entry points are records of the target table’s type–a subclass of Postgres.types.record. These arguments provide access the row being inserted, updated, or deleted.
Row Trigger Entry Points:
- def before_insert(td, new):
The new argument is the row that will be inserted.
Executed for each row, before the insertion is actually made. This entry point can provide a substitute row or suppress the manipulation entirely.
- def before_update(td, old, new):
The old argument is the row that is being updated into the new argument.
Executed for each row, before the update is actually made. This entry point can provide a substitute row or suppress the manipulation entirely.
- def before_delete(td, old):
The old argument is the row being removed.
Executed for each row, before the row is actually removed. This entry point can only suppress the manipulation.
- def after_insert(td, new):
- Executed for each row, after the insert occurs. The new argument is the row that was inserted. None must be returned.
- def after_update(td, old, new):
- Executed for each row, after the update occurs. The old argument is the row that was altered into the new argument. None must must be returned.
- def after_delete(td, old):
- Executed for each row, after the delete occurs. The old argument is the row that was removed from the table. None must must be returned.
The AFTER entry points require that the None object is returned. If None is not returned, the procedural language will throw an error.
When a manipulation should not proceed, it can be suppressed by a trigger. Suppression can only be enacted by triggers that are timed before the manipulation occurs. Specifically, before_insert, before_update, and before_delete are the only entry points that can cause suppression.
In order to signal that suppression should occur, the called entry point will need to raise the Postgres.StopEvent exception:
CREATE TABLE single_column(t text);
INSERT INTO single_column VALUES ('removed'), ('kept');
CREATE FUNCTION suppress_removal()
RETURNS TRIGGER LANGUAGE python AS
$python$
from Postgres import StopEvent
def before_delete(td, old):
if old['t'] == 'kept':
# This does the trick.
raise StopEvent
$python$;
CREATE TRIGGER suppres_it BEFORE DELETE ON single_column
FOR EACH ROW EXECUTE PROCEDURE suppress_removal();
DELETE FROM single_column;
SELECT t FROM single_column;
-- t
--------
-- kept
--(1 row)
When a different row should take the place of the one being inserted or updated, substitution can be performed by returning the replacement row. Substitution can only be used with row triggers that are timed before the manipulation. Specifically, only before_insert and before_update can perform row substitution. If None is returned, the manipulation will continue unaltered:
CREATE TABLE two_columns(i int, t text);
CREATE FUNCTION substitute_row()
RETURNS TRIGGER LANGUAGE python AS
$python$
def before_insert(td, new):
if new['i'] == 0:
# .replace() creates a *new* record with `t` set to 'newval'
return new.replace(t = 'newval')
# Otherwise, None is returned by default, which causes
# the manipulation to continue.
$python$;
CREATE TRIGGER execsub BEFORE INSERT ON two_columns
FOR EACH ROW EXECUTE PROCEDURE substitute_row();
INSERT INTO two_columns VALUES (0, 'text');
SELECT i, t FROM two_columns;
-- i | t
-----+--------
-- 0 | newval
--(1 row)
The object that is returned by the entry point will be coerced into a record of the composite type of the trigger’s table. This means that a simple Python tuple or dictionary could be returned in order to supply the replacement. See Composite Types for more information.
Tip
The replace and transform methods on records are often the most convenient mechanisms to create the appropriate substitute. See the Postgres.types.record reference documentation for more details on how to use these methods.
Like row-triggers, trigger functions called with statement orientation are given a Postgres.TriggerData object as the first argument. Statement triggers timed after are given one or two additional None objects–two for update manipulations. These additional arguments are given in order to accommodate for future functionality.
The following example shows a statement trigger that can handle multiple event types:
CREATE TABLE monitored_table(i int, t text);
CREATE FUNCTION watch_table()
RETURNS TRIGGER LANGUAGE python AS
$python$
import Postgres
def after_truncate_statement(td, olds):
Postgres.notify('was_truncated')
def after_delete_statement(td, olds):
Postgres.notify('removed_data')
def after_insert_statement(td, news):
Postgres.notify('new_data')
$python$;
CREATE TRIGGER watch_trigger BEFORE INSERT ON monitored_table
FOR STATEMENT EXECUTE PROCEDURE watch_table();
LISTEN was_truncated;
LISTEN new_data;
LISTEN removed_data;
INSERT INTO monitored_table VALUES (0, 'text'), (1, 'more text');
DELETE FROM monitored_table;
TRUNCATE monitored_table;
Statement Trigger Entry Points:
- def before_insert_statement(td):
- Executed before each INSERT statement.
- def before_update_statement(td):
- Executed before each UPDATE statement.
- def before_delete_statement(td):
- Executed before each DELETE statement.
- def before_truncate_statement(td):
- Executed before each TRUNCATE statement.
- def after_insert_statement(td, news):
Executed after each INSERT statement.
The news argument is always None.
- def after_update_statement(td, olds, news):
Executed after each UPDATE statement.
The olds and news arguments are always None.
- def after_delete_statement(td, olds):
Executed after each DELETE statement.
The olds argument is always None.
- def after_truncate_statement(td, olds):
Executed after each TRUNCATE statement.
The olds argument is always None.
Note
Stateful functions are an advanced feature of pg-python.
Stateful functions are entry points decorated with Postgres.Stateful. When an entry point is decorated with Postgres.Stateful, the callable should return a state object. The state object is normally a generator object, but it can be any object that supports the generator protocol:
CREATE FUNCTION counting() RETURNS int LANGUAGE python AS
$python$
"""
It counts.
"""
from Postgres import Stateful
@Stateful
def main():
i = 0
while True:
yield i
i = i + 1
$python$;
SELECT counting(), counting(), col FROM (
SELECT 'one'::text AS col UNION ALL
SELECT 'two'::text AS col) AS t(col);
counting | counting | col
----------+----------+-----
0 | 0 | one
1 | 1 | two
(2 rows)
The example shows that state is not shared by multiple calls. Each, independent invocation of the function will have its own state object. For state shared across all invocations, there is no need to use the stateful feature, a module global can be referenced and updated as needed.
Stateful functions get new arguments every time they are invoked; the call to the send method is given a Python tuple of the new parameters. With generators, these new parameters are provided by the yield statement:
CREATE FUNCTION counting_and_more(more int) RETURNS int LANGUAGE python AS
$python$
"""
It counts, and it adds more.
"""
from Postgres import Stateful
@pytypes
@Stateful
def main(more):
i = 0
while True:
i = i + 1 + more
args = (yield i)
more = int(args[0])
$python$;
SELECT counting_and_more(j), counting_and_more(k) FROM
(VALUES (0, 1), (0, 1), (1, 10)) AS g(j,k);
-- counting_and_more | counting_and_more
---------------------+-------------------
-- 1 | 2
-- 2 | 4
-- 4 | 15
--(3 rows)
Stateful can be used with entry points in Trigger Returning Functions and Set Returning Functions. However, in the case of SRFs, it’s very unlikely to offer useful functionality as state is already managed. In the case of TRFs, it’s primarily useful in generic triggers where data and routines are specialized for the target relation can be easily re-used.
Note
DO-statements were introduced in PostgreSQL 9.0, earlier versions will not support this feature.
pg-python supports DO statements by executing the code in a temporary module object. Unlike functions, no entry points are used:
CREATE TEMP TABLE tmp (i int, t text);
DO $python$
ins = prepare("INSERT INTO tmp VALUES ($1, $2)")
with xact():
ins(1, 'A simple sentence.')
ins(2, 'Another simple sentence.')
$python$ LANGUAGE python;
SELECT * FROM tmp;
-- i | t
-----+--------------------------
-- 1 | A simple sentence.
-- 2 | Another simple sentence.
--(2 rows)
Inline execution is very useful for Python as it allows further initialization of the language’s environment without a custom function. In particular, the Postgres.preload function can be used from a DO-statement to make sure that specific procedures are ready for execution well before they are used.
All python SQL functions are Python modules. Once loaded, function modules exist in sys.modules as regular modules do. The primary differences between regular Python modules and function modules are that the function module’s source is stored inside the database and the presence of __func__, a Postgres.Function object, in the module’s dictionary. This object represents the Postgres function and provides the functionality needed to create the module object itself–it is also the module’s __loader__.
A function module’s attributes:
- __name__
- The function’s Oid as a string object. Used as the sys.modules key.
- __loader__
- The module’s PEP-302 loader.
- __func__
- The module’s Postgres.Function object.
- __file__
- The qualified regprocedure representation of the SQL function.
When a function is loaded, all of the necessary information is cached for re-use. The function module exists in sys.modules and it contains __func__. That Postgres.Function object contains all the information needed to properly construct the arguments upon execution and what to do with the returned object. However, this information can change. If a function is replaced or a dependency is modified after the function was loaded, the function needs to be completely reloaded to reflect these changes. Notably, altering a table whose type is referenced by a function will cause it to be reloaded. This can be problematic if the function’s module body initializes temporary tables or other session objects for subsequent use. If a function’s dependencies are subject to change, precautions may need to be taken in the module body in order to avoid errors caused by already initialized objects.
When parameters are given to a function or rows are returned by a prepared statement, Postgres.Object instances are used to represent the data. The only exception is that Python’s None is used to represent SQL NULLs. Some of these data objects have custom interfaces for convenient access to parts of the data. For example, Postgres.types.date objects have properties for getting the day, month, and year parts of the date.
Data objects can make use of the operators that exist in the database. Python does not support arbitrary operator methods, so only some operators are directly accessible from Python. However, the few methods it does support allows for date-time arithmetic, concatenation, and comparisons.
Note
Operators are greedy is the sense that they will implicitly convert a Python object to the type of the other operand before performing the operation. This only occurs when one of the operands is not a Postgres.Object.
Data objects are local to the backend context, and most Python modules are written to expect Python types. Data objects should be converted to Python types before passing them to most external functions. However, many Postgres interfaces are capable of taking data objects, so conversion is not a necessity. In situations where data objects are not wanted, the Postgres.pytypes decorator and the Postgres.iterpytypes processor can be used to handle many of the common conversion cases. These tools will convert Postgres integers and strings to Python longs and strings.
Arrays support the standard sequence interfaces provided by many of the built-in Python types. The notable restriction is that only steps of one are supported by the __getitem__ interface.
Arrays implement a specialized constructor that allows multi-dimension arrays to be built from balanced Python list objects. The constructor is very particular about array boundaries and requires that dimensions be represented using a list object. This requirement is made to allow the distinction between an array’s dimension and an element. By using Python list objects for dimension boundaries, other types–such as Python tuples–can be used to represent the elements of composite arrays:
from Postgres.types import int4
i4 = int4.Array([[1,2],[3,4]])
assert i4[0] == (1,2)
assert i4[1] == (3,4)
The Python object interface provided for arrays take on Python semantics. Index errors will be raised when requested items do not exist, and a subarray or element will be returned unless a slice is requested.
Postgres arrays are quite different from Python sequences. While the Python sequence interfaces are implemented to be consistent with the usual Python semantics, additional array interfaces are implemented to provide access methods that are consistent with the interfaces used in SQL:
from Postgres.types import int4
int4array = int4.Array
i = int4array([
[[1,2],[3,4]],
[[5,6],[7,8]],
])
# i[1][1][1] in SQL
assert i.sql_get_element((1,1,1)) == 1
# i[1][1][2] in SQL
assert i.sql_get_element((1,1,2)) == 2
# i[2][1][1] in SQL
assert i.sql_get_element((2,1,1)) == 5
The sql_get_element method doesn’t raise index errors when the boundary is crossed by an index. It is consistent with how single elements are accessed using indirection in SQL. None is returned in cases where the requested element does not exist.
See Postgres.Array for more information about the array interfaces.
Composite types are supported by subclasses of the Postgres.types.record type. Composite types are fairly interesting objects as they are both a sequence and a mapping. In the following examples, the composite type composite_test will be used to aid examples:
CREATE TYPE composite_test AS (i int, t text);
It can be bound in Python using regtype; subsequent examples will assume composite to have been bound:
import Postgres
from Postgres.types import regtype
composite = Postgres.Type(regtype('composite_test'))
Being a sequence and a mapping, records can be created from a sequence or a dictionary object. When a dictionary is used to provide the fields, absent attributes will be initialized to NULL. However, when a sequence is given, all attributes must be present or an exception will be raised:
from_dict = composite({'t': 't specified, but no i'})
from_seq = composite((1234, 't'))
The Postgres string representation is also acceptable:
from_str = composite('(1234,"t")')
Attributes of the records can be accessed by index or by name using Postgres.types.record.__getitem__:
r = composite((1234, 'text'))
assert r["i"] == r[0] == 1234
assert r["t"] == r[1] == 'text'
Like Python tuples, records are immutable. So, in order to get a record with different fields, a new record must be created. This can be achieved by instantiating the type with the new fields, or by using the record’s manipulation methods. There are two manipulation methods, replace and transform–both return a new record object:
x = composite((1,'test'))
# the keywords identify the attributes to replace
y = x.replace(t = 'newval')
assert y == ctype((1,'newval'))
Tip
These methods are particularly useful with Trigger Returning Functions.
See Postgres.types.record for more information.
Database access is provided by Postgres.Statement objects. Each created instance represents a single SQL statement that can take positional parameters and return rows.
Statement objects have multiple methods that can execute the statement. The simplest form of execution is performed by calling a statement object(__call__). When this execution method is used, the entire result set is returned as a Python list:
import Postgres
# prepare is a built-in pointing at Postgres.Statement
stmt = prepare("SELECT i FROM (VALUES (1), (2), (3)) AS g(i) WHERE i > $1")
# positional parameters are given as arguments
r = stmt(1)
Postgres.WARNING(str(list(map(tuple, r))))
# WARNING: [(int4('2'),), (int4('3'),)]
# CONTEXT: [__inline_executor__ while executing main]
Internally, a cursor is allocated when __call__ is used, but all the rows are fetched and stored in the list. Most execution methods will return an actual Postgres.Cursor object. The cursor that’s created will be configured to suite the request. When the rows execution method is used, the cursor returned can only be used as an iterable producing individual rows–the seek and read methods are not usable:
import Postgres
stmt = prepare("SELECT 1")
for x in stmt.rows():
Postgres.WARNING(str(x[0]))
The rows execution method should be used when the result set is being processed on a row-by-row basis. When a large result set is being produced it is particularly advantageous in comparison to __call__ as the result set is not allocated in a Python list.
Scrollable cursors can be created using the declare(...) execution method. This will return a cursor whose seek and read methods are usable. The cursor’s __next__ method is usable, but in order to guarantee exact positioning, no read-ahead is performed:
import Postgres
stmt = prepare("SELECT i FROM generate_series($1, $2) AS g(i)")
cur = stmt.declare(1, 100)
first10 = cur.read(10)
cur.seek(50)
last50 = cur.read()
Manipulation statements without a RETURNING clause do not return rows. Calling methods that return a cursor configured to return rows with such statements will result in an exception. When these kinds of statements need to be executed, __call__ and first should be used. In the case of first, only the row count is returned. For __call__, a tuple containing the command tag and the row count is returned:
import Postgres
sqlexec("CREATE TEMP TABLE simple_data (t text, i int)")
ins = prepare("INSERT INTO simple_data (t,i) VALUES ($1, $2)")
# __call__
cmdtag, rowcount = ins('text', 1)
# first
rowcount_from_first = ins.first('other text', 2)
Postgres.WARNING("%s %d rows" %(cmdtag, rowcount))
When DML statements need to be executed for a sequence of parameters, the load_rows and load_chunks methods can be used. These methods take an iterable that ultimately produces the arguments to execute the statement with. Upon exhausting the given iterator, these methods will return the total row count. Notably, these methods cannot be used with statements that return rows–an exception will be raised by either method if the statement returns rows. An example of load_rows:
import Postgres
sqlexec("CREATE TEMP TABLE load_rows_test (t text, i int)")
ins = prepare("INSERT INTO load_rows_test (t,i) VALUES ($1, $2)")
# __call__
count = ins.load_rows([
('first', 1),
('second', 2),
('third', 3),
])
Postgres.WARNING("%d rows" %(count,))
# WARNING: 3 rows
# CONTEXT: [__inline_executor__ while executing main]
Contents of the temporary table:
SELECT * FROM load_rows_test;
-- t | i
----------+---
-- first | 1
-- second | 2
-- third | 3
--(3 rows)
See Postgres.Statement and Postgres.Cursor for further information about the database access interfaces.
When a database error is caused by use of a Postres interface, a Postgres.Exception is raised. After a database error occurs, access to the database is prohibited until the failed transaction state is corrected. The transaction state cannot be corrected within the function unless the error occurs inside a Postgres.Transaction block:
CREATE TEMP TABLE pkviolation (s SERIAL PRIMARY KEY);
INSERT INTO pkviolation VALUES (DEFAULT);
DO $python$
import Postgres
insert_stmt = prepare("INSERT INTO pkviolation VALUES ($1)")
try:
with xact():
insert_stmt(1)
except Postgres.Exception as x:
if x.code != '23505':
raise
Postgres.WARNING("caught unique violation")
insert_stmt(12)
$python$ LANGUAGE python;
-- DO
SELECT * FROM pkviolation;
-- s
------
-- 1
-- 12
--(2 rows)
Postgres.Transaction context managers can only be used once, so xact should normally be directly invoked by the with-statement, with xact():.
Note
It is important that exceptions are caught outside of the transaction block. Attempting to interact with the database while in a failed transaction will result in another exception being raised.
Postgres errors can be raised from Python using Postgres.ereport. However, the Postgres.ereport wrapper is more convenient, so direct use of ereport is not recommended. This can be used to construct custom errors with more useful information than the more verbose Python exception output:
import Postgres
Postgres.ERROR(
message = 'application error message',
code = 'AP001',
# Don't include the usual PL context(traceback).
inhibit_pl_context = True,
)
By default, all exceptions include the traceback and additional context identifying the Postgres function that raised the exception. The ereport interface allows this information to be suppressed in order to provide the programmer with greater control over the reported error message.