How Do I Determine The Language For A Function?

This is the third in a series of “How do I do X in Greenplum database” and today’s topic is How do I determine the language for a function.

Inline SQL
A common way to use a function is to transform data directly in the SELECT portion of a query. Here is a quick example.

CREATE TABLE person
(id int NOT NULL,
 fname text,
 lname text,
 company_name text)
DISTRIBUTED BY (id);

--sample data
INSERT INTO person VALUES (1, 'Jon', 'Roberts', null);
INSERT INTO person VALUES (2, null, 'Smith', null);
INSERT INTO person VALUES (3, null, null, 'ABC Company');

Now the use case is to retrieve the “name” of the person but it might be a combination of the three text columns. A business decision was made to do format the name as follows:

SELECT CASE WHEN company_name IS NULL AND fname IS NOT NULL AND lname IS NOT NULL THEN lname || ', ' || fname
            WHEN fname IS NULL AND lname IS NOT NULL THEN lname
            WHEN company_name IS NOT NULL AND fname IS NOT NULL AND lname IS NOT NULL THEN company_name || ': ' || lname || ', ' || fname
            WHEN company_name IS NOT NULL AND fname IS NULL AND lname IS NOT NULL THEN company_name || ': ' || lname 
            WHEN company_name IS NOT NULL AND fname IS NULL AND lname IS  NULL THEN company_name END as name
FROM person;

     name     
--------------
 Roberts, Jon
 ABC Company
 Smith
(3 rows)

It would be much easier and consistent to add this logic to a function to be executed so it centralizes the logic to a single place and makes writing SQL a bit easier. This is when using the language “SQL” is preferred. Here is an example and notice that parameters are referred to $1, $2, $3 rather than the name of the parameter. The SQL language does not support named variables.

CREATE OR REPLACE function fn_get_name(p_fname text, p_lname text, p_company_name text) RETURNS text AS
$$
SELECT CASE WHEN $3 IS NULL AND $1 IS NOT NULL AND $2 IS NOT NULL THEN $2 || ', ' || $1
            WHEN $1 IS NULL AND $2 IS NOT NULL THEN $2
            WHEN $3 IS NOT NULL AND $1 IS NOT NULL AND $2 IS NOT NULL THEN $3 || ': ' || $2 || ', ' || $1
            WHEN $3 IS NOT NULL AND $1 IS NULL AND $2 IS NOT NULL THEN $3 || ': ' || $2 
            WHEN $3 IS NOT NULL AND $1 IS NULL AND $2 IS  NULL THEN $3 END as name
$$
LANGUAGE sql;

And now using this function.

SELECT fn_get_name(fname, lname, company_name) AS name FROM person;
     name     
--------------
 Roberts, Jon
 ABC Company
 Smith
(3 rows)

Transformation
The next use case is when transforming data and PL/pgSQL does a great job at this. You can reference parmeters by name, declare variables, handle errors, etc. The code looks a lot like Oracle PL/SQL too.

CREATE OR REPLACE FUNCTION fn_update_person() RETURNS void AS
$$
DECLARE
    v_rowcount int;
BEGIN
    UPDATE person
    SET fname = initcap(fname),
        lname = initcap(lname),
        company_name = initcap(company_name);

    GET DIAGNOSTICS v_rowcount = ROW_COUNT;

    RAISE INFO '% Rows Updated.', v_rowcount;

END;
$$
LANGUAGE plpgsql VOLATILE;

Notice how I’m not looping through the data or issuing a commit after so many records. There are two reasons for this. First, Greenplum is designed for Big Data so manipulating lots of data in a single transaction is ideal. Secondly, a function in Greenplum is executed in a single transaction. You can’t create a transaction within a function because executing a function is by default in a transaction.

Summary
I hope this helps. When transforming data inline, use the SQL language and use the more robust PL/pgSQL language when transforming data.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.