Monthly Archives: August 2013

Checking for “Skew”

Shared Nothing / MPP Database Architecture
Greenplum database is a “Shared Nothing” database in which data is spread out across many segments located on multiple segment hosts. No two segments have the same data and data is distributed based on a distribution key.

For example, let’s say you have a customer table in Greenplum with the distribution key of (customer_id). Your Greenplum cluster has 16 segments so your data would actually be physically located like this:

Sement Customer ID
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 11
11 12
12 13
13 14
14 15
15 16

When a query is executed like “SELECT COUNT(*) FROM customer”, there will be 16 processes created (one for each segment) where the number of rows are counted in parallel across all segments. This is the design behind a Shared Nothing or Massively Parallel Processing (MPP) database.

Skew
So what is skew? In my customer table example, I have even distribution. That is, each segment has the same number of rows. If I distributed instead by state_code and all of my customers are in Tennessee, then all of the data would be in a single segment and thus be skewed.

Having tables that are skewed will slow down processing significantly because it isn’t taking advantage of the entire cluster.

Do I need to Check For Skew?
If you set your distribution key to the logical or physical Primary Key of the table, checking for skew isn’t needed. This will give you great distribution of data.

There are times when you don’t have a PK on a table or you want to set the distribution key to something else because you commonly join two very large tables together. For example, customer and orders tables may perform better if distributed by customer_id so that the data is collocated. However, if the majority of your orders are by one customer, you will have skew in the orders table.

Checking for Skew Method 1
This is the most common and straight forward way to check for skew.

SELECT gp_segment_id, COUNT(*)
FROM customer
GROUP BY gp_segment_id
ORDER BY 1;

This will count the rows per segment and give you a good indication of the skew. The downside to this is it physically counts the rows for this one table at at time. This takes time especially if you have a large number of tables.

Checking for Skew Method 2
I developed this second approach which looks at the file sizes for each table for each segment. It then will output only the tables that have at least one segment with more than 20% more bytes than expected.

Execute the following script in your database.

CREATE OR REPLACE FUNCTION fn_create_db_files() RETURNS void AS
$$
DECLARE 
        v_function_name text := 'fn_create_db_files';
        v_location int;
        v_sql text;
        v_db_oid text;
        v_num_segments numeric;
        v_skew_amount numeric;
BEGIN
        v_location := 1000;
        SELECT oid INTO v_db_oid 
        FROM pg_database 
        WHERE datname = current_database();

        v_location := 2000;
        v_sql := 'DROP VIEW IF EXISTS vw_file_skew';

        v_location := 2100;
        EXECUTE v_sql;
        
        v_location := 2200;
        v_sql := 'DROP EXTERNAL TABLE IF EXISTS db_files';

        v_location := 2300;
        EXECUTE v_sql;

        v_location := 3000;
        v_sql := 'CREATE EXTERNAL WEB TABLE db_files ' ||
                '(segment_id int, relfilenode text, filename text, ' ||
                'size numeric) ' ||
                'execute E''ls -l $GP_SEG_DATADIR/base/' || v_db_oid || 
                ' | ' ||
                'grep gpadmin | ' ||
                E'awk {''''print ENVIRON["GP_SEGMENT_ID"] "\\t" $9 "\\t" ' ||
                'ENVIRON["GP_SEG_DATADIR"] "/' || v_db_oid || 
                E'/" $9 "\\t" $5''''}'' on all ' || 'format ''text''';

        v_location := 3100;
        EXECUTE v_sql;

        v_location := 4000;
        SELECT count(*) INTO v_num_segments 
        FROM gp_segment_configuration 
        WHERE preferred_role = 'p' 
        AND content >= 0;

        v_location := 4100;
        v_skew_amount := 1.2*(1/v_num_segments);
        
        v_location := 4200;
        v_sql := 'CREATE OR REPLACE VIEW vw_file_skew AS ' ||
                 'SELECT schema_name, ' ||
                 'table_name, ' ||
                 'max(size)/sum(size) as largest_segment_percentage, ' ||
                 'sum(size) as total_size ' ||
                 'FROM	( ' ||
                 'SELECT n.nspname AS schema_name, ' ||
                 '      c.relname AS table_name, ' ||
                 '      sum(db.size) as size ' ||
                 '      FROM db_files db ' ||
                 '      JOIN pg_class c ON ' ||
                 '      split_part(db.relfilenode, ''.'', 1) = c.relfilenode ' ||
                 '      JOIN pg_namespace n ON c.relnamespace = n.oid ' ||
                 '      WHERE c.relkind = ''r'' ' ||
                 '      GROUP BY n.nspname, c.relname, db.segment_id ' ||
                 ') as sub ' ||
                 'GROUP BY schema_name, table_name ' ||
                 'HAVING sum(size) > 0 and max(size)/sum(size) > ' || 
                 v_skew_amount::text || ' ' || 
                 'ORDER BY largest_segment_percentage DESC, schema_name, ' ||
                 'table_name';

        v_location := 4300;
        EXECUTE v_sql; 

EXCEPTION
        WHEN OTHERS THEN
                RAISE EXCEPTION '(%:%:%)', v_function_name, v_location, sqlerrm;
END;
$$
language plpgsql;

Now execute the function so it will create the necessary objects in your database.

SELECT fn_create_db_files();

Now select from the view to see the tables that are skewed.

SELECT * FROM vw_file_skew ORDER BY 3 DESC;

Summary
When tuning a database, a Greenplum DBA will typically only need to worry with skew instead of tuning queries like other databases. Once the tables are created with good distribution keys, the tuning job is 90% or more done. If you think one particular table is not distributed well, use Method 1. If you need to validate an entire database quickly, use Method 2.

Autonomous Transactions for Logging v2

Here is another way that is fault tolerant to a segment host failure.

Step 1

DROP EXTERNAL TABLE IF EXISTS error_log;

CREATE WRITABLE EXTERNAL WEB TABLE error_log
(
function_name text,
location int,
error_message text
)
EXECUTE '/data/log_error.sh'
FORMAT 'CSV' DISTRIBUTED RANDOMLY;

Step 2
Create /data/log_error.sh on every segment host and be sure to chmod 755 the file once created.

read in

function_name=$(echo $in | awk '{split($0,arr,",")} END{print arr[1]}')
location=$(echo $in | awk '{split($0,arr,",")} END{print arr[2]}')
error_message=$(echo $in | awk '{split($0,arr,",")} END{print arr[3]}')

psql -h mdw -c "INSERT INTO my_errors values ('$function_name', $location, '$error_message')"

Note: I’ll admit that there is probably a more efficient way to parse $in but this works.

Step 3
Create the my_errors table.

DROP TABLE IF EXISTS my_errors;

CREATE TABLE my_errors
(
function_name text,
location int,
error_message text
)
DISTRIBUTED RANDOMLY;

Step 4
Insert some data to the error_log External Web Table that isn’t influenced by the transaction.

BEGIN;

INSERT INTO error_log VALUES ('fn_build_mart', 7500, 'ERROR:  division by zero');

ROLLBACK;

Step 5
Verify that the data is in the error_log table.

SELECT * FROM my_errors;