Dropping Corrupt Tables in Greenplum

Greenplum is a Massively Parallel Processing database which means the data is distributed across multiple segments on multiple hosts. Each segment process stores a catalog of objects and it is possible to get this out of sync because of hardware failures or killing processes. For example, if you are vacuuming a table and then decide to kill the process on the Master host, the segments may not get the signal properly and rollback the transaction. This can happen when using “kill -9” rather than just “kill” to stop a process.

You may get this error message if a table is corrupt:

gpadmin=# select * from corrupt;
ERROR:  relation with OID 109802 does not exist  (seg0 slice1 gpdbsne:40000 pid=78502)

So you decide to drop the table and start over but that doesn’t work either.

gpadmin=# drop table corrupt;
ERROR:  table "corrupt" does not exist  (seg0 gpdbsne:40000 pid=78502)

The easy fix for this is to use “if exists” in the drop command.

gpadmin=# drop table if exists corrupt;
DROP TABLE
Time: 10.423 ms

Now the table is gone and you can recreate it. Next time, don’t use kill -9 and don’t kill vacuum commands.

More Details
If you are wondering how I managed to corrupt this table, here are the steps.
1. Create the table

gpadmin=# create table corrupt (id int, foo text, bar text) distributed by (id);
CREATE TABLE
Time: 31.195 ms

2. Insert some data

insert into corrupt select i as id, 'foo_' || i, 'bar_' || i from generate_series(1,100) as i;
INSERT 0 100
Time: 46.419 ms
gpadmin=# analyze corrupt;
ANALYZE
Time: 108.193 ms
gpadmin=# \q

3. Connect to a segment directly. I’m using a single node for this test.

[gpadmin@gpdbsne ~]$ ps -ef | grep postgres | grep M
gpadmin   78213      1  0 09:49 ?        00:00:00 /usr/local/greenplum-db-4.3.13.0/bin/postgres -D /data/primary/gpseg1 -p 40001 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
gpadmin   78214      1  0 09:49 ?        00:00:00 /usr/local/greenplum-db-4.3.13.0/bin/postgres -D /data/primary/gpseg0 -p 40000 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
gpadmin   78239      1  0 09:49 ?        00:00:00 /usr/local/greenplum-db-4.3.13.0/bin/postgres -D /data/master/gpseg-1 -p 5432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E

I see that the segments are using ports 40,000 and 40,001 while the master is using 5432. I’ll connect to seg0 on port 40,000 and then drop the table. This only drops the object in this one segment. I’ll also do a quick query to make sure I only see half of the data because I have 2 segments.

PGOPTIONS='-c gp_session_role=utility' psql -p 40000
psql (8.2.15)
Type "help" for help.

gpadmin=# select count(*) from corrupt;
 count 
-------
    50
(1 row)

Time: 3.485 ms

4. Corrupt the database by dropping the table only on one segment.

drop table corrupt;
DROP TABLE
Time: 9.410 ms
gpadmin=# \q

5. Confirm that the table is corrupt.

[gpadmin@gpdbsne ~]$ psql
Timing is on.
psql (8.2.15)
Type "help" for help.

gpadmin=# select * from corrupt;
ERROR:  relation with OID 109802 does not exist  (seg0 slice1 gpdbsne:40000 pid=78502)

And again, the fix:

gpadmin=# drop table if exists corrupt;
DROP TABLE
Time: 10.423 ms

Leave a Reply

Your email address will not be published. Required fields are marked *