Greenplum Single Node Installation

Step 1
Download a CentOS 6 VM from http://virtual-machine.org/.

Step 2
Download the latest Greenplum binaries for RedHat Enterprise Linux 6 from http://network.pivotal.io
GPDB Download

Step 3
Start the Virtual Machine with VMWare Fusion or something similar.
Memory: 8GB
Cores: 4
Disk: 50GB
You can use less memory and cores but the more you provide the VM, the better it will perform. You might have to expand the VM disk space when using the VirtualMachine.org VM.

Step 4
Configure the operating system.

hostname gpdbsne

Add the hostname to /etc/sysconfig/network too.

Turn off firewalls.

chkconfig iptables off
service iptables stop
echo 0 >/selinux/enforce
vi /etc/selinux/config
SELINUX=disabled
setenforce 0

Edit the /etc/hosts file.

echo "127.0.0.1 gpdbsne gpdbsne.localdomain" >> /etc/hosts

I also like to get the ip address for this host and add it to my local /etc/hosts file.

ifconfig

Install unzip, ed, ntp and ssh.

yum install ntp
yum install unzip
yum install openssh-clients
yum install ed
chkconfig ntpd on
ntpdate pool.ntp.org
/etc/init.d/ntpd start

Add the following to the end of your /etc/sysctl.conf file.

kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 0

Add this to your /etc/security/limits.conf file.

* soft nofile 65536
* hard nofile 65536
* soft nproc 131072
* hard nproc 131072

Remove all lines from /etc/security/limits.d/90-nproc.conf

echo "" > /etc/security/limits.d/90-nproc.conf

There are some other configuration changes you make on a real cluster that involves XFS filesystem but for a SNE, this can be skipped. This is just intended for Development and Testing purposes.

Restart the VM so these changes take affect.

shutdown -r now

Step 5
Copy the installer to the VM.

scp greenplum-db-4.3.6.2-build-1-RHEL5-x86_64.zip root@gpdbsne:/root/

Step 6
ssh to the VM and run the installer.

ssh root@gpdbsne
unzip greenplum-db-4.3.6.2-build-1-RHEL5-x86_64.zip
/bin/bash greenplum-db-4.3.6.2-build-1-RHEL5-x86_64.bin

--Accept the license agreement
--Accept default installation directory

Step 7
For a multi-node cluster, the next step is to use gpseginstall but this isn’t needed with a single node installation. Instead, you have to manually create the gpadmin account and get the cluster ready for the next step.

useradd gpadmin
passwd gpadmin
chown -R gpadmin:gpadmin /usr/local/greenplum-db-4.3.6.2/
mkdir -p /data/master
mkdir /data/primary
chown -R gpadmin:gpadmin /data
su - gpadmin
echo "source /usr/local/greenplum-db/greenplum_path.sh" >> .bashrc
echo "export MASTER_DATA_DIRECTORY=/data/master/gpseg-1" >> .bashrc
source .bashrc
echo "gpdbsne" > hostfile
gpssh-exkeys -f hostfile

Step 8
Create an initialize file called gp_init_config so you can initialize the database.

ARRAY_NAME="Greenplum"
MACHINE_LIST_FILE=./hostfile
SEG_PREFIX=gpseg
PORT_BASE=50000
declare -a DATA_DIRECTORY=(/data/primary /data/primary )
MASTER_HOSTNAME=gpdbsne
MASTER_DIRECTORY=/data/master
MASTER_PORT=5432
TRUSTED_SHELL=ssh
CHECK_POINT_SEGMENTS=8
ENCODING=UNICODE

Step 9
Initialize the database.

gpinitsystem -c ~/gp_init_config
--select Y to continue

Now create the default database and configure the database to allow external connections.

psql -c "create database gpadmin" template1
psql -c "alter user gpadmin password 'changeme'"
echo "host all all 0.0.0.0/0 md5" >> /data/master/gpseg-1/pg_hba.conf
gpstop -u

Complete!
Now you can connect to Greenplum with the gpadmin account and the password is changeme. The default port is 5432. A great client tool is pgAdmin III v1.14.3.
http://www.postgresql.org/ftp/pgadmin3/release/v1.14.3/

5 thoughts on “Greenplum Single Node Installation

  1. Suraj

    I have 3 VMs and planning to have one as master and 2 segment host.
    What are the additional steps that would be required to have install Greenplum on Multi-node VM machine?

    Reply
    1. Jon Post author

      The install guide has detailed instructions on how to install a multi-node cluster but it doesn’t cover a single node installation. That is why I created this blog post to cover just a single node.

      So look at the install guide but here is a quick list of things you’ll need to change:
      – Follow the XFS recommendations in the install guide.
      – Step 4 needs to be done on all nodes.
      – The /etc/hosts file should have the real IP address, not 127.0.0.1 and it should contain all hosts in the cluster.
      – The /data/primary directories need to be created on the segment hosts and /data/master on the master node.
      – The hostfile in Step 7 should contain all segment hosts in the cluster.
      – You may want to change step 8 to have more data directories. One directory equals one segment per host.
      – Step 9 is run on the master only.

      Reply
  2. Suraj

    I’m getting below error while executing command during installation
    gpssh-exkeys -f hostfile

    Traceback (most recent call last):
    File “/usr/local/greenplum-db/./bin/gpssh-exkeys”, line 525, in
    (primary, aliases, ipaddrs) = socket.gethostbyaddr(hostname)
    socket.gaierror: [Errno -2] Name or service not known

    Below is the content of /etc/hosts on master machine

    127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

    I’m installation on cluster having 2 nodes.
    hostfile content :
    -master
    -segment1
    -segment2

    Reply
    1. Jon Post author

      I worked with Suraj offline to resolve the issue. The problem was name resolution. In general, make sure every node can communicate with each other. The best way to do that is to create a hosts file that is used on all nodes and placed in /etc/.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *