Here are some tips for configuring Pivotal HDB (based on Apache HAWQ) with Ambari.
A Hadoop cluster typically is configured with JBOD so utilize all data disks for temp space.
Here is an example of the “HAWQ Master Temp Directories” entry when the Master and Standby nodes each have 8 disks:
Here is an example of the “HAWQ Segment Temp Directories” entry when each Data Node has 8 disks:
VM Overcommit set to 2
VM Overcommit Ratio
2GB – 64GB: set the Overcommit Ratio to 50
>= 64GB of RAM: set the Overcommit Ratio to 100
2GB – 8GB: set swap space equal to RAM
8GB – 64GB: set swap space to 0.5 * RAM
>= 64GB: set swap space to 4GB
Segment Memory Usage Limit
Step 1: Calculate total memory (RAM * overcommit_ratio_percentage + SWAP)
Step 2: Calculate total memory used by other activities (2GB for OS, 2GB for Data Node, 2GB for Node Manager, 1GB for PXF)
Step 3: Subtract other memory from total memory to get the value for the Segment Memory Usage Limit
Overcommit Ratio: 100
Using Yarn: ((256 * 1) + 4) – 7 = 253
Using Default Resource Manager: (256 * 1) – 7 = 249
Overcommit Ratio: 50
Using Yarn: ((64 * 0.5) + 32) – 7 = 57
Using Default Resource Manager: (64 * 0.5) – 7 = 57
ipc.client.connect.timeout = 300000
ipc.client.connection.maxidletime = 3600000
Optional HAWQ hawq-site.xml
hawq_rm_stmt_vseg_memory = 1gb
By default, this is set to 128mb which is great for a high level of concurrency. If you need to utilize more memory in the cluster for each query, you can increase this value considerably. Here are the acceptable values:
128mb, 256mb, 512mb, 1gb, 2gb, 4gb, 8gb, 16gb
Alternatively, you can set this at the session level instead of the entire database.
Operating System gpadmin account
Log into the Master and Standby nodes and execute the following:
echo "source /usr/local/hawq/greenplum_path.sh" >> ~/.bashrc
Now set the database password. Below, I am using ‘password’ as the password so set this based on your organization’s password policy. By default, gpadmin doesn’t have a password set at all.
psql -c "alter user gpadmin password 'password'"
Enable encrypted password authentication. This assumes you are using the default /data/hawq/master path. Adjust if needed. This allows you to connect to the database remotely with an encrypted password.
echo "host all all 0.0.0.0/0 md5" >> /data/hawq/master/pg_hba.conf hawq stop cluster -u -a