How use GNU parallel and GNU SQL with \copy in PostgreSQL

parallel-processing postgresql sqlbulkcopy

Question

I want to do a bulk load to a PostgreSQL database, there are several files and are pretty big. I just read in Using GNU Parallel With split about the GNU Parallel and GNU SQL, and It looks fantastic, Could some one help me with an example of using GNU Parallel, GNU SQL and \copy or COPY for doing a bulk load to PostgreSQL?

Popular Answer

Both GNU parallel and pg_bulkload are cool, but not available in most default installations. I think this task can be achieved by using '&' (background sub-shell) and 'wait' in a shell script, which invokes multi \COPY operations simultaneously. Here is an example:

#!/bin/bash

PG_USER_NAME=bizusr
PG_DB_NAME=bizdb
BCP_DIR=/data/biz/bcp/input

do_bcp()
{
    TABLE_NAME=$1
    echo "`date` $$ copy $TABLE_NAME begin"
    psql -q -U $PG_USER_NAME -d $PG_DB_NAME << EOF
-- SET DATESTYLE TO 'ISO,YMD'; -- you may need this when dealing with timestamps
\COPY $TABLE_NAME FROM '${BCP_DIR}/${TABLE_NAME}.bcp' WITH (FORMAT CSV, DELIMITER '|');
EOF
    echo "`date` $$ copy $TABLE_NAME done"
}

echo "`date` $$ parallel copy started"
for BCP_FILE in `ls ${BCP_DIR}/*.bcp`; do
    TABLE_NAME=`echo $BCP_FILE|awk -F"/" '{print $NF}'|sed -e s/\.bcp$//`
    do_bcp $TABLE_NAME &
done

wait
echo "`date` $$ parallel copy finished"


Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why
Licensed under: CC-BY-SA with attribution
Not affiliated with Stack Overflow
Is this KB legal? Yes, learn why