I want to do a bulk load to a PostgreSQL database, there are several files and are pretty big. I just read in Using GNU Parallel With split about the GNU Parallel
and GNU SQL
, and It looks fantastic, Could some one help me with an example of using GNU Parallel
, GNU SQL
and \copy
or COPY
for doing a bulk load to PostgreSQL?
Both GNU parallel and pg_bulkload are cool, but not available in most default installations. I think this task can be achieved by using '&' (background sub-shell) and 'wait' in a shell script, which invokes multi \COPY operations simultaneously. Here is an example:
#!/bin/bash
PG_USER_NAME=bizusr
PG_DB_NAME=bizdb
BCP_DIR=/data/biz/bcp/input
do_bcp()
{
TABLE_NAME=$1
echo "`date` $$ copy $TABLE_NAME begin"
psql -q -U $PG_USER_NAME -d $PG_DB_NAME << EOF
-- SET DATESTYLE TO 'ISO,YMD'; -- you may need this when dealing with timestamps
\COPY $TABLE_NAME FROM '${BCP_DIR}/${TABLE_NAME}.bcp' WITH (FORMAT CSV, DELIMITER '|');
EOF
echo "`date` $$ copy $TABLE_NAME done"
}
echo "`date` $$ parallel copy started"
for BCP_FILE in `ls ${BCP_DIR}/*.bcp`; do
TABLE_NAME=`echo $BCP_FILE|awk -F"/" '{print $NF}'|sed -e s/\.bcp$//`
do_bcp $TABLE_NAME &
done
wait
echo "`date` $$ parallel copy finished"