rwxr--r--
/dev/blog

Bez Hermoso, Software Engineer @ Square

I had a CSV of bank transactions and I wanted to append a running total on each entry. Here is how I achieved it using only awk.

This is what the CSV file looks like:

Posted Date,Reference Number,Payee,Address,Amount
04/25/2017,<TRANSACTION ID>,"SEAFOOD CITY SUPERM SOUTH SAN FRACA","SOUTH SAN FRA CA ",-1.77
04/25/2017,<TRANSACTION ID>,"SAFEWAY STORE 00028787 MILBRAE CA","MILBRAE       CA ",-5.67
04/24/2017,<TRANSACTION ID>,"SQ *EGGETTES-MILLBRAE Millbrae CA","Millbrae      CA ",-23.00
04/24/2017,<TRANSACTION ID>,"SEAFOOD CITY SUPERM SOUTH SAN FRACA","SOUTH SAN FRA CA ",-23.13
04/24/2017,<TRANSACTION ID>,"WALGREENS #2939 SAN BRUNO CA","SAN BRUNO     CA ",-2.04
04/24/2017,<TRANSACTION ID>,"REDBOX *DVD RENTAL 866-733-2693 IL","866-733-2693  IL ",-5.44

...

Here is the command:

 
$ awk -F, 'BEGIN { sum=0 } NR>1 { sum+=$5; print $0 "," sum }' transactions.csv \
  | tee transactions-with-running-sum.csv \
  | column -s'","' -t

We pass awk’s output to tee so that we can stream the output to both a file and standard output. Here, the altered CSV data is saved as transactions-with-running-sum.csv, while we display a better-looking version to our terminal:

 
04/25/2017  <TRANSACTION_ID>  SEAFOOD CITY SUPERM SOUTH SAN FRACA       SOUTH SAN FRA CA   -1.77    -1.77
04/25/2017  <TRANSACTION_ID>  SAFEWAY STORE 00028787 MILBRAE CA         MILBRAE       CA   -5.67    -7.44
04/24/2017  <TRANSACTION_ID>  SQ *EGGETTES-MILLBRAE Millbrae CA         Millbrae      CA   -23.00   -30.44
04/24/2017  <TRANSACTION_ID>  SEAFOOD CITY SUPERM SOUTH SAN FRACA       SOUTH SAN FRA CA   -23.13   -53.57
...

See man awk, man tee, and man column

comments powered by Disqus