Giving zcat more CPUs

Troubleshooting for me generally starts with parsing logs and what better way than using a combination of zcat and grep. Here is my most recent example:

zcat messages*.gz | grep “Username = whoisit” | grep “Duration:” >/tmp/matches

This works well on the old single cpu servers, but on a multi-cpu hyperthreading server zcat only consumes one CPU to 100% percent.

top - 12:44:33 up 110 days, 20:03,  2 users,  load average: 1.06, 1.17, 1.16
Tasks: 169 total,   2 running, 167 sleeping,   0 stopped,   0 zombie
Cpu(s): 17.2%us,  2.6%sy,  0.0%ni, 76.6%id,  3.6%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1921968k total,  1846908k used,    75060k free,     5304k buffers
Swap:  4193272k total,        0k used,  4193272k free,  1659364k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
29319 root      20   0  4428  616  432 R 99.7  0.0   0:10.84 gzip
29320 root      20   0  100m  892  748 S 19.3  0.0   0:02.06 grep

A number of web pages suggest to use pigz however if you look closely at the documentation it can parallelize only for compression tasks. For decompression its quite useless.

Decompression can’t be parallelized, at least not without specially 
prepared deflate streams for that purpose. As a result, pigz uses a 
single thread (the main thread)  for  decompression,  but  will  create
three  other threads  for  reading,  writing,  and  check  calculation, 
which can speed up decompression under some circumstances.  Parallel 
decompression can be turned off by specifying one process ( -dp 1 or 
-tp 1 ).

This is where gnu-parallels steps up.  Its basically like running a “for” loop and assigning one file to each process on its own cpu.

You can install gnu-parallels as shown below otherwise you can select from a number of excuses for not installing GNU Parallels.

(wget pi.dk/3 -qO - ||  curl pi.dk/3/) | bash

This requires you allow pgpkeyserver TCP 11371  port outbound on your internet connection so that it can check the signed image.

Once parallels is installed your command becomes:

ls mess*.gz | parallel -k 'zcat {}|grep "Username = whoisit" | grep "Duration:"' >/tmp/match

When you look at the top command its got the awesome goodness of 6 CPUs.

top - 13:49:49 up 110 days, 21:08,  2 users,  load average: 7.70, 7.30, 4.98
Tasks: 186 total,   8 running, 178 sleeping,   0 stopped,   0 zombie
Cpu(s): 91.0%us,  7.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  1.2%si,  0.0%st
Mem:   1921968k total,  1847476k used,    74492k free,     3400k buffers
Swap:  4193272k total,      176k used,  4193096k free,  1650776k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31654 root      20   0  4424  608  432 R 89.4  0.0   0:57.74 gzip
31657 root      20   0  4424  608  432 R 89.1  0.0   0:57.16 gzip
31648 root      20   0  4424  608  432 R 88.4  0.0   1:05.91 gzip
31666 root      20   0  4424  612  432 R 88.4  0.0   0:17.95 gzip
31660 root      20   0  4424  608  432 R 87.4  0.0   0:53.46 gzip
31651 root      20   0  4424  608  432 R 87.1  0.0   1:03.67 gzip
31649 root      20   0  100m  896  752 S 10.6  0.0   0:07.86 grep
31655 root      20   0  100m  892  752 S 10.6  0.0   0:06.72 grep
31658 root      20   0  100m  896  752 S 10.6  0.0   0:06.70 grep
31667 root      20   0  100m  896  752 S 10.6  0.0   0:02.10 grep
31661 root      20   0  100m  896  752 S 10.3  0.0   0:06.40 grep
31652 root      20   0  100m  896  752 S 10.0  0.0   0:07.58 grep

 

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s