Say, what’s this blog thing here?

Oh yeah, I have a blog. Well, we’ll see what happens, but I need a place to stash a few things that keep getting lost in my bash histories across various servers.

For starters, here’s quick find line that will get everything in a directory that’s

a) older than a month and
b) not from the 1st of the month

find /directory/subdirectory/* -maxdepth 1 -mtime +30 -ls | awk '$9 != /1/ { print $11 }' | parallel rm -rf {}

Now, why would you want to do this? I did because one of my dbas had been doing full backups nightly to our primary storage for years without trimming his backups. So we had terabytes of old (largely unimportant) stuff on relatively expensive and scarce storage. I talked to him, and he asked me to keep the data per the parameters above. Our company retention policy for backups is 30 days, hence the very short window.

So let’s break this down. The find command is a little tricky in and of itself. First you want to do a find across multiple directories, but not go too deep. That’s the /directory/subdirectory/* bit, with a maxdepth of 1, with a modification time of more than 30 days. Then list that. That looks like this:

[root@server ~]# find /mnt/mysql/* -maxdepth 1 -mtime +30 -ls
234199799 35514 -rw-r--r-- 1 root root 30984020 May 27 2013 /mnt/mysql/mysql2/clustrix-job_manager-201305270010.tar.gz
234181155 2690 -rw-r--r-- 1 root root 2187282 May 25 2013 /mnt/mysql/mysql2/wip-201305250100.tar.gz
354760343 98 -rw-r--r-- 1 root root 27568 Jun 15 2013 /mnt/mysql/mysql2/clustrix-probes-201306150011.tar.gz
257679475 26 -rw-r--r-- 1 root root 1374 Jun 2 2013 /mnt/mysql/mysql2/clustrix-gbrowse_login-201306020001.tar.gz
395629190 45898 -rw-r--r-- 1 root root 31066606 Jun 22 2013 /mnt/mysql/mysql2/clustrix-job_manager-201306220010.tar.gz
357046308 8874 -rw-r--r-- 1 root root 5904716 Jun 12 2013 /mnt/mysql/mysql2/clustrix-ror-201306120009.tar.gz
259828901 26 -rw-r--r-- 1 root root 6290 Jun 2 2013 /mnt/mysql/mysql2/clustrix-smith_lemur-201306020011.tar.gz
354511053 242 -rw-r--r-- 1 root root 75127 Jun 16 2013 /mnt/mysql/mysql2/clustrix-parts-201306160004.tar.gz
356484133 730 -rw-r--r-- 1 root root 449046 Jun 18 2013 /mnt/mysql/mysql2/clustrix-mad-201306180013.tar.gz
498553872 134570 -rw-r--r-- 1 root root 117480405 Mar 30 2012 /mnt/mysql/wiki-db/wiki-201203300855.tar.gz
3097297952 198146 -rw-r--r-- 1 root root 173105695 Nov 2 2013 /mnt/mysql/wiki-db/wiki-201311020855.tar.gz
244561447 167874 -rw-r--r-- 1 root root 114123533 Feb 25 2012 /mnt/mysql/wiki-db/wiki-201202250855.tar.gz
356159441 190722 -rw-r--r-- 1 root root 166592128 Jul 15 2013 /mnt/mysql/wiki-db/wiki-201307150855.tar.gz
4132816213 188730 -rw-r--r-- 1 root root 128309505 Oct 3 2012 /mnt/mysql/wiki-db/wiki-201210030855.tar.gz
2347163665 253386 -rw-r--r-- 1 root root 172236350 Oct 18 2013 /mnt/mysql/wiki-db/wiki-201310180855.tar.gz
636357288 204698 -rw-r--r-- 1 root root 166829424 Jul 22 2013 /mnt/mysql/wiki-db/wiki-201307220855.tar.gz
3504126126 155002 -rw-r--r-- 1 root root 135339049 Apr 4 2013 /mnt/mysql/wiki-db/wiki-201304042055.tar.gz

At that point, we pipe it into the awk statement, which does a search against the 9th field and makes sure it’s not 1 (and only 1, ie not 17 or 21), and then prints the 11th field. So you get something like this:

[root@server ~]# find /mnt/mysql/* -maxdepth 1 -mtime +30 -ls | awk '$9 != /1/ { print $11 }'

Now, why do that whole /directory/subdirectory/* thing? It’s for parallel. If you’re not familiar with parallel, go ahead and install it and read the man page. Yeah, I’m that kind of unix admin. Well, sometimes. Basically, it takes an input and multithreads it. It’s quite useful (if dangerous) with rsync, rm, and other commands of that ilk.

So what this one does is it parallelizes over 500 threads (the -P 500) the command (rm -rf) against the input from the pipeline ({}). I tend to use a very high threadcount, because what I often run into is that it’ll torch the easy stuff quickly and then chug along on the 2-10 highly nested directories. That way, I don’t have to wait for all the easy (ie, not very nested, relatively large file) stuff to go through before it starts working on the highly nested tiny file stuff. I’m usually running this on a 16 core machine, for reference. YMMV.

Please note that I do NOT attach that rm -rf {} bit before I’ve tested the first two parts of the command!



About kcarlile
Twitter: @overclockdlemon

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: