Breaking Down the Monster III

So, finishing this off.

It-sa bunch-a case lines!

Write first:

 

echo $1 $2 "filesize: "$3 "totalsize: "$4"G" "filesperdir: "$5
case $1 in
	write)
        if [ $2 = scality ]; then
            filecount=$totfilecount
            time scalitywrite
            exit 0
        fi
        

So if it’s a Scality (or other pure object storage), it’s simple. Just run the write and time it, which will output the info you need. OTHERWISE…

#Chunk file groups into folders if count is too high
	if [ $totfilecount -ge 10000 ]; then
	    for dir in `seq 1 $foldercount`; do
	        createdir $fspath/$dir
	    done
	    time for dir in `seq 1 $foldercount`; do
	        path=$fspath/$dir
		filecount=$(( $totfilecount / $foldercount ))
	        writefiles
	    done
	else
	    path=$fspath
            createdir $path
            filecount=$totfilecount
            time writefiles
	fi
	;;

 

Do what the comment says. Chunk the files into folders, since if you write to a filesystem, count of files in directories makes a big difference. . Make sure you create the directories before you try to write to them… and then time how long it takes to write all of them. If it’s less than the critical file count number, then just write them and time it.

Neeeext….

 

read) #in order read
	sync; echo 1 > /proc/sys/vm/drop_caches
        if [ $2 = scality ]; then
            filecount=$totfilecount
            time scalityread
            exit 0
        fi
	if [ $totfilecount -ge 10000 ]; then
		time for dir in `seq 1 $foldercount`; do
			path=$fspath/$dir
			filecount=$(( $totfilecount / $foldercount ))
			readfiles
		done
	else
		path=$fspath
		filecount=$totfilecount
		time readfiles
	fi
	;;

That sync line is how you clear the filesystem cache (as root) on a Linux system. This is important for benchmarking, because let me tell you, 6.4GB/sec is not a speed that most network storage systems can reach. Again, we split it and time all of the reads, or we just straight up time the reads if the file count is low enough. This routine reads files in the order they were written.

 

	rm) #serial remove files
        if [ $2 = scality ]; then
            time for i in `seq 1 $totfilecount`; do
                curl -s -X DELETE http://localhost:81/proxy/bparc/$fspath/$i-$suffix > /dev/null
            done
            exit 0
        fi
		if [ $totfilecount -ge 10000 ]; then
			time for i in `seq 1 $foldercount`; do
				rm -f $fspath/$i/*-$suffix
				rmdir $fspath/$i
			done
		elif [ -d $fspath/$3 ]; then 
			time rm -f $fspath/*-$suffix
		fi
	;;

Similar to the other two routines, if it’s an object based, do something completely different, otherwise remove based on file path and count of files.

 

	parrm) #parallel remove files
		time ls $fspath | parallel -N 64 rm -rf $fspath/{}
	;;

This one is remarkably simple. Just run parallel against an ls of the top level directory, and pipe it into rm -rf. The {} is stdin for parallel. The -N 64 is number of threads to run.

 

This one’s kind of neat:

	shufread) #shuffled read
		sync; echo 1 > /proc/sys/vm/drop_caches
		if [ $totfilecount -ge 10000 ]; then
			folderarray=(`shuf -i 1-$foldercount`)
			time for dir in ${folderarray[*]}; do
				path=$fspath/$dir
				filecount=$(( $totfilecount / $foldercount ))
				shufreadfiles
			done
		else
			path=$fspath
			filecount=$totfilecount
			time shufreadfiles
		fi
	;;
	

I needed a way to do random reads over the files I’d written, in order to simulate that on filesystems with little caching (ie, make the drives do a lot of random seeks.)

	shufread) #shuffled read
		sync; echo 1 > /proc/sys/vm/drop_caches
		if [ $totfilecount -ge 10000 ]; then
			folderarray=(`shuf -i 1-$foldercount`)
			time for dir in ${folderarray[*]}; do
				path=$fspath/$dir
				filecount=$(( $totfilecount / $foldercount ))
				shufreadfiles
			done
		else
			path=$fspath
			filecount=$totfilecount
			time shufreadfiles
		fi
	;;
	

At first, I tried writing the file paths to a file, then reading that, but that has waaaay too much latency when you’re doing performance testing. So, after some digging, I found the shuf command, which shuffles a list. You can pass an arbitrary list with the -i flag. I tossed this all into an array, and then it proceeds like the read section.

 

	*) usage && exit 1;;
esac
echo '------------------------'

Fairly self explanatory. I tossed an echo with some characters in to keep the output clean if you’re running the command inside a for loop.

And that’s it!

Advertisements

Breaking down that monster

Or should I use Beast? No, this isn’t an XtremIO. (sorry, I just got back from EMCWorld 2015. The marketing gobbledygook is still strong in me.)

So, first part of the script, like many others, is a function (cleverly called usage), followed by the snippet that calls the function:


usage () {
	echo "Command syntax is $(basename $0) [write|read|shufread|rm|parrm] [test|tier1|tier2|gpfs|localscratch|localssd|object]"
        echo "[filesizeG|M|K] [totalsize in GB] (optional) [file count per directory] (optional)"
}

if [ "$#" -lt 3 ]; then
	usage
	exit 1
fi

Not much to see here if you already know what functions are and how they’re formatted in bash. Basically, if it starts with () { and then is closed with }, it’s a function, and you can call it like a script inside the main script. The code is not executed until it is called by name. You can even pass it input variables–more on that later.

Next, we come to a case block:


case $2 in
	test) fspath=/mnt/dmtest/scicomp/scicompsys/ddcompare/$3 ;;
	tier1) fspath=/mnt/node-64-dm11/ddcompare/$3 ;;
	tier2) fspath=/mnt/node-64-tier2/ddcompare/$3 ;;
	gpfs) fspath=/gpfs1/nlsata/ddcompare/$3 ;;
        localscratch) fspath=/scratch/carlilek/ddcompare/$3 ;;
        localssd) fspath=/ssd/ddcompare/$3 ;;
        object) fspath=/srttest/ddcompare/$3 ;;
	*) usage && exit 1;;
esac

This checks the second variable and sets the base path to be used in the testing. Note that object will be used differently than the rest, because all of the rest are file storage paths. Object ain’t.

Then, we set the size of the files (or objects) to be written, read, or deleted:


case $3 in
	*G) filesize=$(( 1024 * 1024 * `echo $3 | tr -d G`));;
	*M) filesize=$(( 1024 * `echo $3 | tr -d M` ));;
	*K) filesize=`echo $3 | tr -d K`;;
	*) usage && exit 1;;
esac

Note that I should probably be using the newer call out to command style of $( ) here, rather than backticks. I’ll get around to it at some point.

The bizarre $(( blah op blah )) setup is how you do math in bash. Really.

The next few bits are all prepping how many files to write to a given subdirectory, how big the files are, etc.


#set the suffix for file names
suffix=$3

#set the total size of the test set
if [ ! -z $4 ]; then
	totalsize=$(( 1024 * 1024 * $4 ))
else
	totalsize=52428800 #The size of the test set in kb
fi
	
#set the number of files in subdirectories
if [ ! -z $5 ]; then
	filesperdir=$5
else
	filesperdir=5120 #Number of subdirs to use for large file counts
fi

#set up variables for dd commands
if [ $filesize -ge 1024 ]; then
	blocksize=1048576
else
	blocksize=$(( $filesize * 1024 ))
fi

#set up variables for subdirectories
totfilecount=$(( $totalsize / $filesize ))
blockcount=$(( $filesize * 1024 / $blocksize ))
if [ $filesperdir -le $totfilecount ]; then
	foldercount=$(( $totfilecount / $filesperdir ))
fi

OK, I’ll get into the meat of the code in my next post. But I’m done now.

The first of several benchmarking scripts

I’m currently a file storage administrator, specializing in EMC Isilon. We have a rather large install (~60 heterogeneous nodes, ~4PB) as well as some smaller systems, an HPC dedicated GPFS filer from DDN, and an object based storage system from Scality. Obviously, all of these things have different performance characteristics, including the differing tiers of Isilon.

I’ve been benchmarking the various systems using the script below. I’ll walk through the various parts of the script. To date, this is probably one of my more ambitious attempts with Bash, and it would probably work better in Python, but I haven’t learned that yet. 😉


#!/bin/bash
usage () {
	echo "Command syntax is $(basename $0) [write|read|shufread|rm|parrm] [test|tier1|tier2|gpfs|localscratch|localssd|object]"
        echo "[filesizeG|M|K] [totalsize in GB] (optional) [file count per directory] (optional)"
}

if [ "$#" -lt 3 ]; then
	usage
	exit 1
fi

#CHANGE THESE PATHS TO FIT YOUR ENVIRONMENT
#set paths
case $2 in
	test) fspath=/mnt/dmtest/scicomp/scicompsys/ddcompare/$3 ;;
	tier1) fspath=/mnt/node-64-dm11/ddcompare/$3 ;;
	tier2) fspath=/mnt/node-64-tier2/ddcompare/$3 ;;
	gpfs) fspath=/gpfs1/nlsata/ddcompare/$3 ;;
        localscratch) fspath=/scratch/carlilek/ddcompare/$3 ;;
        localssd) fspath=/ssd/ddcompare/$3 ;;
        object) fspath=/srttest/ddcompare/$3 ;;
	*) usage && exit 1;;
esac

#some math to get the filesize in kilobytes
case $3 in
	*G) filesize=$(( 1024 * 1024 * `echo $3 | tr -d G`));;
	*M) filesize=$(( 1024 * `echo $3 | tr -d M` ));;
	*K) filesize=`echo $3 | tr -d K`;;
	*) usage && exit 1;;
esac	

#set the suffix for file names
suffix=$3

#set the total size of the test set
if [ ! -z $4 ]; then
	totalsize=$(( 1024 * 1024 * $4 ))
else
	totalsize=52428800 #The size of the test set in kb
fi
	
#set the number of files in subdirectories
if [ ! -z $5 ]; then
	filesperdir=$5
else
	filesperdir=5120 #Number of subdirs to use for large file counts
fi

#set up variables for dd commands
if [ $filesize -ge 1024 ]; then
	blocksize=1048576
else
	blocksize=$(( $filesize * 1024 ))
fi

#set up variables for subdirectories
totfilecount=$(( $totalsize / $filesize ))
blockcount=$(( $filesize * 1024 / $blocksize ))
if [ $filesperdir -le $totfilecount ]; then
	foldercount=$(( $totfilecount / $filesperdir ))
fi

#debug output
#echo $fspath
#echo filecount $totfilecount
#echo totalsize $totalsize KB
#echo filesize $filesize KB
#echo blockcount $blockcount
#echo blocksize $blocksize bytes

#defines output of time in realtime seconds to one decimal place
TIMEFORMAT=%1R

#creates directory to write to
createdir () {
	if [ ! -d $1 ]; then
		mkdir -p $1
	fi
}

#write test
writefiles () {
	#echo WRITE
	for i in `seq 1 $filecount`; do 
		#echo -n .
		dd if=/dev/zero of=$path/$i-$suffix bs=$blocksize count=$blockcount 2> /dev/null
	done
}

#read test
readfiles () {
	#echo READ
	for i in `seq 1 $filecount`; do 
		#echo -n .
		dd if=$path/$i-$suffix of=/dev/null bs=$blocksize 2> /dev/null
		#dd if=$path/$i-$suffix of=/dev/null bs=$blocksize
	done
}

#shuffled read test
shufreadfiles () {
	#echo SHUFFLE READ
	filearray=(`shuf -i 1-$filecount`)
	for i in ${filearray[*]}; do 
		#echo -n .
		#echo $path/$i-$suffix
		dd if=$path/$i-$suffix of=/dev/null bs=$blocksize 2> /dev/null
		#dd if=$path/$i-$suffix of=/dev/null bs=$blocksize
	done
}

#ObjectWrite
scalitywrite () {
    for i in `seq 1 $filecount`; do
        dd if=/dev/zero bs=$blocksize count=$blockcount 2> /dev/null | curl -s -X PUT http://localhost:81/proxy/bparc$fspath/$i-$suffix -T- > /dev/null
    done
}

#ObjectRead
scalityread () {
    for i in `seq 1 $filecount`; do
        curl -s -X GET http://localhost:81/proxy/bparc/$fspath/$i-$suffix > /dev/null
    done
}

#Do the work based on the work type

echo $1 $2 "filesize: "$3 "totalsize: "$4"G" "filesperdir: "$5
case $1 in
	write) 
        if [ $2 = scality ]; then
            filecount=$totfilecount
            time scalitywrite
            exit 0
        fi
        #Chunk file groups into folders if count is too high
	    if [ $totfilecount -ge 10000 ]; then
			for dir in `seq 1 $foldercount`; do
				createdir $fspath/$dir
			done
			time for dir in `seq 1 $foldercount`; do
				path=$fspath/$dir
				filecount=$(( $totfilecount / $foldercount ))
				writefiles
			done
		else
			path=$fspath
            createdir $path
			filecount=$totfilecount
			time writefiles
		fi
	;;
	read) #in order read
		sync; echo 1 > /proc/sys/vm/drop_caches
        if [ $2 = scality ]; then
            filecount=$totfilecount
            time scalityread
            exit 0
        fi
		if [ $totfilecount -ge 10000 ]; then
			time for dir in `seq 1 $foldercount`; do
				path=$fspath/$dir
				filecount=$(( $totfilecount / $foldercount ))
				readfiles
			done
		else
			path=$fspath
			filecount=$totfilecount
			time readfiles
		fi
	;;
	rm) #serial remove files
        if [ $2 = scality ]; then
            time for i in `seq 1 $totfilecount`; do
                curl -s -X DELETE http://localhost:81/proxy/bparc/$fspath/$i-$suffix > /dev/null
            done
            exit 0
        fi
		if [ $totfilecount -ge 10000 ]; then
			time for i in `seq 1 $foldercount`; do
				rm -f $fspath/$i/*-$suffix
				rmdir $fspath/$i
			done
		elif [ -d $fspath/$3 ]; then 
			time rm -f $fspath/*-$suffix
		fi
	;;
	parrm) #parallel remove files
		time ls $fspath | parallel -N 64 rm -rf $fspath/{}
	;;
	shufread) #shuffled read
		sync; echo 1 > /proc/sys/vm/drop_caches
		if [ $totfilecount -ge 10000 ]; then
			folderarray=(`shuf -i 1-$foldercount`)
			time for dir in ${folderarray[*]}; do
				path=$fspath/$dir
				filecount=$(( $totfilecount / $foldercount ))
				shufreadfiles
			done
		else
			path=$fspath
			filecount=$totfilecount
			time shufreadfiles
		fi
	;;
		
	*) usage && exit 1;;
esac
echo '------------------------'

I’ll break this all down in my next post.

Repairing quotas after you delete them all

As I mentioned in my earlier post, I managed to delete the quotas on one of my Isilon clusters by accident. Still haven’t figured out exactly how it happened, but it happened.

By a happy coincidence, we do dumps of our quota lists on a daily basis (I recommend you do too). The command you could use for this is:

isilon-1# isi quota quotas list --format csv

From there, I cut it down to an output that looks like:

type,path,hard-threshold

and tossed that into a file called quotadata.txt.

Then I used this script:

#!/bin/bash
OIFS=$IFS
IFS=','
INPUT=./quotadata.txt
[ ! -f $INPUT ] &while read TYPE QPATH SIZE
do
type=$TYPE
qpath=$QPATH
size=$SIZE
if [ -z $size ]; then
isi quota quotas create $qpath $type
else
isi quota quotas create $qpath $type --hard-threshold $size --container=yes
fi
done <$INPUT
IFS=$OIFS
isi quota quotas list

It threw a bunch of errors about stdin not being a tty, but those are safely ignored (and you could probably fix them through some kind of flag on the isi quota quotas commands.

In any case, that put all my quotas back. At that point, it was a (relatively) simple matter of running the quotascan job. 

As usual with my Isilon posts, this is in regards to OneFS 7.1.0. 

Moving data between quota protected directories on Isilon

Updated version of the script here: https://unscrupulousmodifier.wordpress.com/2015/10/08/moving-data-between-quota-protected-directories-on-isilon-take-ii/

In the current versions of Isilon OneFS, it is impossible to move files and directories between two directories with quotas on them (regardless of the directory quota type; even if it’s advisory, it won’t allow it). This is really annoying, and although I’ve put in a feature request for it, who knows if it will ever be fixed. So I wrote this script that will make a note of the quota location and threshold (if it’s a hard threshold), remove the quota, move the items, and reapply the quotas.

#!/bin/bash

#Tests whether there is a valid path
testexist () {
        if [ ! -r $1 ]; then
                echo "$1 is an invalid path. Please try again."
                exit
        fi
}

#Iterates through path backwards to find most closely related quota
findquota () {
        RIGHTPATH=0
        i=`echo $1 | awk -F'/' '{print NF}'` #define quantity of fields
        while [ $RIGHTPATH -eq 0 ]; do
                QUOTA=`echo $1 | cut -d"/" -f "1-$i"`
                if [ -n "`isi quota list | grep $QUOTA`" ]; then
                        RIGHTPATH=1
                fi
                i=$(($i-1))
        done
        echo $QUOTA
}

testquota () {
        if [ "$1" = "-" ]; then
                echo "No hard directory quota on this directory."
                exit
        fi
}

if [[ $# -ne 2 ]]; then
        #Gets paths from user
        echo "Enter source:"
        read SOURCE
        echo "Enter target:"
        read TARGET
else
        SOURCE=$1
        TARGET=$2
fi

testexist $SOURCE
testexist $TARGET

#Verifies paths with user
echo "Moving $SOURCE to $TARGET. Is this correct? (y/n)"
read ANSWER
if [ $ANSWER != 'y' ] ; then
        exit
fi

#Defines quotas
SOURCEQUOTA=$(findquota $SOURCE)
TARGETQUOTA=$(findquota $TARGET)

#Gets size of hard threshold from quota
SOURCETHRESH=$(isi quota view $SOURCEQUOTA directory | awk -F" : " '$1~/Hard Threshold/ {print $2}')
TARGETTHRESH=$(isi quota view $TARGETQUOTA directory | awk -F" : " '$1~/Hard Threshold/ {print $2}')
testquota $SOURCETHRESH
testquota $TARGETTHRESH

echo $SOURCEQUOTA $SOURCETHRESH
echo $TARGETQUOTA $TARGETTHRESH

isi quota quotas delete --type=directory --path=$SOURCEQUOTA -f
isi quota quotas delete --type=directory --path=$TARGETQUOTA -f

isi quota quotas view $SOURCEQUOTA directory
isi quota quotas view $TARGETQUOTA directory

mv $SOURCE $TARGET

isi quota quotas create $SOURCEQUOTA directory --hard-threshold=$SOURCETHRESH --container=yes
isi quota quotas create $TARGETQUOTA directory --hard-threshold=$TARGETTHRESH --container=yes

isi quota quotas view $SOURCEQUOTA directory
isi quota quotas view $TARGETQUOTA directory

Here’s how I use it:

bash /ifs/data/scripts/qmv /ifs/source/path /ifs/target/path

First I’ve got some functions in there:
testexist (): test if it’s a sane path
findquota (): find the quota info for the given path
testquota (): check if it’s a hard quota. If it’s not, the script fails, because that’s all we use around here. Feel free to fix it up and post something better in the comments.

Then we get to the bit where if it’s not given two arguments for source and target, it asks for them. It then tests if the source and target both exist. Please note that this script expects a fully qualified path including the bit you want to move for the source, and the place you want to move it for the target (ie, not source=/ifs/data/somethingdir/something target=/ifs/data/otherdir/something).

Of course, there’s a bit of error checking you’ll pretty much start ignoring and answering y to all the time…

Then we find the quotas for the directories. What the findquota () function does is it iterates back through the path until it finds an actual quota on it. I think this will break if you have nested quotas, but again, feel free to fix it up and let me know. It’ll then throw out which quota applies. Once it’s found both the quota paths, it saves the hard threshold in a variable. Now we’ve got variables for the source quota directory, the target quota directory, and both of their hard thresholds.

From there, it’s an easy move to delete the quotas, move the actual data, and then put the quotas back.

Don’t forget to use the –container=yes flag on those isi quota quotas create commands if you don’t want to show your end users the entire size of the filesystem.

**** Please note, and I found this after I made this post… if you comment out the echo $QUOTA line in the findquota () function, it kinda breaks the whole script. And then deletes all of your quotas without asking you. So, uh, don’t comment that out. That echo is what populates the $SOURCEQUOTA and $TARGETQUOTA variables. ****

This script works as of OneFS 7.1. I make no guarantees they won’t switch around the isi commands again in their quest to make commands as long and convoluted as possible.

Say, what’s this blog thing here?

Oh yeah, I have a blog. Well, we’ll see what happens, but I need a place to stash a few things that keep getting lost in my bash histories across various servers.

For starters, here’s quick find line that will get everything in a directory that’s

a) older than a month and
b) not from the 1st of the month

find /directory/subdirectory/* -maxdepth 1 -mtime +30 -ls | awk '$9 != /1/ { print $11 }' | parallel rm -rf {}

Now, why would you want to do this? I did because one of my dbas had been doing full backups nightly to our primary storage for years without trimming his backups. So we had terabytes of old (largely unimportant) stuff on relatively expensive and scarce storage. I talked to him, and he asked me to keep the data per the parameters above. Our company retention policy for backups is 30 days, hence the very short window.

So let’s break this down. The find command is a little tricky in and of itself. First you want to do a find across multiple directories, but not go too deep. That’s the /directory/subdirectory/* bit, with a maxdepth of 1, with a modification time of more than 30 days. Then list that. That looks like this:

[root@server ~]# find /mnt/mysql/* -maxdepth 1 -mtime +30 -ls
....
234199799 35514 -rw-r--r-- 1 root root 30984020 May 27 2013 /mnt/mysql/mysql2/clustrix-job_manager-201305270010.tar.gz
234181155 2690 -rw-r--r-- 1 root root 2187282 May 25 2013 /mnt/mysql/mysql2/wip-201305250100.tar.gz
354760343 98 -rw-r--r-- 1 root root 27568 Jun 15 2013 /mnt/mysql/mysql2/clustrix-probes-201306150011.tar.gz
257679475 26 -rw-r--r-- 1 root root 1374 Jun 2 2013 /mnt/mysql/mysql2/clustrix-gbrowse_login-201306020001.tar.gz
395629190 45898 -rw-r--r-- 1 root root 31066606 Jun 22 2013 /mnt/mysql/mysql2/clustrix-job_manager-201306220010.tar.gz
357046308 8874 -rw-r--r-- 1 root root 5904716 Jun 12 2013 /mnt/mysql/mysql2/clustrix-ror-201306120009.tar.gz
259828901 26 -rw-r--r-- 1 root root 6290 Jun 2 2013 /mnt/mysql/mysql2/clustrix-smith_lemur-201306020011.tar.gz
354511053 242 -rw-r--r-- 1 root root 75127 Jun 16 2013 /mnt/mysql/mysql2/clustrix-parts-201306160004.tar.gz
356484133 730 -rw-r--r-- 1 root root 449046 Jun 18 2013 /mnt/mysql/mysql2/clustrix-mad-201306180013.tar.gz
....
498553872 134570 -rw-r--r-- 1 root root 117480405 Mar 30 2012 /mnt/mysql/wiki-db/wiki-201203300855.tar.gz
3097297952 198146 -rw-r--r-- 1 root root 173105695 Nov 2 2013 /mnt/mysql/wiki-db/wiki-201311020855.tar.gz
244561447 167874 -rw-r--r-- 1 root root 114123533 Feb 25 2012 /mnt/mysql/wiki-db/wiki-201202250855.tar.gz
356159441 190722 -rw-r--r-- 1 root root 166592128 Jul 15 2013 /mnt/mysql/wiki-db/wiki-201307150855.tar.gz
4132816213 188730 -rw-r--r-- 1 root root 128309505 Oct 3 2012 /mnt/mysql/wiki-db/wiki-201210030855.tar.gz
2347163665 253386 -rw-r--r-- 1 root root 172236350 Oct 18 2013 /mnt/mysql/wiki-db/wiki-201310180855.tar.gz
636357288 204698 -rw-r--r-- 1 root root 166829424 Jul 22 2013 /mnt/mysql/wiki-db/wiki-201307220855.tar.gz
3504126126 155002 -rw-r--r-- 1 root root 135339049 Apr 4 2013 /mnt/mysql/wiki-db/wiki-201304042055.tar.gz

At that point, we pipe it into the awk statement, which does a search against the 9th field and makes sure it’s not 1 (and only 1, ie not 17 or 21), and then prints the 11th field. So you get something like this:

[root@server ~]# find /mnt/mysql/* -maxdepth 1 -mtime +30 -ls | awk '$9 != /1/ { print $11 }'
....
/mnt/mysql/mysql2/clustrix-genie-201306140014.tar.gz
/mnt/mysql/mysql2/clustrix-job_manager-201306150010.tar.gz
/mnt/mysql/mysql2/clustrix-genie-201306220014.tar.gz
/mnt/mysql/mysql2/clustrix-scheduleit-201305241800.tar.gz
/mnt/mysql/mysql2/clustrix-looger_lemur-201306180002.tar.gz
/mnt/mysql/mysql2/clustrix-scheduleit-201306021201.tar.gz
/mnt/mysql/mysql2/clustrix-geci_lemur-201306180002.tar.gz
/mnt/mysql/mysql2/clustrix-ahrens_lemur-201306100015.tar.gz
/mnt/mysql/mysql2/clustrix-campus_security-201306040600.tar.gz
/mnt/mysql/mysql2/clustrix-galaxy-201306200013.tar.gz
/mnt/mysql/mysql2/clustrix-qstatworld-201306070015.tar.gz
/mnt/mysql/mysql2/clustrix-zhang_lemur-201305270012.tar.gz
/mnt/mysql/mysql2/clustrix-gbrowse_login-201305310001.tar.gz
....
/mnt/mysql/wiki-db/wiki-201211162055.tar.gz
/mnt/mysql/wiki-db/wiki-201203300855.tar.gz
/mnt/mysql/wiki-db/wiki-201311020855.tar.gz
/mnt/mysql/wiki-db/wiki-201202250855.tar.gz
/mnt/mysql/wiki-db/wiki-201307150855.tar.gz
/mnt/mysql/wiki-db/wiki-201210030855.tar.gz
/mnt/mysql/wiki-db/wiki-201310180855.tar.gz
/mnt/mysql/wiki-db/wiki-201307220855.tar.gz
/mnt/mysql/wiki-db/wiki-201304042055.tar.gz

Now, why do that whole /directory/subdirectory/* thing? It’s for parallel. If you’re not familiar with parallel, go ahead and install it and read the man page. Yeah, I’m that kind of unix admin. Well, sometimes. Basically, it takes an input and multithreads it. It’s quite useful (if dangerous) with rsync, rm, and other commands of that ilk.

So what this one does is it parallelizes over 500 threads (the -P 500) the command (rm -rf) against the input from the pipeline ({}). I tend to use a very high threadcount, because what I often run into is that it’ll torch the easy stuff quickly and then chug along on the 2-10 highly nested directories. That way, I don’t have to wait for all the easy (ie, not very nested, relatively large file) stuff to go through before it starts working on the highly nested tiny file stuff. I’m usually running this on a 16 core machine, for reference. YMMV.

Please note that I do NOT attach that rm -rf {} bit before I’ve tested the first two parts of the command!

nowarrantyorguaranteeisexpressedorimpliedalldatalossresultingfromyouruseofthesecommandsispurelyyourresponsibilityauthorwillnotcomeandrecoveryourdataifyoublowitupbyaccident