When using find, sometimes you want to run a command on each file that it finds. In order to recursively remove all PDF files from the current directory, you can run

$ find -name '*.pdf' -exec rm '{}' ';'

Perhaps you want to take into consideration the fact that some people like to use uppercase letters for the extension:

$ find \( -name '*.pdf' -or -name '*.PDF' \) -exec rm '{}' ';'

In any case, there are instances when you want to run the command on all files at once. For instance, if you want to discover the combined size of all PDF files, you can run du on all files:

$ find \( -name '*.pdf' -or -name '*.PDF' \) -print0 | xargs -0 du -sch

Here, -print0 and -0 make sure that null characters are used for separating paths, so that there is no room for ambiguity.

Now something weird happened today. I was using find and du in order to calculate the amount of space that each file type occupied in a directory and its subdirectories, but the numbers weren’t adding up. In fact, even if I didn’t consider the different file extensions, numbers were still odd. Like this:

$ find . -type f -print0 | xargs -0 du -sch
[lots of files...]
152M    total

$ du -sh .
378M    .

If space isn’t being used by files, where has it gone?

It turns out that xargs may invoke the command you pass it many times — in order to avoid creating a command line that is too long and that therefore will fail. It’s right there in the manual:

The command line for command is built up until it reaches a system-defined limit (unless the -n and -L options are used). The specified command will be invoked as many times as necessary to use up the list of input items.

So, in my case, du was called twice:

$ find . -type f -print0 | xargs -0 du -sch | grep total
225M    total
152M    total

One workaround is to run a Ruby script:

$ ruby -e 'puts Dir.glob("**/*.mp4").map{|x| File.stat(x).size}.sum / 1024.0**2'