How to sanitize your PDF metadata?

There's more than one way to do it, as often.

One strategy is to rely upon qpdf which creates a new (empty) PDF document from scratch and add (all: 1-z) the pages from the input PDF files into it. It has the advantage of removing the metadata while keeping the PDF otherwise untouched:

qpdf --empty --pages in.pdf 1-z -- out.pdf

A brute force approach is to rasterize the whole document, with e.g. ImageMagick:

convert -density 300x300 -compress lzw in.pdf out.pdf

A more selective approach, relying on pdftk to redact out specific metadata, say e.g. the 'Title' and 'Author' metadata only:

pdftk in.pdf dump_data |  
sed -r -e '/InfoKey/ {  N ; s/(InfoKey: (Author|Title))\n(InfoValue.*)/\1\nRedact\3/g }' |
sed -e 's/Redact\(InfoValue:\)\s.*/\1\ /g' |
pdftk in.pdf update_info --output out.pdf

A more generic tool for dissecting various file formats is eventually available: use hachoir!