[Tip] Finding and fixing corrupted PDF files   [Tip]

Version 2-alpha of the "CorruptedPDFinder - Recursive finder of corrupted PDF files" (released Nov 29, 2015), is available here:


CorruptedPDFinder works relatively well for scanning a large set of PDF files, on one drive, or within one branch per session. It identifies PDF files that are 'OK', 'Corrupt', 'Warning' (may be corrupt), 'Password', and 'Other' (unknown, or not found). It's an alpha version, and it lacks a right-click menu for copying results (selected file names), but overall it's a time saver.

It identified some "apparent" false-positives, mostly in PDF files that contain lots of graphics. However, most of the indicated files were found to open and display OK. So, before deleting any files identified as 'corrupt' or 'warning', try opening and viewing each one. I used Adobe Reader for that task, by zooming out to 6.25%, and holding down the Page-down key to quickly "scan" through the entire file, while watching for any Adobe error messages.

CorruptedPDFinder will also "flag" files that have a PDF extension, if they are not actually PDF files. Overall it can be quicker to search a large set of PDF files, to weed out any that lack the internal start-of-file and end-of-file markers: %PDF-1 and %%EOF. Then view the anomalous files in ZTreeWin, to identify the incomplete PDF files (that are corrupt), or other types of files, that need a correct file-name extension.

CorruptedPDFinder 2-Alpha is not "fast", but it's much faster than loading hundreds- or thousands of individual PDF files file into a PDF viewer, to see if any display failures or error messages are encountered. On a slower computer it will help to limit the number of PDF files scanned in one session to, at most, 5,000 files. And depending on the CPU speed, that could take up to 10 - 15 hours to complete. After scanning large numbers of files, the program's display will sometimes "freeze" (while CPU usage temporarily soars), when scrolling or sorting the displayed results or resizing the window. To avoid that problem when displaying the results, I'd recommend clicking the "Results categories" at the top of the CorruptedPDFinder window, and taking a screenshot of each (shorter) list of results. Then close the program, and let the CPU cool down... Also, I noticed that the results will indicate 'not found' for PDF files that have a combined file-name plus path length that exceeds about 258 characters.

For a large set of PDF files, consider deleting any duplicate files, before using CorruptedPDFinder. I'd definitely recommend using "SearchMyFiles", free from Nirsoft. It will rapidly search numerous disks, and drive partitions (or branches) that you select, and locate all of the duplicate files (of your specified file types, according to your file-size limits), in a single session:


SearchMyFiles quickly produces a sortable list of binary- identical file results (regardless of file names), and then those files can be easily managed using ZTreeWin for renaming, moving, deleting, etc.. (To copy a filename that's listed in SearchMyFiles, highlight it and then use: Alt-Enter, Tab, Ctrl-(C)opy, Escape.)

Finally, when a corrupt or possibly-corrupt PDF file is identified, try viewing it in "PDF-XChange Editor 6.0". I find that many "corrupt" PDF files that will not load in Adobe Reader, will load OK in PDF-XChange Editor. Sometimes, the file will load, and PDF-XChange Editor will warn that there's "a problem" with the file, and it offers to "Resave" the file in order to Repair it. Subsequently, the "Resaved" PDF file many times will open without errors in Adobe Reader, too... Also, I've repaired several "corrupt" PDF files by saving them in PDF-XChange Editor (which gave no warning of problems when viewing them), and subsequently those files would open OK in Adobe Reader, which previously had generated error messages while failing to open the same files.

PDF-XChange Editor 6.0 has replaced the "PDF-XChange Viewer", and the Editor v6 is now freeware, for non-commercial use:



