ZTree.com  | ZEN  | About...  

 Index   Back

[Tip] Finding and fixing corrupted PDF files   [Tip]

By: Bill Kingsbury       
Date: Apr 09,2016 at 10:16

Version 2-alpha of the "CorruptedPDFinder - Recursive finder of corrupted PDF files" (released Nov 29, 2015), is available here:

http://sourceforge.net/projects/corruptedpdfinder/
http://sourceforge.net/projects/corruptedpdfinder/files/

CorruptedPDFinder works relatively well for scanning a large set of PDF files, on one drive, or within one branch per session. It identifies PDF files that are 'OK', 'Corrupt', 'Warning' (may be corrupt), 'Password', and 'Other' (unknown, or not found). It's an alpha version, and it lacks a right-click menu for copying results (selected file names), but overall it's a time saver.

It identified some "apparent" false-positives, mostly in PDF files that contain lots of graphics. However, most of the indicated files were found to open and display OK. So, before deleting any files identified as 'corrupt' or 'warning', try opening and viewing each one. I used Adobe Reader for that task, by zooming out to 6.25%, and holding down the Page-down key to quickly "scan" through the entire file, while watching for any Adobe error messages.

CorruptedPDFinder will also "flag" files that have a PDF extension, if they are not actually PDF files. Overall it can be quicker to search a large set of PDF files, to weed out any that lack the internal start-of-file and end-of-file markers: %PDF-1 and %%EOF. Then view the anomalous files in ZTreeWin, to identify the incomplete PDF files (that are corrupt), or other types of files, that need a correct file-name extension.

CorruptedPDFinder 2-Alpha is not "fast", but it's much faster than loading hundreds- or thousands of individual PDF files file into a PDF viewer, to see if any display failures or error messages are encountered. On a slower computer it will help to limit the number of PDF files scanned in one session to, at most, 5,000 files. And depending on the CPU speed, that could take up to 10 - 15 hours to complete. After scanning large numbers of files, the program's display will sometimes "freeze" (while CPU usage temporarily soars), when scrolling or sorting the displayed results or resizing the window. To avoid that problem when displaying the results, I'd recommend clicking the "Results categories" at the top of the CorruptedPDFinder window, and taking a screenshot of each (shorter) list of results. Then close the program, and let the CPU cool down... Also, I noticed that the results will indicate 'not found' for PDF files that have a combined file-name plus path length that exceeds about 258 characters.

For a large set of PDF files, consider deleting any duplicate files, before using CorruptedPDFinder. I'd definitely recommend using "SearchMyFiles", free from Nirsoft. It will rapidly search numerous disks, and drive partitions (or branches) that you select, and locate all of the duplicate files (of your specified file types, according to your file-size limits), in a single session:

http://www.nirsoft.net/utils/search_my_files.html

SearchMyFiles quickly produces a sortable list of binary- identical file results (regardless of file names), and then those files can be easily managed using ZTreeWin for renaming, moving, deleting, etc.. (To copy a filename that's listed in SearchMyFiles, highlight it and then use: Alt-Enter, Tab, Ctrl-(C)opy, Escape.)

Finally, when a corrupt or possibly-corrupt PDF file is identified, try viewing it in "PDF-XChange Editor 6.0". I find that many "corrupt" PDF files that will not load in Adobe Reader, will load OK in PDF-XChange Editor. Sometimes, the file will load, and PDF-XChange Editor will warn that there's "a problem" with the file, and it offers to "Resave" the file in order to Repair it. Subsequently, the "Resaved" PDF file many times will open without errors in Adobe Reader, too... Also, I've repaired several "corrupt" PDF files by saving them in PDF-XChange Editor (which gave no warning of problems when viewing them), and subsequently those files would open OK in Adobe Reader, which previously had generated error messages while failing to open the same files.

PDF-XChange Editor 6.0 has replaced the "PDF-XChange Viewer", and the Editor v6 is now freeware, for non-commercial use:

http://www.tracker-software.com/product/pdf-xchange-editor


Bill


Some related articles...


Bulk Testing and Printing Large Numbers of PDFs (Raw Version - Batch Tutorial)

by Ray Woodcock - Posted on December 3, 2014
https://raywoodcockslatest.wordpress.com/2014/12/03/test-pdfs/


Bulk/Batch Testing and Printing Large Numbers of PDFs (Simplified)

by Ray Woodcock - Posted on April 11, 2015
https://raywoodcockslatest.wordpress.com/2015/04/11/simpler-pdf-test/


Methods of Repairing Corrupted or Damaged PDFs

by Ray Woodcock - Posted on December 4, 2014
https://raywoodcockslatest.wordpress.com/2014/12/04/pdf-repair/



..

2,682 views      
Thread locked
 

Messages in this Thread

 
96,637 Postings in 12,231 Threads, 350 registered users, 109 users online (1 registered, 108 guests)
Index | Admin contact |   Forum Time: Mar 28, 2024 - 11:46 pm UTC  |  Hits:62,384,798  (37,195 Today )
RSS Feed