Usage of regular expressions and batch processing

I use regular expressions for redaction. But I have to do it manually for each pattern (regular expression). I would like to avoid this and I am looking for a better solution.

I am looking for a way to use regular expressions during batch procession.
I see two possible ways
a) Put in the regular expression inside the word list, but how can I do this?
b) Java Scripts, but how does it work?


Hans Hermann Krost


3 Answers

Regular Expressions can't be used with the Search and Remove Text command.

It is possible to do it with JS, but it is a complex task, especially if you want to find text that is longer than a single word. The issue is that JS can only access one word at a time from the text of a PDF, so if you try to match a whole phrase against a RegExp it becomes very complex to keep track of where the phrase is located and whether it's a match.

If you only want to find single word phrases, it's less complex (but still not trivial). You basically scan all the words in the file, looking for matches against your regular expression, and when they are found you place a Redaction annotation on top of those words and then apply the redactions at the end.

I have developed certain such scripts in the past, for example for the removal of Social Security Numbers or Credit Card Numbers from PDF files.

.


Visit my custom-made PDF scripts website: http://try67.blogspot.com
Contact me personally: try6767@gmail.com


Gilad D (try67)   

I do not think there is a tutorial on this.

See Automating Redaction with Acrobat JavaScript by Thom Parker on how to use JavaScript for redaction. It included an example using the the RegExp object but it is not fully integrated into a full script but it is not hard to make the script fully automatic. I have done a script that searches and marks multiple locations on a page and within a PDF as items to redact and then apply the redactions. The script included removing and replacing with text.

Note that one needs Acrobat 9 or better to be able to apply redactions as part of a script.


George Kaiser   

An alternative might be Redax by Appligent. It is considered to be the industry standard when it comes to Redaction, but I can't say just so whether they support a list of Regular Expressions, but it would be worthwile checking.

Hope this can help.

Max Wyss.


Max Wyss   


Please specify a reason: