Presentation and Style Tags
Posted: 2008-02-26 15:00:11
I'm very new to NWP and crafting Regular Expressions. I've seen a couple of cheat sheets for handling the text part, but I do not see anything for finding/replacing based on presentational properties like Bold or "Style1"... I see I can apply a property to the text in the find/replace fields, but if I were to hand-code the beast, how would I do something like this?
FIND: "Any bold text"
REPLACE: "±b±Any bold text±/±b"
I deal with a lot of documents from dozens of sources. No author does the same thing anywhere. I get period+space+space, tabs to control indents, double returns to control paragraph spacing and the dreaded "I really mean this multi-bam!!!!!"
I'm writing a macro to clean the text, which is to say get rid of extra spaces, extra returns, tabs, making phone numbers, dates, etc., in the correct text format. (And no one —I mean no one — gets more than a single bam, dammit!)
The documents carry styles and manually applied presentational formatting that is generally a mess; so I find it helps to attack the text first, then convert simple presentational properties like bold-words to ±b±bold-words±/b± then convert my hand-rolled markup to proper style tags in InDesign. (I like using the "±" because it's never used in text and it does not get in the way of people that know much more about this stuff than me.)
I produce a number of publications that have different sets of Styles, but there are times when a particular piece of content will share text with another publication and have a different presentation. That's why I like having my own simple markup. The author generally makes a string of text bold for a reason, but getting that property to hold up in InDesign requires some reformatting. I realize I could map one set of styles to another, but the original documents have very little consistency to them. I regularly end up with bold-word plain-space bold-word paragraph style thats supposed to be italic, but now it's overridden.
my goals:
1) Clean the text and make it AP Style
2) Honor the author's presentational properties, so long as its style appropriate. (I call it a victory whenever I get a document that's not in all-caps.)
3) Have clean strings of text that include markup that's translated to consistent and correct tags.
FIND: "Any bold text"
REPLACE: "±b±Any bold text±/±b"
I deal with a lot of documents from dozens of sources. No author does the same thing anywhere. I get period+space+space, tabs to control indents, double returns to control paragraph spacing and the dreaded "I really mean this multi-bam!!!!!"
I'm writing a macro to clean the text, which is to say get rid of extra spaces, extra returns, tabs, making phone numbers, dates, etc., in the correct text format. (And no one —I mean no one — gets more than a single bam, dammit!)
The documents carry styles and manually applied presentational formatting that is generally a mess; so I find it helps to attack the text first, then convert simple presentational properties like bold-words to ±b±bold-words±/b± then convert my hand-rolled markup to proper style tags in InDesign. (I like using the "±" because it's never used in text and it does not get in the way of people that know much more about this stuff than me.)
I produce a number of publications that have different sets of Styles, but there are times when a particular piece of content will share text with another publication and have a different presentation. That's why I like having my own simple markup. The author generally makes a string of text bold for a reason, but getting that property to hold up in InDesign requires some reformatting. I realize I could map one set of styles to another, but the original documents have very little consistency to them. I regularly end up with bold-word plain-space bold-word paragraph style thats supposed to be italic, but now it's overridden.
my goals:
1) Clean the text and make it AP Style
2) Honor the author's presentational properties, so long as its style appropriate. (I call it a victory whenever I get a document that's not in all-caps.)
3) Have clean strings of text that include markup that's translated to consistent and correct tags.