6 Mar 2013 eMBee   » (Journeyer)

vim tips: capitalize the first word of every sentence

To solve this problem i found this useful solution, but discovered that it didn't cover all cases i had.

s/\v(\U)([^\.]*\.)/\u\1\L\2/g

To start with, \U does not mean every not-uppercase letter, but every character from the whole set that is not uppercase. So it includes spaces and everything else. This causes the expression to match " hello world!" if the sentence doesn't start at the beginning of the line, which is not quite what we want. To get every non-uppercase (that is lowercase) letter use \l. But even that does not really work, because it means that now it matches "Hello world." as "ello world.", and we get a transformation as "HEllo world!". Again, not what we are looking for. Unfortunately, until someone can suggest a method to skip already capitalized sentences we have to stick to \w.

Next, the expression only excludes periods, but not question-marks, exclamations or other sentence ending characters. We can extend this by simply including the respective characters: [^\.?!:;]. We also do not need to enforce the terminating character, we can simply drop that. What we really want to match is the beginning of the sentence, we don't care about the end.

Also, unless the text is in all uppercase, lower-casing the second group could be counter productive as it would affect upper-cased acronyms etc. that are already there.

Lastly, we want to capture sentences spanning multiple lines, lest every line gets matched as a separate sentence. This is achieved using \_

Putting it all together we get s/\v(\w)(\_[^\.?!:;]*)/\u\1\2/g

Syndicated 2013-03-06 06:09:00 (Updated 2013-03-06 06:12:07) from DevLog

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!