I type in “typewriter-style” — that is to say, I prefer to leave two spaces after the period at the end of sentences. In HTML, a series of whitespace characters is rendered in the browser as a single space, so you typically don’t see two spaces following a sentence unless extra measures are taken to preserve them.
I use WordPress for this blog and I noticed that, indeed, the two spaces are preserved. However, there is an issue. If a line happens to break at the end of a sentence, the second space will be carried over to the next line. (See image above.) In this article, I will explain what is happening and present a possible solution.
John Jetmore has a write-up explaining what is happening. Credit to him for the featured image that I used for this article, as well. It looks like the problem actually lies with the TinyMCE editor and not with WordPress itself. While I’ve read that TinyMCE has been modified to fix this issue, I am still experiencing it, so maybe the fix hasn’t made its way back to WordPress yet.
If there are multiple “spaces” in a row, TinyMCE leaves the first one alone and replaces the rest of them with the Unicode character “C2A0”, which is a non-breaking space. The web browser treats this the same as the character, and the solution works fine, unless a line happens to wrap at the end of a sentence as mentioned above.
Not wanting to dig into how TinyMCE works or have to deal with updating previous posts, I decided to look at writing a WordPress filter that would just correct the output. One way to go is to switch the behavior around. TinyMCE puts the regular space first and adds non-breaking spaces after it. If you put non-breaking spaces first and add a regular space at the end, it actually works fine in the browser even on lines that break at a sentence. So, something needs to be done to switch the spaces around.
The magic comes down to this line of code:
return preg_replace("|([?!.\)]) (\xc2\xa0)+|", '\\1 ', $the_content);
Here, I use a regular expression to look for the characters “?”, “!”, “.”, and “)” (characters that may end a sentence), followed by a space and then one or more non-breaking spaces. I then replace this with the same punctuation mark, followed by a regular space and then an HTML non-breaking space. Requiring that the match start with a sentence-ending character allows for the regular behavior to be preserved in the case that a sequence of multiple spaces occurs in a non-sentence-looking context (for example, a code snippet).
I’ve updated this blog with this filter and everything appears to be working fine at the moment. I plan to publish a WordPress plugin that implements this fix, after I do a little more testing to make sure that there is no unusual behavior.