How to Replace Multiple Strings of Characters with an ‘Enter’

bee-mascot

In certain situations it is necessary to replace strings of characters from a large document. An example of such a situation is having a badly formatted list of email addresses. You need the email addresses but the way they are stored in the file is useless. And to reformat the document manually would take a very long time.

An Example of Wrong Formatting

Creating a badly formatted document can happen to anyone. It may be a badly written PHP script or a wrongheaded managerial decision from the past that can generate a less than useful list of data.

A recent example of this was a badly formatted list of email addresses that I had to reformat.

The original list of email addresses looked like this:

You have to imagine that there were thousands of business email addresses collected over many years.

The list in its then current version was virtually useless.

The owner of the email database needed the list to look like this instead:

Basically, all extra characters had to be removed. Plus, carriage return and line feed characters (otherwise known as CRLF or an Enter) had to be added after each email address.

Reformatting the List

Reformatting the list manually would take days, if not weeks. So it had to be done programmatically.

The easiest way to reformat a list of data where there are recurring formatting errors is to use regex, which is an abbreviation for regular expressions.

Regular expressions are used in pattern matching of strings in find and replace operations. I have already written a post about the finding part. Today’s post will be focused on the replacing part.

Neither Notepad, nor WordPad — the native Windows text editors — are able to use regular expressions to find and replace strings of characters.

The two tools I like to use most for this kind of work, is Notepad++ and Open Office. Both software tools are completely free and open-source. Additionally, Notepad++ also comes in a portable version.

Tip

In case you would like to have a complete office suite in portable version, try Libre Office. It is also good to know that the regex used in Libre Office is exactly the same as the one used in Open Office.

Using Regex

Notepad++

In Notepad++, click Search in the menu bar and select Replace in the drop-down menu.

Alternatively, use the Ctrl+H keyboard shortcut to open the Replace tab of the Find, Replace, Find in Files, Mark window.

npp-find-and-replace-using-regex

Notepad++ — Find and replace using regular expression

In the Search Mode list, select Regular expression.

Then enter the following regular expressions into the correct boxes — Find what: ', \d* => ' and Replace with: \r\n.

Finally, press the Replace All button.

Open Office Writer

In Open Office Writer, click Edit in the menu bar and select Find & Replace in the drop-down menu.

Alternatively, use the Ctrl+F keyboard shortcut to open the Find & Replace window.

openoffice-find-and-replace-using-regex

Open Office Writer — Search and replace using regular expression

In the Find & Replace window, click on the More Options button.

In the drop-down menu tick the Regular expressions box.

Then enter the following regular expressions into the correct boxes — Search for: ', [:digit:]{1,} => ' and Replace with: \n.

Finally, press the Replace All button.

In a second, the regular expression matches all occurrences of the defined pattern of strings in the document and makes the correct replacements.

You will still need to manually delete the remaining characters in the first and last line of the results because those didn’t match the defined expression.

openoffice-regex-operation-results

Manually remove remaining extra characters

Of course, this fact applies to the Notepad++ results as well.

But that’s a lot less work than extracting thousands of email addresses manually.

Further Reading on Regex

Regular expressions can be useful in many different scenarios that arise on a daily basis.

Unless you are aware of the breadth of their possible uses, you might end up doing tedious work manually.

To understand how I arrived at the regular expressions used in the above examples, read more about how to use regex both in Notepad++ and in Open Office Writer.

Liked this post?

Subscribe to our newsletter to receive early notification of new posts and deals:

Next Post »« Previous Post

Leave a Reply

Your email address will not be published. Required fields are marked *