Codebox Software

Readable Regular Expressions Library

Published:

This library helps Java developers to write more readable and robust regular expressions. The library has a fluent API, allowing the developer to build up the text of a regular expression using code that reads like English, rather than by producing something that resembles a cruel practical joke.

Because the text of the expression is generated by the library, the developer doesn't have to worry about things like bracket matching, and making sure that all special characters are correctly escaped (avoiding backslash hell). In addition, since the expression is built up using Java method calls, rather than a dense string of text, the developer can use all the normal techniques for improving code readability, such as indentation and commenting.

For example, the following expression can be used to match email addresses:

[_\\-A-Za-z0-9]+(\\.[_\\-A-Za-z0-9]+)*@[A-Za-z0-9]+(\\.[A-Za-z0-9]+)*\\.[a-zA-Z]{2,}

Using the library, the same expression is built up as follows:

final Token ALPHA_NUM = anyOneOf(range('A','Z'), range('a','z'), range('0','9'));
final Token ALPHA_NUM_HYPEN_UNDERSCORE = anyOneOf(characters('_','-'), range('A','Z'), range('a','z'), range('0','9'));
        
String regexText = RegExBuilder.build(
 // Before the '@' symbol we can have letters, numbers, underscores and hyphens anywhere
    oneOrMore().of(
    	ALPHA_NUM_HYPEN_UNDERSCORE
    ),
    zeroOrMore().of(
        text("."), // Periods are also allowed in the name, but not as the initial character
        oneOrMore().of(
        	ALPHA_NUM_HYPEN_UNDERSCORE
        )
    ),
    text("@"),
 // Everything else is the domain name - only letters, numbers and periods here
    oneOrMore().of( 
    	ALPHA_NUM
    ),
    zeroOrMore().of(
        text("."), // Periods must not be the first character in the domain
        oneOrMore().of(
        	ALPHA_NUM
        )
    ),
    text("."), // At least one period is required
    atLeast(2).of( // Period must be followed by at least 2 letters (this is the TLD)
        anyLetter()
    )
);

The library is open source, and available on GitHub