Java Regular Expression Interview Questions

In this article, we would be going through the mostly asked Java Regular expression interview questions and answers. Even if you are not a great fan of interview questions, this post may offer you some interesting excercises on Regular Expressions.

Java Regular Expression Interview Questions and Answers

What are the classes in Java that helps to deal with regular expressions?

Java has a dedicated package named java.util.regex that has three classes which help to deal with regular expressions. Following is a brief description about them.

  • Pattern – represents compiled representation of a regex. You can get a new instance by using the static ‘compile’ method which accepts a regular expression as the first argument.
  • PatternSyntaxException – unchecked exception that occurs when there is a problem with the regular expression pattern’s syntax.
  • Matcher – engine that interprets the pattern and does match operations for an input string. You can get a new instance by using a Pattern object’s matcher method.

What is a metacharacter? How is it different from an ordinary character?

A metacharacter is a character that has a special meaning to a regular expression engine. This will not be counted as a regular character by the regex engine. Examples of metacharacters are ^, $, ., *, +, etc.

How can we make sure that a metacharacter is treated as an ordinary character?

Sometimes the string or set of strings for which you are creating regular expression may have special characters like the dot(.) symbol as an ordinary character. In such cases, you need to escape them with a backslash to be treated as a regular character and not a metacharacter. Eg: . In Java, you need to add an extra backslash to escape the one meant to treat metacharacter as an ordinary character. So, the sequence would be \. Another way to ensure that would be to include that in square brackets. Eg: [.]. The following program will help you understand the concept better.

The above program will print true for both print statements.

What is Backreference in Java Regex?

To explain the concept better, let us see what a capturing group and a quantifier are in case of regular expressions. In regular expressions, you can give a hint about the number of occurrences of a character.  This is achieved using quantifiers.

Quantifier Description
 ? Preceding character is optional
 * Zero or more occurrence of the character or capturing group
 + One or more occurrence of the character or capturing group

Capturing group is a way to group multiple characters and use as a single unit. This is achieved by placing them inside parentheses. BackReferences allow you to match the same text as previously matched by a capturing group. This allows you to repeat a particular pattern without writing again while constructing the regex for Pattern.compile(). A backreference is specified using a backslash followed by the number of group which needs to be called. In the following example, we try to figure out the start and end points of occurrence of the substring “abab” in a given string.

The output will be as follows:

The above output displays that counting from 0, the first occurrence of substring “cdcd” occurs at 10th position and ends before 14th position. The second occurrence of substring “cdcd” occurs at 17th position and ends before 21st position.
You can read more here

What are predefined character classes?

Predefined character classes are useful shorthand notations available for commonly used regular expressions.

Predefined Character Class Description
 . Any character
 d 0-9
 s Whitespace character
w A word character, ie [A-Za-z_0-9]
D Non-digit character
W Non-word character
S Non-whitespace character

How can we validate a given String that denotes an IPv4 IPAddress?

An IPv4 IP address is denoted by four sets of numbers each representing a byte each separated by a dot(.) symbol. Each can have a value from 0 to 2^8-1 (i.e., 255). So, we can frame the pattern in this manner: For a single byte, It can contain single digit: [0-9] or [\d] It can contain two digits: [1-9][0-9] or [1-9]\d . These two conditions can be written together as [1-9]?\d? It can contain numbers from 100 to 199: [1][0-9][0-9] or 1\d\d It can contain numbers from 200 to 249 : 2[0-4][0-9] or 2[0-4]\d It can include numbers from 250 to 255: 25[0-5] This pattern can be repeated four times with three dots in between. This needs to be specified in a word boundary so that strings containing exactly four integers denoting bytes separated by three dots are validated. So, the final regex will look like: \b((25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d?)\.){3}(25[0-5]|2[0-4]\d|1\d\d|[1-9]?\d?)\b

Write a regex to validate a new username. Criteria for a valid username is as follows:

  • It should start with an English alphabet followed by alphanumeric characters.
  • No special characters allowed
  • Username length should be at least 3 and should not exceed 20.

The regex for the above criterion can be formed as follows: ^[A-Za-z][A-Za-z0-9]{2,19} Where, ^ – denotes start of the name [A-Za-z] – denotes the first criterion that it should start with a letter [A-Za-z0-9] – means alphanumeric characters that may follow. If the question did not specify alphanumeric characters and had asked word characters, we would have used \w. According to the definition of this predefined character class, it also includes an underscore. Hence we are not using it. {2,19} – sets the upper and lower limits of occurrence. Since the first character is already specified, we use 2,19 instead of 3,20 to adhere to the conditions given.

Given an input string with alphanumeric characters, extract the integers into a list and print them out in the order of occurrence.

In the given string, you can have positive or negative numbers. So, the regex that is used should be: -?\d+ The above regular expression means that the leading ‘-‘ is optional, followed by one or more occurrence of [0-9]. The full program for extracting the numbers is as follows:

The output will be as follows:

 

For a given input string, how can you programmatically correct the number of spaces between words? We should essentially replace double or triple spacing with a single space character.

To achieve this, we can use String.replaceAll() functionality. There is trim() function in String class which eliminates the leading and trailing spaces if any. Here is the java program to do it:

Output:

Given an input string, describe how you can find the number of words in the string.

The String class has a method called String.split(String regex) which takes a regular expression and splits the string based on that input as a delimiter. The code below demonstrates how the word count can be done easily.

Sample input-output is given as follows:

Leave a Reply