Table of Contents
Introduction
A regular expression (regular expression) is a sequence of characters that specifies a search pattern. A regular expression can be a complicated pattern or just a simple character.
Operations like search, edit, and replace text can be performed using a regular expression. It is widely used to define constraints on strings for email and password validation. Regular expressions are also used in search engines for search operations, search and replace dialogs of google docs, word processors, and text editors, finding texts in emails, etc.
In short, regular expressions are a fantastic way to search and manipulate texts in a file. A single line of the regular expression can easily replace dozens of lines of code.
Java Regular Expression
Java regular expression is an Application Programming Interface (API) to define a string pattern for searching and editing text operations in Java.
To work with regular expressions in Java, java.util.regex package has to be imported. The package consists of the following classes and interface:
java.util.regex consists of three classes and one interface
i) Pattern Class: It is the compiled version of java regular expressions used to define various patterns.
Method | Description |
static Pattern compile(String regular expression) | It compiles the given regular expression and returns the instance of the pattern. |
Matcher matcher(CharSequence input) | It creates a matcher that matches the given input with the pattern. |
static boolean matches(String regular expression, CharSequence input) | It compiles the regular expression and matches the given input with the pattern. |
String[] split(CharSequence input) | It splits the given input string around matches of the given pattern. |
String pattern() | It returns the regular expression from which this pattern was compiled. |
ii) Matcher Class: It is used to perform match operations on a character sequence.
Method | Description |
boolean matches() | It is used to test whether the regular expression matches the pattern. |
boolean find() | It is used to search multiple occurrences of the regular expression in the text. |
int start() | The starting index of the matched subsequence is returned. |
Int end() | The ending index of the matched subsequence is returned. |
iii) PatternSyntaxException class: It indicates the syntax error in a regular expression pattern.
Method | Description |
String getDescription() | It returns the description of the error. |
Int getIndex() | It returns the error-index. |
String getMessage() | It returns a multi-line string containing:i) the description of the syntax error and its indexii) the incorrect regular-expression patterniii) a visual indication of the error-index within the pattern. |
String getPattern() | It returns the incorrect regular expression pattern. |
iv) MatchResult interface: It is used to represent the result of a match operation. The match boundaries, groups, and group boundaries can be seen but not modified through a MatchResult.
Method | Description |
int end() | The offset after the last character matched is returned. |
int end(int group) | It accepts an integer representing a particular group and returns the offset after the last match occurred in the specified group. |
String group() | The input subsequence matched by the previous match is returned. |
String group(int group) | The input subsequence captured by the given group during the previous match operation is returned. |
int groupCount() | The no: of capturing groups in this match result’s pattern is returned. |
int start() | The start index of the match is returned. |
int start(int group) | The start index of the subsequence captured by the given group during the match is returned. |
Regular Expression Patterns
The first parameter of Pattern.compile() is a pattern that describes what is being searched for. A pattern consists of a simple character or a combination of simple characters and special characters.
Character Class
It is a set of characters that are enclosed inside square brackets [ ]. It specifies which all characters can be considered for the match.
Character Class | Description |
[abc] | Find a character from the options given inside the bracket. (simple) |
[^abc] | Find a character not mentioned in the bracket. (negation) |
[0-9] | Find a character from 0 – 9. (range) |
[a-c[e-f]] | Find a character from a,b,c,e,f. (union) |
[a-c&&[b-c]] | Finds b or c.(intersection) |
[a-c&&[\^b-c]] | Finds only a. (subtraction) |
Metacharacters
A regular expression consists of alphanumeric characters and special characters. These characters with special meanings are called metacharacters and are used to perform certain tasks.
They are considered to be the building blocks of any regular expression and are used in regular expressions to define the search criteria and text manipulations. They cause the compiled regular expression to be interpreted in a special way.
Metacharacter | Description |
| | Find a match between any one of the patterns. |
. | Find just one instance of any character. |
^ | Finds a match at the beginning of a string. |
$ | Finds a match at the end of a string. |
\d | Find a digit. |
\D | Find a non-digit. |
\s | Find a whitespace character. |
\S | Find a non-whitespace character. |
\b | Finds a character at the beginning or end of a word. |
\uxxxx | Find the Unicode character specified by the hexadecimal number xxxx. |
Quantifiers
They specify the number of occurrences of a character.
Quantifiers | Description |
n+ | Matches any string that contains at least one n. |
n* | Matches any string that contains zero or more occurrences of n. |
n? | Matches any string that contains at most one n. |
n{x} | Matches any string in which n occurs x times. |
n{x,y} | Matches any string in which n occurs at least x times but less than y times. |
n{x,} | Matches any string in which n occurs x or more times. |
How to write a regular expression?
A regular expression is written using metacharacters, character class, and quantifiers. “\” is used to search characters like “+”, “.”, etc., which already has a predefined meaning in the regular expression.
Example: regular expression of an Email ID
Consider the example: codingNinjas2021/@/gmail/./com
Let’s write the regular expression part by part:
- The first part can have lowercase/uppercase letters, numbers, underscore, -,.[a-z A-Z 0-9 _ \- \.]+ This means the characters in this set should occur at least once
- The second part will have “@” [@] This means @ should occur once
- The third part will have lowercase letters like Gmail, Yahoo, etc. [a-z]+ This means lowercase character should occur at least once
- The fourth part will have “.” [\.]
- The fifth part will have a lowercase letter of 2/3 length like in/com [a-z]{2,4}
- This means a lowercase character should occur either twice or thrice.
regular expression: [a-z A-Z 0-9 _ \- \.]+[@][a-z]+[\.][a-z]{2,4}
Example Programmes:
1. Example for quantifiers
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
//it returns true as a or b or c occurs at least once
System.out.println(Pattern.matches("[abc]+", "aaa"));
//it returns false as z and t are not matching patterns
System.out.println(Pattern.matches("[abc]+", "aazzta"));
//it returns true as a or b or c occur 0 or more times
System.out.println(Pattern.matches("[abc]*", "ab"));
//it returns true as a or b or c occurs at most once
System.out.println(Pattern.matches("[abc]?", "a"));
//it returns false as a or b or c occurs more than once
System.out.println(Pattern.matches("[abc]?", "aabbbcc"));
}
}
Output:
true
false
true
true
false
2. Example for metacharacters
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
//it returns true as a digit occurs once
System.out.println(Pattern.matches("\\d", "1"));
//it returns false as a digit occurs more than once
System.out.println(Pattern.matches("\\d", "123"));
//it returns false as it is a combination of digits and characters
System.out.println(Pattern.matches("\\d", "123abc"));
//it returns false as a character occurs more than once
System.out.println(Pattern.matches("\\D", "abc"));
//it returns false as it is a digit
System.out.println(Pattern.matches("\\D", "123"));
//it returns true as a character occurs once
System.out.println(Pattern.matches("\\D", "m"));
//it return false as it is a non-digit and occurs zero or more times
System.out.println(Pattern.matches("\\D*", "code"));
}
}
Output:
true
false
false
false
false
true
true
3. Program to find out if there are any occurrences of the word “coding” in a sentence.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
// Create a pattern to be searched
// the second parameter in Pattern.compile() method is a flag which indicates that the search should be case-insensitive
Pattern pattern = Pattern.compile("coding", Pattern.CASE_INSENSITIVE);
// Create a matcher to search the pattern
Matcher matcher = pattern.matcher("Coding Ninjas");
// matcher.find() used to find if there is an occurrence of the pattern
boolean matchFound = matcher.find();
if(matchFound) {
System.out.println("Match found from "+matcher.start()+" and ends at "+ (matcher.end() - 1));
} else {
System.out.println("Match not found");
}
}
}
Output:
Match found from 0 and ends at 5
4. Program to find out if there are any occurrences of a regular expression in a sentence.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
// Create a pattern to be searched
Pattern pattern = Pattern.compile("co*");
// Create a matcher to search the pattern
Matcher matcher = pattern.matcher("Learn to code from coding ninjas");
// matcher.find() used to find if there is an occurrence of the pattern
while (matcher.find())
System.out.println("Pattern found from " + matcher.start() +
" to " + (matcher.end()-1));
}
}
Output:
Pattern found from 9 to 10
Pattern found from 19 to 20
5. Replace words
Problem Statement: A program to replace the word “Python” with Java in the given sentence.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String regex = "Python";
String sentence = "Python Regular Expression";
String replace = "Java";
System.out.println("Before replacing: "+sentence);
// Create a pattern to be searched
Pattern pattern = Pattern.compile(regex);
// Create a matcher to search the pattern
Matcher matcher = pattern.matcher(sentence);
// Replace the pattern with the replace text
sentence = matcher.replaceAll(replace);
System.out.println("After replacing: "+sentence);
}
}
Output:
Before replacing: Python Regular Expression
After replacing: JavaRegular Expression
6. Password validation
Problem Statement: A program to check if the password is valid. A password is considered valid if it is of length 8-12, contains at least one digit, one upper case alphabet, one lower case alphabet, and a special character, and does not contain any white space.
regular expression for password: ^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[@#$%^&-+=()])(?=\S+$).{8, 12}$
The explanation for regular expression:
i) ^: represents the starting character of the string
ii) (?=.[0-9]): a digit must occur at least once iii) (?=.[a-z]): a lower case alphabet must occur at least once
iv) (?=.[A-Z]): an upper case alphabet must occur at least once v) (?=.[@#$%^&-+=()]): a special character must occur at least once
vi) (?=\S+$) : no white space
vii) .{8, 12}: length should be between 8 – 12
viii) $: represents the ending character of the string
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main{
public static void main(String[] args) {
String password = "codingNinjas1@";
// regular expression for password
String regex = "^(?=.*[0-9])" + "(?=.*[a-z])(?=.*[A-Z])" + "(?=.*[@#$%^&+=])" + "(?=\\S+$).{8,20}$";
//Create a pattern to be searched
Pattern pattern = Pattern.compile(regex);
if (password == null)
System.out.println("Invalid password.");
// Create a matcher to search the pattern
Matcher matcher = pattern.matcher(password);
// check if password is valid
if(matcher.matches())
System.out.println("Valid Password");
else
System.out.println("Invalid password.");
}
}
Output:
Valid Password
Frequently Asked Questions
regular expression is made in Java using character class, metacharacters, and quantifiers.
Regular expressions can be used to search a pattern from the text, replace words in the text, find if the input is valid in the case of phone numbers, emails, passwords, etc.
\ is an escape sequence used to insert a backslash character in the text. In regular expression (\d) the extra backslash is required for the code to compile.
In Java, there are various types of Java expressions like constant expressions, integral expressions, floating expressions, regular expressions, relational expressions, etc.
A backreference in a regular expression identifies a previously matched group and looks for the exact pattern again. Backreferences can be used while looking for adjacent, repeated words in some text.
Key Takeaways
So these are the key points related to regular expressions in Java. In this blog, we ran you through the following:
- What is a regular expression?
- Regular expressions in Java
- Regular Expression pattern
- Character Class
- Metacharacters
- Quantifiers
- How to write a regular expression?
- Sample programs
- Some common questions related to Java regular expression
By Hari Sapna Nair
Leave a Reply