Tutorial On Java Regular Expressions

Tutorial On Java Regular Expressions
Tutorial On Java Regular Expressions

Introduction 

A regular expression (regular expression)  is a sequence of characters that specifies a search pattern. A regular expression can be a complicated pattern or just a simple character. 

Operations like search, edit, and replace text can be performed using a regular expression. It is widely used to define constraints on strings for email and password validation. Regular expressions are also used in search engines for search operations, search and replace dialogs of google docs, word processors, and text editors, finding texts in emails, etc.

In short, regular expressions are a fantastic way to search and manipulate texts in a file. A single line of the regular expression can easily replace dozens of lines of code.

Java Regular Expression

Java regular expression is an Application Programming Interface (API) to define a string pattern for searching and editing text operations in Java.

To work with regular expressions in Java, java.util.regex package has to be imported. The package consists of the following classes and interface:

java.util.regex consists of three classes and one interface

i) Pattern Class:  It is the compiled version of java regular expressions used to define various patterns. 

MethodDescription
static Pattern compile(String regular expression)It compiles the given regular expression and returns the instance of the pattern.
Matcher matcher(CharSequence input)It creates a matcher that matches the given input with the pattern.
static boolean matches(String regular expression, CharSequence input)It compiles the regular expression and matches the given input with the pattern.
String[] split(CharSequence input)It splits the given input string around matches of the given pattern.
String pattern()It returns the regular expression from which this pattern was compiled.

ii) Matcher Class: It is used to perform match operations on a character sequence.

MethodDescription
boolean matches()It is used to test whether the regular expression matches the pattern.
boolean find()It is used to search multiple occurrences of the regular expression in the text.
int start()The starting index of the matched subsequence is returned.
Int end()The ending index of the matched subsequence is returned.

iii) PatternSyntaxException class: It indicates the syntax error in a regular expression pattern.

MethodDescription
String getDescription()It returns the description of the error.
Int getIndex()It returns the error-index.
String getMessage()It returns a multi-line string containing:i) the description of the syntax error and its indexii) the incorrect regular-expression patterniii) a visual indication of the error-index within the pattern.
String getPattern()It returns the incorrect regular expression pattern.

iv) MatchResult interface: It is used to represent the result of a match operation. The match boundaries, groups, and group boundaries can be seen but not modified through a MatchResult.

MethodDescription
int end()The offset after the last character matched is returned.
int end(int group)It accepts an integer representing a particular group and returns the offset after the last match occurred in the specified group.
String group()The input subsequence matched by the previous match is returned.
String group(int group)The input subsequence captured by the given group during the previous match operation is returned.
int groupCount()The no: of capturing groups in this match result’s pattern is returned.
int start()The start index of the match is returned.
int start(int group)The start index of the subsequence captured by the given group during the match is returned.

Regular Expression Patterns

The first parameter of Pattern.compile() is a pattern that describes what is being searched for. A pattern consists of a simple character or a combination of simple characters and special characters.

Character Class

It is a set of characters that are enclosed inside square brackets [ ]. It specifies which all characters can be considered for the match.

Character ClassDescription
[abc]Find a character from the options given inside the bracket. (simple)
[^abc]Find a character not mentioned in the bracket. (negation)
[0-9]Find a character from 0 – 9. (range)
[a-c[e-f]]Find a character from a,b,c,e,f. (union)
[a-c&&[b-c]]Finds b or c.(intersection)
[a-c&&[\^b-c]]Finds only a. (subtraction)

Metacharacters

A regular expression consists of alphanumeric characters and special characters. These characters with special meanings are called metacharacters and are used to perform certain tasks. 

They are considered to be the building blocks of any regular expression and are used in regular expressions to define the search criteria and text manipulations. They cause the compiled regular expression to be interpreted in a special way.

MetacharacterDescription
|Find a match between any one of the patterns.
.Find just one instance of any character.
^Finds a match at the beginning of a string.
$Finds a match at the end of a string.
\dFind a digit.
\DFind a non-digit.
\s Find a whitespace character.
\SFind a non-whitespace character.
\bFinds a character at the beginning or end of a word.
\uxxxxFind the Unicode character specified by the hexadecimal number xxxx.

Quantifiers

They specify the number of occurrences of a character.

QuantifiersDescription
n+Matches any string that contains at least one n.
n*Matches any string that contains zero or more occurrences of n.
n?Matches any string that contains at most one n.
n{x}Matches any string in which n occurs x times.
n{x,y}Matches any string in which n occurs at least x times but less than y times.
n{x,}Matches any string in which n occurs x or more times.

How to write a regular expression?
A regular expression is written using metacharacters, character class, and quantifiers. “\” is used to search characters like “+”, “.”, etc., which already has a predefined meaning in the regular expression.

Example: regular expression of an Email ID

Consider the example: codingNinjas2021/@/gmail/./com

Let’s write the regular expression part by part:

  • The first part can have lowercase/uppercase letters, numbers, underscore, -,.[a-z A-Z 0-9 _ \- \.]+ This means the characters in this set should occur at least once
  • The second part will have “@” [@] This means @ should occur once
  • The third part will have lowercase letters like Gmail, Yahoo, etc. [a-z]+ This means lowercase character should occur at least once
  • The fourth part will have “.” [\.]
  • The fifth part will have a lowercase letter of 2/3 length like in/com [a-z]{2,4}
  • This means a lowercase character should occur either twice or thrice.
regular expression: [a-z A-Z 0-9 _ \- \.]+[@][a-z]+[\.][a-z]{2,4}

Example Programmes:

1. Example for quantifiers

import java.util.regex.Pattern;

public class Main {

  public static void main(String[] args) {
	 //it returns true as a or b or c occurs at least once
	  System.out.println(Pattern.matches("[abc]+", "aaa")); 

	  //it returns false as z and t are not matching patterns
	  System.out.println(Pattern.matches("[abc]+", "aazzta"));   

	  //it returns true as a or b or c occur 0 or more times
	  System.out.println(Pattern.matches("[abc]*", "ab")); 

	  //it returns true as a or b or c occurs at most once
	  System.out.println(Pattern.matches("[abc]?", "a"));

	  //it returns false as a or b or c occurs more than once
	  System.out.println(Pattern.matches("[abc]?", "aabbbcc"));  
  }

}  

Output:
true
false
true
true
false

2. Example for metacharacters

import java.util.regex.Pattern;

public class Main {

  public static void main(String[] args) {
	
  //it returns true as a digit occurs once
	  System.out.println(Pattern.matches("\\d", "1"));

	  //it returns false as a digit occurs more than once
	  System.out.println(Pattern.matches("\\d", "123")); 

	  //it returns false as it is a combination of digits and characters
	  System.out.println(Pattern.matches("\\d", "123abc"));
	   
	  //it returns false as a character occurs more than once
	  System.out.println(Pattern.matches("\\D", "abc"));

	  //it returns false as it is a digit
	  System.out.println(Pattern.matches("\\D", "123")); 

	  //it returns true as a character occurs once
	  System.out.println(Pattern.matches("\\D", "m")); 
	  
	  //it return false as it is a non-digit and occurs zero or more times
	  System.out.println(Pattern.matches("\\D*", "code")); 
  }

}  

Output:
true
false
false
false
false
true
true

3. Program to find out if there are any occurrences of the word “coding” in a sentence.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
 
public static void main(String[] args) {

          	// Create a pattern to be searched
		// the second parameter in Pattern.compile() method is a flag which indicates  that the search should be case-insensitive
		Pattern pattern = Pattern.compile("coding", Pattern.CASE_INSENSITIVE);
	
// Create a matcher to search the pattern
		Matcher matcher = pattern.matcher("Coding Ninjas");
    
		// matcher.find() used to find if there is an occurrence of the pattern 
    		boolean matchFound = matcher.find();
    
   	 	if(matchFound) {
     			System.out.println("Match found from "+matcher.start()+" and ends at "+ (matcher.end() - 1));
    		} else {
      			System.out.println("Match not found");
    		}
  	}
}

Output:
Match found from 0 and ends at 5

4. Program to find out if there are any occurrences of a regular expression in a sentence.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main {
	
// Create a pattern to be searched
    	Pattern pattern = Pattern.compile("co*");
    	
    	// Create a matcher to search the pattern
        Matcher matcher = pattern.matcher("Learn to code from coding ninjas");
        
        // matcher.find() used to find if there is an occurrence of the pattern
        while (matcher.find())
            System.out.println("Pattern found from " + matcher.start() +
                               " to " + (matcher.end()-1));
   
 }
}

Output:

Pattern found from 9 to 10
Pattern found from 19 to 20

5. Replace words

Problem Statement: A program to replace the word “Python” with Java in the given sentence.

import java.util.regex.Matcher;
 import java.util.regex.Pattern;

  public class Main {
   		 public static void main(String[] args) {
    	 		
String regex = "Python";
    		String sentence = "Python Regular Expression";
    		String replace = "Java";

    		System.out.println("Before replacing: "+sentence);

    		// Create a pattern to be searched
    		Pattern pattern = Pattern.compile(regex);

    		// Create a matcher to search the pattern
    		Matcher matcher = pattern.matcher(sentence);
    		// Replace the pattern with the replace text
    		sentence = matcher.replaceAll(replace);

    		System.out.println("After replacing: "+sentence);
    		}
}
 
Output:

Before replacing: Python Regular Expression
After replacing: JavaRegular Expression

6. Password validation

Problem Statement: A program to check if the password is valid. A password is considered valid if it is of length 8-12, contains at least one digit, one upper case alphabet, one lower case alphabet, and a special character, and does not contain any white space.

regular expression for password: ^(?=.[0-9])(?=.[a-z])(?=.[A-Z])(?=.[@#$%^&-+=()])(?=\S+$).{8, 12}$

The explanation for regular expression:
i) ^: represents the starting character of the string
ii) (?=.[0-9]): a digit must occur at least once iii) (?=.[a-z]): a lower case alphabet must occur at least once
iv) (?=.[A-Z]): an upper case alphabet must occur at least once v) (?=.[@#$%^&-+=()]): a special character must occur at least once
vi) (?=\S+$) : no white space
vii) .{8, 12}: length should be between 8 – 12
viii) $: represents the ending character of the string

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main{
    	public static void main(String[] args) {
    	 String password = "codingNinjas1@";

    	// regular expression for password
    	String regex = "^(?=.*[0-9])" + "(?=.*[a-z])(?=.*[A-Z])" + "(?=.*[@#$%^&+=])" + "(?=\\S+$).{8,20}$";
    	   	 
    	//Create a pattern to be searched
    	Pattern pattern = Pattern.compile(regex);
    	   	 
    	if (password == null) 
    		System.out.println("Invalid password.");
    	
    	// Create a matcher to search the pattern
    	Matcher matcher = pattern.matcher(password);
    	   	 
    	// check if password is valid
    	if(matcher.matches())
    	   	System.out.println("Valid Password");
    	else
    	   	System.out.println("Invalid password.");
    }
}


Output:
Valid Password

Frequently Asked Questions

How do you make a Java regular expression?

regular expression is made in Java using character class, metacharacters, and quantifiers.

What is the use of the regular expression in Java?

Regular expressions can be used to search a pattern from the text, replace words in the text, find if the input is valid in the case of phone numbers, emails, passwords, etc.

What does \ mean in Java?

\ is an escape sequence used to insert a backslash character in the text. In regular expression (\d) the extra backslash is required for the code to compile.

What are the types of Java expressions?

In Java, there are various types of Java expressions like constant expressions, integral expressions, floating expressions, regular expressions, relational expressions, etc.

What is a backreference in regular expressions?

A backreference in a regular expression identifies a previously matched group and looks for the exact pattern again. Backreferences can be used while looking for adjacent, repeated words in some text.

Key Takeaways

So these are the key points related to regular expressions in Java. In this blog, we ran you through the following:

  • What is a regular expression?
  • Regular expressions in Java
  • Regular Expression pattern
  • Character Class
  • Metacharacters
  • Quantifiers
  • How to write a regular expression?
  • Sample programs
  • Some common questions related to Java regular expression

By Hari Sapna Nair