Regular Expression Matching

vaishnavi pandey
Last Updated: May 13, 2022

Introduction

A regular expression is a string of letters or character sequence that defines a search pattern. When it comes to validating text input, pattern matching is essential. Patterns are highly adaptable, allowing us to create our regular expressions to validate data.

To learn more about regular expressions in Java, you can head to the Tutorial On Java Regular Expressions blog.

This article is all about solving a simple problem on regular expression matching. Such questions are persistent in job interviews. So, let's get started.

But before directly jumping on the solution approaches, we suggest you try out this problem yourself on CodeStudio.

Problem statement

The problem states that “An input string s and a pattern p is given. Implement regular expression matching with support for '.' and '*'. Where '.' matches a single character and '*' matches zero or more characters before it.”

One thing to note here is that the matching should take place throughout the input string.

Example: Let the string s is: "bbbaacd" and the pattern given is p = "b*a*c." We have to check whether the pattern p matches with string s.

The answer is YES. Let’s see how.

  • In the pattern string, the first asterisk denotes that the preceding character, i.e. b can be taken any number of times. It'll be taken three times to match with the string. 

The pattern becomes “bbba*c.”.

  • The second asterisk denotes that the preceding character, i.e. a can be taken any number of times. It'll be taken two times to match with the string.

Now, the pattern becomes “bbbaac.”.

  • Now, the dot(.) can be used to denote any single character. For the pattern to match, we'll take d in place of that dot.

Finally, the pattern becomes "bbbaacd". It matches with the string.

 

Approach 1: Recursion

The problem would be simpler if asterisk * is not present. We just have to examine each character of the string from left to right to see if it matches the pattern.

When a star appears, we may need to look at several distinct text suffixes to determine if they match the rest of the pattern. A simple approach to describe this relationship is with a recursive solution.

Implementation

#include<bits/stdc++.h>
using namespace std;

bool isRegexMatches(string s, string p)
{
    if(p.size()==0) return s.size() == 0;
       


    if(p.size()>1 && p[1]=='*')
    {

        // * matches zero character of s
        if(isRegexMatches(s,p.substr(2)))
          {
                return true;
          }

        // * matches 1 or more character 
        if(s.length() > 0 && (s[0]==p[0] || p[0]=='.'))
          {
                return isRegexMatches(s.substr(1),p);
          }
        return false;
    }
    else
    {
        if(s.size()>0 && (s[0]==p[0] || p[0]=='.'))
            {
                return isRegexMatches(s.substr(1),p.substr(1));
            }
        return false;
    }
}

int main()
{
    int testCases;
    cout<<"Enter the number of testcases:";
    cin>>testCases;
    while(testCases--)
    {
        string s,p;
    cout<<"Enter the source string:";
    cin>>s;
    cout<<"Enter the pattern:";
    cin>>p;
    
    if(isRegexMatches(s,p))
        cout<<"Matching successful!!"<<endl;
    else
        cout<<"No match found!!"<<endl
    }
  
    return 0;
}

 

 

Output

Enter the number of test cases:
4
Enter the String:
ab
Enter the Pattern:
.*
Matching Successful!!
Enter the String:
aab
Enter the Pattern:
c*a*b
Matching Successful!!
Enter the String:
aa
Enter the Pattern:
a
No match found!!
Enter the String:
Mississippi
Enter the Pattern:
Mis*is*p.
No match found!!

Time Complexity

In the worst case, the time complexity will be O((T+P)2T+P/2).

Reason: The call to the function match will be made (i, i+j) times, and strings of the order O(T - i) and O(P - 2 * j) will be made.

Space Complexity

The space complexity will also be O((T+P)2T+P/2).

Reason: In every function call, we’ll create strings as described above, possibly creating duplicates.

Approach 2: Dynamic Programming

A dynamic programming approach is used when we have problems that can be decomposed into similar sub-problems. In this paradigm, the results of the subproblems are stored to reuse later. These algorithms are primarily used for optimisation.

To apply the DP technique, we need to find out if this problem can be divided into similar subproblems or not. 

 

 

In the above diagram, we can see how the problem contains similar subproblems.

Let’s move on to the implementation of this approach.

Implementation

We will use the above recursive approach, but the subproblem with string [ i: ], pattern [ j: ] will only ever be solved once; we will use the dp array instead. This will save expensive string-building operations and allow us to cache the intermediate results.

The top-down dynamic programming implementation is given below:

import java.util.Scanner;

 

public class RegexMatching1 {

enum Result {

TRUE, FALSE

}

 

Result[][] memo;

 

public boolean isMatch(String text, String pattern) {

memo = new Result[text.length() + 1][pattern.length() + 1];

return dp(00, text, pattern);

}

 

public boolean dp(int i, int j, String text, String pattern) {

if (memo[i][j] != null) {

return memo[i][j] == Result.TRUE;

}

boolean ans;

if (j == pattern.length()) {

ans = i == text.length();

else {

boolean first_match = (i < text.length()

&& (pattern.charAt(j) == text.charAt(i) || pattern.charAt(j) == '.'));

 

if (j + 1 < pattern.length() && pattern.charAt(j + 1) == '*') {

ans = (dp(i, j + 2, text, pattern) || first_match && dp(i + 1, j, text, pattern));

else {

ans = first_match && dp(i + 1, j + 1, text, pattern);

}

}

memo[i][j] = ans ? Result.TRUE : Result.FALSE;

return ans;

}

 

public static void main(String args[]) {

String s, p;

int test;

Scanner sc = new Scanner(System.in);

 

System.out.println("Enter the number of test cases:");

test = sc.nextInt();

while (test > 0) {

System.out.println("Enter the String:");

s = sc.next();

System.out.println("Enter the Pattern:");

p = sc.next();

RegexMatching r = new RegexMatching();

if (r.isMatch(s, p)) {

System.out.println("Matching Successful!!");

else {

System.out.println("No match found!!");

}

test--;

}

}

}

 

The bottom-up variation of the above code is given below:

import java.util.Scanner;

public class RegexMatching1 
{
public boolean isMatch(String text, String pattern) {
        boolean[][] dp = new boolean[text.length() + 1][pattern.length() + 1];
        dp[text.length()][pattern.length()] = true;

        for (int i = text.length(); i >= 0; i--)
        {
            for (int j = pattern.length() - 1; j >= 0; j--)
            {
                boolean first_match = (i < text.length() && (pattern.charAt(j) == text.charAt(i) ||pattern.charAt(j) == '.'));
                if (j + 1 < pattern.length() && pattern.charAt(j+1) == '*')
                {
                    dp[i][j] = dp[i][j+2] || first_match && dp[i+1][j];
                } 
                else 
                {
                    dp[i][j] = first_match && dp[i+1][j+1];
                }
            }
        }
        return dp[0][0];
    }

public static void main(String args[]) {
String s, p;
int test;
Scanner sc = new Scanner(System.in);

System.out.println("Enter the number of test cases:");
test = sc.nextInt();
while (test > 0) {
System.out.println("Enter the String:");
s = sc.next();
System.out.println("Enter the Pattern:");
p = sc.next();
RegexMatching r = new RegexMatching();
if (r.isMatch(s, p)) {
System.out.println("Matching Successful!!");
else {
System.out.println("No match found!!");
}
test--;
}
}
}

 

Output

Enter the number of test cases:

4

Enter the String:

aab

Enter the Pattern:

c*a*b

Matching Successful!!

Enter the String:

aa

Enter the Pattern:

a

No match found!!

Enter the String:

ab

Enter the Pattern:

.*

Matching Successful!!

Enter the String:

Mississippi

Enter the Pattern:

Mis*is*p.

No match found!!

Time Complexity

The time complexity will be O(T*P).

Reason: Every call to the function will be processed once, and thus the complete program will take O(T*P). 

Space Complexity

The space complexity will also be O(T*P).

Reason: The memory space utilised in this process will be the function calls which will take O(T*P) space.

Now, let’s move on to some of the frequently asked questions on regular expressions.

Frequently asked questions

  1. What is a regular expression?
     A regular expression is a string of letters or character sequence that defines a search pattern. 
     
  2. What purpose do regular expressions serve?
    Regular expressions come in handy for defining filters. To make a filter more specific or general, regular expressions contain characters that establish a pattern of text to be matched.
     
  3. What are some of the uses of regular expressions?
    Data validation, data scraping (particularly online scraping), data wrangling, simple parsing, the creation of syntax highlighting systems, and various other activities are all typical applications of regular expressions.

Key Takeaways

In this article, we ran you through the regular expression matching problem. We saw a recursive and dynamic programming approach to solve the problem. 

Regex is used in Google Analytics for URL matching and most significant editors like Sublime, Notepad++, Brackets, Google Docs, and Microsoft Word for search and replace.

Various languages provide support for regular expressions, and Java is one such language. To learn more about Java regular expressions, you can visit this blog. 

Happy Reading!

 

 

Was this article helpful ?
0 upvotes