Wildcard Matching

Shreya Deep
Last Updated: May 13, 2022

Introduction

Matching two strings means that they both should be equal to each other. If we’re given a pattern, it might be containing a few temporary characters denoted by some symbols, which can be replaced with a single or a bunch of alphabet characters. So to match a pattern with a given text, you need to efficiently replace those temporary characters with suitable characters that can lead to the exact given text.

In this blog, we’ll learn how to solve a problem named wildcard matching where we have to determine whether a pattern can be matched to a given text or not. 

Let’s understand the problem statement in-depth to get a better understanding.

Problem Statement

Given a text and a wildcard pattern, determine whether that pattern can be matched with the text or not. The matching should cover the entire text. Return true if the pattern can be equated to the text otherwise, false. 

Note:   

  • The wildcard pattern can include characters ‘?’ and ‘*.’ 
  • ‘?’: matches any single character.
  • ‘*’: matches any sequence of characters (including the empty line)
  • Each occurrence of ‘?’ can be replaced with any character, and each occurrence of ‘*’ can be replaced with a sequence of characters such that the wildcard pattern becomes identical to the input string after replacement.


For example: 

INPUT : Text - “adceb”  Pattern - “*a*b”

OUTPUT:  true

Explanation: the ‘*’ at the 0th position of the pattern can be replaced by an empty sequence, and the ‘*’ at the 2nd position can be replaced by “dce.”

INPUT : Text - “acdcb”  Pattern - “a*c?b”

OUTPUT:  false

Explanation: No perfect replacement can be done. 

Recommendation:  Please solve it on “CodeStudio”, before moving on to the solution.

Solution Approach 

First, let’s see the different types of characters in the pattern p.

There can be 3 cases:

  • Case 1: if p[i] == ‘?’
    • Replace the ‘?’ with the current character in the text string and move to the next character in both the text and the pattern.
       
  • Case 2: if p[i] == some alphabet
    • See if the alphabet is equal to the current character in the text string. If it is similar, move to the next character in both the text and the pattern. Otherwise, return false.
       
  • Case 3: if p[i]==’*’ 
    • There are two choices - 
      • We can match this ‘*’ to an empty sequence and move on to the next character in the pattern.
      • Match ‘*’ with a sequence of characters starting with the current character of the text string. Therefore, move to the next character in the text string, and the pattern string stays at the same position.

Approach 1: Recursion

Implementation of the above idea using recursion is as follows
 

#include<iostream>
#include <bits/stdc++.h>
using namespace std;

bool isMatch(string s, string p, int i, int j){
//If you have reached the end of the pattern and you also reached the end of the text, then it's a match. Else it's not. 
if (j == p.size())
    return (i == s.size());
//Whenever the text string is empty or we're at the end of the string, the 
//pattern string can only be matched if it equals '*" because the '*' can be 
//replaced by an empty string.
if (i == s.size())
{
    for (int k = j; k < p.size(); k++)
        if (p[k] != '*'return false;

    return true;
}
//If we have a same character or a '?' they simply match so move ahead in both pattern and text.
if ((p[j] == s[i]) || p[j] == '?')
{
    return isMatch(s,p, i + 1, j + 1);
}
if (p[j] == '*')
{
    return isMatch(s, p, i , j + 1) ||    //Ignore the pattern and match with next character in pattern
          isMatch(s, p, i+ 1, j);      //Match multiple characters in string with '*'
}
return false;
}

int main(){
    string s,p;
    cin>>s>>p;
    if(isMatch(s,p,0,0)){
        cout<<"YES"<<endl;
    }
    else{
        cout<<"NO"<<endl;
    }
    return 0;
}

Input

s= adceb

p = *a*b

Output

YES

Time complexity

O(2n), where n is the length of string p and m is the length of string s.

Space complexity

O(n*m), where n is the length of string p and m is the length of string s.

Reason: We’re storing the values of dp for each i and j from 0 to n and 0 to m, respectively. 

Approach 2: Dynamic Programming

We can solve this problem using dynamic programming. So, it’s suggested that you revise this topic once.

If the length of the text string is m and pattern length is n, make a dp table of size (n+1)*(m+1), where dp[i][j] would denote whether the subpattern from index 0 to i-1 can match the substring of text from 0 to j-1. 

The answer for dp[i][j] will depend on the smaller subproblems.

The steps are :

  • Declare a 2-D bool vector named dp of size (n+1)*(m+1) and initialise the values to 0;
  • Base cases:
    • dp[0][0] = 1 because empty strings always match.
    • Whenever the text string is empty, the pattern string can only be matched if it equals ‘*” because the ‘*’ can be replaced by an empty string. So, if text string s ==” “, if p==’*’, we can write for all i from 0 to n dp[i][0] = 1. Else, dp[i][0] = 0.
    • If the pattern string is empty and the text string is not empty, it can never be matched. Therefore, for all i from 0 to m, dp[0][i] = 0;
       
  • Start from i=1 to n and j=1 to n. Recurrence relation is: 
    • If p[i-1] == ‘*’, replace this with an empty sequence and check for dp[i][j-1] or replace this with current character s[j-1] move ahead in s and check for dp[i-1][j].  
    • Else If p[i-1] ==’?’, replace this with the current character and check for the rest, i.e; check for dp[i-1][j-1]. 
    • Else p[i-1] is a character other than the above two, just check if that character is equal to s[j-1] or not. If it is, check for dp[i-1][j-1], otherwise dp[i][j] = false.
       
  • Return dp[n-1][m-1].

Implementation of the above discussed approach:

#include<iostream>
#include <bits/stdc++.h>
using namespace std;

bool isMatch(string s, string p) {
    int m = s.length(); //length of string s
    int n = p.length(); //length of string p
    vector<vector<bool>> dp(n+1, vector<bool>(m+1false)); //declaration of dp vector
    dp[0][0] = true
    bool flag = true;
    for (int i = 1; i < dp.size(); ++i) {
        if (p[i-1] != '*'){
            flag = false;
        }
        dp[i][0] = flag;
    }
    for (int i = 1; i <= p.size(); ++i) {
        for (int j = 1; j <= s.size(); ++j) {
            if (p[i-1] == '*') {
                if (dp[i-1][j] || dp[i][j-1]){
                    dp[i][j] = true;
                }       
            }
            else if (p[i-1] == '?') {
                if (dp[i-1][j-1] == true){
                    dp[i][j] = true;
                }
            }
            else {
                if(dp[i-1][j-1] == true && p[i-1] == s[j-1]) {
                    dp[i][j] = true;
                }
            }
        }
    }
    return dp[dp.size()-1][dp[0].size()-1];
}

int main(){
    string s,p;
    cin>>s>>p;
    if(isMatch(s,p)){
        cout<<"YES"<<endl;
    }
    else{
        cout<<"NO"<<endl;
    }
    return 0;
}

 

Input

s= adceb

p = *a*b

Output

true

 

Time complexity

O(n*m), where n is the length of string p and m is the length of string s.

Reason: Because we’re calculating the value of dp for each i and j from 0 to n and 0 to m, respectively. 

Space complexity

O(n*m), where n is the length of string p and m is the length of string s.

Reason: We’re storing the values of dp for each i and j from 0 to n and 0 to m, respectively. 

Frequently asked questions

  1. What is dynamic programming?
    Answer: Dynamic programming is an optimization method used in various programming problems
     
  2. Where is dynamic programming used?
    Answer: Dynamic programming is used in problems where the solution depends on smaller overlapping subproblems. We use it to memorize the results so that they can easily be used later when needed.
     
  3. What are overlapping subproblems?
    Answer: A problem has overlapping subproblems if it can be divided into smaller subproblems that are reused multiple times.
     
  4. Does dynamic programming have overlapping subproblems?
    Answer:  Yes, dynamic programming is only used in problems where there are overlapping subproblems.
     
  5. What is wildcard pattern matching?
    Answer: Matching a pattern string with a text string by replacing ‘*’ and ‘?’ characters is called wildcard pattern matching.
     
  6. Where can I submit my “Longest increasing path in a matrix” code?
    Answer: You can submit your code on CodeStudio and get it accepted here right away.
     
  7. Are there more Data Structures and Algorithms problems in CodeStudio?
    Answer: Yes, CodeStudio is a platform that provides both practice coding questions and commonly asked interview questions. The more we’ll practice, the better our chances are of getting into a dream company of ours.

Key takeaways

In this article, we’ve discussed the wildcard matching problem. Here an effective method has been used, which is called dynamic programming. This is a crucial topic, and there are numerous exciting problems related to this topic. Some of these are Tower of Hanoiminimum cost to convert to the given string, and rod cutting problem

I would suggest you solve them to gain more confidence on this topic. These questions are asked during various coding contests as well as placements tests

To practice more such problems, Codestudio is a one-stop destination. This platform will help you acquire effective coding techniques and give you an overview of student interview experience in various product-based companies.

Happy Coding!

 

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think