Tag Validator

Posted: 23 Mar, 2021
Difficulty: Moderate

PROBLEM STATEMENT

Try Problem

You are given a string 'S' representing a code file. You have to check whether it is a valid code file or not.

The File Is Valid If-
1) The file can be represented as \<TAG> CONTENT \</TAG>

2) TAG is an uppercase English letter word of length 1 to 9.

3) CONTENT may contain other valid closed tags, CDATA, letters, digits, ‘>’, ‘/’, ‘]’, ‘[’, ‘!’  and ‘ ’.

4) CONTENT does not contain any unmatched ‘<’.

5) Each start tag must have a matching end-tag. And they must be balanced.

6) CDATA can be represented as \<![CDATA[DATA\_CONTENT]]> where, DATA\_CONTENT can be any string. 

7) DATA\_CONTENT is ignored and not parsed even if it contains a valid tag it is considered as a string and not as a tag. 

For Example :

S = ”<SPAN>Hello <B>Ninja </B> </SPAN>”
S = start_tag | CONTENT | end_tag
start_tag=<SPAN>
CONTENT=Hello <B> Ninja</B>
end_tag=</SPAN>

CONTENT also have a closed <B> tag. 

Answer is True
Input Format :
The first line of input contains an integer ‘T’ denoting the number of test cases to run. Then the test case follows.

The first line of each test case contains a non-empty string 'S'.
Output Format :
For each test case print “True” if it is a valid code file, else print “False” without quotes.

Output for each query is printed in a separate line.
Note :
You don’t need to print anything. It has already been taken care of. Just implement the given function.
Constraints :
1 <= T <= 5
1 <= N <= 10^5 

Where ‘N’, is the size of String 'S' and the given string 'S' contains only letters, digits, ‘<’, ‘>’, ‘/’, ‘]’, ‘[’, ‘!’  and ‘ ’.

Time Limit: 1 sec
Approach 1

We will maintain a stack of all the start tags. And iterate through ‘S’. If S[i]== ‘<’ than it must be a start of a tag, and S[i+1]==’/’ then it must be an end of the tag. If S[i]= ‘<’ and S[i+1] = ‘!’ it is a CDATA block and we will ignore it. All the content must be wrapped in some tag. 

If for any ‘<’ we are not able to find ‘>’ it is an invalid file. or tag name is not valid or the end tag is missing for the start tag then it is an invalid file.

The algorithm will be-

  1. We initially initialize i to 0 and N to the length of S.
  2. We also make use of the data structure ‘stack’. Initially, stack will be empty.
  3. Now While i < N
    1. If S[i]= ‘<’
      1. If it is a end tag
        1. If not valid tag stack.back()!=tag()
          1. Return False
        2. Else, we pop the element of the stack
      2. If it is a CDATA block
        1. If not a valid block()
          1. Return False
        2. Ignore till the end of block
      3. If it is a start tag
        1. If not valid tag
          1. Return False
        2. stack.Push(tag)
    2. ELSE
      1. If stack.empty()
        1. Return False
    3. If i<N and stack.empty()
      1. Return False
  4. We finally return ‘True’.
Try Problem