Update appNew update is available. Click here to update.

Input Buffering in Compiler Design

Vaibhav Agarwal
Last Updated: Feb 2, 2023
Compiler Design


This article will discuss in detail the input buffering in compiler design. To understand the input buffering in compiler design, a few terms need to be understood. We will discuss those terms before moving to input buffering in compiler design. 

Input Buffering in Compiler Design

Lexical Analyser

The lexical analyzer's main purpose is to read the source program's input characters, arrange them into lexemes, and output a sequence of tokens for each lexeme in the source program.

When the lexical analyzer finds a lexeme part of an identifier, it must add it to the symbol table.

The lexical analyzer not only recognizes lexemes but also does pre-processing on the source text, such as deleting comments and white spaces.

Also see, Lexical Analysis in Compiler Design


A lexeme is a sequence of characters in the source program that fits the pattern for a token and is recognized as an instance of that token by the lexical analyzer. 


A Token is a pair that consists of a token name and a value for an optional attribute.

The token name is a symbol that denotes the type of lexical unit.

Lexeme - Token


A pattern is a description of the various forms that a token's lexemes can take. The pattern for a keyword as a token is just a series of characters that make up the term..

Input Buffering 

One or more characters beyond the following lexeme must be searched up to guarantee that the correct lexeme is identified.

As a result, a two-buffer system is used to safely manage huge lookaheads.

Using sentinels to mark the buffer end has been embraced as a technique for speeding up the lexical analyzer process.

Three general techniques for implementing a lexical analyzer

  • With the help of a lexical-analyzer generator, you can: The generator does this by providing methods for reading and buffering the input.
  • By building the lexical analyzer in a traditional systems programming language and reading the input with that language's I/O tools.
  • Programming the lexical analyzer in assembly language and explicitly controlling the input reading.


Buffer Pairs 

Specialized buffering techniques decrease the overhead required to process an input character in moving characters.buffer pairs

Buffer Pairs 

  • It consists of two buffers, each of which has an N-character size and is alternately reloaded.
  • There are two pointers: lexemeBegin and forward.
  • Lexeme Begin denotes the start of the current lexeme, which has yet to be discovered.
  • Forward scans until it finds a match for a pattern.
  • When a lexeme is discovered, lexeme begin is set to the character immediately after the newly discovered lexeme, and forward is set to the character at the right end of the lexeme.
  • The collection of characters between two points is the current lexeme.



Sentinels are used to performing a check each time the forward pointer is shifted to guarantee that one-half of the buffer has not gone off. If it's finished, the other half will need to be reloaded.

As a result, each advance of the forward pointer necessitates two checks at the ends of the buffer halves. Test 1: Check for the buffer's end. Test 2: To figure out which character is being read. By expanding each buffer in half to store a sentinel character at the end, sentinel reduces the two checks to one.

The sentinel is a unique character that isn't included in the source code. (The character of serves as a sentinel.)

Input Buffering in compiler design 

To identify tokens, Lexical Analysis must visit secondary memory each time. It takes a long time and costs a lot of money. As a result, the input strings are buffered before being examined by Lexical Analysis.

Lexical analysis reads the input string one character at a time from left to right to detect tokens. To scan tokens, it employs two pointers.

  • The Begin Pointer (bptr) is a pointer that points to the start of the string to be read.
  • Look Ahead Pointer(lptr) continues its hunt for the token's end.


Sample Example 

Example: For the statement int a,b; 

input buffering
  • Both points begin at the start of the string that is saved in the buffer.
compiler design
  • The Look Ahead Pointer examines the buffer until it finds the token.
buffer pairs in compiler design
  • Before the token ("int") can be identified, the character ("blank space") beyond the token ("int") must be checked.
buffering procedure
  • Both pointers will be set to the next token ('a') after processing the token ("int"), and this procedure will be continued throughout the program.
bptr vs Iptr

Two portions of a buffer can be separated. If you move the look Ahead cursor halfway through the first half, the second half will be filled with fresh characters to read. If you shift the look Ahead cursor to the right end of the second half's buffer, the first half will be filled with new characters, and so on.

Input Buffering

Advantages and Disadvantages of Input Buffering


  • It usually just does one test to determine if the forward pointer is pointing to an eof.
  • It only runs further tests until it reaches the halfway point of the buffer or eof.
  • The average number of tests per input character is extremely close to 1 since N input characters are encountered between eofs.


  • The majority of the time, this approach works effectively, however, the amount of lookahead is restricted.
  • In circumstances when the forward pointer must travel a distance greater than the buffer length, this restricted lookahead may make it difficult to detect tokens.
  • It must wait until the character that follows the right parenthesis decides whether the DECLARE is a keyword or an array name.

Frequently Asked Questions

What do you mean by input buffering?

The input area or input block are other names for the input buffer. The input buffer, when referring to computer memory, is a region that stores all incoming data before sending it to the CPU for processing. The process of storing all incoming data is called input buffering.

What is input buffering in compiler design?

The technique of input buffering when used in designing compilers is called input buffering in compiler design. It is a specialized buffering technique, which can decrease the amount of overhead, which is needed to process an input character in transferring characters.

What are the commonly used buffering methods?

The different types of buffering methods used in input buffering are the One Buffer Scheme and Two Buffer Scheme. 


In this article, we discussed input buffering in compiler design in detail. Hope it helps you understand the input buffering. input buffering in compiler design are important in Compiler design If you find this blog helpful, please upvote. 

Recommended Reading:

Refer to our Guided Path on CodeStudio to upskill yourself in Data Structures and AlgorithmsCompetitive ProgrammingJavaScriptSystem DesignMachine learning, and many more! If you want to test your competency in coding, you may check out the mock test series and participate in the contests hosted on CodeStudio! But if you have just started your learning process and are looking for questions asked by tech giants like Amazon, Microsoft, Uber, etc; you must look at the problemsinterview experiences, and interview bundle for placement preparations.

Nevertheless, you may consider our paid courses to give your career an edge over others!

Do upvote our blogs if you find them helpful and engaging!

Happy Learning!!

Was this article helpful ?
1 upvote