Every second, there is a huge amount of data generated that may be structured and maybe unstructured as well. Before moving to this let’s have a look at what exactly this huge data is.
What is Big Data?
Big Data is also data but is huge in volume and size. Big Data is a term given to a collection of
huge data that is growing exponentially with time. In actual, these data are so huge and complex
that there is no such tool till now that can process and can store these data efficiently.
Examples of Big Data:
- Social Media: Approx 500 terabyte of data is generated per day in social media Databases like Facebook, Instagram etc. Mainly these data is being generated in terms of photos, videos, comments and more.
- New York Stock Exchange: Approx one terabyte of new data is generated per day in the New York Stock Exchange.
- Jet Engine: Approx 10 terabyte of data is generated in less than 30 minutes. Thousand of flights sum up to Petabytes of data per day.
Types Of Big Data:
Basically there are three forms of Big Data:
Structured Data is data that can be stored, accessed and processed easily. There is a consistent, fixed ordering design for such a form. There are well-defined columns and databases also for such data.
For Example: We may consider an employee table as structured data.
This means a database can be considered as structured data.
Unstructured Data: If the form and structure of data are unknown then such data are called unstructured data. We may consider an example of mixed data like file audio-video texts as unstructured data. There is an organisation where you have data but you don’t how to access it efficiently as it is unstructured data. For example, Audio, video, images etc.
Semi-Structured Data: This data can be seen with the form but let say not in the form of tables. Few properties of structured data are inherited but RDBMS is not followed in semi-structured data. For example: XML file or Comma-separated file.
The Five V’s of Big Data:
Initially, Big data was defined using 3V’s only but now it is defined using 5V’s.
The 5V’s are:
- Volume refers to the huge amount of data that is generated every second. As the
the name indicates it indicates the volume of data.
- Source of data can be different like customer logs, financial transactions, social media, different devices etc.
- Hadoop, a distributed system, is widely used to store and process such amounts of data. It was quite challenging in initial times.
- As the size of data plays an important role, it can only be called big data if the size of data is huge. Thus, it is very necessary to consider the characteristic volume.
- Actually volume can be considered as the base which almost doubles in every 40 months.
We may have a file of a few kilobytes and may have a video of megabytes too. Initially, a few years earlier mobile traffic was not so high which is expected to be 40000 Exabytes by 2020.
- It is the speed at which the data is created or accumulated basically.
- The speed of data creation will impact the speed of data processing which is ultimately going to impact the client using this.
- Data flows basically from sources like machines, networks, social media, mobile phones etc.
- Data flows continuously which ultimately defines the potential of data which is the only reason to invest time and energy in it.
- Sampling can help in the velocity of data.
There are almost 3.5 billion searches on Google per day. Facebook, Instagram users are increasing at a rapid rate.
- It is the nature of data that may be structured unstructured or may be semistructured.
- It refers to the different sources of data and these sources have changed with the passage of time.
- It is very crucial to store different kinds of data.
- Data may be present in the form of pdfs files photos videos etc.
- Degree of trustworthiness of data is the veracity of data.
- Basically, data can be inconsistent or uncertain as data which is available can sometimes get messy and quality is like something that can be difficult to control.
- Data that we get mostly is unstructured, so it becomes very important to filter out unnecessary information and use the rest of processing with the data.
- Big data is variable because of dimensions resulting from multiple data types and sources.
The bulk of data may create confusion while a small amount of data may convey the complete or maybe partial information.
- After the Four V’s there is the last V of the Big Data.
- This V for value is at the top of the Pyramid. If there is a bulk of data with no value then it is no good for the company, it must be turned into some useful information.
- Data in itself is of no useful form, it must be converted into something valuable to extract some information.
- What data scientists do, they first convert the raw data into information which is first cleaned to retrieve the most useful information from it.
Importance of Big Data
Big Data is more of a revolution in the field of Information Technology. Its use is enhancing every
year. It has a high variety of volume value and velocity. With the help of Big data, we can perform
multiple operations in a single platform. Big Data helps organisations to work with the data efficiently and to use them to have new opportunities as well. It is helpful because it helps in.
- Cost Reduction: When it comes to storing large amounts of data, technologies such as Hadoop helps to store it and thus reduce the cost.
- Better and Faster Decision Making: With the advancement of technologies like Hadoop, ability to analyse new sources of data has increased and thus decision making has improved a lot.
- New Products and Services: With the ability to get to know about the customer’s needs and their satisfaction through analytics, customers can get what they want.
Real-life Benefits of Big data
With the growth of Big data, there has been enormous growth in other industries as well.
In the banking sector, it can be seen especially that tools like Apache hive are used to query to
get the result in a very short span of time. Big data is a revolution in the education field as well, as there is a new option of research and development. Big Data is so useful in knowing the customer’s needs in advance. Job opportunities have been increased with Big data with titles like Big Data Analyst, Big Data Engineer, Business Intelligence Consultant, Solution Architect etc.
With the advancement of Big Data, there is a revolution in different fields which include Education, Banking etc and it leads to intense competition and demand of Data professionals
has increased. It can be considered as the driving force behind sectors include marketing, sales, analytics and research. Every characteristic of Big Data is equally important in Big Data evolution. I hope this blog helped you understand about Big data and 5 V’s of Big data properly.
To read more about Data Structures, click here.
By Deepak Jain