# Weather Forecasting - Application of ML

## Introduction

Traditional weather forecasting uses complex physics models with many atmospheric conditions over a long period. However, these models are volatile due to the disturbance of the weather system, which leads to inaccurate predictions.

This article will explain how the weather is predicted using Machine Learning concepts. Machine Learning is a technique to prepare a model using a training dataset. A model is a formula to predict the output using weights and variables for each training variable.

So let's get started.

### Prerequisite

• Basic knowledge of pandas, numpy, matplotlib, seaborn, statsmodels, tqdm_notebook, product.
• Must be familiar with ARIMA, SARIMAX concept.

## Implementation

### Essential libraries

These are some essential libraries that will help in preparing the model.

### Dataset Exploration

I will use the GlobalLandTemperaturesByMajorCity dataset available on Kaggle.

This is the first important step in making the model.

Below is the dataset structure and first five entries of the dataset. Let us just check the total number of distinct cities in the dataset. Select any one city for forecasting purposes.

We can see that we have a total of 100 distinct cities in our dataset. ### Preprocessing, Advanced Visualization, Stationary Check

In this part, we will process the raw data, extract some helpful information, and modify it accordingly.

As we saw, there are 100 different cities in our dataset. You can select any one of them. In the below model, I will perform forecasting on Ahmadabad city.

We are storing all the data of Ahmadabad city in the ‘adi_data’ variable. Below is the output of the first five entries of the dataset and the total size of the dataset.  In the preprocessing part, we must check for the NaN values in the dataset. If there are a lot of NaN values in the dataset, we must replace every NaN value with some standard methods, or else we can drop the values.

We can see that a tiny portion of the dataset has NaN values, so we will drop all those NaN values. After removing the NaN values from the dataset, let us check the size of the remaining dataset.

There is a sufficient amount of data left to perform forecasting. Now reset the index of the dataset. If you don't use the index column, you can drop it too, as I found no use here, so I will drop the index column. Change the “dt” column with proper date-time format using the “to_datetime” function.

This is a modified version of our dataset. For easier use, let us make separate columns for year, month, weekday, and day.

Dataset looks like this after making specific changes. Let us look over the data types in which our data is stored. Now we will see scatter plot for average temperature values in the data set.

The below output shows that the scatter plot has many overlapped values, so it's challenging to analyze the dataset. Hence, will check if there are any patterns in the dataset by plotting the average temperature for every decade.

I am defining a time series function; it takes input as starting and ending year and gives the output a list of all the average temperatures and their time from starting to ending year.

Using the time series function and matplotlib we will plot the average temperature of four decades starting from 1896 and ending at 1936.

The below graph shows the average temperature plot for four decades. Each color represents a decade. From this graph, we can understand that it follows a particular pattern every year. While using ARIMA models, we must ensure that our time series data is stationary. To check whether the selected time-series data is stationary, we must plot autocorrelation and partial autocorrelation graphs. The above graph suggests that the time series data is not stationary. Nonetheless, performing the AD Fuller Test on the entire dataset tells us that the dataset is stationary. But it is true just because we are looking at the entire dataset. In fact, if we analyze a single decade, it is clear that the dataset is absolutely not stationary for the decade period of time. ### For preparing the model, I will consider the time series from 1992 to 2013.

Here’s is the plot of average temperature versus year. Let us split the dataset (from 1992 - 2013) into training and testing data.

Plotting the train and test data. Here comes the main algorithm called ARIMA model. ARIMA is also called as autoregressive integrated moving average this model is based on optimization procedure that adopts maximum likelihood function.

First we will perform operation for zero-differentiated ARIMA models. In the code below, d = 0 is declared for zero-differentiated ARIMA model.

The result for the zero-differentiated ARIMA model. Now lets perform the same operation for d = 1 indicates as first-differentiated ARIMA models.

The result for the first-differentiated ARIMA model. So let us combine the results from both the operations (zero and first differentiated models). Pick the best model from the combined results.

Till now, we have successfully created the model using ARIMA.

### Forecasting

Here comes the final part, now we will train the dataset using the prepared model.

Let us plot our results.

The below plot shows our forecast; the black line and dotted lines show the actual temperature, the red line shows the predicted temperature. ## FAQs

1. What is SARIMAX?
Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors is an updated version of the ARIMA model.

2. What is the general use of ARIMA?
It's a model used in statistics and econometrics to measure events that happen over a period of time.

3. What are time series models?
Time series models are used to forecast future events based on previous events that have been observed (and data collected) at regular time intervals. Time series analysis is a useful business forecasting technique.

### Key Takeaways

In this blog, we covered:

• Importance of Machine Learning in weather forecasting.
• Use of ARIMA and SERIMAX.
• Exploring, preprocessing, visualizing dataset.

Hello readers, here's a perfect course that will guide you to dive deep into Machine learning.

Happy Coding! 