Data Visualization Using Bokeh

Mayank Goyal
Last Updated: May 13, 2022

Introduction

We all know results matter, but they will have a minor impact if not interacted properly. Statistical analysis is meaningless if it cannot be communicated with the help of graphs, charts, plots. People like to visualize numbers, and they enjoy seeing results; that is why data visualization became an integral part of statistical analysis. That is where Bokeh comes into the picture,

Bokeh is an open-source python library for creating interactive visualizations that help us build beautiful charts and plots ranging from simple to complex ones. Unlike other data visualization libraries in python like Matplotlib and seaborn, Bokeh renders its plots using HTML and javascript.

Features

  •  Flexibility

            Flexible for applying different styling techniques and layouts for         

            visualization even for complex ones.

 

  •  Interactivity

            It is one of the essential features as it creates non-static plots, thus providing                   

            users to interact with the data.

 

  • Sharable

            Bokeh visualization can be embedded into the flask and Django app.

 

  • Productivity

 It can interact with other Python tools such as Pandas and Jupyter Notebooks.

 

Basics Of Bokeh

The most crucial aspect of Bokeh is, it provides a simple, intuitive interface for those who do not wish to be distracted by intricate details of its working. At the same time, Bokeh provides access to those people who want to control more sophisticated features of Bokeh.
 

Thus, Bokeh provides two interfaces which we can use,

  • The primary interface for Data Scientists, i.e., Bokeh. plotting
  •  The low-level Bokeh. models interface for application developers.

 

Bokeh. plotting

It is a primary or high-level interface that focuses on relating data to glyphs. Glyphs are the basic building block of bokeh plots which draw vectorized graphics to represent data. It includes elements such as lines, rectangles, squares, wedges, or the circles of a scatter plot. It provides functionalities to customize our visualization.

 

The core of Bokeh. Plots are the figure() function, which includes methods to add different varieties of glyphs to a plot. 

 

bokeh. models                                                   

It is a low-level interface where a user controls how Bokeh creates all elements of our visualization; it provides excellent flexibility to application developers.

Basic Steps In Bokeh

The most basic steps for visualization with Bokeh’s Bokeh.plotting are:

 

Preparing Data

Data is necessary for visualization. There are various ways to provide Bokeh data like python list, NumPy arrays, providing data to ColumnData Source, and so on.

 

Calling figure() function

figure() helps us create a plot with default options; we can customize our properties for better visualization.

 

Adding Renders

 We can use different types of glyphs to represent data. Basic glyphs used are scattered markers, line glyphs, bars and rectangles, and many more. line() is used to create a line. Renders have plenty of options that help us specify visual attributes such as color, legends, widths.

 

show() or save()

This function helps us save the plot as an HTML file or display it in our browser.

 

Implementations

Let us see some of the implementations of its basic steps:

Line Glyphs:
 

 Code:  

from bokeh.plotting import figure, output_notebook, show

 

# output to notebook

output_notebook()

 

x = [12345]
y = [10,11,12,14,15]

# creating a new plot
p = figure(title="EXAMPLE OF LINE GLYPHS", x_axis_label="x", y_axis_label="y")

# adding line renderer
p.line(x, y, legend_label="Temp.", line_width=5)

# show the results
show(p)

 

Output:

 

Scatter Plots:

 Code:

from bokeh.plotting import figure, output_notebook, show
  
# output to notebook
output_notebook()
  
# create figure
p = figure(plot_width = 400, plot_height = 400)
  
# adding circle renderer with size, color and alpha
p.circle([1234], [4763], size = 10, color = "BLACK", alpha = 0.5)
  
# show the results
show(p)

 

Output

 

Bars And Rectangles

 

Code

 from bokeh.plotting import figure, output_notebook, show
  
# output to notebook
output_notebook()
  

p = figure(width=400, height=400)
p.vbar(x=[147], width=1.5, bottom=0,
      top=[1.22.53.7], color="blue")

show(p)

 

 

Output

Ellipses:

 

Code

from math import pi

from bokeh.plotting import figure, output_file, show
p = figure(width=600, height=600)
p.ellipse(x=[123], y=[123], width=[0.20.30.1], height=0.3,
          angle=pi/2, color="#CAB2D6")

show(p)

 

Output

Stacked Areas

 

Use varea() for vertical alignment for directed stacked areas and vice-versa for harea().

Code:

from bokeh.models import ColumnDataSource
from bokeh.plotting import figure, output_notebook, show

output_notebook()

input_data = ColumnDataSource(data=dict(
    x=[14678],
    y1=[13578],
    y2=[14223],
))
p = figure(width=500, height=500)

p.varea_stack(['y1''y2'], x='x', color=("grey""black"), source=input_data)

show(p)

 

Output:

 

 

Wedge And Arc Graphs

The wedge() and arc() glyph have similar properties, but in the case of the wedge(), it renders a filled wedge. 

 

Code for arc()

from bokeh.plotting import figure,output_notebook show

p = figure(width=400, height=400)
p.arc(x=[123], y=[246], radius=0.1, start_angle=0.4, end_angle=4.8, color="black")

show(p)

 

Output

 

Code for Wedge:

p = figure(width=400, height=400)
p.wedge(x=[123], y=[123], radius=0.5, start_angle=0., end_angle=1,
        color="black", alpha=1, direction="clock")

show(p)

 

 

Output 

 

Histograms

We will have a detailed explanation of visualizing data with the help of a histogram, as it shows the distribution of the data. It is the most commonly used plotting technique. Histograms give a more detailed look at how each variable is dependent on the other one.

So we will use the most famous dataset,i.e., the titanic dataset, to visualize with the help of histograms. We will plot the number of passengers in different Fare ranges. So let us get started.

 

Step 1:

Import all the required libraries

from bokeh.plotting import figure, output_notebook, show
import pandas as pd
import numpy as
from math import pi

 

Step 2:

Read the dataset

df=pd.read_csv(r"C:\Users\goyal\Desktop\ml\jupyter\titanic.csv")

 

Step 3:

Prepare the dataset

df['Fare'].describe()

 

Output:

count    417.000000
mean      35.627188
std       55.907576
min        0.000000
25%        7.895800
50%       14.454200
75%       31.500000
max      512.329200
Name: Fare, dtype: float64

To create a histogram, we use a quad glyph in which we have to specify the top, bottom, left, and right. The left and right are the x-extremum coordinates. The x coordinate is divided into groups in intervals called bins, and the height of each bin is the count of data points in that bin.

So, to create data for the histogram, we will use the numpy histogram function. The above output shows that the 75% quantile is at 31.5 $, so we can consider fare over 36 as an outlier.

Bins will be 4$ in width, so the number of bins will be the length of the bin upon the size of the bin, which is 9 in this case. Range is from [0,36].

arr_hist, edges=np.histogram(df['Fare'], bins=int(36/4),range=[0,36])

 

##Converting into dataframe
price = pd.DataFrame({'Fare': arr_hist, 'left': edges[:-1], 'right': edges[1:]})

 

Therefore our final input data will look like this:

 

The Fare column counts the number of different passengers in the interval from left to right. From here, we can make a Bokeh figure with a quad glyph specifying the appropriate parameters:

output_notebook()


p = figure(plot_height = 600, plot_width = 600,  title = 'Price Distribution',x_axis_label = 'Price]', y_axis_label = 'Number of Passangers')


p.quad(bottom=0, top=price['Fare'], 
      left=price['left'], right=price['right'], 
      fill_color='black', line_color='yellow')



show(p)

 

 

Output

 

Those are some of the basic glyphs used for visualization in Bokeh. We can do better visualizations with custom attributes like themes, using hover tools, changing the font, color, and many more attributes.

 

Frequently Asked Questions

1. Is bokeh better than matplotlib?
Ans. While matplotlib is a low-level visualization library, Bokeh is high and level. Therefore, Bokeh can create many sophisticated plots with fewer code lines and a higher resolution.      

2. How to get the sample data?
Ans. Due to the size of sample data, these are not present in the Bokeh GitHub repository or released packages, but we can download them using the following syntax :

 import bokeh.sampledata
 bokeh.sampledata.download()

3. Does Bokeh use D3.js?
Ans. No, the purpose of D3 is to provide a javascript-based scripting layer for the   DOM, which is not the current purpose of Bokeh.

4. Why did we start writing a new plotting library?
Ans. The main reason is maximizing flexibility for exploring new design spaces to achieve long-term visualization goals.

Key Takeaways

So that is the end of the article. Let us brief the article:

Firstly we saw the basic features of Bokeh and how Bokeh increases interactivity.

This article taught us why Bokeh has the upper hand over data visualization libraries. The basic steps involved in plotting different glyphs and how we can add renders to achieve better communication. Lastly, we saw that some of the basic implementations of some glyphs and histograms are important for better understanding.

Thus Bokeh is most impactful when we want to extend our vision beyond static figures.

 

Bokeh is an excellent tool for users who want to explore glyphs in-depth, but for users who want simple visualization, matplotlib is better.

 

Do not worry if we do not get Bokeh at first; we have a perfect tutor to help us out. 

 

Happy Learning Ninjas!

Was this article helpful ?
0 upvotes

Comments

No comments yet

Be the first to share what you think