PhDs Dataset Exploration and Dashboard Deployment

Description: Play with PhDs awarded in the US dataset. Do some analysis in pandas. Make a dashboard visualization of a few interesting aspects of the data using streamlit.

About Dataset

Data presented in Doctorate Recipients from U.S. Universities: 2017 were collected by the Survey of Earned Doctorates (SED). The Survey collects data on the number and characteristics of individuals receiving research doctoral degrees from U.S. academic institutions. There are 72 tables in total in this dataset. In this blog, we only explore few of them and play with dashboard tool – streamlit.

What is Streamlit?

If you are a non-tech person who is looking for a way to build your own interactive dashboard, you can consider Streamlit.

Streamlit is an open python package that helps you make deployable interactive web apps without any knowledge of HTML or CSS, etc. Python is all you need. The great thing about Streamlit is that it can automatically refresh your web apps whenever the source codes have their inputs changed.

There are numerous visualization libraries that Streamlit supports for building interactive dashboards, such as Plotly, Altair, Bokeh, etc., With these, telling a story through bar, pie, line charts, etc., has never been simpler.

Get started

In order to use Streamlit, you have to install it first by using the command:

Pip install streamlit

Now, if you want to explore the samples of Streamlit dashboards, let’s open command window and type

Streamlit hello

You can have your first glance at Streamlit visualizations by exploring different demos listed on the “Welcome to Streamlit” page, which is run at a local URL.

Dashboard building

Let’s begin to create the dash by first importing some required libraries:

import streamlit as st
import pandas as pd
import numpy as np
from plotly import tools
import plotly.offline as py
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import matplotlib.pyplot as plt

Preparing my dataset, here I used sed17-sr-tab005.xlsx:

df = pd.read_excel('data_tables/sed17-sr-tab005.xlsx')
df = df[2:]
df.columns = df.iloc[0]
df = df.drop(df.index[0])
df_ = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv')
df_code = df_[['code','state']]
result = pd.merge(df, df_code, how='inner', left_on='State or location', right_on='state')
result.head()

	State or location	Rank	Doctorate recipients	code	state
0	California	1	6105	CA	California
1	Texas	2	4068	TX	Texas
2	New York	3	4064	NY	New York
3	Massachusetts	4	2879	MA	Massachusetts
4	Pennsylvania	5	2628	PA	Pennsylvania

Now, the first step is to name your dashboard and give it a description. Below is a simple one:

st.title("**♟**PhDs Dataset Exploration**♟**")
st.write("Data presented in *Doctorate Recipients from U.S. Universities:* *2017* were collected by the Survey of Earned Doctorates (SED). The Survey collects data on the number and characteristics of individuals receiving research doctoral degrees from U.S. academic institutions. There are 72 tables in total in this dataset.")

Next, people might get curious about how my data frame looks, so I will set an option for everyone to select the availability of my data frame.

if st.checkbox('Show dataframe'):
	st.write(df)

Choropleth map

It is time to build the graphs to tell the story. Let’s do some visualizations.

fig = go.Figure(data=go.Choropleth(
    locations=result['code'], # Spatial coordinates
    z = result['Doctorate recipients'].astype(float), # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Reds',
    colorbar_title = "Number of Doctorate recipients",
))

fig.update_layout(
    title_text = 'State, ranked by number of doctorate recipients: 2017',
    geo_scope='usa', # limite map scope to USA
)

Here, I built a choropleth map to discribe the number of doctorate recipients from U.S. universities by states.

Second, I used sed17-sr-tab006.xlsx:

df_6 = pd.read_excel('data_tables/sed17-sr-tab006.xlsx')
df_6 = df_6.iloc[2:5,1:]
subjects = df_6.iloc[0].dropna().to_list()
sex = ['Male','Female']
df_male = df_6.iloc[2].to_list()[::2]
df_female = df_6.iloc[2].to_list()[1::2]
df_6 = pd.DataFrame({'Male': df_male,'Female': df_female},index=subjects)

Pie charts with Selectbox

if st.checkbox('Show Dataframe--Doctorates awarded by sex'):
  st.write(df_6)

selected_metrics_2 = st.selectbox(
  label="Choose...", options=subjects
)

fig = go.Figure()
for i in subjects:
  if selected_metrics_2 == i:
  	fig = go.Figure(data=[go.Pie(labels=sex, values=df_6.loc[i],title= i+' Doctorates awarded by sex')])

st.plotly_chart(fig, use_container_width=**True**)

Third, I used sed17-sr-tab002.xlsx:

df_2 = pd.read_excel('data_tables/sed17-sr-tab002.xlsx')
df_2 = df_2.iloc[4:,:]
df_2.columns = ['Year','Doctorate-granting institutions','Total','Mean (per institution)','Median (per institution)']

Year = list(df_2.Year)

df_2 = df_2.iloc[:,1:]

if st.checkbox('Show Dataframe--Number of Doctorate recipients by Institutions'):
  st.write(df_2)

Line chart with Multiselectbox

subset_data = df_2

country_name_input = st.multiselect('Choose one or more', df_2.columns)

if len(country_name_input) > 0:
  subset_data = df_2[country_name_input]

fig = go.Figure()

for i in subset_data.columns:
	fig.add_trace(go.Scatter(x=Year, y=df_2[i],mode='lines+markers',name=i))
	
st.plotly_chart(fig, use_container_width=True)

The 3 plots above are examples of how screamlit works. You can create as many charts and sections as you need. Now, to view these charts on Streamlit, let’s run a few more commands:

It is a must to have a “.py” file to run Streamlit. So save your script as a “.py” file
Open Command Window and direct to where the file is saved. Then, run the command “streamlit run “yourfilename.py”. A pop-up tab displaying your dashboard will appear automatically

In my case, I saved my file as “hw6.py” and stored it on the desktop. So, here is how I run it:

streamlit run hw6.py

Here are just some basics of how I explored Streamlit and simple visualizations for PhDs dataset. Now it’s your turn to play with these resources and create incredible dashboards.

Menu

Content

Contact me

Follow me