Yuhua Cai

Yuhua Cai https://yc392.github.io/ Recent content on Yuhua Cai Hugo -- gohugo.io en-us © Yuhua Cai Duke 2021 Mon, 16 Nov 2020 22:28:00 +0200 About Me https://yc392.github.io/about-me/ Thu, 09 Jul 2020 16:57:33 -0400 https://yc392.github.io/about-me/ This is Yuhua Cai. Education Duke University Master of Engineering in Electrical and Computer Engineering Data Analyst and Machine Learning Engineer Hobby Basketball Contact Me Email: yuhua.cai@duke.edu Build My Personal Blog with Hugo and Github Pages https://yc392.github.io/build-my-personal-blog-with-hugo-and-github-pages/ Sun, 23 Aug 2020 20:14:38 +0530 https://yc392.github.io/build-my-personal-blog-with-hugo-and-github-pages/ Introduction Hugo is one of the most popular open-source static generator and is written in GO. It is simple and easy to deploy. And Github Pages is a static web hosting service provided by Github which provides convenient deployment directly through Github repository. In this brief post I’ll walk you through how I created this website in 5 simple steps using Hugo and GitHub Pages. Step 1: Install Hugo and get a GitHub account As I’m a mac user, I’ll provide instructions here for installing Hugo on macOS. Three Euler Project Problems https://yc392.github.io/euler/ Tue, 01 Sep 2020 22:28:00 +0200 https://yc392.github.io/euler/ Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. In this post, three problems solved from the Euler Project using Python. Every function was annotated with nuympy-styple docstrings. 1001st prime Problem 7 (420680 Solved) By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10, 001st prime number? Malaria Data Visualization https://yc392.github.io/malaria-data-visualization/ Tue, 08 Sep 2020 20:14:38 +0530 https://yc392.github.io/malaria-data-visualization/ Three informative visualizations about malaria are created using Python, starting with the data sets at https://github.com/rfordatascience/tidytuesday/tree/master/data/2018/2018-11-13. Malaria Dataset Malaria Dataset includes 3 informative datasets. 3 Datasets malaria_inc.csv - Malaria incidence by country for all ages across the world across time. malaria_deaths.csv - Malaria deaths by country for all ages across the world and time. malaria_deaths_age.csv - Malaria deaths by age across the world and time. SQLite3 schema for Spotify dataset in 3NF https://yc392.github.io/spotify/ Tue, 06 Oct 2020 22:28:00 +0200 https://yc392.github.io/spotify/ Description: Download the Spotify songs data set. Create a SQLite3 schema to store this data in at least 3rd normal form (3NF), and populate the tables. Use an SQL query to find the names of all playlists that contain instrumentals. Spotify Songs Dataset First, let’s download and see the dataset we gonna use. It is a dataset of Spotify songs. df = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv')` df track_id track_name track_artist track_popularity track_album_id track_album_name track_album_release_date playlist_name playlist_id playlist_genre … key loudness mode speechiness acousticness instrumentalness liveness valence tempo duration_ms 0 6f807x0ima9a1j3VPbc7VN I Don’t Care (with Justin Bieber) - Loud Luxur… Ed Sheeran 66 2oCs0DGTsRO98Gh5ZSl2Cx I Don’t Care (with Justin Bieber) [Loud Luxury… 2019-06-14 Pop Remix 37i9dQZF1DXcZDD7cfEKhW pop … 6 -2. Data Request and Extract From Star Wars ApI https://yc392.github.io/starwar/ Wed, 21 Oct 2020 22:28:00 +0200 https://yc392.github.io/starwar/ Description: Using the requests library, download all the people in the Star Wars universe using the Star Wars API (https://swapi.dev/documentation). Show the name of the oldest person (or robot or alien) and list the titles of all the films they appeared in. The Requests Package The requests module allows us to send HTTP requests using Python. The HTTP request returns a Response Object with all the response data (content, encoding, status, etc). Insects Images Classification with Deep Learning https://yc392.github.io/deep-learning/ Mon, 16 Nov 2020 22:28:00 +0200 https://yc392.github.io/deep-learning/ Description: Train a deep learning model to classify beetles, cockroaches and dragonflies using these images. Note: Original images from https://www.insectimages.org/index.cfm. Insect Image Dataset The original dataset is found at the Insect Image websites. There are total 1199 images of beetles, cockroaches and dragonflies. In this project, we will train a neural network model to classfy these three different types of insects. So first let’s take a look at these insects. Mortality Prediction Models using MIMIC-III https://yc392.github.io/mimic-iii/ Mon, 16 Nov 2020 22:28:00 +0200 https://yc392.github.io/mimic-iii/ Description: This is the final project for Duke BIOS-823-2020. Brief Introduction An intensive care unit (ICU) provides intensive treatment medicine for patients with severe and life-threatening illness and injuries, or those directly transferred from emergency department. ICUs have higher sta-to-patient ratio then normal wards to provide intensive care and comprehensive monitoring to severe patients, and hence generate a massive amount of electronic healthcare records (EHR) which are useful to predict patients’ disease status and the amount of healthcare needed. PhDs Dataset Exploration and Dashboard Deployment https://yc392.github.io/phds/ Wed, 04 Nov 2020 22:28:00 +0200 https://yc392.github.io/phds/ Description: Play with PhDs awarded in the US dataset. Do some analysis in pandas. Make a dashboard visualization of a few interesting aspects of the data using streamlit. About Dataset Data presented in Doctorate Recipients from U.S. Universities: 2017 were collected by the Survey of Earned Doctorates (SED). The Survey collects data on the number and characteristics of individuals receiving research doctoral degrees from U.S. academic institutions. There are 72 tables in total in this dataset. Support Vector Machine https://yc392.github.io/support-vector-machine/ Tue, 15 Sep 2020 16:57:33 -0400 https://yc392.github.io/support-vector-machine/ LINEAR SVM Due Date : 9/28 Monday 10:15 PM EST import numpy as np import matplotlib.pyplot as plt import scipy.io as io import libsvm from libsvm.svmutil import * import pandas as pd %matplotlib inline 3.1 Linear Support Vector Machine on toy data 3.1.1 Generate a training set of size $100$ with 2D features (X) drawn at random as follows: X_{neg} $\sim$ $\mathcal{N}$($[-5, -5]$, 5*$I_2$) and correspond to negative labels (-1) X_{pos} $\sim$ $\mathcal{N}$($[5, 5]$, 5*$I_2$) and correspond to positive labels (+1) Accordingly, $X = [X_{neg}, X_{pos}]$ is a $100\times2$ array, Y is a $100\times1$ array of values $\in {-1, 1}$. CART4.5, ID3 and Random Forest https://yc392.github.io/cart4.5-id3-and-random-forest/ Sun, 13 Sep 2020 16:57:33 -0400 https://yc392.github.io/cart4.5-id3-and-random-forest/ Instructions If there is a conflict bewteen the problem description in the ipython notebook and the question in the pdf, follow the question in the pdf file. The part you need to fill in is commented as “Code Clip”. You can search “Code Clip” in this notebook to find the part you need to complete. After you finish the required part, you may need to run other related code blocks for evaluation or visualization. Adaboost https://yc392.github.io/adaboost/ Mon, 14 Sep 2020 22:28:00 +0200 https://yc392.github.io/adaboost/ Boosting a decision stump The goal of this notebook is to implement your own boosting module. Go through an implementation of decision trees. Implement Adaboost ensembling. Use your implementation of Adaboost to train a boosted decision stump ensemble. Evaluate the effect of boosting (adding more decision stumps) on performance of the model. Explore the robustness of Adaboost to overfitting. This file is adapted from course material by Carlos Guestrin and Emily Fox.