Created
January 22, 2022 16:15
-
-
Save arshren/33d71bce9a51a55216b438917f1cad7c to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Titanic Data Analysis" | |
output: html_document | |
author: Authored by Renu | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
``` | |
# Data Analysis of Titanic dataset using R Markdown | |
## Required Libraries for R Markdown and data wrangling and Visualization | |
**Following libraries needs to be installed and included for the following code to run** | |
__knitr and rmarkdown are required for rendering R code to html output file__ | |
* ggplot2 is for data visualization | |
* Data wrangling | |
+ tidyverse | |
+ dplyr | |
* for markdown | |
+ rmarkdown | |
+ knitr | |
```{r message=FALSE} | |
library(knitr) | |
library(rmarkdown) | |
library(tidyverse) | |
library(ggplot2) | |
library(dplyr) | |
``` | |
## Extract the Summary of Titanic which is the in-built dataset in RStudio | |
```{r } | |
summary(Titanic) | |
``` | |
## Compactly display the internal structure of the Titanic dataset | |
**str() is an alternative to Summary()** | |
*Kable displays all the data in Titanic dataset* | |
```{r} | |
str(Titanic) | |
kable(Titanic) | |
``` | |
## Converting Titanic to dataframe and the converting Survived and Class as Factors | |
```{r echo=FALSE} | |
Titanic= as.data.frame(Titanic) | |
Titanic$Survived <- as.factor(Titanic$Survived) | |
Titanic$Class <- as.factor(Titanic$Class) | |
``` | |
## Create a data frame that is first grouped based on the Sex column and then is pre-counted on the Freq column | |
```{r} | |
gender_titanic <- Titanic %>% group_by(Sex) %>% summarise(sum_tot = sum(Freq)) | |
kable(gender_titanic) | |
``` | |
## Finding mean age of the passenges | |
```{r} | |
class_titanic <- Titanic %>% group_by(Class) %>% summarise(sum_class = sum(Freq)) | |
class_titanic | |
``` | |
**There were `r class_titanic$Class` classes in Titanic** | |
# Data Visualization for Titanic dataset | |
## Bar plot to summarize Males and Females on Titanic | |
```{r fig.cap="Summarization of how many Males and Females were on Tianic"} | |
ggplot(data = gender_titanic, mapping=aes(x=Sex, y=sum_tot)) + geom_col() + ggtitle('Titanic - Count by Gender')+ xlab('Gender')+ ylab('Count') | |
``` | |
## Need to install for mac OS | |
[XQuartz installation required for Mac OS](https://www.xquartz.org) | |
## Visualize using scatter plot between class and Age of the Passengers | |
```{r} | |
ggplot(data=Titanic, mapping = aes(x = Class, y = Age)) + | |
geom_point(size = 1) + | |
geom_jitter(colour = "#1380A1") + | |
labs(title = "Survivors Age Distribution by Class on the Titanic", | |
x = "Ticket Class", | |
y = "Age(Yrs)") | |
``` | |
## Passenges survival count by Class | |
```{r} | |
SummarySurvival <- Titanic %>% group_by(Class, Survived) %>% summarize(Total = sum(Freq)) | |
SummarySurvival | |
``` | |
## Summary of Passenger survival by class | |
```{r} | |
ggplot(SummarySurvival, aes(x = Survived, y = Total, fill = Class)) + | |
geom_col(position = position_dodge())+ | |
geom_text(aes(label = Total), position = position_dodge(width = 0.9), vjust = 0) + | |
labs(y="Number of Passangers", | |
title = "Survival Rates by Class") | |
``` | |
# END |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment