Skip to content

Instantly share code, notes, and snippets.

@arshren
Created January 22, 2022 16:15
Show Gist options
  • Save arshren/33d71bce9a51a55216b438917f1cad7c to your computer and use it in GitHub Desktop.
Save arshren/33d71bce9a51a55216b438917f1cad7c to your computer and use it in GitHub Desktop.
---
title: "Titanic Data Analysis"
output: html_document
author: Authored by Renu
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Data Analysis of Titanic dataset using R Markdown
## Required Libraries for R Markdown and data wrangling and Visualization
**Following libraries needs to be installed and included for the following code to run**
__knitr and rmarkdown are required for rendering R code to html output file__
* ggplot2 is for data visualization
* Data wrangling
+ tidyverse
+ dplyr
* for markdown
+ rmarkdown
+ knitr
```{r message=FALSE}
library(knitr)
library(rmarkdown)
library(tidyverse)
library(ggplot2)
library(dplyr)
```
## Extract the Summary of Titanic which is the in-built dataset in RStudio
```{r }
summary(Titanic)
```
## Compactly display the internal structure of the Titanic dataset
**str() is an alternative to Summary()**
*Kable displays all the data in Titanic dataset*
```{r}
str(Titanic)
kable(Titanic)
```
## Converting Titanic to dataframe and the converting Survived and Class as Factors
```{r echo=FALSE}
Titanic= as.data.frame(Titanic)
Titanic$Survived <- as.factor(Titanic$Survived)
Titanic$Class <- as.factor(Titanic$Class)
```
## Create a data frame that is first grouped based on the Sex column and then is pre-counted on the Freq column
```{r}
gender_titanic <- Titanic %>% group_by(Sex) %>% summarise(sum_tot = sum(Freq))
kable(gender_titanic)
```
## Finding mean age of the passenges
```{r}
class_titanic <- Titanic %>% group_by(Class) %>% summarise(sum_class = sum(Freq))
class_titanic
```
**There were `r class_titanic$Class` classes in Titanic**
# Data Visualization for Titanic dataset
## Bar plot to summarize Males and Females on Titanic
```{r fig.cap="Summarization of how many Males and Females were on Tianic"}
ggplot(data = gender_titanic, mapping=aes(x=Sex, y=sum_tot)) + geom_col() + ggtitle('Titanic - Count by Gender')+ xlab('Gender')+ ylab('Count')
```
## Need to install for mac OS
[XQuartz installation required for Mac OS](https://www.xquartz.org)
## Visualize using scatter plot between class and Age of the Passengers
```{r}
ggplot(data=Titanic, mapping = aes(x = Class, y = Age)) +
geom_point(size = 1) +
geom_jitter(colour = "#1380A1") +
labs(title = "Survivors Age Distribution by Class on the Titanic",
x = "Ticket Class",
y = "Age(Yrs)")
```
## Passenges survival count by Class
```{r}
SummarySurvival <- Titanic %>% group_by(Class, Survived) %>% summarize(Total = sum(Freq))
SummarySurvival
```
## Summary of Passenger survival by class
```{r}
ggplot(SummarySurvival, aes(x = Survived, y = Total, fill = Class)) +
geom_col(position = position_dodge())+
geom_text(aes(label = Total), position = position_dodge(width = 0.9), vjust = 0) +
labs(y="Number of Passangers",
title = "Survival Rates by Class")
```
# END
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment