Habib Mrad HabibMrad

The Data Scientist's Toolbox Quiz 1 (JHU) Coursera

github repo for rest of specialization: Data Science Coursera

Question 1

Which of the following are courses in the Data Science Specialization? Select all that apply:

Business Analytics
Python Programming

The Data Scientist's Toolbox Quiz 3 (JHU) Coursera

github repo for rest of specialization: Data Science Coursera

Question 1

We take a random sample of individuals in a population and identify whether they smoke and if they have cancer. We observe that there is a strong relationship between whether a person in the sample smoked or not and whether they have lung cancer. We claim that the smoking is related to lung cancer in the larger population. We explain we think that the reason for this relationship is because cigarette smoke contains known carcinogens such as arsenic and benzene, which make cells in the lungs become cancerous.

This is an example of a causal data analysis.

Plotting Data

Histgram

Histogram display a sample estimate of the density or mass function by plotting a bar graph of the frequency or proportion of times that a variable takes specific values, or a range of values for continuous data, within a sample

Pros and Cons

Histograms are useful and easy, apply to continuous, discrete and even unordered data
They use a lot of ink and space to display very little information
It's difficult to display several at the same time for comparisons Also, for this data it's probably preferable to consider log base 10, since the raw histogram simplay says that most islands are small

Stem and leaf plot

Machine Learning Week 3 Quiz 1 (Logistic Regression) Stanford Coursera

Github repo for the Course: Stanford Machine Learning (Coursera)
Quiz Needs to be viewed here at the repo (because the image solutions cant be viewed as part of a gist)

Question 1

Answer | Explanation

Get a list of distinct values for a column in a table

SELECT DISTINCT column FROM table;

Get the count of rows in a table

SELECT COUNT(*) FROM table;

Deep Learning Papers Reading Roadmap

If you are a newcomer to the Deep Learning area, the first question you may have is "Which paper should I start reading from?"

Here is a reading roadmap of Deep Learning papers!

The roadmap is constructed in accordance with the following four guidelines:

From outline to detail
From old to state-of-the-art

Data Manipulation with pandas

pandas is the world's most popular Python library, used for everything from data manipulation to data analysis. Learn how to manipulate DataFrames, as you extract, filter, and transform real-world datasets for analysis. Using real-world data, including Walmart sales figures and global temperature time series, you’ll learn how to import, clean, calculate statistics, and create visualizations—using pandas!

Lead by Maggie Matsui, Data Scientist at DataCamp

Transforming Data

Inspect DataFrames and perform fundamental manipulations, including sorting rows, subsetting, and adding new columns

data scientist with python

https://learn.datacamp.com/career-tracks/data-scientist-with-python

Introduction to Python
Intermediate Python
PROJECT. TV, Halftime Shows, and the Big Game Load, clean, and explore Super Bowl data in the age of soaring ad costs and flashy halftime shows.
Data Manipulation with pandas
PROJECT. The Android App Market on Google Play Load, clean, and visualize scraped Google Play Store data to understand the Android app market.

	"""
	Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
	BSD License
	"""
	import numpy as np

	# data I/O
	data = open('input.txt', 'r').read() # should be simple plain text file
	chars = list(set(data))
	data_size, vocab_size = len(data), len(chars)