- Machine Learning cheatsheet: https://stanford.edu/~shervine/teaching/cs-229.html
- Pattern Recognition and Machine Learning Book
-
-
Save felipemoraes/c423d1447ee13585e2270b27f174fb13 to your computer and use it in GitHub Desktop.
Difference between convex and non-convex cost function; what does it mean when a cost function is non-convex?
Convex: local min = global min efficient solvers strong theoretical guarantees Examples of ML algorithms:
- Linear regression/ Ridge regression, with Tikhonov regularisation
- Sparse linear regression with L1 regularisation, such as Lasso
- Support vector machines
- Parameter estimation in Linear-Gaussian time series (Kalman filter and friends)
Non-convex
- Multi local min
- Many solvers come from convex world
- Weak theoretical guarantees if any Examples of ML algorithms:
- Neural networks
- Maximum likelihood mixtures of Gaussians
https://en.wikipedia.org/wiki/Overfitting
Describe Decision Tree, SVM, Random Forest and Boosting. Talk about their advantage and disadvantages.
https://www2.isye.gatech.edu/~tzhao80/Lectures/Lecture_6.pdf
http://www.stat.cmu.edu/tr/tr759/tr759.pdf
- Linear regression: Linearity of residuals, Independence of residuals, Normal distribution of residuals, Equal variance of residuals. http://blog.uwgb.edu/bansalg/statistics-data-analytics/linear-regression/what-are-the-four-assumptions-of-linear-regression/
- Logistic regression: Dependent variable is binary, Observations are independent of each other, Little or no multicollinearity among the independent variables, Linearity of independent variables and log odds. https://www.statisticssolutions.com/assumptions-of-logistic-regression/
https://wiseodd.github.io/techblog/2017/01/01/mle-vs-map/
How does K-means work? What kind of distance metric would you choose? What if different features have different dynamic range?
- Explain why and pseudo-code: http://stanford.edu/~cpiech/cs221/handouts/kmeans.html
- Distance metrics: Euclidean distance, Manhatan distance, https://pdfs.semanticscholar.org/a630/316f9c98839098747007753a9bb6d05f752e.pdf
- Explain normalization for K-means and different results you can have: https://www.edupristine.com/blog/k-means-algorithm
What are generative and discriminative algorithms? What are their strengths and weaknesses? Which type of algorithms are usually used and why?”
https://cedar.buffalo.edu/~srihari/CSE574/Discriminative-Generative.pdf
Why scaling of the input is important? For which learning algorithms this is important? What is the problem with Min-Max scaling?
https://sebastianraschka.com/Articles/2014_about_feature_scaling.html
With macro-averaging of weights where PRE = (PRE1 + PRE2 + --- + PREk )/K https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html
Is random weight assignment better than assigning same weights to the units in the hidden layer?
Because of the symmetry problem, all the units will get the same values during the forward propagation. This also will bias you to a specific local minima. https://stackoverflow.com/questions/20027598/why-should-weights-of-neural-networks-be-initialized-to-random-numbers https://medium.com/usf-msds/deep-learning-best-practices-1-weight-initialization-14e5c0295b94
Gradient checking can help to find bugs in a backpropagation implementation, it is done by comparing the analytical gradient and the numerical gradient computed with calculus. https://stackoverflow.com/questions/47506521/what-exactly-is-gradient-checking http://cs231n.github.io/optimization-1/
The loss function depends on the type of problem: Regression: Mean squared error Binary classification: Binary cross entropy Multiclass: Cross entropy Ranking: Hinge loss
There is a neuron in the hidden layer that always has a large error found in backpropagation. What can be the reason?
It can be either the weight transfer from the input layer to the hidden layer for that neuron is to be blamed or the activation function for the neuron should be changed. https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
http://www.cs.toronto.edu/~kswersky/wp-content/uploads/svm_vs_lr.pdf
Posterior probability (P(y|x))
http://www.cs.cornell.edu/courses/cs678/2007sp/platt.pdf
The vectors that define the hyperplane (margin) of SVM.
You can use any evaluation metric such as Precision, Recall, AUC, F1.
http://www-hsc.usc.edu/~eckel/biostat2/notes/notes14.pdf
I replied that for 2 Gaussians, the prior or the mixture weight can be assumed to be a Bernouli distribution. http://www.aishack.in/tutorials/expectation-maximization-gaussian-mixture-model-mixtures/
N(0,2) https://en.wikipedia.org/wiki/Sum_of_normally_distributed_random_variables
http://www.robots.ox.ac.uk/~fwood/teaching/C19_hilary_2013_2014/gmm.pdf
When you have a time series data by monthly, it has large data records, how will you find out significant difference between this month and previous months values?
Many possible answers here, mine: you sample a N large enough to reduce uncertainty over the large data, then you compare with a statistical test. https://www.sas.upenn.edu/~fdiebold/Teaching104/Ch14_slides.pdf
When users are navigating through the Amazon website, they are performing several actions. What is the best way to model if their next action would be a purchase?
A sequential machine learning algorithm where you manage to keep the state of the user and predict his/her next action. Here many options are possible HMM, RNN, Bandits.
When you recommend a set of items in a horizontal manner there is a problem we call it position bias? How do you use click data without position bias?
You sample by position making them a uniform distribution.
If you can build a perfect (100% accuracy) classification model to predict some customer behaviour, what will be the problem in application?
All the problems that can happen with overfitting.
https://mattgadient.com/2013/02/03/9-marbles-and-a-weight-balance-which-is-the-heaviest-one/
Estimate the disease probability in one city given the probability is very low nationwide. Randomly asked 1000 person in this city, with all negative response (NO disease). What is the probability of disease in this city?
https://medium.com/acing-ai/interview-guide-to-probability-distributions-a6dfb08c3766
Given a bar plot and imagine you are pouring water from the top, how to qualify how much water can be kept in the bar chart?
https://www.geeksforgeeks.org/trapping-rain-water/
Find the cumulative sum of top 10 most profitable products of the last 6 month for customers in Seattle.
Solution: heap that keeps and updates the most profitable products.
https://www.geeksforgeeks.org/circular-queue-set-1-introduction-array-implementation/
Given a ‘csv’ file with ID and Quantity columns, 50 million records and size of data as 2 GBs, write a program in any language of your choice to aggregate the QUANTITY column.
Grep like solution, careful with overflow!
Given a function with inputs — an array with N randomly sorted numbers, and an int K, return output in an array with the K largest numbers.
https://www.geeksforgeeks.org/kth-smallestlargest-element-unsorted-array/
Given two strings, print all the inter-leavings of the Strings in which characters from two strings should be in same order as they were in original strings.
e.g. for "abc", "de", print all of these: adebc, abdec, adbce, deabc, dabce, etc, etc
https://gist.github.com/geraldyeo/6c4eaea8a1a6bcc480cac5328cbff664