-
-
Save 18520339/a6843aa82b32f6517f5af67cdc985bde to your computer and use it in GitHub Desktop.
#include <iostream> | |
#include <math.h> | |
using namespace std; | |
bool is_prime(int n) { | |
if (n <= 1) return false; | |
if (n <= 3) return true; | |
if (n % 2 == 0 || n % 3 == 0) return false; | |
for (int i = 5; i * i <= n; i += 6) | |
if (n % i == 0 || n % (i + 2) == 0) return false; | |
return true; | |
} | |
bool is_fibo(int n) { | |
int n1 = n * n * 5 - 4; | |
int n2 = n * n * 5 + 4; | |
float sqrt1 = sqrt(n1); | |
float sqrt2 = sqrt(n2); | |
return (int)sqrt1 == sqrt1 || (int)sqrt2 == sqrt2; | |
} | |
int get_fibo(int n) { | |
double phi = (1 + sqrt(5)) / 2; | |
return round(pow(phi, n) / sqrt(5)); | |
} | |
// SAKAMOTO ALGORITHM to checks what day of the week it is | |
int day_of_week(int year, int month, int day) { | |
int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}; | |
year -= month < 3; | |
return (497 * year/400 + t[month - 1] + day) % 7; | |
} |
function getWebName(url) { | |
// http://example1.com/a/b?c=d => example1 | |
// http://www.example2.com/b?c=d => example2 | |
// https://ww.example3.com.vn => example3 | |
const hostnameParts = new URL(url).hostname.split('.'); | |
return hostnameParts[hostnameParts.length - 1].length === 2 | |
? hostnameParts[hostnameParts.length - 3] | |
: hostnameParts[hostnameParts.length - 2]; | |
} | |
// Check even and odd without `if else` | |
number = 3 | |
["even", "odd"][number % 2] | |
// Get intersection | |
const a = new Set([1,2,3]); | |
const b = new Set([4,3,2]); | |
const intersection = [...a].filter(x => b.has(x)) | |
console.log(intersection) // [2, 3] | |
function getCookieField(name) { | |
const cookie = document.cookie.split("; ").find(item => item.startsWith(`${name}=`)); | |
return cookie ? decodeURIComponent(cookie.split("=")[1]) : null; | |
} | |
(265 >>> 0).toString(2); | |
(_$=($,_=[]+[])=>$?_$($>>+!![],($&+!![])+_):_)(265); | |
/* | |
Đây ko phải là RegEx mà là hàm mũi tên (arrow function) với các tên hàm, tên biến và số (1) được thể hiện bằng các kí tự đặc biệt và sô 1 được thể hiện bằng biểu thức mảng như này +!![] | |
Đây là phiên bản dễ hiểu hơn một chút của đoạn mã: | |
(toBinary = (val, str = "") => val ? toBinary(val >> 1, (val & 1) + str):str)(265); | |
[]+[] chính là chuỗi trống "". | |
+!![] chính là số 1. | |
Dùng đệ quy để lấy từng bit và cộng dồn vào chuỗi str (ban đầu là trống ""). Điều kiện dừng là val bằng 0 (đoạn toán tử 2 ngôi chỗ val?... đấy). | |
Viết cho dễ nhìn và chú thích: | |
( | |
toBinary = (val, str = "") => // gán toBinary cho hàm mũi tên với 2 tham số val và str (mặc định là ""). | |
val ? // nếu val khác 0... | |
toBinary(val >> 1, (val & 1) + str) : // ... thì thực hiện đệ quy cho bit tiếp theo | |
str // ...ngược lại kết thúc đệ quy và trả về giá trị | |
)(265); // gọi trực tiếp hàm toBinary | |
*/ |
# Compare hyperparameter search results | |
def plot_param_performace(clf, param_name, title): | |
results = clf.search_cv.cv_results_ | |
plt.figure(figsize=(13, 5)) | |
plt.title(title) | |
plt.xlabel(param_name) | |
plt.ylabel("Score") | |
plt.grid() | |
ax = plt.gca() | |
ax.set_ylim(0.96, 1) | |
# Get the regular numpy array from the MaskedArray | |
X_axis = np.array(results[f'param_{param_name}'].data, dtype=float) | |
for scorer, color in zip(('test_score', 'train_score'), ('g', 'r')): | |
for sample, style in (('mean', '-'), ('std', '--')): | |
sample_score_mean = results[f'mean_{scorer}'] | |
sample_score_std = results[f'std_{scorer}'] | |
ax.fill_between(X_axis, sample_score_mean - sample_score_std, | |
sample_score_mean + sample_score_std, | |
alpha=0.1 if sample == 'mean' else 0, color=color) | |
ax.plot(X_axis, sample_score_mean, style, color=color, | |
alpha=1 if sample == 'mean' else 0.7, | |
label=f'{scorer} ({sample})') | |
plt.legend(loc="best") | |
plt.tight_layout() | |
plt.show() | |
plot_param_performace(rf_classifier, 'n_estimators', "Random Forest: Performance vs Number of Estimators") |
Experimental design
- Defining the independent and dependent variables in their experiment:
- In your clinical trial, you want to find out how the medicine affects recovery time. Therefore:
- Your independent variable is the medicine—the cause you want to investigate.
- Your dependent variable is recovery time—the effect you want to measure.
- In a more complex experiment, you might test the effect of different medicines on recovery time, or different doses of the same medicine. In each case, you manipulate your independent variable (medicine) to measure its effect on your dependent variable (recovery time).
- In your clinical trial, you want to find out how the medicine affects recovery time. Therefore:
- Formulate your hypothesis: H0 is that the medicine has no effect. Ha is that the medicine is effective.
- Assign test subjects to treatment and control groups:
- Experiments such as clinical trials and A/B tests are controlled experiments. A typical A/B test has at least 3 main features: Test design, Sampling, Hypothesis testing.
- In a controlled experiment, test subjects are assigned to a treatment group and a control group. The treatment is the new change being tested in the experiment. The treatment group is exposed to the treatment. The control group is not exposed to the treatment. The difference in metric values between the 2 groups measures the treatment’s effect on the test subjects.
- As a next step, you might conduct a 2-sample t-test to determine whether the observed difference in recovery time is statistically significant or due to chance.
Randomized controlled experiment
An A/B test is a basic version of what’s known as a randomized controlled experiment. This design allows researchers to control for other factors that might influence the test results and draw causal conclusions about the effect of the treatment.
- For example, imagine the subjects in your treatment group have a much healthier diet than the subjects in your control group. Any observed decrease in recovery time for the treatment group might be due to their healthier diet—and not to the medicine. In this case, you cannot say with confidence that the medicine alone is the cause of the faster recovery time.
Typically, data professionals randomly assign test subjects to treatment and control groups. Randomization helps control the effect of other factors on the outcome of an experiment:
- Completely randomized design: test subjects are assigned to treatment and control groups using a random process.
- For example, in a clinical trial, you might use a computer program to label each subject with a number => randomly select numbers for each group => maybe not effective due to nuisance factors.
- Randomized block design: minimize the impact of known this by dividing subjects into blocks => randomly assign the subjects within each block to treatment and control groups.
- For example, you know that people under the age of 35 tend to recover faster than older people => age is a nuisance factor => divide the test subjects into 21-35, 36-50, and 51-65 => randomly assign the subjects within each block to treatment and control groups => more confident that this result is due to the treatment (medicine) and not to the nuisance factor (age).
GAN
![]() |
![]() |
Discriminator wants the fake examples to seem as fake as possible, but the Generator wants fake examples to seem as real as possible. That is, it wants to fool the Discriminator. The Generator doesn't see the real images. It learns over time by using feedback from the Discriminator.
They need to learn from each other over time and both should be at similar skill levels. If one model significantly better than the other, it doesn’t help the other learn because the feedback isn’t useful. Imagine if you were a beginning artist, and you showed your work to an art expert, asking whether your painting looked like a famous piece and all they said was ‘no’. Because they have a very discerning eye, they know your image is not right, but won’t be able to tell you how close you are.
Activations
ReLU with the dying ReLU problem, and sigmoid and tanh with the vanishing ingredient in saturation problems
Sigmoid activation isn't used very often in hidden layers because the derivative of the function approaches 0 at the tails of this function => Vanishing gradient problems, or saturated outputs here at the tails of the function.
You can imagine that this function continues to go in both directions because it can take any real value as input. It is asymptotically approaching 1 at the top and asymptotically approaching 0 on the bottom => Vanishing gradient problems, because you have these saturated outputs here at the tails, and the values will always be ~1 or ~0.
Although Tanh has a similar shape to Sigmoid, its range of -1 to 1 preserves the sign of the input z, so negatives are still negative. That can be useful in some applications. Because it's shape is similar to the sigmoid however, the same saturation and vanishing gradient issue does occur. Again, the tails do extend on both sides of bridging 1 at the top and -1 at the bottom.
Batch normalization
- Batch normalization smooths the cost function and reduces the internal covariate shift.
- You use the batch mean and standard deviation during training and the running statistics (that was computed over the entire training set) for testing. The running values are fixed after training.
Problem with BCE Loss
Mode collapse
Mode collapse happens when the generator gets stuck in one mode. The discriminator will eventually learn to differentiate the generator's fakes when this happens and outskill it, ending the model's learning.
For example, a discriminator has learned to be good at identifying which handwritten digits are fakes, except for cases where the generated images look like 1 and 7. This could mean the discriminator is at of local minima of its cost function. It classifies most of the digits correctly, except for the ones that resembled those 1 and 7, then this information is passed on to the generator. The generator sees this and looks at the feedback from the discriminator and gets a good idea of how to fool the discriminator in the next round.
Vanishing gradient
![]() |
![]() |
GANs try to make 2 distributions look similar. When the discriminator improves too much, the function approximated by BCE Loss will contain flat regions = vanishing gradients.
At the beginning, there's some overlap between 2 distributions. However, as it gets better at training, the real distribution will be centered around 1 and the generated distribution will start to approach 0. As a result, when this discriminator is getting better, it'll start giving less informative feedback.
Decision tree
![]() |
![]() |
![]() |
![]() |
The entropy measures the impurity of a set of data
. It starts from 0, goes up to 1, and then comes back down to 0 as a function of the fraction of positive examples in your sample:
- We take
log 2
just to make the peak of this curve equal to 1, if we were to takelog e
or the base of natural logarithms, then that just vertically scales this function -> still work, but a bit hard to interpret. - If p1 or p0 = 0 ->
0*log(0)
, andlog(0)
is technically undefined (negative infinity). But by convention, we'll take0*log(0) = 0
. - Looks a little bit like logistic loss -> there is actually a mathematical rationale for why these 2 formulas look so similar.
- There are other functions that look like this, they go from 0 up to 1 and then back down such as the Gini criteria.
The information gain measures the reduction in entropy
that you get in your tree resulting from making a split. Why do we bother to compute the reduction in entropy rather than just entropy at the left and right sub-branches?:
- It turns out that one of the stopping criteria for deciding when not to bother to split any further is if the reduction in entropy is too small.
- In which case, you could decide you're just increasing the size of the tree unnecessarily and risking overfitting by splitting and just decide not to bother if the reduction in entropy is too small or below a threshold.
- For a continuous-valued feature (such as the weight of the animal), there are 10 animals in the dataset. The recommended way to find the best split for that feature is to choose the 9 mid-points between the 10 examples as possible splits and find the split that gives the highest information gain.
Tree ensembles
A single decision tree can be highly sensitive to small changes in the data -> change just 1 training example causes the algorithm to come up with a different split at the root -> a totally different tree that makes this algorithm just not that robust -> use a whole bunch of different trees and vote.
Sample the training data with replacement and select a random subset of features to build each tree so that they are not all identical to each other. if
Kmeans
Shouldn't use the
Choose
The
- Therefore, in practice, the K-means algorithm is usually run a few times with different random initializations.
- One way to choose between these different solutions from different random initializations is to choose the one with the lowest cost function value (distortion).
Anomaly detection
Collaborative Filtering
![]() |
![]() |
![]() |
![]() |
If you were to run the algorithm on this dataset, you actually end up with the parameters w = [0 0] & b = 0 for the user Eve. Because Eve hasn't rated any movies yet, w & b don't affect the first term in the cost function because none of Eve's movie ratings play a role in the squared error cost function -> Normalizing the rows so that you can give reasonable ratings.
Normalizing the columns would help if there was a brand-new movie that no one has rated yet. But if there's a brand new movie that no one has rated yet, you probably shouldn't show that movie to too many users initially because you don't know that much about that movie. So normalizing columns is less important than normalizing the rules to hope with the case of a new user that's hardly rated any movies yet.
Content-based filtering
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
The retrieval step tries to prune out a lot of items that are just not worth doing the more detailed influence and inner product on. And then the ranking step makes a more careful prediction for what are the items that the user is actually likely to enjoy. Retrieving more items results in better performance but slower recommendations.
To optimize the trade-off, carry out offline experiments to see if retrieving additional items results in more relevant recommendations (ex: |
![]() |
Reinforcement learning
One way to think of why reinforcement learning is so powerful is you have to tell it what to do rather than how to do it:
- For an autonomous helicopter, you could then train a neural network using supervised learning to directly learn the mapping from the states
$s$ (x) to action$a$ (y). - But it turns out that when the helicopter is moving through the air is actually very ambiguous. What is the exact right action to take? It's actually very difficult to get a dataset of x and the ideal action y -> For lots of tasks of controlling a robot like a helicopter & other robots, the supervised learning approach doesn't work well, and we instead use reinforcement learning.
- Specifying the reward function (make it impatient) rather than the optimal action gives you more flexibility in how to design the system.
![]() |
![]() |
Reinforcement learning is more finicky in terms of the choice of hyperparameters. For example, in supervised learning, if you set the learning rate a little bit too small, then maybe the algorithm may take 3 times longer to train, which is annoying but maybe not that bad. Whereas in Reinforcement learning, if you set the value of Epsilon or other parameters not good, it may take 10 times or 100 times longer to learn.
Bellman Equation
![]() |
![]() |
![]() |
![]() |
When the RF problem is stochastic, there isn't a sequence of rewards that you see for sure -> what we're interested in is not maximizing the return (because that's a random number) but maximizing the average value of the sum of discounted rewards. In cases where both the state and action space are discrete we can estimate the action-value function iteratively by using the Bellman equation:
This iterative method converges to the optimal action-value function
- However, in cases where the state space is continuous it becomes practically impossible to explore the entire state-action space. Consequently, this also makes it practically impossible to gradually estimate
$Q(s,a)$ until it converges to$Q^*(s,a)$ .
In the Deep
- Using neural networks in RF to estimate action-value functions has proven to be highly unstable -> Use a Target Network (soft update) and Experience Replay storing the agent's states, actions, rewards the agent receives in a memory buffer and then samples a random mini-batch to generate uncorrelated experiences for training agent.
- Towards the end of training, the agent will lean towards selecting the
action
that it believes (based on past experiences) will maximize$Q(s,a)$ -> We will set the minimum 𝜖 value = 0.01 (not 0) because we always want to keep a little bit of exploration during training.
Deep reinforcement learning
def print_number_of_trainable_model_parameters(model):
trainable_model_params = 0
all_model_params = 0
for _, param in model.named_parameters():
all_model_params += param.numel()
if param.requires_grad:
trainable_model_params += param.numel()
return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"
print(print_number_of_trainable_model_parameters(original_model))
Calculate
I know this is not practical way but I just want to strengthen my understanding by looking for a pure substitution method. Please correct me if I was doing something wrong 😅
Now, we used the fact that
Now, find
One-tailed test
Says, your test statistic is a z-score of 1.75 and your p-value is 0.04. In a left-tailed test, the p-value is the probability that the z-score < 1.75 standard units away from the mean to the left (z-score < -1.75). The probability of getting a value < your z-score of -1.75 is calculated by taking the area under the distribution curve to the left of the z-score.