Save 18520339/a6843aa82b32f6517f5af67cdc985bde to your computer and use it in GitHub Desktop.
#include <iostream> | |
#include <math.h> | |
using namespace std; | |
bool is_prime(int n) { | |
if (n <= 1) return false; | |
if (n <= 3) return true; | |
if (n % 2 == 0 || n % 3 == 0) return false; | |
for (int i = 5; i * i <= n; i += 6) | |
if (n % i == 0 || n % (i + 2) == 0) return false; | |
return true; | |
} | |
bool is_fibo(int n) { | |
int n1 = n * n * 5 - 4; | |
int n2 = n * n * 5 + 4; | |
float sqrt1 = sqrt(n1); | |
float sqrt2 = sqrt(n2); | |
return (int)sqrt1 == sqrt1 || (int)sqrt2 == sqrt2; | |
} | |
int get_fibo(int n) { | |
double phi = (1 + sqrt(5)) / 2; | |
return round(pow(phi, n) / sqrt(5)); | |
} | |
// SAKAMOTO ALGORITHM to checks what day of the week it is | |
int day_of_week(int year, int month, int day) { | |
int t[] = {0, 3, 2, 5, 0, 3, 5, 1, 4, 6, 2, 4}; | |
year -= month < 3; | |
return (497 * year/400 + t[month - 1] + day) % 7; | |
} |
function getWebName(url) { | |
// http://example1.com/a/b?c=d => example1 | |
// http://www.example2.com/b?c=d => example2 | |
// https://ww.example3.com.vn => example3 | |
const hostnameParts = new URL(url).hostname.split('.'); | |
return hostnameParts[hostnameParts.length - 1].length === 2 | |
? hostnameParts[hostnameParts.length - 3] | |
: hostnameParts[hostnameParts.length - 2]; | |
} | |
// Check even and odd without `if else` | |
number = 3 | |
["even", "odd"][number % 2] | |
// Get intersection | |
const a = new Set([1,2,3]); | |
const b = new Set([4,3,2]); | |
const intersection = [...a].filter(x => b.has(x)) | |
console.log(intersection) // [2, 3] | |
function getCookieField(name) { | |
const cookie = document.cookie.split("; ").find(item => item.startsWith(`${name}=`)); | |
return cookie ? decodeURIComponent(cookie.split("=")[1]) : null; | |
} | |
(265 >>> 0).toString(2); | |
(_$=($,_=[]+[])=>$?_$($>>+!![],($&+!![])+_):_)(265); | |
/* | |
Đây ko phải là RegEx mà là hàm mũi tên (arrow function) với các tên hàm, tên biến và số (1) được thể hiện bằng các kí tự đặc biệt và sô 1 được thể hiện bằng biểu thức mảng như này +!![] | |
Đây là phiên bản dễ hiểu hơn một chút của đoạn mã: | |
(toBinary = (val, str = "") => val ? toBinary(val >> 1, (val & 1) + str):str)(265); | |
[]+[] chính là chuỗi trống "". | |
+!![] chính là số 1. | |
Dùng đệ quy để lấy từng bit và cộng dồn vào chuỗi str (ban đầu là trống ""). Điều kiện dừng là val bằng 0 (đoạn toán tử 2 ngôi chỗ val?... đấy). | |
Viết cho dễ nhìn và chú thích: | |
( | |
toBinary = (val, str = "") => // gán toBinary cho hàm mũi tên với 2 tham số val và str (mặc định là ""). | |
val ? // nếu val khác 0... | |
toBinary(val >> 1, (val & 1) + str) : // ... thì thực hiện đệ quy cho bit tiếp theo | |
str // ...ngược lại kết thúc đệ quy và trả về giá trị | |
)(265); // gọi trực tiếp hàm toBinary | |
*/ |
# Compare hyperparameter search results | |
def plot_param_performace(clf, param_name, title): | |
results = clf.search_cv.cv_results_ | |
plt.figure(figsize=(13, 5)) | |
plt.title(title) | |
plt.xlabel(param_name) | |
plt.ylabel("Score") | |
plt.grid() | |
ax = plt.gca() | |
ax.set_ylim(0.96, 1) | |
# Get the regular numpy array from the MaskedArray | |
X_axis = np.array(results[f'param_{param_name}'].data, dtype=float) | |
for scorer, color in zip(('test_score', 'train_score'), ('g', 'r')): | |
for sample, style in (('mean', '-'), ('std', '--')): | |
sample_score_mean = results[f'mean_{scorer}'] | |
sample_score_std = results[f'std_{scorer}'] | |
ax.fill_between(X_axis, sample_score_mean - sample_score_std, | |
sample_score_mean + sample_score_std, | |
alpha=0.1 if sample == 'mean' else 0, color=color) | |
ax.plot(X_axis, sample_score_mean, style, color=color, | |
alpha=1 if sample == 'mean' else 0.7, | |
label=f'{scorer} ({sample})') | |
plt.legend(loc="best") | |
plt.tight_layout() | |
plt.show() | |
plot_param_performace(rf_classifier, 'n_estimators', "Random Forest: Performance vs Number of Estimators") |
Reinforcement learning
One way to think of why reinforcement learning is so powerful is you have to tell it what to do rather than how to do it:
- For an autonomous helicopter, you could then train a neural network using supervised learning to directly learn the mapping from the states
$s$ (x) to action$a$ (y). - But it turns out that when the helicopter is moving through the air is actually very ambiguous. What is the exact right action to take? It's actually very difficult to get a dataset of x and the ideal action y -> For lots of tasks of controlling a robot like a helicopter & other robots, the supervised learning approach doesn't work well, and we instead use reinforcement learning.
- Specifying the reward function (make it impatient) rather than the optimal action gives you more flexibility in how to design the system.
![]() |
![]() |
Reinforcement learning is more finicky in terms of the choice of hyperparameters. For example, in supervised learning, if you set the learning rate a little bit too small, then maybe the algorithm may take 3 times longer to train, which is annoying but maybe not that bad. Whereas in Reinforcement learning, if you set the value of Epsilon or other parameters not good, it may take 10 times or 100 times longer to learn.
Bellman Equation
![]() |
![]() |
![]() |
![]() |
When the RF problem is stochastic, there isn't a sequence of rewards that you see for sure -> what we're interested in is not maximizing the return (because that's a random number) but maximizing the average value of the sum of discounted rewards. In cases where both the state and action space are discrete we can estimate the action-value function iteratively by using the Bellman equation:
This iterative method converges to the optimal action-value function
- However, in cases where the state space is continuous it becomes practically impossible to explore the entire state-action space. Consequently, this also makes it practically impossible to gradually estimate
$Q(s,a)$ until it converges to$Q^*(s,a)$ .
In the Deep
- Using neural networks in RF to estimate action-value functions has proven to be highly unstable -> Use a Target Network (soft update) and Experience Replay storing the agent's states, actions, rewards the agent receives in a memory buffer and then samples a random mini-batch to generate uncorrelated experiences for training agent.
- Towards the end of training, the agent will lean towards selecting the
that it believes (based on past experiences) will maximize$Q(s,a)$ -> We will set the minimum 𝜖 value = 0.01 (not 0) because we always want to keep a little bit of exploration during training.
Deep reinforcement learning
def print_number_of_trainable_model_parameters(model):
trainable_model_params = 0
all_model_params = 0
for _, param in model.named_parameters():
all_model_params += param.numel()
if param.requires_grad:
trainable_model_params += param.numel()
return f"trainable model parameters: {trainable_model_params}\nall model parameters: {all_model_params}\npercentage of trainable model parameters: {100 * trainable_model_params / all_model_params:.2f}%"
I know this is not practical way but I just want to strengthen my understanding by looking for a pure substitution method. Please correct me if I was doing something wrong 😅
Now, we used the fact that
Now, find
Content-based filtering
The retrieval step tries to prune out a lot of items that are just not worth doing the more detailed influence and inner product on. And then the ranking step makes a more careful prediction for what are the items that the user is actually likely to enjoy. Retrieving more items results in better performance but slower recommendations.