AaradhyaSaxena/1. Metrics.md

Last active April 29, 2025 16:01

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/AaradhyaSaxena/269511995f0a3be65946c1ad30ed2b7c.js"></script>
Save AaradhyaSaxena/269511995f0a3be65946c1ad30ed2b7c to your computer and use it in GitHub Desktop.

Download ZIP

Machine Learning

Raw

1. Metrics.md

Precision and Recall

The terrorist detection task is an imbalanced classification problem: we have two classes we need to identify—terrorists and not terrorists—with one category (non-terrorists) representing the overwhelming majority of the data points. Another imbalanced classification problem occurs in disease detection when the rate of the disease in the public is very low. In both these cases, the positive class—disease or terrorist—greatly outnumbers the negative class. These types of problems are examples of the fairly common case in data science when accuracy is not a good measure for assessing model performance.

Recall

Intuitively, we know that proclaiming all data points as negative (not a terrorist) in the terrorist detection problem isn’t helpful, and, instead, we should focus on identifying the positive cases. The metric our intuition tells us we should maximize is known in statistics as recall, or the ability of a model to find all the relevant cases within a data set.

Precision

While recall expresses the ability to find all relevant instances of a class in a data set, precision expresses the proportion of the data points our model says existed in the relevant class that were indeed relevant.

ROC Curve

One of the main visualization technique for showing the performance of a classification model is the Receiver Operating Characteristic (ROC) curve. The idea is simple: the ROC curve shows how the recall vs. precision relationship changes as we vary the threshold for identifying a positive data point in our model. The threshold represents the value above which we consider a data point in the positive class.

AUC

We can quantify a model’s ROC curve by calculating the total Area Under the Curve (AUC), a metric that falls between zero and one with a higher number indicating better classification performance. In the graph above, the AUC for the blue curve will be greater than that for the red curve, meaning the blue model is better at achieving a blend of precision and recall. A random classifier (the black line) achieves an AUC of 0.5.

Raw

2.Recommendation-Engine-Metric.md

Metric for Evaluating Recommendation Engines

Cumulative Gain (CG)

If every recommendation has a graded relevance score associated with it, CG is the sum of graded relevance values of all results in a search result list.

The Cumulative Gain at a particular rank position p, where the rel_i is the graded relevance of the result at position i. The problem with CG is that it does not take into consideration the rank of the result set when determining the usefulness of a result set.

Discounted Cumulative Gain

To overcome this we introduce DCG. DCG penalizes highly relevant documents that appear lower in the search by reducing the graded relevance value logarithimically proportional to the position of the result.

Eg:

setA: [3, 1, 2, 3, 2, 0] CG setA: 11
setB: [3, 3, 2, 2, 1, 0] CG setB: 11
DCG setA: 13.306224081788834
DCG setB: 14.595390756454924

Normalized Discounted Cumulative Gain

An issue arises with DCG when we want to compare the search engines performance from one query to the next because search results list can vary in length depending on the query that has been provided.

Hence, by normalizing the cumulative gain at each position for a chosen value of p across queries we arrive at NDCG. We perform this by sorting all the relevant documents in the corpus by their relative relevance producing the max possible DCG through position p (a.k.a Ideal Discounted Cumulative Gain)

The ratios will always be in the range of [0, 1] with 1 being a perfect score — meaning that the DCG is the same as the IDCG. Therefore, the NDCG values can be averaged for all queries to obtain a measure of the average performance of a recommender systems ranking algorithm.

Limitations of NDCG

The NDCG does not penalize for bad documents in the results.
Does not penalize missing documents in the results.
May not be suitable to measure performance of queiries that may often have several equally good results.

Raw

3.LambdaRank.md

LambdaRank and LambdaMART

What problem are they solving?

In ranking, your goal is not just to predict relevance scores — it's to get the order right.
Metrics like NDCG are non-differentiable (they involve sorting and discrete jumps), so you can't directly optimize them with gradient descent.
LambdaRank and LambdaMART were invented to indirectly optimize ranking metrics like NDCG, using clever tricks.

What are LambdaRank and LambdaMART?

Both LambdaRank and LambdaMART are algorithms designed to directly optimize ranking metrics like NDCG (instead of using simple regression or classification loss).
They are based on the idea of learning to rank by pairwise comparisons, but smartly adjusting how much you care about different ranking mistakes.

LambdaRank

Problem it solves:

Standard ranking models (like RankNet) used pairwise loss but did not care about the ranking metric directly.
LambdaRank adjusts the gradient dynamically based on the change in NDCG if you swap two items.

High-Level Idea of LambdaRank:

For each pair of documents/items for a query:
Compute the swap effect: if we swap these two items, how much would the NDCG change?

Use that swap impact to scale the gradient:

If swapping the items would hurt NDCG a lot, you apply a larger gradient.
If swapping would not affect NDCG much, you apply a smaller gradient.

Thus: the model focuses more on mistakes that matter more to the actual ranking quality (NDCG).

LambdaMART

LambdaRank is usually trained with a neural network.
LambdaMART replaces the neural net with gradient boosted decision trees (GBDT).

Raw

5.Link-Prediction.md

Link Prediction Graph

Link Prediction: The link prediction task is used in the offline experiments because it is a fundamental problem in networks. Given a network with some edges removed, the link prediction task is to predict the occurrence of links. 1/3 of the edges are randomly chosen and removed as ground truth in the test set, and the remaining graph is taken as the training set. The same number of node pairs in the test data with no edges connecting them are randomly chosen as negative samples in the test set. To evaluate the performance of link prediction, the Area Under Curve (AUC) score is adopted as the performance metric.

Triadic Closure : If two vertices are connected to the same third vertices, the tendency for them to share a connection is Triadic Closure. Eg: Original graph: ([(1, 2), (1, 3), (1, 4), (3, 4), (4, 5)]) The possible new edges according to Triadic closure are : [(2, 3), (2, 4), (3, 2), (4, 2), (1, 5), (3, 5), (5, 1), (5, 3)]

Jaccard Coefficient : It is calculated by number of common neighbors normalized by total number of neighbors. It is used to measure the similarity between two finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets. Eg: Original graph: ([(1, 2), (1, 3), (1, 4), (3, 4), (4, 5)]) The possible new edges, with their probability: [(1, 5, 0.33), (2, 3, 0.5), (2, 4, 0.33), (2, 5, 0.0), (3, 5, 0.5)]

Resource Allocation Index : Among a number of similarity-based methods to predict missing links in a complex network, Research Allocation Index performs well with lower time complexity. It is defined as a fraction of a resource that a node can send to another through their common neighbors. Research Allocation Index(X, Y) = Eg: Original graph: ([(1, 2), (1, 3), (1, 4), (3, 4), (4, 5)]) The possible new edges, with their probability: [(1, 5, 0.33), (2, 3, 0.33), (2, 4, 0.33), (2, 5, 0), (3, 5, 0.33)]

Other algorithms: Adamic Adar Index, Preferential Attachment, Community Common Neighbor, Community Resource Allocation