Skip to content

Instantly share code, notes, and snippets.

View alexcpn's full-sized avatar

Alex Punnen alexcpn

View GitHub Profile
@alexcpn
alexcpn / gpt2-training-output.txt
Last active April 20, 2023 12:57
Taking GPT2 for a spin
2023-04-18 20:32:36,678 [INFO] Training data ./data/small_3.txt
2023-04-18 20:32:36,679 [INFO] length of dataset in words: 22,420
2023-04-18 20:32:36,713 [INFO] encoding.input_ids.shape torch.Size([1, 4742])
2023-04-18 20:32:36,713 [INFO] encoding.attention_mask.shape torch.Size([1, 4742])
2023-04-18 20:32:36,713 [INFO] length of dataset in tokens = 4742
2023-04-18 20:32:57,546 [INFO] Over-fit check answer: Formation of Granulation Tissue
2023-04-18 20:32:57,546 [INFO] len_train_data=4742 block_size =256 batch_size= 4
2023-04-18 20:32:57,547 [INFO] Epoch 1 of 50
2023-04-18 20:33:04,405 [INFO] Epoch 0 complete. Loss: 5.974085330963135 saving ./test-gpt2-4/gpt2-epoch-1-2023-04-18 20:32:35.343858
2023-04-18 20:33:06,065 [INFO] Over-fit check answer: Formation of Granulation Tissueation of granulation tissue of granulation tissue of granulation tissue of granulation tissue of granulation tissuea,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,,1P
@alexcpn
alexcpn / GPT2-output.md
Created April 18, 2023 05:46
Huggingface GPT2 output generation based on parameters

Processing Message from input() Question: New York

Generated `

New York City. New Yorkers live within walking distance of the capital, and over 90% are located at or near high-speed Internet access points (h/t to WIRED). NYC is a global cultural center with an important influence on commerce; it constitutes one major city in terms
[a]century's worth [of news content]. With its rich media culture coupled by vibrant online communities that foster collaboration among writers from aroundthe world—from emerging markets like China through Latin America into Europe via Asia —NYC has become perhaps most influential place for new creative expression.[1][2], where innovative ideas can be disseminated quickly across disparate audiences without compromising quality control as well,[3],[4](http://www:washingtonpost.-times/.wp.] NYX provides opportunities both inside your home town hall meeting room full time but also outside when you're not there because many people don't have internet connections yet! It offers unp
@alexcpn
alexcpn / line_segment_intersction.md
Created March 23, 2023 14:48
Explanation of line segment intersection for two points
from transformers import T5Tokenizer, T5ForConditionalGeneration
import numpy as np
import torch
class FlaxDataCollatorForT5MLM:
"""
From https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py
"""
def __init__(self,tokenizer,noise_density,mean_noise_span_length) -> None:
self.tokenizer = tokenizer
from transformers import T5Tokenizer
import numpy as np
class FlaxDataCollatorForT5MLM:
"""
From https://github.com/huggingface/transformers/blob/main/examples/flax/language-modeling/run_t5_mlm_flax.py
"""
def __init__(self,tokenizer,noise_density,mean_noise_span_length) -> None:
self.tokenizer = tokenizer
self.noise_density = noise_density
loki:
auth_enabled: false
commonConfig:
path_prefix: /var/loki
replication_factor: 1
compactor:
apply_retention_interval: 1h
compaction_interval: 5m
retention_delete_worker_count: 500
retention_enabled: true
@alexcpn
alexcpn / bert.py
Created January 27, 2023 12:58
Sentence classification with transformer model Bert
'''
Adapted and extended from
https://github.com/huggingface/transformers/issues/1950#issuecomment-558679189
'''
import pandas as pd
from transformers import BertTokenizer, BertModel
from sklearn.metrics.pairwise import cosine_similarity
import torch
@alexcpn
alexcpn / wrongly_classified.py
Created October 20, 2022 13:32
List out wrongly and irightly classified classes
#---------------------------------------------------------------------------------------------
# Populate the Confusion Matrix
#---------------------------------------------------------------------------------------------
for key,val in wrong_per_class.items(): # Key is category and val is a list of wrong classes
summed_wrong_classes =Counter(val).most_common()
print(f"**To Predict {categories[key]}")
for ele in summed_wrong_classes:
print(f" --Predicted {categories[ele[0]]} count={ele[1]}")
confusion_matrix[key][ele[0]]=ele[1]
@alexcpn
alexcpn / bind_to_released_pv.md
Created August 2, 2022 11:26
Kuberenetes How to Get the Data back for a StorageClass with reclaimPolicy as Retain

How to Get the Data back for a StorageClass with reclaimPolicy as Retain

Testing Using Rook-Ceph Storage Class with External Ceph

(Should work with any storage class)

Step 1: Create a Storage Class with reclaimPolicy as Retain