Ritwik Raha ritwikraha

Pretraining

A Map for Studying Pre-training in LLMs

Data Collection
- General Text Data
- Specialized Data
Data Preprocessing
- Quality Filtering
Deduplication

Bahdanau Attention is often called Additive Attention because of the mathematical formulation used to compute the attention scores. In contrast to Dot-Product (Multiplicative) Attention, Bahdanau Attention relies on addition and a non-linear activation function.

Let's go through the math step-by-step:

Definitions

( h_i ): Hidden state of the encoder for the (i)-th time step in the source sequence.
( s_t ): Hidden state of the decoder for the (t)-th time step in the target sequence.
( W_1 ) and ( W_2 ): Weight matrices.
( b ): Bias term.
( v ): Context vector.

	<!DOCTYPE html>
	<html>
	<head>
	<title>Unit Circle and Sine Wave Animation</title>
	<style>
	body {
	font-family: Arial, sans-serif;
	margin: 20px;
	}
	.container {

	import gradio as gr
	import numpy as np
	import torch
	from PIL import Image


	'''
	TODOs:
	- Fetch the SAM model
	- Fetch the inpainting model

	def shape_list(tensor: Union[tf.Tensor, np.ndarray]) -> List[int]:
	"""
	Deal with dynamic shape in tensorflow cleanly.

	Args:
	tensor (`tf.Tensor` or `np.ndarray`): The tensor we want the shape of.

	Returns:
	`List[int]`: The shape of the tensor as a list.
	"""