Sayak Paul sayakpaul

TF-Hub text embedding modules for underrepresented languages

Mentors:

Morgan Roff
Sayak Paul
jaeyounkim

This is a summary of my GSoC 2021 project. In this project, I tried to produce text embedding modules trained on underrepresented languages like Arabic and Swahili and publish them on tfhub.dev.

To be posted in: https://forums.fast.ai/c/fastai-users/fastai-v2/

Title: Proposed workflow to compare & monitor models using WandbCallback

Content:

Hi,

I’ve been working on WandbCallback for the past few months (with a lot of help from @sgugger) and I'm very excited to show how it works!

Google Cloud configuration

First: install the CLI program for your distribution: https://cloud.google.com/sdk/install

Parameters

Modify accordingly:

export REGION='us-central1'
export ZONE='us-central1-f'
export PROJECT_NAME='proj'

DISCLAIMER

This gist is unofficial. It was created for personal use but have kept it public in case it would be of use to others. This document is not updated regularly and may not reflect the current status of the CUDA backend.

	import torch
	from diffusers import FluxPipeline
	from torch import nn


	class ModelOffloaderV2:
	def __init__(self, model: nn.Module, record_stream: bool = False):
	# move model to pinned memory. keep a model copy in CPU pinned memory.
	for p in model.parameters():
	p.data = p.data.cpu().pin_memory()

	# Copyright 2022 Google LLC.
	# SPDX-License-Identifier: Apache-2.0
	# Author: Maithra Raghu <maithra@google.com>


	def compute_distance_matrix(patch_size, num_patches, length):
	"""Helper function to compute distance matrix."""

	distance_matrix = np.zeros((num_patches, num_patches))

	# Copyright 2021 Google LLC.
	# SPDX-License-Identifier: Apache-2.0
	import kfp
	import json
	import time
	from google.cloud import bigquery
	from google.cloud.exceptions import NotFound
	from kfp.v2.google.client import AIPlatformClient

	client = bigquery.Client()

	import functools
	import numpy as np
	import tensorflow.compat.v1 as tf
	from tensorflow.python.tpu import tpu_function


	BATCH_NORM_DECAY = 0.9
	BATCH_NORM_EPSILON = 1e-5

	def get_classification_report(y_test, y_pred):
	'''Source: https://stackoverflow.com/questions/39662398/scikit-learn-output-metrics-classification-report-into-csv-tab-delimited-format'''
	from sklearn import metrics
	report = metrics.classification_report(y_test, y_pred, output_dict=True)
	df_classification_report = pd.DataFrame(report).transpose()
	df_classification_report = df_classification_report.sort_values(by=['f1-score'], ascending=False)
	return df_classification_report