Whistleblowing within AI companies, especially concerning sensitive information like model sizes and training methods, is a significant and serious action. It requires careful consideration of the legal, ethical, and personal implications involved. This guide aims to provide employees with comprehensive information on safely and anonymously whistleblowing or leaking confidential AI-related data while minimizing personal and professional risks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer | |
import torch | |
import matplotlib.pyplot as plt | |
import seaborn as sns | |
# Define model names for base and fine-tuned versions | |
base_model_name = "pszemraj/tFINE-900m-e16-d32-1024ctx" | |
ft_model_name = "BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e" | |
# Load the models with appropriate dtype |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Merges all text files in a directory into a single markdown file. | |
""" | |
import logging | |
import re | |
from datetime import datetime | |
from pathlib import Path | |
import fire |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In addition to a significant decrease in hepatic lipid accumulation in the IOE group, which inhibited energy intake by propionate enrichment, hepatic lipids were also significantly reduced in the mice in the IOP group, which was largely enriched with butyrate. Compared with the IOE group, IOP had a stronger regulatory effect on hepatic metabolism and triglyceride metabolism and higher levels of TCA cycle in the host. In addition, butyrate has the ability to promote browning of white adipose tissue (WAT) to brown adipose tissue (BAT).^[@ref39],[@ref40]^ WAT stores energy, whereas BAT uses energy for heating and consequently host energy expenditure increases.^[@ref41],[@ref42]^ However, adipose tissue weight does not change after WAT browning.^[@ref43]^ Therefore, the weight of adipose tissue of mice in the IOP group dominated by butyrate was greater than that of the mice in the IOE group dominated by propionate. | |
In conclusion ([Figure [7](#fig7){ref-type="fig"}](#fig7){ref-type="fig"}C), the improvement of ob |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import os | |
import fire | |
import torch | |
from datasets import load_dataset | |
from huggingface_hub import PyTorchModelHubMixin | |
from torch import nn | |
from transformers import AutoConfig, AutoModel, AutoTokenizer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# !pip install -q sentence-splitter | |
import os | |
from sentence_splitter import split_text_into_sentences | |
REFUSAL_TERMS = [ | |
"sorry", | |
"i can't", | |
"unfortunately,", | |
"as a language model", | |
"as an ai language model", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import re | |
def extract_comments_and_docs(multiline_string): | |
# Pattern to match lines where the first non-whitespace character is '#' | |
comment_pattern = r"^\s*#(.*)" | |
# Pattern to match any text within triple quotes (either ''' or """) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
import gzip | |
from datasets import load_dataset | |
from botocore import UNSIGNED | |
from botocore.config import Config | |
num_proc = 32 | |
s3 = boto3.client("s3", config=Config(signature_version=UNSIGNED)) | |
bucket_name = "softwareheritage" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import os | |
import fire | |
import torch | |
from datasets import load_dataset | |
from huggingface_hub import PyTorchModelHubMixin | |
from torch import nn | |
from transformers import AutoConfig, AutoModel, AutoTokenizer |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import base64 | |
import os | |
from pathlib import Path | |
import fire | |
from openai import OpenAI | |
from tqdm.auto import tqdm | |
from joblib import Memory | |
# Set up joblib caching |