Skip to content

Instantly share code, notes, and snippets.

@KuRRe8
Last active June 22, 2025 09:09
Show Gist options
  • Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
Save KuRRe8/36f63d23ef205a8e02b7b7ec009cc4e8 to your computer and use it in GitHub Desktop.
和Python使用有关的一些教程,按类别分为不同文件

Python教程

Python是一个新手友好的语言,并且现在机器学习社区深度依赖于Python,C++, Cuda C, R等语言,使得Python的热度稳居第一。本Gist提供Python相关的一些教程,可以直接在Jupyter Notebook中运行。

  1. 语言级教程,一般不涉及初级主题;
  2. 标准库教程,最常见的标准库基本用法;
  3. 第三方库教程,主要是常见的库如numpy,pytorch诸如此类,只涉及基本用法,不考虑新特性

其他内容就不往这个Gist里放了,注意Gist依旧由git进行版本控制,所以可以git clone 到本地,或者直接Google Colab\ Kaggle打开相应的ipynb文件

直接在网页浏览时,由于没有文件列表,可以按Ctrl + F来检索相应的目录,或者点击下面的超链接。

想要参与贡献的直接在评论区留言,有什么问题的也在评论区说 ^.^

目录-语言部分

目录-库部分

目录-具体业务库部分-本教程更多关注机器学习深度学习内容

目录-附录

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Python 描述符

描述符是实现了__get__ __set__ __delete__其中任一方法的类型对象,根据是否有__set__区别是否为数据描述符。

class DataDesc:
    def __get__(self, obj, objtype=None):
        return 'data'
    def __set__(self, obj, value):
        pass

class NonDataDesc:
    def __get__(self, obj, objtype=None):
        return 'nondata'

class C:
    a = DataDesc()
    b = NonDataDesc()
    c = 42
    d = NonDataDesc()

obj = C()
obj.a = 'inst_a'
obj.b = 'inst_b'
obj.c = 'inst_c'
# d 没有实例属性,直接访问

print(obj.a)  # 'data'(数据描述符优先)
print(obj.b)  # 'inst_b'(实例属性优先)
print(obj.c)  # 'inst_c'(实例属性优先)
print(obj.d)  # 'nondata'(非数据描述符被触发)

由此,我们有以下总结:

  1. 描述符必须是type类型的成员,也就是class定义的内部成员(或者通过type(,,)动态定义的类型)
  2. 通过类名可访问描述符C.a,也可以通过实例对象访问obj.a,而实例对象访问时候会有以下查找顺序:
    1. 无论obj.__dict__中是否有a,优先查找C.__dict__中的数据描述符。
    2. 如果没有1那么回退到访问obj.dict__该名称a的普通对象(当obj只有__slot__时候转而查找__slot)
    3. 如果没有2则查找C.__dict__的非数据描述符
    4. 如果没有3则返回C.__dict__中名为a的普通对象
    5. 如果没有4则报错
  3. 注意上述数据中的第三条,平时在class中使用def定义的都是function类型的实例,其实现了__get__方法,所以在obj调用实例方法obj.foo时候(此时obj.__dict__自然没有覆盖obj.foo的项),会访问function的描述符协议,导致其访问内容变为bound method而不是function。这也是为什么直接通过类名和实例名访问函数时候的行为不一样。 特别的,内建的staticmethod和classmethod都是实现了__get__的装饰器,所以访问这些方法时候都会按描述符协议去访问。
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 实验跟踪与可视化教程 (TensorBoard, Weights & Biases, MLflow)\n",
"\n",
"欢迎来到机器学习实验跟踪与可视化教程!在进行机器学习项目时,有效地跟踪实验过程、记录关键指标和可视化结果对于保证可复现性、比较不同尝试的效果以及深入理解模型行为至关重要。\n",
"\n",
"本教程将分别介绍三个广泛使用的工具,它们各有侧重,但都能帮助你更好地管理和理解你的机器学习实验:\n",
"\n",
"1. **TensorBoard**: Google 开发的可视化工具包,擅长实时监控训练指标、可视化模型图和数据,通常在本地运行。\n",
"2. **Weights & Biases (WandB)**: 一个流行的云平台(对个人和学术免费),提供实验跟踪、高级可视化、协作和模型管理功能。\n",
"3. **MLflow (Tracking 组件)**: 一个开源的端到端 MLOps 平台,其 Tracking 组件专注于记录和查询实验参数、指标、代码和模型,可本地或远程部署。\n",
"\n",
"我们将通过训练一个简单的 CNN 模型对 FashionMNIST 数据集进行分类的示例,分别展示如何集成和使用这三个工具。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 准备工作:安装必要的库\n",
"\n",
"请确保在运行相应部分之前已安装所需的库。\n",
"\n",
"```bash\n",
"# 通用依赖\n",
"pip install torch torchvision scikit-learn pandas matplotlib numpy\n",
"\n",
"# TensorBoard (如果尚未随 PyTorch/TensorFlow 安装)\n",
"pip install tensorboard\n",
"\n",
"# Weights & Biases\n",
"pip install wandb\n",
"\n",
"# MLflow\n",
"pip install mlflow\n",
"```\n",
"\n",
"**重要提示**: \n",
"* **WandB**: 需要注册免费账号并在首次使用时 `wandb login`。\n",
"* **MLflow**: 默认在本地 `./mlruns` 目录记录,可通过 `mlflow ui` 查看。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 使用 TensorBoard 进行可视化\n",
"\n",
"TensorBoard 通过读取事件文件来可视化训练过程。PyTorch 提供了 `SummaryWriter` 来方便地生成这些文件。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- TensorBoard: 导入与设置 ---\n",
"print(\"--- Setting up for TensorBoard ---\")\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"from torch.utils.data import DataLoader\n",
"from torch.utils.tensorboard import SummaryWriter\n",
"import numpy as np\n",
"import os\n",
"import time\n",
"\n",
"device_tb = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"print(f\"TensorBoard section using device: {device_tb}\")\n",
"\n",
"tb_config = {\n",
" \"learning_rate\": 0.001,\n",
" \"epochs\": 2,\n",
" \"batch_size\": 64,\n",
" \"optimizer\": \"Adam\",\n",
"}\n",
"\n",
"# --- TensorBoard: 数据准备 --- \n",
"tb_dataset_loaded = False\n",
"try:\n",
" print(\"TensorBoard: Preparing FashionMNIST dataset...\")\n",
" tb_transform = transforms.Compose([\n",
" transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))\n",
" ])\n",
" tb_trainset = torchvision.datasets.FashionMNIST(root='./data_tb', train=True, download=True, transform=tb_transform)\n",
" tb_testset = torchvision.datasets.FashionMNIST(root='./data_tb', train=False, download=True, transform=tb_transform)\n",
" tb_trainloader = DataLoader(tb_trainset, batch_size=tb_config['batch_size'], shuffle=True, num_workers=0)\n",
" tb_testloader = DataLoader(tb_testset, batch_size=tb_config['batch_size']*2, shuffle=False, num_workers=0)\n",
" tb_classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', \n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')\n",
" print(\"TensorBoard: Dataset prepared.\")\n",
" tb_dataset_loaded = True\n",
"except Exception as e:\n",
" print(f\"TensorBoard: Error loading dataset: {e}\")\n",
"\n",
"# --- TensorBoard: 模型定义 --- \n",
"class SimpleCNN_TB(nn.Module):\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.network = nn.Sequential(\n",
" nn.Conv2d(1, 16, kernel_size=3, padding=1),\n",
" nn.ReLU(),\n",
" nn.MaxPool2d(2, 2),\n",
" nn.Conv2d(16, 32, kernel_size=3, padding=1),\n",
" nn.ReLU(),\n",
" nn.MaxPool2d(2, 2),\n",
" nn.Flatten(),\n",
" nn.Linear(32 * 7 * 7, 128),\n",
" nn.ReLU(),\n",
" nn.Linear(128, 10)\n",
" )\n",
" def forward(self, x): return self.network(x)\n",
"print(\"TensorBoard: SimpleCNN_TB model defined.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- TensorBoard: 训练循环与日志记录 ---\n",
"def train_with_tensorboard(cfg, train_loader, test_loader, classes):\n",
" print(\"\\n--- Running Training Loop for TensorBoard --- \")\n",
" if not tb_dataset_loaded: return None\n",
" \n",
" model = SimpleCNN_TB().to(device_tb)\n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.Adam(model.parameters(), lr=cfg['learning_rate'])\n",
" \n",
" run_name = f\"TB_LR={cfg['learning_rate']}_{int(time.time())}\"\n",
" tb_log_dir = os.path.join(\"runs_tb\", run_name)\n",
" writer = SummaryWriter(log_dir=tb_log_dir)\n",
" print(f\"TB Train: Logging to {writer.log_dir}\")\n",
" \n",
" global_step = 0\n",
" for epoch in range(cfg['epochs']):\n",
" model.train()\n",
" running_loss = 0.0\n",
" for i, data in enumerate(train_loader, 0):\n",
" inputs, labels = data[0].to(device_tb), data[1].to(device_tb)\n",
" optimizer.zero_grad()\n",
" outputs = model(inputs)\n",
" loss = criterion(outputs, labels)\n",
" loss.backward()\n",
" optimizer.step()\n",
" running_loss += loss.item()\n",
" global_step += 1\n",
" \n",
" if i % 400 == 399:\n",
" batch_loss = running_loss / 400\n",
" writer.add_scalar('Loss/train_step_tb', batch_loss, global_step)\n",
" running_loss = 0.0\n",
" \n",
" # Epoch evaluation & logging\n",
" model.eval()\n",
" correct, total, test_loss = 0, 0, 0.0\n",
" sample_images = None\n",
" with torch.no_grad():\n",
" for batch_idx, data in enumerate(test_loader):\n",
" images, labels = data[0].to(device_tb), data[1].to(device_tb)\n",
" if batch_idx == 0: sample_images = images.cpu()\n",
" outputs = model(images)\n",
" loss = criterion(outputs, labels)\n",
" test_loss += loss.item()\n",
" _, predicted = torch.max(outputs.data, 1)\n",
" total += labels.size(0)\n",
" correct += (predicted == labels).sum().item()\n",
" \n",
" epoch_accuracy = 100 * correct / total\n",
" epoch_test_loss = test_loss / len(test_loader)\n",
" print(f'TB Run - Epoch {epoch + 1} Test Acc: {epoch_accuracy:.2f}%, Test Loss: {epoch_test_loss:.4f}')\n",
" \n",
" writer.add_scalar('Accuracy/test_tb', epoch_accuracy, epoch)\n",
" writer.add_scalar('Loss/test_tb', epoch_test_loss, epoch)\n",
" if epoch == cfg['epochs'] - 1:\n",
" if sample_images is not None:\n",
" img_grid = torchvision.utils.make_grid(sample_images[:16], nrow=4)\n",
" writer.add_image('Test_Samples_tb', img_grid, epoch)\n",
" for name, param in model.named_parameters():\n",
" if param.requires_grad:\n",
" writer.add_histogram(f\"Weights_tb/{name.replace('.', '/')}\", param.cpu().data.numpy(), epoch)\n",
" if param.grad is not None: \n",
" writer.add_histogram(f\"Gradients_tb/{name.replace('.', '/')}\", param.cpu().grad.numpy(), epoch)\n",
" \n",
" writer.close()\n",
" print(\"TB Train: TensorBoard training finished.\")\n",
" return model\n",
"\n",
"# --- Run Training --- \n",
"if tb_dataset_loaded:\n",
" model_tb = train_with_tensorboard(tb_config, tb_trainloader, tb_testloader, tb_classes)\n",
"else:\n",
" print(\"Skipping TensorBoard training run due to dataset loading error.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### A.4 查看 TensorBoard UI\n",
"\n",
"1. 打开终端。\n",
"2. 导航到包含 `runs_tb` 的目录。\n",
"3. 运行 `tensorboard --logdir runs_tb`。\n",
"4. 在浏览器中打开 `http://localhost:6006/`。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 使用 Weights & Biases (WandB) 进行跟踪"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### B.1 WandB 简介与设置\n",
"WandB 提供云端服务来跟踪实验,需要注册并登录。它通过 `wandb.init()` 开始跟踪,并使用 `wandb.log()` 记录数据。\n",
"\n",
"**重要**: 运行下面的代码前,请确保你已经在环境中通过 `wandb login` 登录,或设置了 `WANDB_API_KEY` 环境变量,否则日志记录将处于 `disabled` 模式。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- WandB: 导入与设置 ---\n",
"print(\"\\n--- Setting up for Weights & Biases ---\")\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"from torch.utils.data import DataLoader\n",
"import wandb\n",
"import numpy as np\n",
"import os\n",
"import time\n",
"\n",
"device_wandb = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"print(f\"WandB section using device: {device_wandb}\")\n",
"\n",
"# --- 配置 (在此部分内定义) ---\n",
"wandb_config = {\n",
" \"learning_rate\": 0.0015,\n",
" \"epochs\": 2,\n",
" \"batch_size\": 128,\n",
" \"optimizer\": \"RMSprop\",\n",
"}\n",
"\n",
"# --- 数据准备 (在此部分内执行) --- \n",
"wandb_dataset_loaded = False\n",
"try:\n",
" print(\"WandB Section: Preparing FashionMNIST dataset...\")\n",
" wandb_transform = transforms.Compose([\n",
" transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))\n",
" ])\n",
" wandb_trainset = torchvision.datasets.FashionMNIST(root='./data_wandb', train=True, download=True, transform=wandb_transform)\n",
" wandb_testset = torchvision.datasets.FashionMNIST(root='./data_wandb', train=False, download=True, transform=wandb_transform)\n",
" wandb_trainloader = DataLoader(wandb_trainset, batch_size=wandb_config['batch_size'], shuffle=True, num_workers=0)\n",
" wandb_testloader = DataLoader(wandb_testset, batch_size=wandb_config['batch_size']*2, shuffle=False, num_workers=0)\n",
" wandb_classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')\n",
" print(\"WandB Section: Dataset prepared.\")\n",
" wandb_dataset_loaded = True\n",
"except Exception as e:\n",
" print(f\"WandB Section: Error loading dataset: {e}\")\n",
"\n",
"# --- 模型定义 (在此部分内定义) --- \n",
"class SimpleCNN_WandB(nn.Module):\n",
" # (模型结构同前)\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.network = nn.Sequential(\n",
" nn.Conv2d(1, 16, kernel_size=3, padding=1),\n",
" nn.ReLU(), nn.MaxPool2d(2, 2),\n",
" nn.Conv2d(16, 32, kernel_size=3, padding=1),\n",
" nn.ReLU(), nn.MaxPool2d(2, 2),\n",
" nn.Flatten(),\n",
" nn.Linear(32 * 7 * 7, 128), nn.ReLU(),\n",
" nn.Linear(128, 10)\n",
" )\n",
" def forward(self, x): return self.network(x)\n",
"print(\"WandB Section: SimpleCNN_WandB model defined.\")\n",
"\n",
"# Re-check WandB login possibility (defined in first code cell)\n",
"wandb_mode = \"online\" if 'wandb_online_mode_possible' in globals() and wandb_online_mode_possible else \"disabled\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- B.3 WandB 训练循环 --- \n",
"def train_with_wandb(cfg, train_loader, test_loader, classes):\n",
" print(\"\\n--- Running Training Loop for WandB --- \")\n",
" if not wandb_dataset_loaded: return None\n",
" \n",
" run_timestamp = int(time.time())\n",
" run_name = f\"WandB_{cfg['optimizer']}_lr{cfg['learning_rate']}_{run_timestamp}\"\n",
" \n",
" wandb_run = None\n",
" try:\n",
" wandb_run = wandb.init(\n",
" project=\"pytorch-tracking-tutorial-revised\", \n",
" config=cfg, name=run_name, reinit=True, mode=wandb_mode\n",
" )\n",
" print(f\"WandB Train: Run initialized (mode: {wandb_mode}). URL: {wandb_run.url if wandb_run and wandb_mode == 'online' else 'N/A'}\")\n",
" except Exception as e:\n",
" print(f\"WandB Train: Could not initialize WandB: {e}. Aborting training for WandB.\")\n",
" return None\n",
"\n",
" model = SimpleCNN_WandB().to(device_wandb)\n",
" criterion = nn.CrossEntropyLoss()\n",
" if wandb.config.optimizer == 'Adam':\n",
" optimizer = optim.Adam(model.parameters(), lr=wandb.config.learning_rate)\n",
" else:\n",
" optimizer = optim.RMSprop(model.parameters(), lr=wandb.config.learning_rate)\n",
"\n",
" if wandb_mode == 'online':\n",
" wandb.watch(model, log=\"all\", log_freq=100)\n",
" \n",
" global_step = 0\n",
" for epoch in range(wandb.config.epochs):\n",
" model.train()\n",
" running_loss = 0.0\n",
" for i, data in enumerate(train_loader, 0):\n",
" inputs, labels = data[0].to(device_wandb), data[1].to(device_wandb)\n",
" optimizer.zero_grad()\n",
" outputs = model(inputs)\n",
" loss = criterion(outputs, labels)\n",
" loss.backward()\n",
" optimizer.step()\n",
" running_loss += loss.item()\n",
" global_step += 1\n",
" \n",
" if i % 400 == 399:\n",
" batch_loss = running_loss / 400\n",
" if wandb_mode == 'online':\n",
" wandb.log({\"step_loss_wandb\": batch_loss, \"global_step_wandb\": global_step})\n",
" running_loss = 0.0\n",
" \n",
" # Epoch evaluation & logging\n",
" model.eval()\n",
" correct, total, test_loss = 0, 0, 0.0\n",
" sample_images, sample_labels, sample_preds = [], [], []\n",
" with torch.no_grad():\n",
" for batch_idx, data in enumerate(test_loader):\n",
" images, labels = data[0].to(device_wandb), data[1].to(device_wandb)\n",
" if batch_idx == 0: \n",
" sample_images = images.cpu()\n",
" sample_labels = labels.cpu()\n",
" outputs = model(images)\n",
" loss = criterion(outputs, labels)\n",
" test_loss += loss.item()\n",
" _, predicted = torch.max(outputs.data, 1)\n",
" if batch_idx == 0:\n",
" sample_preds = predicted.cpu()\n",
" total += labels.size(0)\n",
" correct += (predicted == labels).sum().item()\n",
" \n",
" epoch_accuracy = 100 * correct / total\n",
" epoch_test_loss = test_loss / len(test_loader)\n",
" print(f'WandB Run - Epoch {epoch + 1} Test Acc: {epoch_accuracy:.2f}%, Test Loss: {epoch_test_loss:.4f}')\n",
" \n",
" if wandb_mode == 'online':\n",
" wandb_logs = {\"epoch\": epoch + 1, \"test_accuracy_wandb\": epoch_accuracy, \"test_loss_wandb\": epoch_test_loss}\n",
" if len(sample_images) > 0:\n",
" wandb_images = []\n",
" num_samples_to_log = min(16, len(sample_images))\n",
" for idx in range(num_samples_to_log):\n",
" wandb_images.append(wandb.Image(\n",
" sample_images[idx],\n",
" caption=f\"Pred: {classes[sample_preds[idx]]}, True: {classes[sample_labels[idx]]}\"\n",
" ))\n",
" wandb_logs[\"test_samples_wandb\"] = wandb_images\n",
" wandb.log(wandb_logs)\n",
" \n",
" if wandb_run:\n",
" wandb_run.finish()\n",
" print(\"WandB Train: WandB training finished.\")\n",
" return model\n",
"\n",
"# --- Run Training with WandB ---\n",
"if wandb_dataset_loaded:\n",
" model_wandb = train_with_wandb(wandb_config, wandb_trainloader, wandb_testloader, wandb_classes)\n",
"else:\n",
" print(\"Skipping WandB training run due to dataset loading error.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### B.4 查看 WandB Dashboard\n",
"\n",
"1. 如果 `wandb_mode` 是 `online`,访问你的 WandB 账户 ([https://wandb.ai/](https://wandb.ai/))。\n",
"2. 找到名为 `pytorch-tracking-tutorial-revised` 的项目。\n",
"3. 查看对应的 run。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Section C: 使用 MLflow Tracking"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### C.1 MLflow Tracking 简介\n",
"MLflow Tracking 用于记录实验运行的参数、指标和 Artifacts。它可以将日志保存到本地 `mlruns` 目录或配置远程服务器。\n",
"\n",
"**核心步骤**:\n",
"1. `mlflow.set_experiment()`: 设置实验名称。\n",
"2. `with mlflow.start_run():`: 开始一个运行。\n",
"3. 在 `with` 块内使用 `mlflow.log_*` 方法记录信息。\n",
"4. `mlflow.pytorch.log_model()`: (可选) 记录 PyTorch 模型。\n",
"5. 在终端运行 `mlflow ui` 查看结果。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- C.2 MLflow 设置与导入 ---\n",
"print(\"\\n--- Setting up for MLflow ---\")\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.optim as optim\n",
"import torchvision\n",
"import torchvision.transforms as transforms\n",
"from torch.utils.data import DataLoader\n",
"import mlflow\n",
"import mlflow.pytorch\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt # For logging plot artifact\n",
"import os\n",
"import time\n",
"\n",
"device_mlflow = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
"print(f\"MLflow section using device: {device_mlflow}\")\n",
"\n",
"# --- 配置 (在此部分内定义) ---\n",
"mlflow_config = {\n",
" \"learning_rate\": 0.005,\n",
" \"epochs\": 2,\n",
" \"batch_size\": 64,\n",
" \"optimizer\": \"SGD\",\n",
"}\n",
"\n",
"# --- 数据准备 (在此部分内执行) --- \n",
"mlflow_dataset_loaded = False\n",
"try:\n",
" print(\"MLflow Section: Preparing FashionMNIST dataset...\")\n",
" mlflow_transform = transforms.Compose([\n",
" transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))\n",
" ])\n",
" mlflow_trainset = torchvision.datasets.FashionMNIST(root='./data_mlflow', train=True, download=True, transform=mlflow_transform)\n",
" mlflow_testset = torchvision.datasets.FashionMNIST(root='./data_mlflow', train=False, download=True, transform=mlflow_transform)\n",
" mlflow_trainloader = DataLoader(mlflow_trainset, batch_size=mlflow_config['batch_size'], shuffle=True, num_workers=0)\n",
" mlflow_testloader = DataLoader(mlflow_testset, batch_size=mlflow_config['batch_size']*2, shuffle=False, num_workers=0)\n",
" mlflow_classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',\n",
" 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot')\n",
" print(\"MLflow Section: Dataset prepared.\")\n",
" mlflow_dataset_loaded = True\n",
"except Exception as e:\n",
" print(f\"MLflow Section: Error loading dataset: {e}\")\n",
"\n",
"# --- 模型定义 (在此部分内定义) --- \n",
"class SimpleCNN_MLflow(nn.Module):\n",
" # (模型结构同前)\n",
" def __init__(self):\n",
" super().__init__()\n",
" self.network = nn.Sequential(\n",
" nn.Conv2d(1, 16, kernel_size=3, padding=1),\n",
" nn.ReLU(), nn.MaxPool2d(2, 2),\n",
" nn.Conv2d(16, 32, kernel_size=3, padding=1),\n",
" nn.ReLU(), nn.MaxPool2d(2, 2),\n",
" nn.Flatten(),\n",
" nn.Linear(32 * 7 * 7, 128), nn.ReLU(),\n",
" nn.Linear(128, 10)\n",
" )\n",
" def forward(self, x): return self.network(x)\n",
"print(\"MLflow Section: SimpleCNN_MLflow model defined.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# --- C.3 MLflow 训练循环 --- \n",
"def train_with_mlflow(cfg, train_loader, test_loader, classes):\n",
" print(\"\\n--- Running Training Loop for MLflow --- \")\n",
" if not mlflow_dataset_loaded: return None\n",
"\n",
" mlflow.set_experiment(\"FashionMNIST Classification Revised\")\n",
" run_timestamp = int(time.time())\n",
" run_name = f\"MLflow_{cfg['optimizer']}_lr{cfg['learning_rate']}_{run_timestamp}\"\n",
"\n",
" with mlflow.start_run(run_name=run_name) as run:\n",
" print(f\"MLflow Train: Run started. Run ID: {run.info.run_id}\")\n",
" mlflow.log_params(cfg)\n",
" \n",
" model = SimpleCNN_MLflow().to(device_mlflow)\n",
" criterion = nn.CrossEntropyLoss()\n",
" optimizer = optim.SGD(model.parameters(), lr=cfg['learning_rate'], momentum=0.9)\n",
"\n",
" global_step = 0\n",
" final_epoch_accuracy = 0 \n",
" for epoch in range(cfg['epochs']):\n",
" model.train()\n",
" running_loss = 0.0\n",
" for i, data in enumerate(train_loader, 0):\n",
" inputs, labels = data[0].to(device_mlflow), data[1].to(device_mlflow)\n",
" optimizer.zero_grad()\n",
" outputs = model(inputs)\n",
" loss = criterion(outputs, labels)\n",
" loss.backward()\n",
" optimizer.step()\n",
" running_loss += loss.item()\n",
" global_step += 1\n",
" \n",
" if i % 400 == 399:\n",
" batch_loss = running_loss / 400\n",
" mlflow.log_metric(\"step_loss_mlflow\", batch_loss, step=global_step)\n",
" running_loss = 0.0\n",
" \n",
" # Epoch evaluation & logging\n",
" model.eval()\n",
" correct, total, test_loss = 0, 0, 0.0\n",
" with torch.no_grad():\n",
" for data in test_loader:\n",
" images, labels = data[0].to(device_mlflow), data[1].to(device_mlflow)\n",
" outputs = model(images)\n",
" loss = criterion(outputs, labels)\n",
" test_loss += loss.item()\n",
" _, predicted = torch.max(outputs.data, 1)\n",
" total += labels.size(0)\n",
" correct += (predicted == labels).sum().item()\n",
" \n",
" epoch_accuracy = 100 * correct / total\n",
" epoch_test_loss = test_loss / len(test_loader)\n",
" final_epoch_accuracy = epoch_accuracy\n",
" print(f'MLflow Run - Epoch {epoch + 1} Test Acc: {epoch_accuracy:.2f}%, Test Loss: {epoch_test_loss:.4f}')\n",
" \n",
" mlflow.log_metric(\"test_accuracy_mlflow\", epoch_accuracy, step=epoch)\n",
" mlflow.log_metric(\"test_loss_mlflow\", epoch_test_loss, step=epoch)\n",
"\n",
" # --- End of Training (within 'with' block) ---\n",
" print(\"MLflow Train: Logging model and artifacts...\")\n",
" mlflow.pytorch.log_model(model, \"model_mlflow\")\n",
" mlflow.log_metric(\"final_accuracy\", final_epoch_accuracy)\n",
" \n",
" # Log a simple config file as artifact\n",
" cfg_path = \"mlflow_config.txt\"\n",
" with open(cfg_path, \"w\") as f: f.write(str(cfg))\n",
" mlflow.log_artifact(cfg_path)\n",
" if os.path.exists(cfg_path): os.remove(cfg_path) \n",
" \n",
" print(f\"MLflow Train: MLflow run finished. Run ID: {run.info.run_id}\")\n",
" return model\n",
"\n",
"# --- Run Training with MLflow ---\n",
"if mlflow_dataset_loaded:\n",
" model_mlflow = train_with_mlflow(mlflow_config, mlflow_trainloader, mlflow_testloader, mlflow_classes)\n",
"else:\n",
" print(\"Skipping MLflow training run due to dataset loading error.\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### C.4 查看 MLflow UI\n",
"\n",
"1. 打开终端。\n",
"2. 导航到包含 `mlruns` 的目录。\n",
"3. 运行 `mlflow ui`。\n",
"4. 在浏览器中打开 `http://localhost:5000`。\n",
"5. 找到名为 `FashionMNIST Classification Revised` 的实验查看运行结果。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 比较与总结"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 比较与总结\n",
"\n",
"本教程分别独立地展示了如何使用 TensorBoard, Weights & Biases, 和 MLflow Tracking 来跟踪和可视化一个简单的 PyTorch 训练过程。每个工具都有其独特的优势和适用场景。\n",
"\n",
"| 特性 | TensorBoard | Weights & Biases (WandB) | MLflow Tracking |\n",
"|------------------|---------------------------------|-------------------------------------|--------------------------------|\n",
"| **类型** | 开源可视化工具包 | 商业云平台 (个人/学术免费) | 开源 MLOps 平台 (Tracking组件) |\n",
"| **核心功能** | 实时监控, 可视化 (图, 图表, 嵌入)| 实验跟踪, 可视化, 协作, 模型管理 | 实验跟踪, 参数/指标/代码/模型记录 |\n",
"| **设置** | 简单 (通常随框架安装) | 需要注册登录, `pip install wandb` | `pip install mlflow`, 可本地运行 |\n",
"| **UI 托管** | 本地运行 `tensorboard` 命令 | 云端仪表板 | 本地运行 `mlflow ui` 或远程服务器 |\n",
"| **协作** | 有限 (共享日志文件) | 强大 (团队, 报告, 项目) | 良好 (共享 Tracking Server) |\n",
"| **集成** | PyTorch, TensorFlow, JAX 等 | PyTorch, TF, Keras, Sklearn, XGBoost等 | 多种框架 (PyTorch, TF, Sklearn等) |\n",
"| **超参数扫描** | HParams 插件 (较基础) | 内置强大的 Sweeps 功能 | 需要与其他库 (如 Hyperopt) 集成 |\n",
"| **模型/数据版本**| 不直接支持 | 支持 Artifacts (版本化) | 支持 Artifacts (版本化) |\n",
"| **部署/注册** | 无 | 有限 (集成部署工具) | MLflow Models & Registry |\n",
"\n",
"**选择建议**: \n",
"* **TensorBoard**: 快速本地可视化和调试训练过程的首选。\n",
"* **WandB**: 需要强大云端协作、高级可视化和集成超参数扫描时非常好用。\n",
"* **MLflow**: 适合需要开源、可自托管、关注实验复现性、代码/模型版本管理和 MLOps 集成的场景。\n",
"\n",
"选择哪个工具取决于你的具体需求和工作流程偏好。"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 5
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

对动态语言Python的一些感慨

众所周知Python是完全动态的语言,体现在

  1. 类型动态绑定
  2. 运行时检查
  3. 对象结构内容可动态修改(而不仅仅是值)
  4. 反射
  5. 一切皆对象(instance, class, method)
  6. 可动态执行代码(eval, exec)
  7. 鸭子类型支持

动态语言的约束更少,对使用者来说更易于入门,但相应的也会有代价就是运行时开销很大,和底层汇编执行逻辑完全解耦不知道代码到底是怎么执行的。

而且还有几点是我认为较为严重的缺陷。下面进行梳理。

破坏了OOP的语义

较为流行的编程语言大多支持OOP编程范式。即继承和多态。同样,Python在执行简单任务时候可以纯命令式(Imperative Programming),也可以使用复杂的面向对象OOP。

但是,其动态特性破环了OOP的结构:

  1. 类型模糊:任何类型实例,都可以在运行时添加或者删除属性或者方法(相比之下静态语言只能在运行时修改它们的值)。经此修改的实例,按理说不再属于原来的类型,毕竟和原类型已经有了明显的区别。但是该实例的内建__class__属性依旧会指向原类型,这会给类型的认知造成困惑。符合一个class不应该只是名义上符合,而是内容上也应该符合。
  2. 破坏继承:体现在以下两个方面
    1. 大部分实践没有虚接口继承。abc模块提供了虚接口的基类ABC,经典的做法是让自己的抽象类继承自ABC,然后具体类继承自自己的抽象类,然后去实现抽象方法。但PEP提案认为Pythonic的做法是用typing.Protocol来取代ABC,具体类完全不继承任何虚类,只要实现相应的方法,那么就可以被静态检查器认为是符合Protocol的。
    2. 不需要继承自具体父类。和上一条一样,即使一个类没有任何父类(除了object类),它依旧可以生成同名的方法,以实现和父类方法相同的调用接口。这样在语义逻辑上,类的定义完全看不出和其他类有何种关系。完全可以是一种松散的组织结构,任何两个类之间都没继承关系。
  3. 破坏多态:任何一个入参出参,天然不限制类型。这使得要求父类型的参数处,传入子类型显得没有意义,依旧是因为任何类型都能动态修改满足要求。

破坏了设计模式

经典的模式诸如工厂模式,抽象工厂,访问者模式,都严重依赖于继承和多态的性质。但是在python的设计中,其动态能力使得设计模式形同虚设。 大家常见的库中使用设计模式的有transformers库,其中的from_pretrained系列则是工厂模式,通过字符串名称确定了具体的构造器得到具体的子类。而工厂构造器的输出类型是一个所有模型的基类。

安全性问题

Python在代码层面一般不直接管理指针,所以指针越界,野指针,悬空指针等问题一般不存在。而gc机制也能自动处理垃圾回收使得编码过程不必关注这类安全性问题。但与之相对的,Python也有自己的安全性问题。以往非托管形式的代码的攻击难度较大,注入代码想要稳定执行需要避免破坏原来的结构导致程序直接崩溃(段错误)。 Python却可以直接注入任何代码修改原本的逻辑,并且由于不是在code段固定的内容,攻击时候也无需有额外考虑。运行时可以手动修改globals() locals()内容,亦有一定风险。 另一个危险则是类型不匹配导致的代码执行问题,因为只有在运行时才确定类型,无法提前做出保证,可能会产生类型错误的异常,造成程序崩溃。

总结

我出身于C++。但是近年来一直在用python编程。而且python的市场占有率已经多年第一,且遥遥领先。这和其灵活性分不开关系。对于一个面向大众的编程语言,这样的特性是必要的。即使以上说了诸多python的不严谨之处,但是对于程序员依旧可以选择严谨的面向对象写法。所以,程序的优劣不在于语言怎么样,而在于程序员本身。程序员有责任写出易于维护,清晰,规范的代码~

Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@KuRRe8
Copy link
Author

KuRRe8 commented May 8, 2025

返回顶部

有见解,有问题,或者单纯想盖楼灌水,都可以在这里发表!

因为文档比较多,有时候渲染不出来ipynb是浏览器性能的问题,刷新即可

或者git clone到本地来阅读

ChatGPT Image May 9, 2025, 04_45_04 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment