Skip to content

Instantly share code, notes, and snippets.

@xpe
Last active December 3, 2024 16:19
Show Gist options
  • Save xpe/cb99726601303a36c49217737ae3e662 to your computer and use it in GitHub Desktop.
Save xpe/cb99726601303a36c49217737ae3e662 to your computer and use it in GitHub Desktop.
Burn (Rust ML library) CHANGELOG up to v0.15.0

v0.15.0 - 2024-10-28

Summary

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Support for ONNX models has been expanded, with additional operators and bug fixes for better operator coverage.

As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations, and enhanced documentation.

Module & Tensor

Bug Fixes

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

Changes

v0.14.0 - 2024-08-27

Summary

This release marks the debut of our CubeCL integration, which brings cross-platform GPU programming capabilities directly to Rust. With CubeCL now supporting both CUDA and WebGPU, Burn benefits from a new CUDA backend that can be enabled using the cuda-jit feature. Please note that this backend is still considered experimental, and some operations, particularly those related to vision, may experience issues.

Additionally, this release features significant enhancements to ONNX support, including bug fixes, new operators, and improvements in code generation.

As always, it also includes numerous bug fixes, performance enhancements, new tensor operations, and improved documentation.

Burn 0.14.0 introduces a new tensor data format that significantly enhances serialization and deserialization speeds and introduces Quantization, a new Beta feature included in this release. The format is not compatible with previous versions of Burn, but you can migrate your previously saved records using this guide.

Module & Tensor

Bug Fixes

ONNX Support

Bug Fixes

Enhancements

Refactoring

Documentation & Examples

CubeCL

Miscellaneous

Bug Fixes

Changes

v0.13.2 - 2024-05-03

Bugfix

Fix autodiff graph memory management strategy to improve performance (#1702 #1710) @louisfd Fix matmul double broadcasting for ndarray (#1646 #1679) @lancelet

Changes

v0.13.1 - 2024-04-26

Bugfix

Fix autodiff memory leak and improve performance with a new graph memory management strategy (#1698) @nathanielsimard @louisfd Fix inplace fused operations (#1682) @nathanielsimard

Improvements

Linear 1D support, helpful for ONNX support (#1682) @nathanielsimard Upgrade wgpu to 0.19.4 (#1692) @nathanielsimard

Changes

v0.13.0 - 2024-04-12

The Burn Release 0.13 is a significant update introducing numerous new features and performance enhancements. One major change is the removal of the Sync trait implementation from most Burn types, see Core User APIs. Additionally, the release introduces several new tensor operations, module features, optimizers, as well as improvements to the autodiff backend. Notably, a new bridge mechanism facilitates runtime switching between backends, and significant work has been done on the Just-in-Time and Wgpu backends. The release also addresses numerous bug fixes, documentation improvements, infrastructure updates, CI enhancements, and miscellaneous changes to improve code quality and usability.

Core User APIs

A major change in this release is that most Burn types no longer implement the Sync trait, such as modules, optimizers, and tensors. This change should not impact users of the Learner struct for model training. However, it may affect those who implemented their own training loop and inference server. While modules, optimizers and tensors can be sent to other threads, they cannot be accessed concurrently by multiple threads. This aligns with Burn's workflow, where each tensor operation requires an owned version of the tensor. The change was made to safely reduce the number of locks needed when modifying the state of the autodiff graph, fusion state, allocation cache, and various other use cases. While not all locks have been removed, the type signature no longer poses a problem for follow-up optimizations. Note that the same tensor can still be sent to multiple threads without copying the underlying data. However it will require cloning before sending a tensor to a thread. (#1575) @nathanielsimard

Tensor

Module

Optimizer

Train

Backend

This release also introduces the backend bridge, a new mechanism for runtime switching between backends. While an improvement, it remains compatible with previous methods of supporting mixed precision. (#1529) @nathanielsimard

JIT

Significant effort has been devoted over the past few months to refactor the previous Wgpu backend into a shader-agnostic Just-in-Time backend. All lower-level dependencies have been abstracted into the Just-in-Time Runtime trait, requiring a compiler, compute server, and storage. The bulk of this work was carried out by @nathanielsimard and @louisfd.

Commits: #1274 #1280 #1313 #1340 #1356 #1359 #1378 #1391 #1396 #1398 #1417 #1429 #1423 #1424 #1433 #1456 #1474 #1457 #1480 #1472 #1493 #1509 #1530 #1528 #1541 #1550 #1569

Wgpu

Autodiff

Extensive work has also been undertaken on Burn's autodiff backend. The backend now supports gradient checkpointing to reduce memory usage and has been refactored into a client/server architecture. These updates result in significantly less blocking when tracking gradients, enhancing performance particularly on smaller models. Furthermore, various bugs have been fixed where some graph nodes weren't used, potentially truncating the autodiff graph. Overall, these changes make the autodiff process more reliable and efficient. (#1575) (#1358) @louisfd @nathanielsimard

Candle

Data

Import

Benchmarks

We have implemented a system that enables the comparison of backends across a variety of tasks. Currently, most of these tasks consist of micro-benchmarks, but we plan to expand the range of benchmarks in the future. To ensure Burn's portability and performance across different devices, the community can run and upload benchmarks! 🔥

Bug Fix

Infrastructure

The minimum Rust version has been updated to 1.75. (#1297) @syl20bnr

Docs

CI

Tests

  • Add NaN and Inf detection in assert_approx_eq to catch potential numerical bugs. (#1209) @skewballfox

Misc

Changes

v0.12.1 - 2024-02-01

Bugfix

Fix wgpu performance issue: revert to wgpu 0.18.0 #1221 @nathanielsimard Fix problem with batch norm on LibTorch backend #1226 @nathanielsimard Fix docs build #1212 #1229 @syl20bnr @nathanielsimard Fix training dashboard metrics switch #1228 @nathanielsimard

Chores

Put all dependencies versions in workspace #1210 @nathanielsimard

Changes

v0.12.0 - 2024-01-31

This release highlights an optimized Wgpu Backend, clearer examples and documentation, and numerous bug fixes. Notably, breaking changes in device management mandate explicit device specification to prevent potential bugs. Additionally, the new PyTorch recorder simplifies model porting by enabling automatic import of PyTorch's weights. We also put a lot of efforts into improving our CI infrastructure for enhanced reliability, efficiency, and scalability.

Changes

Tensor & Module API

  • Added support for generic modules #1147 @nathanielsimard
  • Added support for tuple modules #1186 @varonroy
  • Enabled loading PyTorch .pt (weights/states) files directly to module's record, currently available on Linux & MacOS #1085 @antimora
  • Added mish and softplus activation functions #1071 @pacowong
  • Improved chunk performance in backends @1032 @Kelvinyu1117
  • [Breaking] Added the device as an argument for tensor operations that require it, replacing the previous optional device usage #1081 #518 #1110 @kpot
    • Code update involves either using Default::default for the same behavior or specifying the desired device.
  • Allowed raw tensors to be serialized/deserialized directly with serde #1041 @jmacglashan
  • [Breaking] Forced the choice of the device for deserialization #1160 #1165 @nathanielsimard
  • Added element-wise pow operation #1133 @skewballfox
  • Refactored the tensor backend API names #1174 @skewballfox
  • [Breaking] Changed the default recorder to NamedMpkFileRecorder #1161 #1151 @laggui
    • After a bit of exploration, we removed any type of compression because it adds to much overhead

Examples & Documentation

Wgpu Backend

Fusion

Infra

Chore

Bug Fixes

Changes

v0.11.1 - 2023-12-04

Burn v0.11.1 fixes a few bugs in the recent v0.11.0

Bugfixes

Fix concurrency issue in burn-fusion, related to freeing tensors that are never read @nathanielsimard

Fix typos in the book @shanmo

Fix Readme @nathanielsimard

Fix docs build @dcvz

Thanks

Thanks to all aforementioned contributors

Changes

v0.11.0 - 2023-12-01

The main feature of Burn v0.11.0 is automatic kernel fusion, which is still in active development but already usable. Many enhancement and new features have been added throughout the framework, for better efficiency and reliability.

Warnings:

  • There are some breaking changes, see below.
  • The organization has been renamed from burn-rs to tracel-ai.

Changes

Overall changes

Burn Fusion

Burn Core

Burn Tensor

  • New operators in tensor API: unsqueeze_dim, narrow, stack, chunk, tril, triu @dcvz

  • Recip operation support on all backends @gzsombor

  • Implement DoubleEndedIterator for DimIter @wcshds

Burn Compute

Burn Import

Burn Train

Burn WGPU

Burn Candle

  • Support conv_transpose_1d @louisfd

  • Enable accelerate for MacOS CPU @dcvz

Backend Comparison

Bugfixes

  • Allow arbitrary precision threshold for float equality assertion @meteor-lsw

  • Update serde_rusqlite to the new version with MIT/Apache2 license @antimora

  • Fix SQLite database tests on Windows @syl20bnr

  • Fix max_dim and min_dim tensor operations @gzsombor

  • Fix inplace double binary broadcasting in the LibTorch backend @nathanielsimard

Documentation

Continuous Integration

Thanks

Thanks to all aforemetioned contributors.

Changes

v0.10.0 - 2023-10-24

Burn v0.10.0 sees the addition of the burn-compute crate to simplify the process of creating new custom backends, a new training dashboard and the possibility of using the GPU in the browser along with a web demo. Additionally, numerous new features, bug fixes, and CI improvements have been made.

Warning: there are breaking changes, see below.

Changes

Burn Compute

Burn Import

  • Add more ONNX record types @antimora

  • Support no-std for ONNX imported models @antimora

  • Add custom file location for loading record with ONNX models @antimora

  • Support importing erf operation to ONNX @AuruTus

Burn Tensor

Burn Dataset

  • Improved speed of SqLite Dataset @antimora

  • Use gix-tempfile only when sqlite is enabled @AlexErrant

Burn Common

  • Add benchmark abstraction @louisfd

  • Use thread-local RNG to generate IDs @dae

Burn Autodiff

  • Use AtomicU64 for node ids improving performance @dae

Burn WGPU

Burn Candle

  • Candle backend is now available as a crate and updated with Candle advances @louisfd @agelas

Burn Train

  • New training cli dashboard using ratatui @nathanielsimard

  • [Breaking] Heavy refactor of burn-train making it more extensible and easier to work with @nathanielsimard

  • Checkpoints can be customized with criteria based on collected metrics @nathanielsimard

  • Add the possibility to do early stopping based on collected metrics @nathanielsimard

Examples

  • Add image classifier web demo using different backends, including WebGPU, @antimora

Bugfixes

Documentation

Chores

Thanks

Thanks to all aforemetioned contributors and to our sponsors @smallstepman, @0x0177b11f and @premAI-io.

Changes

v0.9.0 - 2023-09-06

Burn v0.9.0 sees the addition of the Burn Book, a new model repository, and many new operations and optimizations.

Burn Book

The Burn Book is available at https://burn-rs.github.io/book/

Model repository

The Model repository is available at https://github.com/burn-rs/models

Changes to Burn

Neural networks

Tensors

Training

  • New training metrics @Elazrod56
    • CPU temperature and use
    • GPU temperature
    • Memory use
  • Custom training and validation metric loggers @nathanielsimard
  • Migration from log4rs to tracing, better integration in a GUI app @dae
  • Training interruption @dae
  • New custom optimize method @nathanielsimard

Backends

Dataset

Import & ONNX

Fix

  • Hugging Face downloader Windows support @Macil
  • Fix grad replace and autodiff backward broadcast @nathanielsimard
  • Fix processed count at learning completion @dae
  • Adjust some flaky tests @dae
  • Ability to disable experiment logging @dae

Configuration

Documentation

Thanks

Thanks to all aforemetioned contributors and to our sponsors @smallstepman and @premAI-io.

Changes

v0.8.0 - 2023-07-25

In this release, our main focus was on creating a new backend using wgpu. We greatly appreciate the meaningful contributions made by the community across the project. As usual, we have expanded the number of supported operations.

Changes

Tensor

Dataset

  • Added a dataset using Sqlite for storage. Now used to store huggingface datasets. @antimora
  • New speech command audio dataset. @antimora
  • Create python virtual environment for huggingface dependencies. @dengelt

Burn-Import

Backend

Neural Networks

  • Added LSTM module. @agelas
  • Added GRU module. @agelas
  • Better weights initialization with added support for Xavier Glorot. @louisfd
  • Added MSE loss. @bioinformatist
  • Cleanup padding for convolution and pooling modules. @luni-4
  • Added sinusoidal positional embedding module. @antimora

Fix

Documentation

  • Improve documentation across the whole project ♥! @antimora

Thanks

Thanks to all contributors and to the sponsor @smallstepman.

Changes

v0.7.0 - 2023-05-06

Serialization

Serialization has been completely revamped since the last release. Modules, Optimizers, and Learning Rate Scheduler now have an associative type, allowing them to determine the type used for serializing and deserializing their state. The solution is documented in the new architecture doc.

State can be saved with any precision, regardless of the backend in use. Precision conversion is performed during serialization and deserialization, ensuring high memory efficiency since the model is not stored twice in memory with different precisions.

All saved states can be loaded from any backend. The precision of the serialized state must be set correctly, but the element types of the backend can be anything.

Multiple (de)serialization recorders are provided:

  • Default (compressed gzip with named message pack format)
  • Bincode
  • Compressed gzip bincode
  • Pretty JSON

Users can extend the current recorder using any serde implementation.

Multiple precision settings are available:

  • Half (f16, i16)
  • Full (f32, i32)
  • Double (f64, i64)

Users can extend the current settings using any supported number type.

Optimizer

The optimizer API has undergone a complete overhaul. It now supports the new serialization paradigm with a simplified trait definition. The learning rate is now passed as a parameter to the step method, making it easier to integrate the new learning rate scheduler. The learning rate configuration is now a part of the learner API. For more information, please refer to the documentation.

Gradient Clipping

You can now clip gradients by norm or by value. An integration is done with optimizers, and gradient clipping can be configured from optimizer configs (Adam & SGD).

Learning Rate Scheduler

A new trait has been introduced for creating learning rate schedulers. This trait follows a similar pattern as the Module and Optimizer APIs, utilizing an associative type that implements the Record trait for state (de)serialization.

The following learning rate schedulers are now available:

  • Noam learning scheduler
  • Constant learning scheduler

Module

The module API has undergone changes. There is no longer a need to wrap modules with the Param struct; only the Tensor struct requires a parameter ID.

All modules can now be created with their configuration and state, eliminating the unnecessary tensor initializations during model deployment for inference.

Convolution

Significant improvements have been made to support all convolution configurations. The stride, dilation, and groups can now be set, with full support for both inference and training.

Transposed convolutions are available in the backend API but do not currently support the backward pass. Once they are fully supported for both training and inference, they will be exposed as modules.

Pooling

The implementation of the average pooling module is now available.

Transformer

The transformer decoder has been implemented, offering support for efficient inference and autoregressive decoding by leveraging layer norms, position-wise feed forward, self-attention, and cross-attention caching.

Tensor

The developer experience of the Tensor API has been improved, providing more consistent error messages across different backends for common operations. The Tensor struct now implements Display, allowing values, shape, backend information, and other useful details to be displayed in an easily readable format.

New operations

  • The flatten operation
  • The mask scatter operation

Torch Backend

The Torch backend now supports bf16.

ONNX

The burn-import project now has the capability to generate the required Burn code and model state from an ONNX file, enabling users to easily import pre-trained models into Burn. The code generation utilizes the end user API, allowing the generated model to be fine-tuned and trained using the learner struct.

Please note that not all operations are currently supported, and assistance from the community is highly appreciated. For more details, please refer to the burn-import repository https://github.com/burn-rs/burn/tree/main/burn-import.

Bug Fixes

  • Backward pass issue when there is implicit broadcasting in add burn-rs/burn#181

Thanks 🙏

Thanks to all contributors @nathanielsimard , @antimora, @agelas, @bioinformatist, @sunny-g Thanks to current sponsors: @smallstepman

Changes

v0.6.0 - 2023-03-21

Backend API

  • Almost all tensor operations now receive owned tensors instead of references, which enables backend implementations to reuse tensor-allocated memory.
  • Backends now have a different type for their int tensor, with its own set of operations.
  • Removed the IntegerBackend type.
  • Simpler Element trait with fewer functions.
  • New index-related operations (index_select , index_select_assign , index_select_dim and index_select_dim_assign).

Tensor API

  • The Tensor struct now has a third generic parameter Kind with a default value of Float.
  • There are three kinds of tensors: Float, Bool, and Int,
    • Float Tensor ⇒ Tensor<B, D> or Tensor<B, D, Float>
    • Bool Tensor ⇒ Tensor<B, D, Bool>
    • Int Tensor ⇒ Tensor<B, D, Int>
  • You still don’t have to import any trait to have functions enabled, but they have an extra constraint based on the kind of tensor, so you can’t call matmul on a bool tensor. All of it with zero match or if statement, just pure zero-cost abstraction.
  • The BoolTensor struct has been removed.

Autodiff

  • Not all tensors are tracked by default. You now have to call require_grad.
  • The state is not always captured. Operations manually have to clone the state they need for their backward step. This results in a massive performance enhancement.

No Std

  • Some Burn crates don't require std anymore, which enables them to run on any platform:
    • burn-core
    • burn-ndarray
    • burn-common
    • burn-tensor
  • We have a WebAssembly demo with MNIST inference. The code is also available here with a lot of details explaining the process of compiling a model to WebAssembly.

Performance

  • The Tch backend now leverages in-place operations.
  • The NdArray backend now leverages in-place operations.
  • The convolution and maxpooling layers in the NdArray backend have been rewritten with much better performance.
  • The cross-entropy loss module leverages the new index_select operation, resulting in a big performance boost when the number of classes is high.

And of course, a lot of fixes and enhancements everywhere.

Thanks to all the contributors for their work @antimora @twitchax @h4rr9

Changes

v0.5.0 - 2023-02-12

New Modules for Vision Tasks

  • Conv1D, Conv2D currently without support for stride, dilation, or group convolution
  • MaxPool2D
  • BatchNorm2D

New General Tensor Operations

Breaking Changes

  • Devices are now passed by reference, thanks to feedback from @djdisodo.
  • The shape function now returns an owned struct, and backends no longer need to cache each shape.

Changes

v0.4.0 - 2022-12-30

Changes

v0.3.0 - 2022-11-20

  • Separed backend crates

Changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment