v0.15.0 - 2024-10-28

Summary

This release brings major performance improvements to tensor operations, particularly in matrix multiplication and convolution, along with experimental ROCm/HIP and SPIR-V support enabled by CubeCL runtimes. It also introduces foundational features for multi-backend compatibility and adds new quantization operations.

Support for ONNX models has been expanded, with additional operators and bug fixes for better operator coverage.

As with previous releases, this version includes various bug fixes, further performance optimizations, new tensor operations, and enhanced documentation.

Module & Tensor

Remove copy restriction for const generic modules (#2222) @laggui
Add deform_conv2d as implemented in torchvision (#2147) @wingertge
Add dim checks on output rank for unsqueeze and stack (#2331) @laggui
Add Softmin (#2358) @NoahSchiro
Add round, floor, ceil for float tensor (#2372) @med1844
Make tensor sync (#2392) @kingwingfly
Add tensor.one_hot int operation (#2413) @tsanona
[Breaking] Change LR schedulers to return the initial LR at first .step() (#2337) @towerpark
Move LrSchedule generic to make it easier to use (#2309) @ArthurBrussee
Add quantization ops default implementation (#2125 #2275 2301) @laggui

Bug Fixes

Avoid 0 denominator in interpolate frac (#2224) @laggui
Nonzero should return an empty vec for zero tensors (#2212) @laggui
Change ndarray mask_where implementation to correctly deal with NaNs (#2272) @laggui
Fix mask_where broadcasted input (#2381) @laggui
Make powf broadcastable (#2398) @laggui

Backends

Add candle CudaDevice and MetalDevice to avoid creating a new unique device each time (#2290) @laggui
Add fusion mix precision (#2247) @nathanielsimard
Add SPIR-V compiler backend to burn-wgpu (#2386) @wingertge
Add burn-hip (#2399) @syl20bnr
Add BackendRouter to handle multiple backends on the way to distributed (#2353 #2419) @laggui

Bug Fixes

Fix autodiff memory leak (#2347) @nathanielsimard
Fix autodiff abs NaN when output is 0 (#2249) @AsherJingkongChen

Documentation & Examples

Add documentation for custom cubecl kernels, update some outdated docs (#2404) @wingertge
Add comments to burn fusion (#2130) @cBournhonesque
Improve doc for burn-tch (#2288) @kingwingfly
Improve regression example (#2405) @laggui
Create CITATION.cff (#2231) @antimora
Enable doc_auto_cfg to show feature-req-hint in docs.rs (#2271) @kingwingfly

Fixes

Fix tensor data elem type conversion in book (#2211) @laggui
Fix target convert in batcher and align guide imports (#2215) @laggui
Fix huber loss documentation (#2232) @kingwingfly
Fix debugger settings doc in contributor book (#2223) @tiruka
Fixed raspberry pi pico example not compiling (#2220) @BjornTheProgrammer
Fixed path in book (#2262) @mehmetalianil
Fix unresolved import regression (#2285) @tiruka
Fix burn book links (#2303 #2327) @laggui @tiruka
Contributor Book: Fix the link of primitive types in the "Serialization" page (#2362) @towerpark
Fix simple regression batch targets (#2379) @wangjiawen2013
Fix xtask args which are unmodified when upgrading xtask commands (#2364) @tiruka

ONNX Support

Add gather support for multi-dim indices (rank > 1) (#2199) @alteredoxide
Allow onnx-import expand op with non-const shapes (#2189) @hexd0t
Improve ONNX import tensor shape tracking (#2213) @hexd0t
Add missing output padding to conv transpose ONNX (#2216) @laggui
Fix ONNX where op for scalar inputs (#2218) @hexd0t
simplify scope tracking in burn-import (#2207) @skewballfox
Add onnx op trilu (#2323) @tiruka
Add ConvTranspose1d ONNX op (#2349) @tiruka

Enhancements

Improve slice kernel performance (#2252) @nathanielsimard
Fix burn-jit conv2d excessive loop unrolling (#2263) @AsherJingkongChen
Introduce autotuning to conv2d and conv_transpose2d with a new im2col/GEMM algorithm (#2287) @wingertge
Further data locality optimizations for implicit GEMM (#2300) @wingertge
Add utility methods to split gradients to GradientParams (#2311) @ArthurBrussee
Add bounds checking to implicit GEMM to allow arbitrary input shapes (#2354) @wingertge
Initialize accumulator to bias for implicit GEMM to save an expensive float_add (#2383) @wingertge

Refactoring

Select kernel from CPA to CubeCL (#2168) @mepatrick73
Migrate cubecl macro (#2266) @wingertge
Remove primitves const D generic (#2298) @laggui
Refactor elemwise fusion (#2344) @nathanielsimard
Refactor Adaptive Avg Pool to CubeCL (#2351) @nathanielsimard
Refactor pooling kernels (#2356) @nathanielsimard
Refactor burn-tensor: Split conv backward ops to allow conditional gradient computation (#2278) @AsherJingkongChen

Miscellaneous

Fix panic messages being invisible in tui mode (#2226) @PaulWagener
Refactor xtask to use tracel-xtask and refactor CI workflow (#2063) @syl20bnr
Automatic minimum rust version in README (#2227) @syl20bnr
Set MSRV to 1.81 (#2388) @nathanielsimard
Don't panic when the progress is > 1.0 (#2229) @PaulWagener
Fix compile for dataset crate with vision feature (#2228) @PaulWagener
Update CI workflow for last version of setup-linux action (#2248) @syl20bnr
[CI] Fix llvmpipe, lavapipe install for valgrind and vulnerabilities (#2264) @syl20bnr
Use CliMetricsRenderer when not in a terminal (#2307) @lancelet
Update rusqlite and associated libraries (#2328) @paulirotta
Fix missing fusion feature flag @nathanielsimard
Move conv autotune under feature flag (except key) (#2330) @laggui
Add should_run for convs instead of panicking (#2403) @ArthurBrussee
Make changes for latest ratatui version (#2421) @laggui
Add Windows/WindowsIterator/WindowsDataset (#2338) @NicoZweifel

Changes

v0.14.0 - 2024-08-27

Summary

This release marks the debut of our CubeCL integration, which brings cross-platform GPU programming capabilities directly to Rust. With CubeCL now supporting both CUDA and WebGPU, Burn benefits from a new CUDA backend that can be enabled using the cuda-jit feature. Please note that this backend is still considered experimental, and some operations, particularly those related to vision, may experience issues.

Additionally, this release features significant enhancements to ONNX support, including bug fixes, new operators, and improvements in code generation.

As always, it also includes numerous bug fixes, performance enhancements, new tensor operations, and improved documentation.

Burn 0.14.0 introduces a new tensor data format that significantly enhances serialization and deserialization speeds and introduces Quantization, a new Beta feature included in this release. The format is not compatible with previous versions of Burn, but you can migrate your previously saved records using this guide.

Module & Tensor

(@laggui) Add RoPE init_with_frequency_scaling (#2194)
(@laggui) Add 0-dim tensor checks for creation ops and validate TensorData shape w/ num values (#2137)
(@wingertge): Add Hard sigmoid activation function (#2112)
(@antimora): Add is_nan and contains_nan tensor ops (#2088)
(@laggui) Convert compatible prelu weights to rank 1 (#2054)
(@laggui) Refactor tensor quantization for q_* ops (#2025)
(@RuelYasa): Adding burn::nn::Sigmoid (#2031)
(@laggui) Module weight quantization (#2000)
(@louisfd): Cube: Matmul tiling (#1994)
(@antimora): Enhance slice operation to support more range variation (#1989)
(@laggui) Add static tensor quantization (#1963)
(@johnhuichen): Enable negative starts and ends for slice op (#1981)
(@booti386): Implement 3D and transposed 3D convolutions. (#1945)
(@antimora): Print module - implement module display for remaining modules (part2) (#1933)
(@antimora): Print model structure like with PyTorch - Part 1 (#1912)
(@DieracDelta): Tanh nn wrapper (#1903)
(@laggui) Implement Element for bool (#1878)
(@LilDojd) Feat: Add movedim tensor operator (#1876)
(@ArthurBrussee): Make autodiff compile on wasm (#1889)
(@ArthurBrussee): Make Param.id public (#1859)
(@kantic) Remainder operator (#1726)
(@McArthur-Alford) Indices Operator (#1735)
(@laggui) Add seq start position when applying RoPE encoding (#1796)
(@JachymPutta): Adding max import (#1769)
(@agelas): Feat/squeeze dims (#1779)
(@wcshds) Implement bidirectional LSTM (#1035)
(@agelas): Feat/remainder (#1597)

Bug Fixes

(@laggui) Fix root-mean-square precision issue (#2193)
(@laggui) Fix indices dim check in gather_update_outputs (#2149)
(@antimora): Fix #2091 bug (in-place after expand) (#2114)
(@laggui) Fix aggregation results slice (#2110)
(@nathanielsimard): Fix: fusion auto bound checks (#2087)
(@laggui) Extend [min, max] range to ensure zero-point (#2055)
(@agelas): Bug/Remove Squeeze Panic for Multiple Dimensions (#2035)
(@nathanielsimard): Fix wgsl remainder definition (#1979)
(@laggui) Fix output tensor dtype (#1938)
(@femshima): feat: Make RetroForward public (#1905)
(@laggui) Fix conv2d_weight_grad_groups (#1891)
(@nathanielsimard): Fix select assign backward (#1739)
(@louisfd): Fix repeat for dims > 1 (#1713)
(@nathanielsimard): Fix lstm batch size bug (#1695)
(@antimora): Reshape bug fix (#1684)
(@antimora) Fix bug: Filling tensor containing f32::NEG_INFINITY will result in NaN for burn-ndarray (#2095)

ONNX Support

(@hexd0t): Allow ONNX scalar greater/less with scalar (#2146)
(@hexd0t): Implement ONNX Gather for scalar indices (#2141)
(@mepatrick73): feat: adding shape support for gather ONNX operation (#2128)
(@mepatrick73): ONNX Tile operation (#2092)
(@cBournhonesque): Add onnx mean (#2119)
(@mepatrick73): Repeat operation (#2090)
(@antimora): Add 1d and 2d modules for interpolate with scaling (also fix ONNX Resize op) (#2081)
(@johnhuichen): Implement ONNX Pad Operator (#2007)
(@hexd0t, @antimora): Implement ONNX ConstantOfShape (#1815)
(@johnhuichen): Add subtract tensor from scalar for ONNX sub op (#1964)
(@Dirleye): Add ReduceProd ONNX Import (#1955)
(@JachymPutta) feat: added reduce min onnx import (#1894)
(@mosure): feat: resize onnx import (#1863)
(@JachymPutta) feat: added slice onnx import (#1856)
(@skewballfox): Optimize argument handling and improve ONNX graph building (#1857)
(@JachymPutta) feat: add sum onnx import (#1846)
(@agelas): Feat/gather import (#1843)
(@JachymPutta): feat: expand onnx import (#1813)
(@JachymPutta): feat: added range onnx import (#1834)
(@will-maclean): Feature/onnx argmax (#1814)
(@hexd0t): Feat: Implement ONNX RandomUniform + RandomNormal in burn-import (#1806)
(@JachymPutta): feat: Greater + GreaterOrEqual onnx import (#1801)
(@JachymPutta): feat: Less + LessOrEqual onnx import (#1800)
(@JachymPutta): feat: added min onnx import (#1778)
(@agelas): Squeeze Onnx Import (#1753)
(@Arjun31415): Added ONNX AvgPool1d (#1744)
(@Arjun31415): Add MaxPool1d ONNX Op(#1725)
(@AntBlo) Add reduce sum onnx ops to burn imports (#1723)
(@Arjun31415): PReLu ONNX import (#1721)
(@antimora): Update SUPPORTED-ONNX-OPS.md (#1717)
(@antimora): ONNX debug improvements (#1712)
(@antimora): Skip updating shape for linear if not present (#1700)
(@laggui) Remove leaky relu ONNX file (#1697)
(@antimora): ONNX support for scalar unsqueeze (#1690)
(@laggui) Add layer norm onnx op support (#1680)
(@antimora): Fix reshape bug (support for opset version 1) (#1667)
(@wufniks) Add sign ONNX op import support (#1663)
(@laggui) Add where onnx op support (#1653)
(@laggui) Add matmul ONNX op support (#1638)
(@laggui) Add reduce max ONNX op support (#1636)
(@laggui) Add shape ONNX op support (#1639)
(@laggui) [ONNX] Add not op and extend cast support to tensors (#1634)
(@laggui) Add reduce mean ONNX op support (#1637)
(@antimora): Update SUPPORTED-ONNX-OPS.md (#1641)
(@laggui) Add sin onnx op support (#1633)

Bug Fixes

(@mepatrick73) Tensor type indent fix (#2196)
(@mepatrick73) pad-input-fix: adding support for pads as attributes (#2195)
(@hexd0t) Fix ONNX Gather codegen for Shape input (#2148)
(@mepatrick73): bug fix: adding bounds checking to pad ONNX inputs (#2120)
(@laggui) Fix checks_channels_div_groups condition and ONNX conv import with groups (#2051)
(@nathanielsimard): Support linear 1d (#1682)
(@laggui) Fix ONNX and PyTorch import section links in burn book (#1681)
(@antimora): Fix bug 1645 (Unsqueeze OpSet 11) (#1661)
(@laggui) Fix transpose onnx op (permute) (#1657)

Enhancements

(@laggui) Add scientific notation formatting for small metric values (#2136)
(@ArthurBrussee): Always derive Cube features from adapter (#1958)
(@mepatrick73, @nathanielsimard): Dynamic memory management preset + updated wgpu buffer memory management (#1962)
(@mepatrick73): Feat/fixed chunk alloc by class (#1960)
(@ArthurBrussee): Consistent sync/async handling, allow more functions to be async for wasm. (#1936)
(@varonroy): Replaced str with Path (#1919)
(@louisfd, @nathanielsimard): New autodiff graph memory management strategy (#1698)
(@syl20bnr): Move HandleContainer and Tensor Ops descriptions from burn-fusion to burn-tensor (#1654)
(@NicoZweifel) WindowDataset/windows function (#1553)
(@antimora): Improve pickle (CandleTensor) conversions to NestedValue (#1944)

Refactoring

(@mepatrick73) Scatter kernel from cpa to cubecl (#2169)
(@nathanielsimard): Refactor binary op (#2085)
(@omahs): Fix typos (#2098)
(@nathanielsimard): Refactor/jit/unary (#1965)
(@skewballfox): Separating ONNX parsing from burn-import (#1921)
(@laggui) Refactor tensor data (#1916)
(@ArthurBrussee): Remove GraphicsAPI generic for WgpuRuntime (#1888)
(@skewballfox): add dependency management for python (#1887)
(@louisfd): refactor reduce into separate traits (#1798)
(@nathanielsimard): Refactor/jit fusion (#1750)
(@nathanielsimard): Refactor/burn compute (#1580)

Documentation & Examples

(@nathanielsimard) Enable cuda-jit in burn-core + in text classification example (#2160)
(@cBournhonesque): Add comments for matmul kernel (#2138)
(@laggui) Fix inner backend typo in book guide (#2135)
(@antimora): Improve ONNX import book section (#2059)
(@antimora): Update slice documentation (#2024)
(@syl20bnr): Remove mention of example in backend section of the book (#2014)
(@laggui) Fix image-classsification-web + autotune flag usage (#2011)
(@nathanielsimard): Cube/doc/readme (#1904)
(@laggui, @syl20bnr) Add models and examples reference (#1966)
(@antimora): Print module part3 - Update book (#1940)
(@towerpark): Book: Fix the link to burn-train in "Learner" page (#1920)
(@nathanielsimard): Doc: Improve module to_device/fork docs (#1901)
(@jwric, @ThierryCantin-Demers, @mepatrick73): Add documentation to burn core nn (#1746)
(@towerpark): Book: Fix typos in the name of MessagePack format (#1868)
(@Zirconium409122, @kantic): Remainder operator doc (#1836)
(@nathanielsimard): Fix wasm examples (#1824)
(@eltociear) docs: update README.md (#1810)
(@agelas): Contributor Book: Onnx to Burn Conversion (#1771)
(@benbaarber): update ARCHITECTURE.md links to project architecture section in contributor book (#1759)
(@jwric): Add hidden code snippets to guide example in Burn book [redo] (#1742)
(@mepatrick73): Fixing various syntax errors in the Burn book (#1740)
(@ThierryCantin-Demers) Add indentation to project architecture in contributing book (#1738)
(@AntBlo) Add info about enabling debugging for new contributors (#1719)
(@syl20bnr): [guide] Remove ambiguity lib vs. executable (#1649)
(@wangxiaochuTHU): Update README.md (#1696)
(@syl20bnr): [burn-book] Fix broken URL to SUPPORTED-ONNX-OPS.md (#1651)
(@syl20bnr): [burn-book] Fix typos in getting started (#1650)
(@louisfd): Many superficial fixes to the contributor book (#1644)
(@laggui) Fix guide project name in the book (#1631)
(@Gadersd): Improve grammar (#1619)
(@agelas): Docs/update contributor book (#1622)

CubeCL

(@laggui) Remove CubeCL GELU kernel example reference (moved to CubeCL repo) (#2150)
(@cBournhonesque) Convert reduce_dim_naive kernel to use the #[cube] derive macro (#2117)
(@syl20bnr): Rename revision key to rev for cubecl dependencies in Cargo.toml (#2086)
(@syl20bnr): Fix cubecl version in Cargo.toml to correctly fecth the version tag
(@louisfd): Refactor/jit cube/mask (#2075)
(@nathanielsimard): Chore/update/cubecl (#2067)
(@ArthurBrussee): Feat: Dynamic cube count dispatch (#1975)
(@nathanielsimard): Refactor cube launch + support inplace operation (#1961)
(@nathanielsimard): Feat/cube/cooperative matrix-multiply and accumulate. (#1943)
(@nathanielsimard): Refactor/cube/mutability (#1934)
(@nathanielsimard): Handle visibility in cube (#1929)
(@nathanielsimard): Feat/cube/array assign ops (#1914)
(@nathanielsimard): Feat/comptime expr (#1910)
(@nathanielsimard): Feat/cube/compile error (#1909)
(@nathanielsimard): feat cube support Array (#1907)
(@louisfd): Cube: variable reusability + refactor in cube macros (#1885)
(@nathanielsimard): Refactor the tuner to be used standalone (#1884)
(@ArthurBrussee): Add option to flush queue instead of waiting for completion. (#1864)
(@louisfd): Cube: Vectorization + simple matmul implementation (#1866)
(@ArthurBrussee): Get resources from server (#1861)
(@ArthurBrussee): Speedup client.create for small allocations. (#1858)
(@ArthurBrussee): Add a feature to initialize from an existing wgpu adapter/device/queue (#1788)
(@laggui) Fix cmma test (#1957)
(@nathanielsimard): Perf/dynamic mm (#1906)
(@mepatrick73): Feat/dynamic small pool (#1931)
(@mepatrick73): Perf/dynamic mm slice adressing (#1917)
(@mepatrick73): Feat/dynamic mm basic implementation + small refactor (#1844)
(@louisfd): Cube: CubeType (no launch) and Comptime::map (#1853)
(@louisfd, @nathanielsimard): Feat/cube/struct support (#1842)
(@nathanielsimard): [Refactor - Breaking] Refactor cube operations with better names & Support subgroup operations (#1839)
(@louisfd, @nathanielsimard): Cube: Topology constants (#1838)
(@louisfd): Cube: cleaner use of topology values (#1835)
(@louisfd): Cube: support for shared memory (#1831)
(@louisfd): Cube: support method call + prettier tensor metadata (#1829)
(@nathanielsimard): Add vectorization support into cube (#1830)
(@louisfd): Cube: support for return + conv2d early return (#1828)
(@nathanielsimard): Feat/cube/launch (#1827)
(@nathanielsimard): Update cuda-jit (#1799)
(@louisfd): Feat/cube/remaining ops (#1807)
(@louisfd): Cube: first ported kernel + comptime support + variable reuse + cleanup (#1797)
(@louisfd): Refactor/cube/vectorization (#1781)
(@louisfd, @nathanielsimard): Feat/enable cube cl (#1777)
(@nathanielsimard, @louisfd): Feat/cubecl ir (#1776)
(@louisfd): CubeCL first iteration (#1756)
(@nathanielsimard): First draft CUDA runtime (#1685)
(@nathanielsimard): Upgrade wgpu (#1692)

Miscellaneous

(@BjornTheProgrammer) Make compatible with thumbv6m-none-eabi + add raspberry pi pico example (#2096)
(@antimora): Precision option for tensor display (#2139)
(@tiruka): remove lto linker option to make build successful (#2123)
(@cBournhonesque): Add top-k accuracy (#2097)
(@tiruka): Modify contributing md scripts to solve conflicts between doc and scripts (#2107)
(@ragyabraham, @antimora): Add polars DataFrame support for Dataset (#2029)
(@tiruka): modify broken link src of ide image (#2079)
(@syl20bnr): Bump rust minimal version to 1.79
(@Haislich): Added parameter trust_remote_code to hf dataset call. (#2013)
(@laggui) Enable optimized handling of bytes (#2003)
(@nathanielsimard): Feat: Support trait with CubeCL (#1980)
(@syl20bnr): Set DEFAULT_MAX_TASKS to 1 when running tests
(@loganbnielsen) remove manual option matching (#1948)
(@jwhogg): Remove closed 'future improvements' (#1935)
(@nathanielsimard): Fix: launch without generics (#1932)
(@antimora): Update candle-core to a released version (#1913)
(@ArthurBrussee): Do not use default burn-compute features unless enabled. (#1908)
(@louisfd): clippy on rust update (#1886)
(@Icekey): LearnerBuilder "with_checkpointing_strategy" should use builder pattern (#1841)
(@nathanielsimard): Fix bench load record benchmarks (#1826)
(@jwric): Add configurable application logger to learner builder (#1774)
(@getumen) Add Clone trait to the OptimizerAdaptor and Clone implementations to the optimizers (#1770)
(@benbaarber): Replace opaque return types in optim (#1767)
(@ahmedyarub, @syl20bnr) #1747 Upgrade Rust dependencies (#1748)
(@sebhtml): Refactor: replace trait TemplateKernel by existing trait JitKernel (#1737)
(@louisfd): Autodiff Memory Management: BFS (#1710)
(@nathanielsimard): [Fusion] Support multi-precision fusion (#1718)
(@laggui) Refactor element type to be decoupled from runtime (#1693)
(@AlexErrant) Arc<EventStoreClient> to Rc<EventStoreClient> (#1668)
(@louisfd): remove JIT subsequent RNG tests (#1652)
(@antimora): Enable native sign operation for Candle backend (#1647)

Bug Fixes

(@laggui) Fix module derive with generics (#2127)
(@tiruka): modified mnist image link in the Hugging face (#2134)
(@NoahSchiro) Fix broken links in contributor book (#2061)
(@syl20bnr): Bump gix-tempfile to fix security audit on gix-fs (#2022)
(@laggui) Fix warnings when using record-backward-compat (#1977)
(@nathanielsimard): Fix: constant record loading (#1902)
(@laggui) Fix DataSerialize conversion for elements of the same type (#1832)
(@DieracDelta): Fix burn-jit compile error (#1803)
(@laggui) Fix record nested value de/serialization (#1751)
(@louisfd): fix prng bug during autotune (#1791)
(@ThierryCantin-Demers, @jwric) Fix Cargo.toml repository links (#1749)
(@AntBlo) Fix unstable tests when run concurrently (#1724)
(@lancelet) Handle ndarray matmul broadcasting (#1679)
(@laggui) Fix inverted epoch - iteration counts in valid progress (#1699)
(@NicoZweifel) fix: window -> pub window in dataset/mod.rs (#1658)

Changes

v0.13.2 - 2024-05-03

Bugfix

Fix autodiff graph memory management strategy to improve performance (#1702 #1710) @louisfd Fix matmul double broadcasting for ndarray (#1646 #1679) @lancelet

Changes

v0.13.1 - 2024-04-26

Bugfix

Fix autodiff memory leak and improve performance with a new graph memory management strategy (#1698) @nathanielsimard @louisfd Fix inplace fused operations (#1682) @nathanielsimard

Improvements

Linear 1D support, helpful for ONNX support (#1682) @nathanielsimard Upgrade wgpu to 0.19.4 (#1692) @nathanielsimard

Changes

v0.13.0 - 2024-04-12

The Burn Release 0.13 is a significant update introducing numerous new features and performance enhancements. One major change is the removal of the Sync trait implementation from most Burn types, see Core User APIs. Additionally, the release introduces several new tensor operations, module features, optimizers, as well as improvements to the autodiff backend. Notably, a new bridge mechanism facilitates runtime switching between backends, and significant work has been done on the Just-in-Time and Wgpu backends. The release also addresses numerous bug fixes, documentation improvements, infrastructure updates, CI enhancements, and miscellaneous changes to improve code quality and usability.

Core User APIs

A major change in this release is that most Burn types no longer implement the Sync trait, such as modules, optimizers, and tensors. This change should not impact users of the Learner struct for model training. However, it may affect those who implemented their own training loop and inference server. While modules, optimizers and tensors can be sent to other threads, they cannot be accessed concurrently by multiple threads. This aligns with Burn's workflow, where each tensor operation requires an owned version of the tensor. The change was made to safely reduce the number of locks needed when modifying the state of the autodiff graph, fusion state, allocation cache, and various other use cases. While not all locks have been removed, the type signature no longer poses a problem for follow-up optimizations. Note that the same tensor can still be sent to multiple threads without copying the underlying data. However it will require cloning before sending a tensor to a thread. (#1575) @nathanielsimard

Tensor

Support signed value for Tensor::arange #1238 @Nikaidou-Shinku
Add Tensor::unsqueeze_dims op (#1236) @skewballfox
Add support for Any, All operations to Tensor (#1342) @ashdtu
Add not_equal and not_equal_elem tensor ops (#1374) @laggui
Element wise min/max between a pair of tensors (#1385) @boondocklabs
Add is_close and all_close tensor operators (#1389) @antimora
Interpolate tensor operation (Inference Only) (#1246) @Nikaidou-Shinku @antimora @ashdtu
Autodiff/training support for Nearest Interpolation (#1414) @Nikaidou-Shinku @ashdtu @antimora
Add argwhere and nonzero boolean tensor ops (#1394) @laggui
Add bool() op for numerical tensor (#1402) @antimora
Tensor permute operator (#1410) @antimora
Add sign tensor operator (#1446) @antimora
Rename diagonal to eye tensor op and add missing entry for diagonal to Book tensor section (#1449) @antimora
Add prod and prod_dim tensor ops (#1460) @antimora
Add tril_mask, triu_mask and diag_mask ops (#1479) @antimora
Add flip tensor operator (#1468) @carrotflakes
Add tensor sorting operations (#1488) (#1494) @laggui
Add topk tensor operation (#1497) @laggui
Tensor expand operator (#1508) @antimora
Provide Tensor Padding Helpers (#960) (#1097) @jcmullwh @antimora
Move log_sigmoid to activation ops (#1558) @laggui
Add repeat autodiff and fusion support (#1600) @louisfd

Module

Feature Addition: PRelu Module (#1328) @Arjun31415
Implement Instance Normalization (#1321) @tushushu
Add enum module support (#1337) @laggui
Make the parameters of conv1d and conv2d public. (#1245) @Arjun31415
Parameters are now lazy initialized, so you don't need to implement both the init and init_with(record) method for training/inference. (#1539) @nathanielsimard
Support multilabel binary cross entropy (#1571)
Implement Huber loss (#1444) @WorldSEnder
Feat: Add Leaky Relu Model (#1467) @Arjun31415
Feat/swiglu (#1507) @ashdtu
Feat: transformer rotary positional encoding to transformer modules (#1604) @ashdtu

Optimizer

Add linear learning rate scheduler (#1443) @astral4
Exponential learning rate scheduler @1481 @rubenjr0
Cosine Annealing learning rate scheduler with cold restarts @1481 @rubenjr0
Add Rank0 variant to AdaptorRecordV1 and AdaptorRecordItemV1 (#1442) @carrotflakes

Train

Add multi-label classification dataset and metric (#1572) @laggui
Add learner training summary (#1591) @laggui

Backend

This release also introduces the backend bridge, a new mechanism for runtime switching between backends. While an improvement, it remains compatible with previous methods of supporting mixed precision. (#1529) @nathanielsimard

JIT

Significant effort has been devoted over the past few months to refactor the previous Wgpu backend into a shader-agnostic Just-in-Time backend. All lower-level dependencies have been abstracted into the Just-in-Time Runtime trait, requiring a compiler, compute server, and storage. The bulk of this work was carried out by @nathanielsimard and @louisfd.

Commits: #1274 #1280 #1313 #1340 #1356 #1359 #1378 #1391 #1396 #1398 #1417 #1429 #1423 #1424 #1433 #1456 #1474 #1457 #1480 #1472 #1493 #1509 #1530 #1528 #1541 #1550 #1569

Wgpu

Enable burn-fusion by default. (#1223) @nathanielsimard
Feature/autotune int ops (#1136) @agelas
Add runtime options in Wgpu init methods. (#1505) @nathanielsimard
Decent speedup of transposed convolution @louisfd

Autodiff

Extensive work has also been undertaken on Burn's autodiff backend. The backend now supports gradient checkpointing to reduce memory usage and has been refactored into a client/server architecture. These updates result in significantly less blocking when tracking gradients, enhancing performance particularly on smaller models. Furthermore, various bugs have been fixed where some graph nodes weren't used, potentially truncating the autodiff graph. Overall, these changes make the autodiff process more reliable and efficient. (#1575) (#1358) @louisfd @nathanielsimard

Candle

Upgrade to Candle 0.4.1. (#1382) @laggui

Data

Add an image folder dataset implementation. (#1232) (#1132) @laggui
Add burn::data::network::downloader. (#1283) @laggui

Import

[PyTorchRecorder] Allow multiple pattern matches in chain. (#1269) @laggui
[PyTorchRecorder] Pytorch config extraction (#1323) @antimora
[PyTorchRecorder] Pass top-level key to extract state_dict (#1300) @antimora
[PyTorchRecorder] print debug option (#1425) @antimora
[PyTorchRecorder] Truncate debug display for NestedValue (#1428) @antimora
[PyTorchRecorder] Support for non-contiguous indexes in PyTorchFileRecorder keys (#1432) @antimora
[PyTorchRecorder] Add Enum module support (#1436) @antimora
[ONNX] Parser rewrite (#1296) @skewballfox

Benchmarks

We have implemented a system that enables the comparison of backends across a variety of tasks. Currently, most of these tasks consist of micro-benchmarks, but we plan to expand the range of benchmarks in the future. To ensure Burn's portability and performance across different devices, the community can run and upload benchmarks! 🔥

Created the burnbench CLI. (#1260) @syl20bnr
Added GitHub authentication to the burnbench CLI. (#1285) @syl20bnr
Updated GitHub App ID with the official application. (#1397) @syl20bnr
Implemented benchmark upload functionality to the server. (#1381) @syl20bnr
Compiled benchmarks in a dedicated target directory. (#1435) @syl20bnr
Enhanced benchmark result presentation with a neat table and attempted to run every benchmark. (#1464) @akhildevelops
Improved access token refreshing and displayed authenticated user name. (#1483) @syl20bnr
Added system information to benchmark results. (#1495) @syl20bnr
Included Operating System information in benchmark results. (#1531) @syl20bnr
Fixed automatic fusion activation issue with Wgpu. (#1542) @syl20bnr
Tweaked and added kinds to Gelu benchmark names. (#1533) @syl20bnr
Ensured backend names in JSON reports match the burnbench CLI. (#1375) @errordeveloper @syl20bnr
Added 'all' choice to --benches and --backends options. (#1567) @syl20bnr
Revamped burnbench output for improved readability and compactness. (#1568) @syl20bnr
Added URL to browse results on the burn.dev website. (#1573) @syl20bnr

Bug Fix

Fix the pow backward pass when one of the tensor wasn't tracking the gradients. (#1225) (#1224) @nathanielsimard
Fix batch norm on the LibTorch backend when the aggregation was on the same device. (#1226) @nathanielsimard
Fix training dashboard metrics switch on Max OS & Linux (#1228) @nathanielsimard
Fix a bug introduced in (#1138) where arithmetic could fail on usize type. (#1287) @louisfd
[PyTorchRecorder] Fix out of memory bug (#1270) (#1286) @antimora
[PyTorchRecorder] Fix chain pattern matching when multiple patterns are provided (#1273) @laggui
Fix LogEventStore end epoch log (#1314) @laggui
Huggingface dataset importer: check that pa_type is valid before checking if is_binary (#1354) @laggui
Fix implicit casting of bool in wgpu backend (#1391) @louisfd
Fix Switched arguments in reshape_args_usize check (#1409) @jackdarlison
Fix tch view data corruption (#1434) @nathanielsimard
Missing Debug derive for Group Norm Config (#1482) @Arjun31415
Numerically stable log_sigmoid (#1548) @laggui
Fix pytorch recorder adapt_linear when using autodiff backend (#1576) @laggui

Infrastructure

The minimum Rust version has been updated to 1.75. (#1297) @syl20bnr

Docs

Improve the doc feature flags for docs.rs (#1212) @syl20bnr
Include the backends in the documentation (#1229) @nathanielsimard
Started the burn developer book. (#1184) @skewballfox @syl20bnr @antimora
Update TORCH_CUDA_VERSION usage. (#1284) @laggui
fix(book): add missing device parameter to mode.init(). (#1302) @apertureless
fix(book): add missing second parameter to CrosEntropyLoss constructor (#1301) @apertureless
docs(book-&-examples): modify book and examples with new prelude module (#1372) @bioinformatist
Update tensor book (#1401) @antimora
Fix book MNIST reference (no more huggingface) (#1471) @laggui
Update SUPPORTED-ONNX-OPS.md (#1547) @antimora
Update book module (#1557) @antimora
Update pytorch-model.md (#1570) @antimora
Fixes to code examples in section 5.2 (#1594) @hrishim

CI

Add a semantic versioning checker. (#1219) @Luni-4
Simplify CI binaries updating. (#1235) @Luni-4
Trigger test suite when Cargo.lock file is updated (#1326) @syl20bnr
Fix codecov and update to weekly the .dependabot file for cargo (#1320) @Luni-4
Refactor xtask (#1288) @iamricks
Fix broken test and run-checks script (#1347) @antimora
Add stale action (#1383) @Luni-4
Update Cargo.lock workflow to trigger commit checks (#1399) @syl20bnr
Use GitHub's own action to generate GitHub App token (#1437) @syl20bnr
Add support for cargo metadata new workspace member format (#1500) @syl20bnr
Switch codecov to informational mode (#1540) @syl20bnr
Migrate workflows to use Blaze runners (#1596) @dcvz
Add a retry on adding ppa for kisak (#1599) @dcvz

Tests

Add NaN and Inf detection in assert_approx_eq to catch potential numerical bugs. (#1209) @skewballfox

Misc

Make all struct CamelCase (#1316) (#1311) @antimora
Move burn crates to their own crates directory (#1336) @syl20bnr
Add sub-crates as members of workspace (#1348) @antimora
Pytorch message updates (#1344) @antimora
Chore: update main README links to crate-specific READMEs (#1415) @ekalosak
[Wasm] remove exit in scripts (#1543) @AlexErrant
Use num-traits for float ops (#1584) @antimora

Changes

v0.12.1 - 2024-02-01

Bugfix

Fix wgpu performance issue: revert to wgpu 0.18.0 #1221 @nathanielsimard Fix problem with batch norm on LibTorch backend #1226 @nathanielsimard Fix docs build #1212 #1229 @syl20bnr @nathanielsimard Fix training dashboard metrics switch #1228 @nathanielsimard

Chores

Put all dependencies versions in workspace #1210 @nathanielsimard

Changes

v0.12.0 - 2024-01-31

This release highlights an optimized Wgpu Backend, clearer examples and documentation, and numerous bug fixes. Notably, breaking changes in device management mandate explicit device specification to prevent potential bugs. Additionally, the new PyTorch recorder simplifies model porting by enabling automatic import of PyTorch's weights. We also put a lot of efforts into improving our CI infrastructure for enhanced reliability, efficiency, and scalability.

Changes

Tensor & Module API

Added support for generic modules #1147 @nathanielsimard
Added support for tuple modules #1186 @varonroy
Enabled loading PyTorch .pt (weights/states) files directly to module's record, currently available on Linux & MacOS #1085 @antimora
Added mish and softplus activation functions #1071 @pacowong
Improved chunk performance in backends @1032 @Kelvinyu1117
[Breaking] Added the device as an argument for tensor operations that require it, replacing the previous optional device usage #1081 #518 #1110 @kpot
- Code update involves either using Default::default for the same behavior or specifying the desired device.
Allowed raw tensors to be serialized/deserialized directly with serde #1041 @jmacglashan
[Breaking] Forced the choice of the device for deserialization #1160 #1165 @nathanielsimard
Added element-wise pow operation #1133 @skewballfox
Refactored the tensor backend API names #1174 @skewballfox
[Breaking] Changed the default recorder to NamedMpkFileRecorder #1161 #1151 @laggui
- After a bit of exploration, we removed any type of compression because it adds to much overhead

Examples & Documentation

Updated the text-classification example #1044 @nathanielsimard
Fixed import and type redefinitions in mnist-web-inference #1100 @syl20bnr
Fixed documentation of Tensor::stack #1105 @PonasKovas
Fixed some typos in links in the burn-book #1127 @laggui
Added an example for a custom CSV dataset #1129 #1082 @laggui
Fixed missing ticks in Burn book and removed unused example dependency #1144 @laggui
Added a new example for regression problems #1150 #1148 @ashdtu
Added model saving and loading examples in the book #1164 #1156 @laggui
Added Rust concept notes and explanations to the Burn Book #1169 #1155 @laggui
Fixed jupyter notebook and ONNX IR example #1170 @unrenormalizable
Added a custom mnist dataset, removing the Python dependency for running the guide and the mnist example #1176 #1157 @laggui
Updated documentation and book sections on PyTorch import #1180 @antimora
Updated burn-book with improved tensor documentation #1183 #1103 @ashdtu
Updated burn-book with a new dataset transforms section #1183 #1154 @ashdtu
Update CONTRIBUTING.md with code guidelines. #1134 @syl20bnr
Fixed documentation of Multi Head Attention #1205 @ashdtu

Wgpu Backend

Optimized the repeat operation with a new kernel #1068 @louisfd
Improved reduce autotune by adding the stride to the autotune key #1070 @louisfd
Refactored binary operations to use the new JIT compiler IR #1078 @nathanielsimard
Added persistent cache for autotune #1087 @syl20bnr

Fusion

Refactored burn-fusion, making it possible to eventually save the JIT state #1104 @nathanielsimard
Improved fusion in the Wgpu backend with caching #1069 @nathanielsimard
Supported fusing int operations with burn-fusion #1093 @nathanielsimard
Supported automatic vectorization of operations fused with burn-fusion in WGPU #1123 #1111 @nathanielsimard
Supported automatically executing in-place operations fused with burn-fusion in WGPU #1128 #1124 @nathanielsimard
Heavily refactored burn-fusion to better reflect the stream optimization process #1135 @nathanielsimard
Heavily refactored burn-fusion to save all execution plans for any trigger #1143 @nathanielsimard
Supported multiple concurrent optimization streams #1149 #1117 @nathanielsimard
Supported overlapping optimization builders #1162 @nathanielsimard
Supported fusing ones, zeroes, and full operations #1159 @nathanielsimard
Supported autotuning fused element-wise kernels #1188 #1112 @nathanielsimard

Infra

Support testing accelerate(MacOS) on the burn-ndarray backend #1050 @dcvz
Improved CI output by introducing groups #1024 @dcvz
Updated scheduled CI tasks #1028 @Luni-4
Added support for Windows Pipeline #925 @Luni-4
Fixed CI for testing the wgpu backend by pinning versions #1120 @syl20bnr
Fixed burn-compute build command with no-std #1109 @syl20bnr
Temporarily disabled unnecessary steps on Windows runners to save CI time #1107 @syl20bnr
Refactored serialization of backend comparison benchmarks #1131 @syl20bnr
Fixed doc build on docs.rs #1168 @syl20bnr
Added cargo xtask commands for dependencies and vulnerabilities checks #1181 #965 @syl20bnr
Added cargo xtask command to manage books #1192 @syl20bnr

Chore

Shared some properties across the cargo workspace #1039 @dcvz
Formatted the codebase with nightly where the stable version falls short #1017 @AlexErrant
Improved panic messages on the web @1051 @dcvz
Used web-time in wasm #1060 @sigma-andex
Refactored some tensor tests #1089 @nathanielsimard
Made Embedding weights public #1094 @unrenormalizable
Updated candle version and added support for slice_assign #1095 @louisfd
Records no longer require Debug and Clone #1137 @nathanielsimard
Removed cargo warning #1108 @syl20bnr
Updated wgpu version to 0.19.0 #1166 @nathanielsimard
Added tests for Slice assign vs Cat in LSTM backward #1146 @louisfd
Updated xtask publish task #1189 @Luni-4
Enable dependabot daily #1195 @Luni-4
Updated Ratatui version #1204 @nathanielsimard
Updated tch version #1206 @laggui

Bug Fixes

Fixed a slice issue in the LibTorch backend that could corrupt tensors' data #1064 #1055 @nathanielsimard
Fixed issues with tensor stack and reshape on ndarray #1053 #1058 @AuruTus
Fixed multithread progress aggregation in dataloader #1083 #1063 @louisfd
Resolved a numerical bug with tanh on MacOS with Wgpu #1086 #1090 @louisfd
Fixed burn-fusion, where only operations followed by a sync were fused #1093 @nathanielsimard
Removed the requirement for users to add serde as a dependency for Burn #1091 @nathanielsimard
Fixed transformer prenorm on the residual path #1054 @Philonoist
Fixed conv2d initialization by supporting fan_out #1138 @laggui
Resolved the problem of sigmoid gradient generating NaN #1140 #1139 @wcshds
Fixed FullPrecisionSettings type for integers #1163 @laggui
Fixed batchnorm not working properly when training on multiple devices #1167 @wcshds
Fixed powf function in WGPU, with new tests #1193 #1207 @skewballfox @louisfd
Fixed regex in PyTorch Recorder #1196 @antimora

Changes

v0.11.1 - 2023-12-04

Burn v0.11.1 fixes a few bugs in the recent v0.11.0

Bugfixes

Fix concurrency issue in burn-fusion, related to freeing tensors that are never read @nathanielsimard

Fix typos in the book @shanmo

Fix Readme @nathanielsimard

Fix docs build @dcvz

Thanks

Thanks to all aforementioned contributors

Changes

v0.11.0 - 2023-12-01

The main feature of Burn v0.11.0 is automatic kernel fusion, which is still in active development but already usable. Many enhancement and new features have been added throughout the framework, for better efficiency and reliability.

Warnings:

There are some breaking changes, see below.
The organization has been renamed from burn-rs to tracel-ai.

Changes

Overall changes

[Breaking] Refactor backend names @nathanielsimard
[Breaking] Updated the feature flags of burn to improve usability @nathanielsimard
Update of Burn's Readme @nathanielsimard @louisfd

Burn Fusion

Innovative automatic kernel fusion algorithm @nathanielsimard
Relative computation graph cache @nathanielsimard

Burn Core

GroupNorm module @dcvz
Allow for int and bool constant tensors in modules @nathanielsimard
Quiet softmax in transformers @wbrickner

Burn Tensor

New operators in tensor API: unsqueeze_dim, narrow, stack, chunk, tril, triu @dcvz
Recip operation support on all backends @gzsombor
Implement DoubleEndedIterator for DimIter @wcshds

Burn Compute

Major Autotune refactor @louisfd

Burn Import

ONNX Support for Gather @CohenAriel
ONNX Support for Cos, Exp, Gelu, Log, Neg @antimora
ONNX Support ConvTranspose2D @npatsakula, @antimora,
ONNX Support for Sqrt @edmondop
Support count_include_pad attr in avg_pool2d ONNX @antimora

Burn Train

Add warmup consideration for estimated training time @nathanielsimard

Burn WGPU

New Matmul kernels @louisfd
New Reduce kernel @louisfd
Add Autotune capabilities to Matmul and Reduce @louisfd
Support of kernel fusion for element-wise operations @nathanielsimard @louisfd

Burn Candle

Support conv_transpose_1d @louisfd
Enable accelerate for MacOS CPU @dcvz

Backend Comparison

Custom Gelu benchmarks @nathanielsimard
Persistence of results in json @louisfd

Bugfixes

Allow arbitrary precision threshold for float equality assertion @meteor-lsw
Update serde_rusqlite to the new version with MIT/Apache2 license @antimora
Fix SQLite database tests on Windows @syl20bnr
Fix max_dim and min_dim tensor operations @gzsombor
Fix inplace double binary broadcasting in the LibTorch backend @nathanielsimard

Documentation

Add Python details in the Book's getting started @antimora
Miscellaneous Book fixes @syl20bnr @mks-h

Continuous Integration

Add test coverage @Luni-4
Speedup typos check @Luni-4
Dependency checks @Luni-4
Vulnerability checks @Luni-4

Thanks

Thanks to all aforemetioned contributors.

Changes

v0.10.0 - 2023-10-24

Burn v0.10.0 sees the addition of the burn-compute crate to simplify the process of creating new custom backends, a new training dashboard and the possibility of using the GPU in the browser along with a web demo. Additionally, numerous new features, bug fixes, and CI improvements have been made.

Warning: there are breaking changes, see below.

Changes

Burn Compute

Introduction of burn-compute, a new Burn crate making it easier to create async backends with custom kernels. @nathanielsimard, @louisfd
Add new memory management strategies @louisfd, @nathanielsimard
Add autotune capabilities @louisfd

Burn Import

Add more ONNX record types @antimora
Support no-std for ONNX imported models @antimora
Add custom file location for loading record with ONNX models @antimora
Support importing erf operation to ONNX @AuruTus

Burn Tensor

Add covariance and diagonal operations @ArvidHammarlund
[Breaking] Reading operations are now async when compiling to wasm, except when wasm-sync feature is enabled. @nathanielsimard @AlexErrant
[Breaking] Improved Clamp API @nathanielsimard
Add unfold tensor operation @agelas, @nathanielsimard
Improve tensor display implementation with ellipsis for large tensors: @macroexpansion

Burn Dataset

Improved speed of SqLite Dataset @antimora
Use gix-tempfile only when sqlite is enabled @AlexErrant

Burn Common

Add benchmark abstraction @louisfd
Use thread-local RNG to generate IDs @dae

Burn Autodiff

Use AtomicU64 for node ids improving performance @dae

Burn WGPU

Enable non-blocking reads when compiling to wasm to fully support WebGPU @nathanielsimard
Add another faster matmul kernel @louisfd
[Breaking] Massive refactor to use burn-compute @nathanielsimard

Burn Candle

Candle backend is now available as a crate and updated with Candle advances @louisfd @agelas

Burn Train

New training cli dashboard using ratatui @nathanielsimard
[Breaking] Heavy refactor of burn-train making it more extensible and easier to work with @nathanielsimard
Checkpoints can be customized with criteria based on collected metrics @nathanielsimard
Add the possibility to do early stopping based on collected metrics @nathanielsimard

Examples

Add image classifier web demo using different backends, including WebGPU, @antimora

Bugfixes

Epoch and iteration were swapped. (#838) @daniel-vainsencher
RNN (Gru & LSTM) were not generic over the batch size @agelas, @EddieMataEwy
Other device adaptors in WGPU were ignored when best available device was used @chistophebiocca

Documentation

Update book @nathanielsimard
Doc improvements with std feature flag: @ArvidHammarlund

Chores

Update all dependencies @antimora
Lots and lots of CI Improvements with coverage information @Luni-4, @DrChat, @antimora, @dae, @nathanielsimard

Thanks

Thanks to all aforemetioned contributors and to our sponsors @smallstepman, @0x0177b11f and @premAI-io.

Changes

v0.9.0 - 2023-09-06

Burn v0.9.0 sees the addition of the Burn Book, a new model repository, and many new operations and optimizations.

Burn Book

The Burn Book is available at https://burn-rs.github.io/book/

Burn Book setup and plan @nathanielsimard @wdoppenberg @antimora
Motivation & Getting started @louisfd @nathanielsimard
Basic Workflow: from training to inference @nathanielsimard @louisfd
Building blocks @nathanielsimard
ONNX models @antimora
Advanced sections @nathanielsimard

Model repository

The Model repository is available at https://github.com/burn-rs/models

Setup @nathanielsimard
Add SqueezeNet @antimora
Multiple models made with Burn @gadersd
- Llama 2
- Whisper
- Stable Diffusion v1.4

Changes to Burn

Neural networks

Three new optimizers
- AdamW @wdoppenberg
- AdaGrad @CohenAriel
- RMSProp @AuruTus
Custom initializer for transformer-related modules @wbrickner
Cross Entropy with label smoothing and weights @ArvidHammarlund

Tensors

Many new operators
- cast @trfdeer @nathanielsimard
- clamp, clamp_min, clamp_max @antimora
- abs @mmalczak
- max_pool1d, max_pool with dilation @caiopiccirillo
- adaptive_avg_pool 1d and 2d @nathanielsimard
- conv_transpose 1d and 2d, with backward @nathanielsimard
- Not operator @louisfd
- Dim iterator @ArvidHammarlund
More tests for basic tensor ops @louisfd

Training

New training metrics @Elazrod56
- CPU temperature and use
- GPU temperature
- Memory use
Custom training and validation metric loggers @nathanielsimard
Migration from log4rs to tracing, better integration in a GUI app @dae
Training interruption @dae
New custom optimize method @nathanielsimard

Backends

WGPU backend
- Autotune @louisfd @nathanielsimard
  - Cache optimization @agelas
- Pseudo-random number generator @louisfd
- Fix configs @nathanielsimard
- Matmul optimization @louisfd
ndarray backend
- Optimization of argmin/argmax @DrChat
- Optimization of conv2d @DrChat
Candle backend @louisfd
- Support for all basic operations
- Work in progress

Dataset

Option for with or without replacement in dataset sampler @nathanielsimard

Import & ONNX

Refactor, performance, tests and fixes @antimora @Luni-4 @nathanielsimard, @gadersd
New operators @Luni-4 @antimora @AuruTus
- Reshape
- Transpose
- Binary operators
- Concat
- Dropout
- Avg pool
- Softmax
- Conv1d, Conv2d
- Scalar and constants
- tanh
- clip

Fix

Hugging Face downloader Windows support @Macil
Fix grad replace and autodiff backward broadcast @nathanielsimard
Fix processed count at learning completion @dae
Adjust some flaky tests @dae
Ability to disable experiment logging @dae

Configuration

Rewrite publish and checks scripts in Rust, with cargo-xtask @luni-4 @DrChat
Add Typos verification to checks @caiopiccirillo @antimora
Checks for Python and venv environment @mashirooooo
Feature flags for crates in different scenarios @dae

Documentation

Configuration doc for vscode environment setup @caiopiccirillo
Jupyter notebook examples @antimora
Readme updated @louisfd

Thanks

Thanks to all aforemetioned contributors and to our sponsors @smallstepman and @premAI-io.

Changes

v0.8.0 - 2023-07-25

In this release, our main focus was on creating a new backend using wgpu. We greatly appreciate the meaningful contributions made by the community across the project. As usual, we have expanded the number of supported operations.

Changes

Tensor

Added Max/Minimum operation @nathanielsimard
Added average pooling 1D operation @nathanielsimard
Added Gather/Scatter operations @nathanielsimard
Added Mask Where operation @nathanielsimard
Refactor index related operations @nathanielsimard
- index, index_assign => slice, slice_assign
- index_select, index_select_assign => select, select_assign
New syntax sugar for transpose @wbrickner
Added SiLU activation function @poxxy

Dataset

Added a dataset using Sqlite for storage. Now used to store huggingface datasets. @antimora
New speech command audio dataset. @antimora
Create python virtual environment for huggingface dependencies. @dengelt

Burn-Import

Big refactor to make it easier to support new operations. @nathanielsimard
Support bool element type. @maekawatoshiki
Added Add operator. @luni-4
Added MaxPool2d operator. @luni-4
Parse convolution 2D config. @luni-4
Added sigmoid operation. @luni-4

Backend

New burn-wgpu backend 🔥! @nathanielsimard @louisfd
- Tile 2D matrix multiplication
- All operations are supported
Improve performance of repeat with the tch backend. @nathanielsimard

Neural Networks

Added LSTM module. @agelas
Added GRU module. @agelas
Better weights initialization with added support for Xavier Glorot. @louisfd
Added MSE loss. @bioinformatist
Cleanup padding for convolution and pooling modules. @luni-4
Added sinusoidal positional embedding module. @antimora

Fix

Deserialization of constant arrays. @nathanielsimard
Concat backward with only one dim. @nathanielsimard
Conv1d stride hardcoded to 1. @antimora
Fix arange with the tch backend. @nathanielsimard

Documentation

Improve documentation across the whole project ♥! @antimora

Thanks

Thanks to all contributors and to the sponsor @smallstepman.

Changes

v0.7.0 - 2023-05-06

Serialization

Serialization has been completely revamped since the last release. Modules, Optimizers, and Learning Rate Scheduler now have an associative type, allowing them to determine the type used for serializing and deserializing their state. The solution is documented in the new architecture doc.

State can be saved with any precision, regardless of the backend in use. Precision conversion is performed during serialization and deserialization, ensuring high memory efficiency since the model is not stored twice in memory with different precisions.

All saved states can be loaded from any backend. The precision of the serialized state must be set correctly, but the element types of the backend can be anything.

Multiple (de)serialization recorders are provided:

Default (compressed gzip with named message pack format)
Bincode
Compressed gzip bincode
Pretty JSON

Users can extend the current recorder using any serde implementation.

Multiple precision settings are available:

Half (f16, i16)
Full (f32, i32)
Double (f64, i64)

Users can extend the current settings using any supported number type.

Optimizer

The optimizer API has undergone a complete overhaul. It now supports the new serialization paradigm with a simplified trait definition. The learning rate is now passed as a parameter to the step method, making it easier to integrate the new learning rate scheduler. The learning rate configuration is now a part of the learner API. For more information, please refer to the documentation.

Gradient Clipping

You can now clip gradients by norm or by value. An integration is done with optimizers, and gradient clipping can be configured from optimizer configs (Adam & SGD).

Learning Rate Scheduler

A new trait has been introduced for creating learning rate schedulers. This trait follows a similar pattern as the Module and Optimizer APIs, utilizing an associative type that implements the Record trait for state (de)serialization.

The following learning rate schedulers are now available:

Noam learning scheduler
Constant learning scheduler

Module

The module API has undergone changes. There is no longer a need to wrap modules with the Param struct; only the Tensor struct requires a parameter ID.

All modules can now be created with their configuration and state, eliminating the unnecessary tensor initializations during model deployment for inference.

Convolution

Significant improvements have been made to support all convolution configurations. The stride, dilation, and groups can now be set, with full support for both inference and training.

Transposed convolutions are available in the backend API but do not currently support the backward pass. Once they are fully supported for both training and inference, they will be exposed as modules.

Pooling

The implementation of the average pooling module is now available.

Transformer

The transformer decoder has been implemented, offering support for efficient inference and autoregressive decoding by leveraging layer norms, position-wise feed forward, self-attention, and cross-attention caching.

Tensor

The developer experience of the Tensor API has been improved, providing more consistent error messages across different backends for common operations. The Tensor struct now implements Display, allowing values, shape, backend information, and other useful details to be displayed in an easily readable format.

New operations

The flatten operation
The mask scatter operation

Torch Backend

The Torch backend now supports bf16.

ONNX

The burn-import project now has the capability to generate the required Burn code and model state from an ONNX file, enabling users to easily import pre-trained models into Burn. The code generation utilizes the end user API, allowing the generated model to be fine-tuned and trained using the learner struct.

Please note that not all operations are currently supported, and assistance from the community is highly appreciated. For more details, please refer to the burn-import repository https://github.com/burn-rs/burn/tree/main/burn-import.

Bug Fixes

Backward pass issue when there is implicit broadcasting in add burn-rs/burn#181

Thanks 🙏

Thanks to all contributors @nathanielsimard , @antimora, @agelas, @bioinformatist, @sunny-g Thanks to current sponsors: @smallstepman

Changes

v0.6.0 - 2023-03-21

Backend API

Almost all tensor operations now receive owned tensors instead of references, which enables backend implementations to reuse tensor-allocated memory.
Backends now have a different type for their int tensor, with its own set of operations.
Removed the IntegerBackend type.
Simpler Element trait with fewer functions.
New index-related operations (index_select , index_select_assign , index_select_dim and index_select_dim_assign).

Tensor API

The Tensor struct now has a third generic parameter Kind with a default value of Float.
There are three kinds of tensors: Float, Bool, and Int,
- Float Tensor ⇒ Tensor<B, D> or Tensor<B, D, Float>
- Bool Tensor ⇒ Tensor<B, D, Bool>
- Int Tensor ⇒ Tensor<B, D, Int>
You still don’t have to import any trait to have functions enabled, but they have an extra constraint based on the kind of tensor, so you can’t call matmul on a bool tensor. All of it with zero match or if statement, just pure zero-cost abstraction.
The BoolTensor struct has been removed.

Autodiff

Not all tensors are tracked by default. You now have to call require_grad.
The state is not always captured. Operations manually have to clone the state they need for their backward step. This results in a massive performance enhancement.

No Std

Some Burn crates don't require std anymore, which enables them to run on any platform:
- burn-core
- burn-ndarray
- burn-common
- burn-tensor
We have a WebAssembly demo with MNIST inference. The code is also available here with a lot of details explaining the process of compiling a model to WebAssembly.

Performance

The Tch backend now leverages in-place operations.
The NdArray backend now leverages in-place operations.
The convolution and maxpooling layers in the NdArray backend have been rewritten with much better performance.
The cross-entropy loss module leverages the new index_select operation, resulting in a big performance boost when the number of classes is high.

And of course, a lot of fixes and enhancements everywhere.

Thanks to all the contributors for their work @antimora @twitchax @h4rr9

Changes

v0.5.0 - 2023-02-12

New Modules for Vision Tasks

Conv1D, Conv2D currently without support for stride, dilation, or group convolution
MaxPool2D
BatchNorm2D

New General Tensor Operations

log1p thanks to @bioinformatist
sin, cos, tanh thanks to @makroiss

Breaking Changes

Devices are now passed by reference, thanks to feedback from @djdisodo.
The shape function now returns an owned struct, and backends no longer need to cache each shape.

Changes

v0.4.0 - 2022-12-30

Changes

v0.3.0 - 2022-11-20

Separed backend crates

Changes

xpe/burn-CHANGELOG-v0.15.0.md

v0.15.0 - 2024-10-28

Summary

Module & Tensor

Bug Fixes

Backends

Bug Fixes

Documentation & Examples

Fixes

ONNX Support

Enhancements

Refactoring

Miscellaneous

v0.14.0 - 2024-08-27

Summary

Module & Tensor

Bug Fixes

ONNX Support

Bug Fixes

Enhancements

Refactoring

Documentation & Examples

CubeCL

Miscellaneous

Bug Fixes

v0.13.2 - 2024-05-03

Bugfix

v0.13.1 - 2024-04-26

Bugfix

Improvements

v0.13.0 - 2024-04-12

Core User APIs

Tensor

Module

Optimizer

Train

Backend

JIT

Wgpu

Autodiff

Candle

Data

Import

Benchmarks

Bug Fix

Infrastructure

Docs

CI

Tests

Misc

v0.12.1 - 2024-02-01

Bugfix

Chores

v0.12.0 - 2024-01-31

Changes

Tensor & Module API

Examples & Documentation

Wgpu Backend

Fusion

Infra

Chore

Bug Fixes

v0.11.1 - 2023-12-04

Bugfixes

Thanks

v0.11.0 - 2023-12-01

Changes

Overall changes

Burn Fusion

Burn Core

Burn Tensor

Burn Compute

Burn Import

Burn Train

Burn WGPU

Burn Candle

Backend Comparison

Bugfixes

Documentation

Continuous Integration