Variables are immutable by default. This makes Rust safer and makes concurrency easier.
Immutable means once a value is bound to that variable, it cannot be changed.
For example:
fn main() {
let x = 5;
println!("The value of x is: {}", x);
x = 6;
println!("The value of x is: {}", x);
}
produces the following output
$ cargo run
Compiling variables v0.1.0 (file:///projects/variables)
error[E0384]: cannot assign twice to immutable variable `x`
--> src/main.rs:4:5
|
2 | let x = 5;
| -
| |
| first assignment to `x`
| help: consider making this binding mutable: `mut x`
3 | println!("The value of x is: {}", x);
4 | x = 6;
| ^^^^^ cannot assign twice to immutable variable
For more information about this error, try `rustc --explain E0384`.
error: could not compile `variables` due to previous error
Variables can be made mutable by adding the mut
keyword in front of the variable name.
fn main() {
let mut x = 5;
println!("The value of x is: {}", x);
x = 6;
println!("The value of x is: {}", x);
}
$ cargo run
Compiling variables v0.1.0 (file:///projects/variables)
Finished dev [unoptimized + debuginfo] target(s) in 0.30s
Running `target/debug/variables`
The value of x is: 5
The value of x is: 6
Constants can be declared using the const
keyword instead of let
. For constants, the type must be explicitly stated.
Note: mut
cannot be used with constants\
fn main() {
const THREE_HOURS_IN_SECONDS: u32 = 60 * 60 * 3;
}
A new variable can be declared with the same name as the original. We say the original variable is shadowed by the new variable. let
must be used when shadowing.
fn main() {
let x = 5;
let x = x + 1;
{
let x = x * 2;
println!("The value of x in the inner scope is: {}", x);
}
println!("The value of x is: {}", x);
}
Note: when the inner scope finishes x
will return to 6.
Shadowing is different to making a variable mutable. If let
is removed a compile-time error will appear. Shadowing allows us to do some transformations on a variable then have it return to immutable after. Therefore shadowing effectively makes a new variable
Rust is statically typed, this means it must know the types if all variables at compile time. The compiler can usually infer the type, but sometimes type annotation must be added.
let guess: u32 = "42".parse().expect("Not a number");
This defines guess
as a u32
. We have to do this as the parse()
method could be many different types of number.
A scalar type represents a single value. Rust has four scalar types: integers, floating-point numbers, Booleans and characters.
Signed integers are in the format ixxx
and unsigned are in the format uxxx
, where xxx
is the number of bits. The number of bits allowed in Rust are:
- 8
- 16
- 32
- 64
- 128
An n-bit signed integer, numbers from -(2n-1) to (2n-1-1) inclusive. For example, i8
can store from -128 to to 127 (-27 to 27-1). Unsigned integers can store from 0 to 2n-1, so a u8
can store from 0 to 255 (28-1)
Note: signed numbers are stored using two's compliment
There also exist integer types which scale to the word size of the system
isize
usize
So on a 64-bit system, this would be 64-bits
Integer literals can be written in different bases.
- Decimal ->
98_222
- Hex ->
0xff
- Octal ->
0o77
- Binary ->
0b1111_0000
- Byte (
u8
only) ->b'A'
Note: _
is a visual separator to make the number easier to read
Number literals that can be multiple types can have a type allocation suffix, for example 57u8
defines the value 57 as an unsigned 8-bit integer.
Rust has two primitive floating-point numbers, f32
and f64
. The default type is f64
as on modern CPUs it is the same speed as f32
but with higher precision.
As expected. Declared with bool
Characters are declared with the char
keyword and single quotes.
fn main(){
let c = 'z';
}
The character type is 4 bytes and represents a Unicode Scalar Value.
Compound types can group multiple values into one type.
A tuple is a general list of different types. They have a fixed size.
fn main(){
let tup: (i32, f64, u8) = (500, 6.4, 1);
}
Note: the type annotation is optional
The identifier tup
binds to the whole object. Pattern matching can be used to destruct a tuple; this is called destructuring. The individual elements can be retrieved by passing the index.
fn main(){
let tup = (500, 6.4, 1);
let (x, y, z) = tup; // x = 500, y = 6.4, z = 1
let a = tup.0;
let b = tup.1;
let c = tup.2;
}
A tuple without any values, let x = ();
, is a special type that has only one value, written ()
. This type is called the unit type and the value is called the unit value. Expressions implicitly return the unit value if they don't return any other value.
Every element in an array must have the same type, and the array has a fixed length.
Values in an array are written as comma-separated inside square brackets.
fn main(){
let a = [1, 2, 3, 4, 5];
}
Arrays are allocated on the stack.
Array type annotation is defined using a type (or value) and the length.
fn main(){
let a: [i32, 5] = [1, 2, 3, 4, 5]; // An array of length 5 of type i32
let b = [3; 5]; // = [3, 3, 3, 3, 3];
}
Array elements can be accessed using square brackets.
fn main(){
let a = [1, 2, 3, 4, 5];
let first = a[0];
let last = a[4];
}
Index out of bounds panic occurs at runtime. The given index is checked to see if it is less than the array length.
Other compound types include structs
and enums
.
The main
function is the entry point for the program. The fn
keyword is used to declare functions.
Function parameters can be passed when defining the function as such:
fn main(){
another_function(5);
}
fn another_function(x: i32){
println!("The value of x is: {}", x);
}
In function declarations parameter types must be defined.
Rust is an expression-based language. Statements only return the unit value, whereas expression return the result of the expression. An example statement is let x = 10;
. An expression could be 5+6
; this returns the value 11.
Calling a function or macro is an expression. Expressions do not include an ending semicolon,
otherwise they will be treated as statements. An example of this is as following with the line x+1
.
fn main(){
let y = {
let x = 3;
x+1
};
println!("The value of y is {}", y);
}
Returned values are not named, but their type is defined using an arrow.
fn five() -> i32{
5
}
This value will return the number 5.
fn main(){
let number = 3;
if number < 5{
println!("number < 5");
} else if number == 5{
println!("number = 5");
}else {
println!("number > 5");
}
}
If expressions can be used to assign variables a value.
fn main(){
let condition = true;
let number = if condition {5} else {6};
}
Rust has three different types of loops: loop
, while
and for
.
loop
repeats endlessly until it is explicitly told to stop with the break
keyword.
while
repeats until a condition evaluates until false.
for
repeats a certain number of times.
The continue
keyword will skip the rest of the loop and go to the next iteration. Loop labels can be used with break
and continue
to operate on a specific loop.
fn main() {
let mut count = 0;
'counting_up: loop {
println!("count = {}", count);
let mut remaining = 10;
loop {
println!("remaining = {}", remaining);
if remaining == 9 {
break;
}
if count == 2 {
break 'counting_up;
}
remaining -= 1;
}
count += 1;
}
println!("End count = {}", count);
}
Values can be returned from loops. This can be useful to check if a thread has finished.
fn main(){
let mut counter = 0;
let result = loop{
counter += 1;
if counter == 10 {
break counter * 2;
}
};
}
The while loop uses a condition to check if the loop should continue.
fn main(){
let mut counter = 3;
while number != 0{
println!("{}!", number);
number -= 1;
}
println!("LIFTOFF");
}
A for loop can iterate through a collection or run a specific amount of times.
fn main(){
let a = [10, 20, 30, 40, 50];
for element in a{
println!("The value is {}", element);
}
for number in (1..4).rev(){
println!("{}", number);
}
}
Rust has no garbage collection but works on a basis of ownership.
The stack is LIFO. Data is push
ed to the stack and pop
removes it from the stack.
All data on the stack must have a known, fixed size. Data with an unknown size, or a size that might change must be put on the heap. When data is allocated on the heap, the memory allocator finds a block of data the size that has been requested and returns a pointer to this a pointer to this memory location.
Pushing to the stack is much faster than allocating to the heap.
Accessing data in the heap is much slower than accessing data on the stack.
When calling a function, values passed into the function and the function's local variables get pushed onto the stack. When the function is over, those data get popped off the stack. This is of course an oversimplification as this is optimised using registers and other methods within the compiler; but this is out of the scope of this sheet.
- Each value in Rust has a variable called its owner
- There can only be one owner at a time
- When the owner goes out of scope, the value will be dropped
A scope is the range within a program of which an item is valid.
#![allow(unused)]
fn main(){
let s = "hello";
// s is valid here until }
}
The value s
refers to a string literal. The variable is valid from where it is declared to the end of the current scope.
To create a mutable string requires the following declaration.
fn main(){
let mut s = String::from("hello");
s.push_str(", world!"); // push_str() appends to the string
println("{}", s);
}
Each string contains 3 pieces of data: a pointer to the character array, the length and the capacity.
For let s1 = String::from("hello");
, the data would be:
Name | Value |
---|---|
ptr | 0x... |
len | 5 |
capacity | 5 |
This data is stored on the stack.
The character array at 0x...
would appear as:
Index | Value |
---|---|
0 | h |
1 | e |
2 | l |
3 | l |
4 | o |
This is stored on the heap.
As the size of string literals is known at compile time the memory can be allocated and the data can be hard coded into the executable.
As the size of a mutable string is unknown at compile time, memory must be allocated onto the heap. Therefore
- The memory must be requested from the OS at runtime
- The memory must be returned to the OS when the program is finished using it
The first part is done automatically when the string is created. This is common across many programming languages.
The second is more difficult. In languages with a garbage collector this is done automatically. However Rust has no garbage collector. In languages without a garbage collector this is done manually and for every allocation a free
is required to prevent excess memory use or premature memory de-allocation. Rust automatically de-allocates memory when it goes out of scope. In Rust, the command for memory de-allocation is drop
and this is often done automatically at the end of a scope.
Multiple variables can interact with the same data.
For primitive data types, if a variables is assigned to another primitive then the value is simply copied to the second one. This is because of the Copy
trait.
let x = 5;
let y = x;
However, for non-primitive types (for example, strings), this is not the case.
let s1 = String::from("hello");
let s2 = s1;
When we assign s1
to s2
the string data on the stack (ptr
, len
, capacity
) are copied to s2
but the data on the heap is not copied. This makes it more efficient as the whole character array is not recreated.
This can cause a problem if both strings go out of scope at once. Both s1
and s2
will try to free the same memory on the heap (using drop
); this is known as a double free error. This is a memory safety bug and can lead to memory corruption and/or security vulnerabilities.
Rust avoids this issue by classing s1
to be no longer valid; therefore it doesn't need to be freed once it is out of scope.
For example, if you try to use s1
after s2
has be initialised, it will thrown an error.
let s1 = String::from("hello");
let s2 = s1;
println!("{}, world!", s1);
error[E0382]: use of moved value: `s1`
--> src/main.rs:5:28
|
3 | let s2 = s1;
| -- value moved here
4 |
5 | println!("{}, world!", s1);
| ^^ value used here after move
|
= note: move occurs because `s1` has type `std::string::String`, which does not implement the `Copy` trait
This is effectively a shallow copy, but because s1
is invalidated, it is called a move
If we do want to create a deep copy of an object, we can use the clone method.
let s1 = String::from("hello");
let s2 = s1.clone();
println!("s1 = {}, s2 = {}", s1, s2);
This means the heap is being copied and so is less efficient an more expensive.
The Copy
trait can be added to types which are stored entirely on the stack (i.e. fixed size). Copy
cannot be implemented if the type or any of its types implement Drop
.
The semantics of passing a value to a function are similar to those of assigning a value to a variable. Passing a variable to a function will move or copy it, the same as assignment.
fn main(){
let s = String::from("hello"); // s comes into scope
take_ownership(s); // s moves into the function and is no longer valid in this scope
let x = 5; // x comes into scope
makes_copy(x); // x will be copied into the function as it is an i32 and so it
// can still be used in this scope
} // x goes out of scope
fn take_ownership(some_string: String){ // some_string comes into scope
println!("{}", some_string);
} // some_string goes out of scope and `drop` is called. This frees the memory of some_string
fn makes_copy(some_int: i32){ // some_int comes into scope
println!("{}", some_int);
} // some_int goes out of scope
Returning values can also transfer ownership.
fn main(){
let s1 = gives_ownership(); // The return value is moved into `s1`
let s2 = String::from("hello"); // `s2` comes into scope
let s3 = takes_and_gives_back_ownership(s2); // `s2` is moved into the method and then the return value is moved into `s3`
}
fn gives_ownership() -> String { // Moves its return value into the calling method
let s = String::from("hello");
s
}
fn takes_and_gives_back_ownership(s: String) -> String { // `s` comes into scope
s // `s` is returned to the calling function
}
Taking ownership of a variable follows the same pattern every time.
Taking and returning ownership of a variable with every function can be quite tedious so references can be used to prevent this.
References can be used to access variables without moving them. This allows them to be accessed in different scopes.
fn main(){
let s1 = String::from("hello");
let len = calc_length(&s1);
println!("The length of {} is {}", s1, len);
}
fn calc_len(s: &String) -> usize{
s.len()
} // `s` goes out of scope but as `calc_len` doesn't own it, `s` isn't dropped
The ampersands denote a reference to the value; allowing for access without moving it. s
is simply a pointer to the parameter passed, in this case s
points to s1
.
To reverse this we can dereference a pointer by using the *
.
Having references as parameters is called borrowing; we do not need to return the value at the end of the function to use it.
If we try to edit a borrowed reference it will throw an error as the references are immutable. To create a mutable reference we need to use &mut
instead of a single ampersand.
fn main(){
let mut s1 = String::from("hello");
change(&mut s1);
println!("{}", s1);
}
fn change(s: &mut String){
s.push_str(", world!");
}
There is a limit with mutable references. Only one mutable reference to a variable can exist within a single scope at one time. This prevents data races.
This issue also arises if immutable references are combined with mutable references in a single scope.
Unlike other languages, the Rust compiler will prevent dangling references.
A slice
is another data type without ownership. Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection.
A string slice is simply a reference to a substring.
let s = String::from("hello world");
let hello = &s[0..5];
let world = &s[6..11];
Note: the lower bound is inclusive and the upper bound is exclusive
This is stored internally as the starting index and then the length of the slice.
String literals are stored as slices. They are immutable because the pointer to the beginning of the slice is immutable.
Other slices can be created from other contiguous collections (for example vectors and arrays) using the same syntax.
A struct
is like a tuple but each element has a name and the order is not important.
struct User {
username: String,
password: String,
age: u16,
loggedIn: bool,
}
After the struct has been defined, new instances of it can be created with concrete values.
let user1 = User {
loggedIn: false,
username: String::from("name"),
password: getHash(&username),
age: 10,
}
Elements of a struct can be accessed and changed similar to other languages. Values can only be changed if the instance is declared as mutable.
fn main(){
let mut user = User {
loggedIn: false,
username: String::from("name"),
password: getHash(&username),
age: 10,
}
user.age = 11;
println!("Username: {}", user.username);
}
A method can be written to build a struct and return an instance of it.
fn make_user(email: String, age: u16, password: String) -> User {
User {
email,
password,
loggedIn: true,
age,
}
}
Note: this uses field init shorthand syntax which allows us to not repeat element names if they are exactly the same as the parameter names. For example, email: email,
becomes email,
This is useful when you want to make a new struct based on an old one but with some values changed.
fn main(){
let user1 = User {
loggedIn: false,
username: String::from("user`"),
password: getHash(&username),
age: 10,
}
let user2 = User {
username: String::from("user2"),
password: getHash(&username);
..user1
}
}
This syntax will auto-fill any fields that have not been redefined with their values from the older struct.
Tuple structs look like normal tuples but benefit from the name of the struct.
struct Colour(i32, i32, i32);
struct Point(i64, i64, i64, i64);
let black = Colour(0, 0, 0);
let origin = Point(0, 0, 0);
Point
and Colour
are still different structs even though they have the same fields. Other than this, these structs behave like tuples.
Unit-like structs have no fields (hence the name; they are similar to the unit type - ()
). These can be useful if you want to implement a trait but have no data for it.
Methods are simply functions defined inside of a struct
. Their first parameter is always self
which references the instance of the struct itself upon which the method will act.
To add a method to a struct we first need an impl
(implement) block for the struct. Inside this block define a method the same as a function but ensuring the first parameter &self
.
impl User {
fn logIn(&mut self){
self.loggedIn = true;
}
}
The method can then be run using typical dot notation.
user1.logIn() // The use will then be logged in
Note: unlike C++, Rust has automatic referencing and dereferencing when calling methods. Therefore there is no requirement for pointer and instance operators (user1.logIn()
vs user1_ptr->logIn()
).
Associated functions are defined inside the impl
block but do not take self
as a parameter. They are often used for constructors which will return a new instance of a struct.
struct Rectangle {
i32: height,
i32: width,
}
impl Rectangle {
fn area(&self) -> i64 {
self.height * self.width
}
fn square(dimension: i32) -> Rectangle {
Rectangle {
height: dimension,
width: dimension,
}
}
}
let myRectangle = Rectangle {
height: 10,
width: 1000,
}
println!("The area of the rectangle is {}", myRectangle.area());
let mySquare = Rectangle::square(917);
println!("The area of the square is {}", mySquare.area());
Associated functions are called using a double colon (::
).
An enumeration (A.K.A. enum
) allows you to create a type by defining all of its possible values. An instance of an enum
can only take one value at a time.
For example, IP addresses (for now) can only be of type v4 or v6, so the following enumeration would be appropriate.
enum IpAddressVersion {
V4,
V6,
}
IpAddressVersion
is now a custom type we can use throughout the scope.
Instances of the enum
can then be defined and used. Double colon notations is used to select the value.
fn route(IpAddressVersion) {
...
}
enum IpAddressVersion {
V4,
V6,
}
let version4 = IpAddressVersion::V4;
let version6 = IpAddressVersion::V6;
route(version4);
route(version6);
Data can also be inserted directly into the enum
. This attaches the data to the enum
value.
enum IpAddress {
V4(u8, u8, u8, u8),
V6(String),
}
let v4 = IpAddress::V4(127, 0, 0, 1);
let v6 = IpAddress::V6(String::from("::1"));
This can be beneficial over a struct
because each value can have a different type of data attached to it.
Methods can also be defined on an enum
using an impl
block. These methods also take self
or references to self
as the first parameter (unless it's an associated function) and can be called using the dot notation (enumExample.method()
).
This is another enum
defined in the standard library. It is used very often as it encodes the common scenario of a value being something or nothing. This means the compiler can check that all possible values have been handled.
Due to Rust's lack of a null
value, the Option<T>
enum can be used to create the same effect of a value being absent or not.
The Option
enum is defined in the standard library as such.
enum Option<T> {
Some(T),
None,
}
Note: the Option
enum is included in the prelude.
Therefore any Option<T>
enum must be converted to a T
when being used in situations where T
is required. This helps to catch the case where the value of T
is assumed to be non-null
when it is null
.
The match
operator compares a value against a series of patterns and then runs code respective to which pattern matches.
Patterns can be made up of literal values, variable names, wildcards, etc.
enum Coin {
Penny,
Nickel,
Dime,
Quarter
}
fn coinToVal(coin: Coin) -> u8 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter => 25,
}
}
Each part inside the match
block is called an arm. Each arm is divided into a pattern, =>
, and then the code to run.
If the value matches the pattern, then the respective code is run. If not, the next arm is checked.
Matches must be exhaustive over the whole enum
for the code to compile.
Match arms can also bind to the parts of the values that match the pattern.
#[derive(Debug)]
enum UsState {
Alabama,
Alaska,
...
}
enum Coin {
Penny,
Nickel,
Dime,
Quarter(UsState),
}
We can then use a match
to retrieve the UsState
value of any quarter.
fn coinToVal(coin: Coin) -> u8 {
match coin {
Coin::Penny => 1,
Coin::Nickel => 5,
Coin::Dime => 10,
Coin::Quarter(state) => {
println!("Quarter from {:?}", state);
25
},
}
}
This can be very useful to safely run code on a value that has the possibility of being None
.
fn addOne(x: Option<i32>) -> Option<i32> {
match x {
None => None,
Some(x) => Some(x+1),
}
}
let five = Some(5);
let six = addOne(five);
let none = addOne(None);
Rust has the pattern _
which allows us to match to anything. This can be useful if you only care about a small range of the possible values of the data type.
For example, if the data is a u8
but we only care about the values 1 to 5, we can use _
in the match.
let someVal = 0u8;
match someVal {
1 => 1,
2 => 2,
3 => 3,
4 => 4,
5 => 5,
_ => (),
}
This is useful if a match
would ignore all but one of the values.
For example, if we only cared about the value 3
, we could replace the match
with an if let
.
let someVal = Some(3u8);
match someVal {
3 => println!("three"),
_ => println!("Not a three"),
}
// Can be replaced with
if let Some(3) = someVal {
println!("three");
} else {
println!("Not a three!");
}
An else
clause can also be attached to run code for any pattern that doesn't match the if let
clause.
Rust and Cargo have many features to help manage larger projects.
Packages: A Cargo feature that lets you build, test and share crates
Crates: A tree of modules that produce a library or an executable
Modules: Let you control the organisation, privacy and scope of paths
Paths: A way of naming items
The crate root is a source file that the Rust compiler starts from and makes up the root module of the project.
A package is one or more crates that provides functionality. A package contains a Cargo.toml
file which describes how to build the package. A package must contain zero or one library crate, and as many binary crates as desired. But it must contain at least one crate.
Note: all the functionality of a crate is defined within the crate's namespace
Modules allow us to organise the code within crates for ease of reuse and for better readability. Modules also define the privacy of items (whether an item can be used by outside code (public
) or not (private
)).
We will write a library crate to help model a restaurant. It will have both front and back of house methods.
First we must run cargo new --lib restaurant
, then put the following code into src/lib.rs
.
mod front_of_house {
mod hosting {
fn add_to_waitlist(){}
fn seat_at_table(){}
}
mod serving {
fn take_order(){}
fn serve_order(){}
fn take_payment(){}
}
}
Modules are defined using the mod
keyword followed by the name of the module.
Rust supports both absolute and relative paths to modules inside of a crate. Absolute paths start from the crate root by using a crate name or a literal crate. A relative path starts from the current module and uses self
, super
or an identifier inside the current module. Relative paths start with super
to start the path inside of the parent module (akin to ../
in filesystem paths).
Both types of paths are followed by one or more identifiers separated by double colons (::
).
We can now expand the restaurant example from above.
mod front_of_house {
mod hosting {
fn add_to_waitlist(){}
fn seat_at_table(){}
}
mod serving {
fn take_order(){}
fn serve_order(){}
fn take_payment(){}
}
}
pub fn eat_at_restaurant(){
// Absolute path
crate::front_of_house::hosting::add_to_waitlist();
// Relative path
front_of_house::hosting::seat_at_table();
}
The function eat_at_restaurant()
is now defined within the crate's root. It uses the pub
keyword to expose it to the crate's public API.
However, this code will fail to compile due to Rust's privacy boundaries. In Rust, everything is private by default. A method cannot use a child item's private item, but can use an ancestor's private items.
For our example to compile, we need to add pub
to both hosting
and the methods inside of it. We do not need to add pub
to front_of_house
because it is defined inside the same module as eat_at_restaurant()
.
The process of making structs and enums public is similar to that of methods but with some extra details.
If we use pub
before the struct definition, the struct will be public but the fields will still be private. We can then denote which fields inside the struct should be public. If one or more of the fields are still private, then an associated function constructor must exist otherwise no instances of the struct could ever be created.
If you add pub
before the definition of an enum, all values inside the enum will be public.
We can bring a path into scope with the use
keyword and then use items inside of it as if they're local items. This means we don't have to write the whole path every time we want to use an item.
This can help simplify our restaurant example.
mod front_of_house {
pub mod hosting {
pub fn add_to_waitlist(){}
pub fn seat_at_table(){}
}
mod serving {
fn take_order(){}
fn serve_order(){}
fn take_payment(){}
}
}
use crate::front_of_house::hosting;
pub fn eat_at_restaurant(){
hosting::add_to_waitlist();
hosting::seat_at_table();
}
We can also provide these modules new names using the as
keyword.
use std::io::Result as IoResult;
These names can be re-exported to allow external code to call items brought into scope. To do this, we can use pub use
.
To use an external package, first it must be declared inside the Cargo.toml
file.
For example, if rand
was required, Cargo.toml
would need to include the following.
[dependencies]
rand = "0.5.5"
This tells Cargo to download the rand
package (version 0.5.5) and make it available in the project.
Then to bring it into scope, we need to include a use
line starting with the name of the dependency.
If multiple items from the same package are required, we can use a nested path to save space.
use std::io;
use std::cmp::Ordering;
// Can be replaced with
use std::{io cmp::Ordering};
self
can also be used in nested paths.
use std::io;
use std::io::Write;
// Can be replaced with
use std::io::{self, Write};
We can bring all public items defined in a path into scope using the glob
operator, *
.
use std::collections::*;
When modules get large, they can be separated into different files. This makes the code easier to navigate.
For example, if the front_of_house
module was defined in src/front_of_house.rs
, the root file (src/lib.rc
or src/main.rs
) would have to include the following lines.
mod front_of_house;
Using a semi-colon instead of brackets after the mod
name tells Rust to load the contents from another file with the same name as the module.
This can also be used if the nested modules are moved into directories. If we moved the hosting
module into the directory src/front_of_house/
, we can still use pub use crate::front_of_house::hosting;
.
Collections can contain multiple values; unlike the built-in array and tuple types, these are stored on the heap. This means the amount of data does not need to be known at compile time and can resize during runtime.
A vector allows you to store a variable number of values next to each other.
A string is a collection of characters.
A hash map allows you to associate a value with a particular key.
There are other collections included in the standard library which can be found here.
Vectors allow you to store more than one value in a single contiguous data structure. Vectors can only store values of one type and are useful for lists of items.
A new, empty vector can be created using Vec::new()
. When creating a vector, type annotation is required.
let v: Vec<i32> = Vec::new();
If the vector is being created with initial values, the type can be inferred and type annotation becomes unnecessary. The vec
macro can be useful for this.
let v = vec![1, 2, 3];
To add elements to a vector, we use push
.
let mut v: Vec<i32> = Vec::new();
v.push(1);
v.push(2);
v.push(3);
v.push(4);
v.push(5);
Note: when a vector goes out of scope it is dropped along with its contents
There are two ways of retrieving a value from a vector.
Method 1 uses a reference to the original vector which gives us a reference.
Method 2 uses the .get(index)
function and a match
which gives us an Option<&T>
.
let v = vec![1, 2, 3, 4, 5];
let third1: &i32 = &v[2];
println!("The third element is {}", third1);
match v.get(2) {
Some(third2) => println!("The third element is {}", third2),
None => println!("There is no third element"),
}
Note: vectors are zero-indexed
Iteration over a vector allows us to access each element successively.
One way to do this is with a for
look to get an immutable reference to each element.
let v = vec![1, 2, 3, 4, 5];
for i in &vec {
println!("{}", i);
}
We can also iterate over mutable references.
let mut v = vec![1, 2, 3, 4, 5];
for i in &mut vec {
let x = *i + 10;
println!("{}", x);
}
A trick to storing different types inside one vector is to use an enum. As all of the vector elements will be of the enum type, this is valid.
enum SpreadsheetCell {
Int(i64),
Float(f64),
Text(String),
}
let row = vec![
SpreadsheetCell::Int(3),
SpreadsheetCell::Float(10.12),
SpreadsheetCell::Text(String::from("cell"))];
Strings are implemented as a collection of bytes alongside some methods to provide functionality when those bytes are interpreted as text.
Rust has only one string type in its core language, which is the string slice str
. It can only be handled behind a pointer, so is most commonly seen as &str
. String literals are stored in the program's binary and therefore are also string slices.
The String
type, provided in the standard library, is a growable, mutable, owned, UTF-8 encoded string type.
Note: both str
and String
are UTF-8 encoded.
Rust's standard library includes a number of other string types, including:
OsString
OsStr
CString
CStr
The difference between *String
and *Str
represents the owned and borrowed types respectively.
Other library crates can provide even more string types.
Many of the same operations available for Vec<T>
are also available for String
.
To create a new String
the new
function can be used.
let mut s = String::new();
This creates a new, empty string s
which we can load data into.
We can also create a String
from a str
using the to_string()
method. This only works because string literals implement the Display
trait.
let data = "initial value"; // String literal
let s = data.to_string(); // String type
let s = "new value".to_string(); // Also works on the literal directly
We can also use the from("...")
method to create a String
from a string literal.
let s = String::from("Hello, World!"); // Equivalent to the above code
As strings are UTF-8 they can represent an array of different languages.
A string can grow in size and its contents can change (like a Vec<T>
). Either the +
operator or the format
macro can be used to concatenate String
values.
The +
operator calls a function with the signature fn add(self, s: &str) -> String
.
let s1 = String::from("Hello, ");
let s2 = String::from("World!");
let s3 = s1 + &s2; // s1 has now been moved to s3
Note: Rust uses deref coercion to allow us to pass a &String
instead of &str
.
If we need to concatenate multiple values, the use of +
can become unwieldy. Instead we can use the format
macro.
let s1 = String::from("tic");
let s2 = String::from("tac");
let s3 = String::from("toe");
let s = s1 + "-" + &s2 + "-" + &s3;
// Can be replaced with
let s = format!("{}-{}-{}", s1, s2, s3);
To append to the end of a string we can use push_str
or push
if only a character is being appended.
let mut s = String::from("Hello, ");
s.push_str("World!");
let mut s2 = String::from("lo");
s2.push('l');
In many other programming languages individual characters in a string can be accessed simply by referencing them by index. In Rust, this will cause an error.
let s = String::from("Hello, World!");
let h = s[0];
The above code will produce the following output.
error[E0277]: the trait bound `std::string::String std::ops::Index<{integer}>` is not satisfied
-->
|
3 | let h = s[0];
| ^^^^ the type `std::string::String` cannot be indexed by `{integer}`
|
= help: the trait `std::ops::Index<{integer}>` is not implemented for `std::string::String`
To understand why this doesn't work, we must understand the internal representation of a string.
A String
is a wrapper over a Vec<u8>
.
For single-byte characters, the length of the string is equal to the number of characters; therefore the memory index of each character is simply the position it appears in the string (zero indexed). However, some characters in UTF-8 are not single bytes (one example is the cyrillic alphabet - each character is 2 bytes).
This makes it trivial as to why simple indexed character retrieval in a string is not possible in Rust. If we had the string let s = "Здравствуйте"
; would could s[0]
return? The character 'З' is made up of the bytes 208 and 151. So s[0]
should return 208, but this alone is not a valid character in UTF-8.
For UTF-8, there are three relevant ways for Rust to look at strings.
For the Hindi string "नमस्ते", it is stored as a Vec<u8>
that looks like the following.
[224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164,
224, 165, 135]
This is just 18 bytes and this how the computer stores the data.
If we view them as Unicode scalar values (which is what Rust's char
type is), they make the following array.
['न', 'म', 'स', '्', 'त', 'े']
This appears to be 6 characters, except the fourth and sixth characters aren't actually characters - they're diacritics.
If we view the data as grapheme clusters, we'd get what a person would call 4 letters.
["न", "म", "स्", "ते"]
This means each program can choose which interpretation of a string that it needs.
A final reason why the indexing into strings is not allowed is because indexing operations are expected to take O(1)
time. But this cannot be guaranteed with a String
, because Rust would first have to determine how many valid characters were there.
Because indexing into a string could return several types (bytes, characters, grapheme clusters or a string slice), Rust requires you to be more specific when using indexes.
To do this, you must specify that you want a string slice by providing a range of indexes.
let s = "Здравствуйте";
let ss = &s[0..4];
This is perfectly valid syntax to retrieve a string slice.
Here ss
will be a &str
; as each character is 2 bytes (in this example), ss
will hold the characters Зд
.
If we tried to pass the indexes [0..1]
, Rust would panic at runtime.
thread 'main' panicked at 'byte index 1 is not a char boundary; it is inside `З` (bytes 0..2) of `Здравствуйте`', src/libcore/str/mod.rs:2188:4
If you need to perform operations on individual unicode characters, then you can use the .chars()
method.
for c in "नमस्ते".chars() {
println!("{}", c);
}
This code outputs the following. (Diacritics couldn't be printed individually.)
न
म
स
त
The .bytes()
method returns the bytes of each character.
Getting grapheme clusters is more complicated but crates for it are available.
A hash map is a type of content addressable memory where the data itself is (or derives) the key. The means we can achieve O(1) searching, inserting and deleting.
A HashMap<K, V>
stores a mapping from keys of type K
to values of type V
.
It works by using a hashing function to place the keys and associated values into memory.
These can be useful if you want to refer to data not with a numerical index but with a key of any type.
A hash map can be created using the new
method.
The below example creates a HashMap<String, i32>
for two teams, Blue and Yellow, which start with 10 and 50 points respectively.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
Hash maps store their data on the heap. And like vectors, they are homogeneous (all keys must be the same type and all the values must be the same type).
Another way of creating a hash map is by using the collect
method on a vector of tuples, where each tuple consists of a key and a value. The collect
method gathers data into a number of collection types.
The below example creates the hash map for the Blue and Yellow teams, which are stored in two separate vectors. The zip
method creates a vector of tuples from the original two vectors.
use std::collections::HashMap;
let teams = vec![String::from("Blue"), String::from("Yellow")];
let scores = vec![10, 50];
let team_scores: HashMap<_, _> = teams.iter().zip(scores.iter()).collect();
Note: type annotation is required here as collect
can be used to create many different data structures. However, Rust can still infer the types of K
and V
so we can use _
in the annotation.
If a type implements the Copy
trait (e.g. i32
), the values are copied into the hash map. For types that don't implement the Copy
trait, the values are moved into the hash map and the hash map will be the owner.
We can retrieve the value from a hash map by passing the key into it.
use std::collections::HashMaps;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
let team_name = String::from("Blue");
let score = scores.get(&team_name);
Here score
will have the value that is associated with the key Blue
. The result will be Some(&10)
.
The result is an Option<&V>
; if there is no value for the given key it will return None
.
We can also iterate over the key-value pairs.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
for (key, value) in &scores {
println!("{}, {}", key, value);
}
As each key can only be associated with one value, values are often changed.
To overwrite a value, we simply use insert
again with the key whose value you want to overwrite and the new value.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.insert(String::from("Yellow"), 50);
scores.insert(String::from("Blue"), 108); // Replaces 10 with 108
We can also write a new value, only if there was not a previous value associated with that key. Rust has a method for this called entry
. It takes the key as a parameter and returns an Entry
enum which represents a value that might or might not exist.
For example, we want to write the value 50
into Yellow
, only if there isn't a value already associated with the key Yellow
. And the same for the Blue team with the score 10
.
use std::collections::HashMap;
let mut scores = HashMap::new();
scores.insert(String::from("Blue"), 10);
scores.entry(String::from("Yellow")).or_insert(50);
scores.entry(String::from("Blue")).or_insert(10);
The or_insert
method on Entry
is defined to return a mutable reference to the value of the corresponding value for the given key if it exists. If not, the value is inserted and a mutable reference to this new value is returned.
Another option is to update values depending on the value that is already there.
For example, if we want to count how many times a word occurs in a string, we can have the word as the key in the hash map and the count as the value.
use std::collections::HashMap;
let mut map = HashMap::new();
let text = String::from("hello world wonderful world");
for word in text.split_whitespace() {
let count = map.entry(word).or_insert(0);
*count += 1;
}