Protocol Buffers implementation with using Elixir
How to parse a binary 00001000_10010110_00000001
to %{1 => 150}
How to do by using Elixir
https://github.com/niku
Living in Sapporo, Hokkaido
Work at Farmnote
A Software and Hardware company to solve agriculutual issues particularly hard management
Enjoy sapporo-beam
A community to talk about ErlangVM
Almost every Thursday
Since 2014
BEAM bitstring and binary basis
Bitstring and binary in BEAM
description representation a kind of bitstring? a kind of binary?
bitstring a sequence of zero or more bits <<10::size(4)>>
Yes No
binary A bitstring its size is divisible by 8 <<30>>
**Yes** Yes
Type How to Result
Making a binary as you read binary **=:binary.encode_unsigned(0b10101100_00000010)=** <<172, 2>>
bitstring **=<<0::1, 1::1, 0::1, 1::1>>=** <<5::size(4)>>
Pattern matching binary **=<<x, rest::binary>>=** = <<1, 2, 3>>
x
is 1
, rest
is <<2, 3>>
bitstring **=<<x::1, rest::bitstring>>=** = <<1, 2, 3>>
x
is 0
, rest
is <<2, 4, 3::size(7)>>
Concatenation binary **=<<1>> <> <<127, 233>>=** <<1, 127, 233>>
bitstring a
= <<1::1, 0::1>>;
b
= <<0::1, 1::1>>;
**=<<a::bitstring, b::bitstring>>=**<<9::size(4)>>
Conversion to integer binary **=:binary.decode_unsigned(<<1, 255>>)=** 511
bitstring a
= <<1::1, 0::1>>;
size
= bit_size(a);
<<x::integer-size(size)>>
= a;
**=x=**2
A mechanism for serializing structured data like XML
simpler
3 to 10 times smaller
20 to 100 times faster
By default gRPC uses protocol buffers
StructuredData:
%MyMessage{a: 150}
Proto:
message MyMessage {
optional int32 a = 1;
}
Binary:
00001000_10010110_00000001
[Serialize] [Deserialize]
StructuredData Binary
\__Binary \__StructuredData
/ /
+-----+ +-----+
|Proto| |Proto|
+-----+ +-----+
00001000_10010110_00000001_......
msb 0 ^||||||| | |
key number ^^^^||| | |
value type ^^^ | |
msb 1 ^ |
msb 0 ^
drop msb 0010110 0000001
reverse 0000001 0010110
concatenate 000000_10010110
part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
00001000_10010110_00000001_..........
msb 0 ^||||||| | |
key number ^^^^||| | |
+---+ value type ^^^ | |
| H | => msb 1 ^ |
| E | => msb 0 ^
| | => drop msb 0010110 0000001
| R | => reverse 0000001 0010110
| E | => concatenate 000000_10010110
+---+ part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
A type of a PB’s value
Represents an integer
Has variable-length (1 ~ 16bytes)
MSB(Most Significant Bit)
To divide to a varint chunk
If msb is 1
, next byte is also a part of varint
If msb is 0
, next byte is the last byte of varint
10010110_00000001_...
| |
1. MSB is 1. So next byte is also a part of varint -+ |
2. MSB is 0. So this byte is the last byte of varint ----------+
Varint chunk:
10010110 00000001
10010110_10000001_00000011_...
| | |
1. MSB is 1. So next byte is also a part of varint -+ | |
2. MSB is 1. So next byte is also a part of varint ----------+ |
3. MSB is 0. So this byte is the last byte of varint -------------------+
Varint chunk:
10010110 10000001 00000011
Converting varint binary to integer
10010110_10000001_00000011_..........
10010110_10000001_00000011 1. Divide to a varint chunk by using msb
0010110 0000001 0000011 2. Drop msbs
0000011 0000001 0010110 3. Reverse order by each 7bit
00000_11000000_10010110 4. Concatenate bitsrings
|| | | || 5. Cast as an integer
|| | | ||
|| | | |+ 2
|| | | +- 4
|| | +--- 16
|| +------ 128
|+-------------- 16384
+--------------- 32768
32768 + 16384 + 128 + 16 + 4 + 2 = 49302
Dividing to a varint chunk - Pattern matching is awesome
b = :binary . encode_unsigned ( 0b10010110_10000001_00000011 )
<< msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> = b
msb # => 1
rest_7bits # => <<22::size(7)>>
rest_binary # => <<129, 3>>
10010110_10000001_00000011
+ msb 1
+-----+ rest_7bits 0010110
+---------------+ rest_binary 10000001_00000011
defmodule MsbBinary do
def scan ( << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) do
IO . inspect ( msb: msb , rest_7bits: rest_7bits , rest_binary: rest_binary )
end
end
binary = :binary . encode_unsigned ( 0b10010110_10000001_00000011 )
MsbBinary . scan ( binary )
[ msb: 1 , rest_7bits: << 22 :: size ( 7 ) >> , rest_binary: << 129 , 3 >> ]
Dividing to a varint chunk - Recursion is also awesome
defmodule MsbBinary do
def scan ( << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 0 do
IO . inspect ( msb: 0 , rest_7bits: rest_7bits , rest_binary: rest_binary )
end
def scan ( << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 1 do
IO . inspect ( msb: 1 , rest_7bits: rest_7bits , rest_binary: rest_binary )
scan ( rest_binary ) # Do recursive
end
end
binary = :binary . encode_unsigned ( 0b10010110_10000001_00000011 )
MsbBinary . scan ( binary )
[ msb: 1 , rest_7bits: << 22 :: size ( 7 ) >> , rest_binary: << 129 , 3 >> ]
[ msb: 1 , rest_7bits: << 1 :: size ( 7 ) >> , rest_binary: << 3 >> ]
[ msb: 0 , rest_7bits: << 3 :: size ( 7 ) >> , rest_binary: "" ]
10010110_10000001_00000011_..........
10010110_10000001_00000011 1. Divide to a varint chunk by using msb
0010110 0000001 0000011 2. Drop msbs
0000011 0000001 0010110 3. Reverse order by each 7bit
00000_11000000_10010110 4. Concatenate bitsrings
5. Cast as an integer
defmodule MsbBinary do
def scan ( binary ) when is_binary ( binary ) do
do_scan ( << >> , binary )
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 0 do
IO . inspect ( progress: progress , msb: 0 , rest_7bits: rest_7bits , rest_binary: rest_binary )
{ << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary }
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 1 do
IO . inspect ( progress: progress , msb: 1 , rest_7bits: rest_7bits , rest_binary: rest_binary )
do_scan ( << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary )
end
end
binary = :binary . encode_unsigned ( 0b10010110_10000001_00000011 )
MsbBinary . scan ( binary )
[ progress: "" , msb: 1 , rest_7bits: << 22 :: size ( 7 ) >> , rest_binary: << 129 , 3 , 5 , 64 >> ]
[ progress: << 22 :: size ( 7 ) >> , msb: 1 , rest_7bits: << 1 :: size ( 7 ) >> , rest_binary: << 3 , 5 , 64 >> ]
[ progress: << 2 , 22 :: size ( 6 ) >> , msb: 0 , rest_7bits: << 3 :: size ( 7 ) >> , rest_binary: << 5 , 64 >> ]
{ << 6 , 4 , 22 :: size ( 5 ) >> , << 5 , 64 >> } # Result
defmodule MsbBinary do
def scan ( binary ) when is_binary ( binary ) do
do_scan ( << >> , binary )
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 0 do
{ << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary }
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 1 do
do_scan ( << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary )
end
end
defmodule Varint do
def as_int32 ( b ) when is_bitstring ( b ) do
size = bit_size ( b )
<< i :: integer - size ( size ) >> = b
i
end
end
binary = :binary . encode_unsigned ( 0b10010110_00000001 )
{ chunk , _rest } = MsbBinary . scan ( binary ) # => {<<2, 22::size(6)>>, ""}
Varint . as_int32 ( chunk ) # => 150
You’ve been able to understand the part of figure
00001000_10010110_00000001_..........
msb 0 ^||||||| | |
key number ^^^^||| | |
+---+ value type ^^^ | |
| H | => msb 1 ^ |
| E | => msb 0 ^
| | => drop msb 0010110 0000001
| R | => reverse 0000001 0010110
| E | => concatenate 000000_10010110
+---+ part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
+---+ 00001000_10010110_00000001_..........
| H | => msb 0 ^||||||| | |
| E | => key number ^^^^||| | |
| R | => value type ^^^ | |
| E | msb 1 ^ |
+---+ msb 0 ^
drop msb 0010110 0000001
reverse 0000001 0010110
concatenate 000000_10010110
part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
Same as varint
Taking while msb is 1
The msb which is the last byte of key part is 0
Parsing value is a little bit different
Key part has two information
message MyMessage {
optional int32 a = 1 ;
}
A type of the value
an integer between from 0 to 5
In the example, the type of value which is bind by key a is int32 due to int32 a
A value of the key
an integer greater than 0
In the example, when key a
is serialized, it represented by 1 due to a = 1
Getting a type of the value
The last 3 bits of key part
number 3 and 4 are depricated
number Meaning Used For
0 Varint int32, int64, uint32, uint64, sint32, sint64, bool, enum
1 64-bit fixed64, sfixed64, double
2 Length-delimited string, bytes, embedded messages, packed repeated fields
5 32-bit fixed32, sfixed32, float
Getting a value of the key
The last 3 bits are “type of the value”
Parsing as a varint except above
00001000_10010110_........
msb 0 ^|||||||
key number ^^^^|||
value type ^^^
part +-key--+ +----value-------
type bitstring integer meaning
A value of the key 0001 1 The Value of the key is 1
A type of the value 000 0 Type of the value is Varint
defmodule MsbBinary do
def scan ( binary ) when is_binary ( binary ) do
do_scan ( << >> , binary )
end
defp do_scan ( progress << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 0 do
{ << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary }
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 1 do
do_scan ( << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary )
end
end
defmodule Key do
@ wire_type_size 3
@ wire_types % {
0 => :varint ,
1 => :"64bit" ,
2 => :length_delimited ,
3 => :start_group ,
4 => :end_group ,
5 => :"32bit"
}
def parse ( b ) when is_bitstring ( b ) do
total_size = bit_size ( b )
key_size = total_size - @ wire_type_size
<< key_no :: size ( key_size ) , wire_type_no :: size ( @ wire_type_size ) >> = b
{ key_no , @ wire_types [ wire_type_no ] }
end
end
binary = :binary . encode_unsigned ( 0b00001000_10010110_00000001 )
{ chunk , _rest } = MsbBinary . scan ( binary ) # => {<<8::size(7)>>, <<150, 1>>}
Key . parse ( chunk ) # => {1, :varint}
You’ve been able to understand the part of figure
+---+ 00001000_10010110_00000001_..........
| H | => msb 0 ^||||||| | |
| E | => key number ^^^^||| | |
| R | => value type ^^^ | |
| E | msb 1 ^ |
+---+ msb 0 ^
drop msb 0010110 0000001
reverse 0000001 0010110
concatenate 000000_10010110
part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
You know BEAM basis about bitstring and binary
Difference between bitstring and binary
Make a binary as you read
Bitstring pattern matching
Concatenate bitstrings
Convert bitstring to integer
You almost know what this figure represents
00001000_10010110_00000001_......
msb 0 ^||||||| | |
key number ^^^^||| | |
value type ^^^ | |
msb 1 ^ |
msb 0 ^
drop msb 0010110 0000001
reverse 0000001 0010110
concatenate 000000_10010110
part +-key--+ +----value------+
kv pair +------------------------+
kv pair +--------+
message +-----------------------------------+
00001000_10010110_00000001
is parsed to %{1 => 150}
defmodule MsbBinary do
def scan ( binary ) when is_binary ( binary ) do
do_scan ( << >> , binary )
end
defp do_scan ( progress << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 0 do
{ << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary }
end
defp do_scan ( progress , << msb :: 1 , rest_7bits :: bitstring - ( 7 ) , rest_binary :: binary >> ) when msb == 1 do
do_scan ( << rest_7bits :: bitstring , progress :: bitstring >> , rest_binary )
end
end
defmodule Varint do
def as_int32 ( b ) when is_bitstring ( b ) do
size = bit_size ( b )
<< i :: integer - size ( size ) >> = b
i
end
end
defmodule Key do
@ wire_type_size 3
@ wire_types % {
0 => :varint ,
1 => :"64bit" ,
2 => :length_delimited ,
3 => :start_group ,
4 => :end_group ,
5 => :"32bit"
}
def parse ( b ) when is_bitstring ( b ) do
total_size = bit_size ( b )
key_size = total_size - @ wire_type_size
<< key_no :: size ( key_size ) , wire_type_no :: size ( @ wire_type_size ) >> = b
{ key_no , @ wire_types [ wire_type_no ] }
end
end
binary = :binary . encode_unsigned ( 0b00001000_10010110_00000001 )
{ key_part , rest } = MsbBinary . scan ( binary )
{ key_no , wire_type } = Key . parse ( key_part ) # => {1, :varint}
{ value_part , _ } = MsbBinary . scan ( rest )
value = Varint . as_int32 ( value_part ) # => 150