Skip to content

Instantly share code, notes, and snippets.

@jjlumagbas
Last active October 25, 2016 05:25
Show Gist options
  • Save jjlumagbas/bbf53ff1ccef149cde9347508ad4e6c8 to your computer and use it in GitHub Desktop.
Save jjlumagbas/bbf53ff1ccef149cde9347508ad4e6c8 to your computer and use it in GitHub Desktop.

Machine Problem 3 - Parsing Poetry Packets

Data on the internet are broken up and passed around in what are called "packets":

Packet - Email Example

On the Internet, the network breaks an e-mail message into parts of a certain size in bytes. These are the packets. Each packet carries the information that will help it get to its destination -- the sender's IP address, the intended receiver's IP address, something that tells the network how many packets this e-mail message has been broken into and the number of this particular packet. - Read the whole article at HowStuffWorks

In this MP, you're going to parse a "network packet capture" (a listing of packets), reassemble corresponding sets of packets into the original "messages" sent, and display the messages in a human-readable format.

Poems

For our purposes, the messages that will be split up into packets are poems, and each packet will represent a single line. Here's a poem we know:

Roses are red,
Violets are blue,
Sugar is sweet,
And so are you.

This is how this might be broken up into packets in a network packet capture:

00000000000000000000000000001110111001001100110111011111010101111000101101000001110000010101000100000000000000000000000000000000Roses are red,00000000000000000000010011101011
00000000000000000000000000010001111001001100110111011111010101111000101101000001110000010101000100000000000000000000000000000001Violets are blue,00000000000000000000011000110010
00000000000000000000000000001111111001001100110111011111010101111000101101000001110000010101000100000000000000000000000000000010Sugar is sweet,00000000000000000000010101110010
00000000000000000000000000001111111001001100110111011111010101111000101101000001110000010101000100000000000000000000000000000011And so are you.00000000000000000000010100011000

Yes, really. An explanation follows below.

Given a network packet capture with the above contents, your program will create an output file with the following contents:

From: 228.205.223.87
To: 139.65.193.81

Roses are red,
Violets are blue,
Sugar is sweet,  
And so are you.

Implementation details

Packets

Each line in the file represent a packet, which is composed of several sections, listed below (along with the number of characters alloted for the section):

Size (32 bits) From (32 bits) To (32 bits) Seq (32 bits) Line ([Size] characters) Check (32 bits)
00000000000000000000000000001110 11100100110011011101111101010111 10001011010000011100000101010001 00000000000000000000000000000000 Roses are red, 00000000000000000000010011101011

These are the different sections of the packets:

Field Description
Size Number of characters in the line
From Source IP Address
To Destination IP Address
Seq The line's sequence in the poem
Line Yep, you guessed it
Check A checksum, to detect corrupted lines

So, really, this is what the above network packet capture represents:

Size: 14
From: 228.205.223.87
To: 139.65.193.81
Seq: 0
Line: Roses are red,
Check: 1259

Size: 17
From: 228.205.223.87
To: 139.65.193.81
Seq: 1
Line: Violets are blue,
Check: 1586

Size: 15
From: 228.205.223.87
To: 139.65.193.81
Seq: 2
Line: Sugar is sweet,
Check: 1394

Size: 15
From: 228.205.223.87
To: 139.65.193.81
Seq: 3
Line: And so are you.
Check: 1304

IP addresses

IP addresses will be in the IPv4 Addressing format:

IPv4 addressing

Checksums

A checksum is the result of an algorithm applied to data. It is transmitted along with data to give an indication of whether there are errors in transmission.

To detect errors, you apply the checksum algorithm to the data received and compare it against the checksum transmitted.

In our case, the checksum is computed by summing the ASCII codes of each character in the line:

R  82
o 111
s 115
e 101
s 115
   32
a  97
r 114
e 101
   32
r 114
e 101
d 100
,  44
-----
 1259

Extra considerations

In the given network packet capture, there may be more than one poem

Every line in a single poem will share the same From and To values: this is how you can tell which packets go together to form a single poem.

Lines in a poem may be transmitted out of order, or interspersed among lines of other poems

This is where Seq comes in. It tells you how to order lines when you display the poem.

There may be duplicate lines

That is, lines from the same poem with the same Seq number. In this case, keep the first-occuring line, and discard the duplicate.

Lines may be missing

Again, Seq will tell you if a line is missing. Missing lines are to be reported like this in the output file:

Roses are red,
Violets are blue,
[MISSING]
And so are you.

Lines may be corrupted

That is, a checksum generated by applying the algorithm to the line may not match the transmitted checksum. Corrupted lines are to be reported like this:

Roses are red,
Violets are blue,
Sugar is sweet,
[CORRUPTED]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment