Go-Back-N protocol & Implementation
This article is not going to repeat the protocol detail that anyone can find via google. I’ll start with a brief high-level explanation of the problem GBN solves, and quickly dump into some interesting issues I encountered while implementing using C. You can find the source code here.
Trade Off between UDP and TCP
The well-known (read this awesome article if you don’t know it, or continue reading for the simple and important fact of this protocol) TCP ensures reliable transmission of data.It does not only transmission but also error-handling. You can feel safe without checking packet loss or corruption by yourself.
UDP provides best effort delivery service. It only ensures to deliver packet as fast as possible, without handling error. This is enough in some use cases, as long as the bottom layer system is robust enough, which means packet loss or corruption probability is under a tolerable threshold. For example, when designing a small scale online multi-player game, it might not harm if a small amount of packets(like a small stuck in user’s end) are lost due to using UDP. But if TCP is used here, we have t o sacrifice low-latency for the benefit of lossless transmission due to error-handling mechanism, which the user cannot even notice.
So there is a trade-off between transmission reliability and speed, while TCP and UDP are like two extremes on the axis. What I have implemented is UDP plus an error-handling mechanism(TCP has a lot more than error-handling beyond UDP), which can be the GBN protocol.
As I said in the beginning, the introduction part stops here because the detail of the protocol can be found anywhere.
Interesting design/Implementation issues
- Circular array for maintaining sequence number
The length of input buffer is not fixed, but the length of data in every packet is fixed. It is obvious that the sequence numbers cannot be stored in a fixed size data structure. The way I resolve this problem is to use uint_8_t type to store the sequence number in the packet data structure, and use a circular global sequence number to store the sequence number following the current window. When this number exceeds 255, it overflows back to 1.
gbn_next_seqnum = gbn_next_seqnum % MAX_SEQNUM + 1;
This introduces complexity in comparing base and sequence number, because simply comparing base and sequence number when sequence number has just overflown 255, which base is slightly smaller than 255, will result in a wrong information that sequence becomes smaller than base. So this edge case is carefully examined in the code.
2. Mode switching
In my setting, the N of Go-Back-N has four possible values: 1, 2, and 4. When the network is in a good situation, packet loss rate is low and we can expect to receive ACKs of all packets in most of the times, so we use a large window. This case, where window size goes from 1 to 2 or from 2 to 4, is pretty simple.
But deceleration is complicated because we have to discard the current array and construct a smaller array to store the new window.
We have to construct the new array from the where base points to in the file buffer, but since we increment the file pointer by 1024 every time we read new data to build packet for the window, the current file pointer has exceeded base. This could easily be ignored in the first place.
To solve this, the file pointer is decreased for a reasonable length so that it goes back to the content we want to feed into the beginning of the small array.
This is not the only problem associated with file I/O and mode switching. Consider the situation where ACKs are sent by receiver but corrupted/lost, and then mode switching happens at the sender’s side. The file pointer goes back as explained above, so the sender starts sending something that has been received and wrote to file. The solution is simple: take the case where ACK sequence number precedes the end of window into consideration. Once this happens, move the window forward. This is an easy-to-solve but hard to find problem.
3. State Machine
I confess that this is the first time I use state machine... But it’s not embarrassing because it is also the last time that I have to confess this LOL. When using state machine in a function, the common coding paradigm is to loop until the BROKEN or EXIT state is reached. Inside the loop, use either switch or if/else to check the current status and decide which logic to execute. At the end of each block, use the state transition table to transit to the next state. A cleaned code in the implementation is like this(after some clean up):
ssize_t gbn_send(int sockfd, const void *buf, size_t len, int flags)
{
// initialization goes here
while (chunk_start < len || (gbn_base < gbn_next_seqnum))
{
if (gbn_send_session.cur_state == MAKE_AND_SEND)
{
// things to do
gbn_send_session.cur_state = ACKNOWLEDGE;}if (gbn_send_session.cur_state == ACKNOWLEDGE)
{
// things to do
if (gbn_send_session.window_shifted || prev_mode > gbn_mode_session.cur_mode)
{
gbn_send_session.cur_state = lookup_gbn_send_transit(&gbn_send_session, BASE_SHIFTED);
}
}
if (gbn_send_session.cur_state == RESEND)
{
// things to do
gbn_send_session.cur_state = lookup_gbn_send_transit(&gbn_send_session, RESEND_CUR_WIN);
}
}
// return statements
}
Below are the state machines we used in this project:
Here I feel the need to explain BASE_SHIFTED and BASE_NOT_SHIFTED events: base is the starting point of a window, we use these two events to describe if the base is moved forward before timing out.
Summary
Network/Distributed programming is both challenging and interesting due to the fact that you have to force yourself to assume that everything goes wrong frequently(which is definitely true when you wok on a large number of machines). After going through all these weird bugs, you will find your inner peace despite the complicated fact that can drive normal people crazy. You know things will go wrong, and you calm down to handle them one by one.
When I was learning Physics, the professors often told us that scientists live long because they know the limit of human-being and power of nature, so that they don’t pursue to something beyond the principle of nature, or human nature. Now I feel that engineers can also live long because they are doing a job where failure is taken for granted, so that they are calm when things fail in their lie. Now the question is, if both scientists and engineers live long, who lives short?
Reference
The picture at beginning of this article is taken from Computer Networking: A Top Down Approach, 6th Edition by James F. Kurose and Keith W. Ross