Using UDP to communicate between computers (with links)

2002-08-30 Stanley Tsao

I don't guarantee that everything here is technically accurate to the letter, but there's enough to get you the general gist of things. Please contact me, Stanley Tsao with any questions, comments, corrections, etc.

Standard Communication Protocols

Since the are many different kinds of computers, running different OS's, connected by different physical links there needs to be some standard way for these computers to talk to each other. IP or Internet Protocol is what is called a "network layer" communication protocol. It's at this layer that your computer or device has its IP address. At a level above the "network layer" we have the "transport layer". TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) are two examples of "network layer" protocols. TCP and UDP are built on top of IP, adding features like checksums or resends.

TCP vs. UDP

TCP and UDP are the two most common "network layer" protocols. There are two main differences between TCP and UDP as indicated in the following table.

TCP UDP
Connection-oriented Connectionless
Reliable Unreliable

TCP is connection-oriented. So when device A wants to talk to device B, device A contacts device B. Device B will then respond to device A and acknowledge the fact that they are going to create a connection with each other. So in other words the computers would need to meet up, shake hands, and agree that they will talk to each other. This is kind of like telephone, you can't send someone a message unless someone or something (answering machine) picks up the call on the other end.

UDP is connectionless. When device A wants to talk to device B, device A only needs to know device B's address, and it can send data to device B. There is no need for the computers to negotiate and establish a "connection". This is more like the US postal service. You can send mail to someone if you know their address. You don't need to get their permission to send them mail.

The second main difference between the protocols is reliability. Here reliability doesn't refer to the robustness of the protocol but to the way it manages data. Data is sent as "packets" over a network so I will use "data" and "packets" interchangably. Packets are also referred to as datagram packets.

TCP is reliable because it:

All of these behaviors are inherent in the operation of TCP. So if you were sending data files for archival purposes for example, TCP would be good because it would let you be confident that the data files wouldn't be corrupted on transfer.

UDP is unreliable because it does not provide all these services. What this means is that UDP neither guarantees delivery nor preserves packet sequence. This makes UDP sound bad, but it's more a matter of simplicity. There is less overhead since you don't need to establish a connection. UDP can have a higher throughput than TCP. Also, in some applications, particularly real time streaming, resending of missed data can be useless. In the end, you can selectively implement the "reliable" features of TCP in UDP as you need them. For example, one could add a timestamp to each packet sent.

This little tutorial is based around using UDP, but except where otherwise noted, I've tried to stick with general concepts.

Computer Identification - IP addresses and ports

We can't talk between computers unless we know how they address each other. Each computer is assigned an IP address, which is in number.number.number.number format. The more common .com addresses you see are like text labels, each one actually maps to a numerical IP address. The IP address is how computers are identified on the internet.

As a way of allowing a single computer to have multiple connections you can also address a port on a certain computer. Each protocol (UDP, TCP) has it own set of ports. There are 65536 ports per protocol, though generally the first 1024 are either reserved or have well known applications associated with them. For example, FTP uses port 21 and telnet uses port 23. Although you can setup FTP and telnet to work with different ports. So to FTP to a computer you would need to know its IP address and what port the FTP server is looking for reuqests on.

Sockets

Sockets are essentially end points for communication, analagous to telephones on a telephone line. You need to have one on each end in order for there to be communication. So before you use a program to talk using UDP or TCP you need to open a socket. When you open a socket you bind (binding is like plugging in) it to a specific IP address and port. After a socket is bound it is ready to send and receive data. At least in UNIX type systems, sockets are like file descriptors. So the important thing to remember is that they need to be properly opened and closed like files in order to keep things from getting messy. Trying to bind multiple sockets to the same port on a computer can cause conflicts. Also, when you create a socket it comes back to you as a socket descriptor number, this descriptor is the handle for the socket and is how you will refer to it in the future.

A final bit to remember about sockets is that they have both sending and receiving buffers associated with them. In relation to UDP the sending buffer doesn't have much relevance as long as it is bigger than any single packet you will send. On the receiving side, a socket will store up UDP packets in the buffer until the buffer is full. Of course, when you read a packet from the socket it will remove it from the buffer. When the buffer is full any more packets going to the buffer will be lost. So if you are sending data at high speeds and your packet reading application can't keep up with the sender it is important to keep in mind that the buffer may have some kind of effect on your program's behavior.

More to remember about UDP

As I mentioned before this tutorial is based around using UDP so here is a little more to keep in mind on UDP. UDP packets have a maximum size limit. The maximum size of the protocol is about 65kB but this number can vary depending on hardware also. Generally minimizing packet size is in your best interest as it makes it more likely that your data will succesfully traverse the network. At the same time, it can be more efficient and faster to send larger packet as opposed to multiple tiny packets.

As mentioned before, UDP does not guarantee that sent data will be received. Nor does it guarantee that the order the data is received in will be the same as the order the data was sent in. The packet sequence changes occur because of packets taking different paths through a network between computers.

When you send packets you need to also make sure the receiver is properly setup. If there is not a socket open on the proper port on the receiving end the packets you send will vanish. Also, UDP does not give you a way to check the existence of the receipient (unless you add in this functionality yourself).




Network programming

Java programming (in MATLAB)