Notes on writing proxy

Using an ssh tunnel

When you run your proxy server on turing, the port you open is not reachable from the outside world. However, you can make a "tunnel" to that port from your local machine (laptop, home computer, etc). Then set up your local web browser to use the local machine's tunnel endpoint.

How to configure your web browser to use a proxy server

In all of these, you'll need to set the HTTP server to "turing.slu.edu" and set the port to the port your proxy server is using (see the assignment for your range of ports).

Firefox

On turing and the linux lab machines, you can set up Firefox to use proxies. Click the settings gear, choose Advanced, then click "Set up how Firefox connects to the internet", then choose Proxies.

Mac OS X

If you're using a Mac, you can set up an HTTP proxy in the Network pane of System Preferences, by choosing Advanced and then Proxies.

Windows

I haven't tested this, but you should be able to set up a proxy via the Tools menu of Internet Explorer. Choose Internet Options, then the Connections tab, and then the Settings button. Let me know if this works!

Running proxy in the LinuxLab

If you sit down at a linux lab machine, you are not using turing. So if you run a server, such as proxy or ilisten, that server is running at linuxlab##.mcs.slu.edu, and you'll need to use that address to connect to it, for example: telnet linuxlab4.mcs.slu.edu 9030 Since the linuxlab machines are on their own local network, connecting to a server on one of those machines will probably only work from turing or another machine in the lab.

Other notes:

Stream buffering

When you use fdopen() to turn a connected file descriptor into a FILE *, the FILE stream is buffered, which means that it holds data in a buffer until it has enough to be worth sending across the network. Read the man page for setlinebuf() for details.

With proxy, you will need to send information to a web server, and you need to ensure it gets through the buffer and actually sent across the network. You can do this manually with fflush(), which ensures that data is flushed out of the buffer and to the network. Or, you can use setlinebuf() to change the buffering behavior of the stream.

Connection: Close

In HTTP/1.0, a TCP connection was created for each request/response, which wasted resources creating lots of TCP connections. With HTTP/1.1, clients can send more than one request and receive more than one response over a TCP connection, using the Connection: Keep-Alive header. This is tricky to handle properly. If you don't handle it properly, you'll see pages load very (30+ sec) slowly as the browser waits for connections to timeout. For an easy workaround, change all "Connection:" headers to Connection: Close when forwarding the client request.

Don't use fgets/fputs to copy a server response

You're going to connect to a web server, send it a request, and then copy it's response back to the client. That web server can send back binary data, which may include 0 characters. fgets cannot handle embedded 0 characters, since those terminate a C string. If you use fgets/fputs, you'll find your proxy works for simple text only websites and fails for more complicated sites that compress their response or include images.

Instead, the easiest thing to do is to simply use read() and write() calls on the socket (as opposed to the FILE * created by fdopen()).

A more sophisticated approach would be to use fgets to read the response header, parse it to find the Content Length: field, and only then switch to read/write to copy the message body. This approach is not without it's hazards - see the BUGS section of the fgets man page if you plan to try it.

How do I return "404 Not Found"?

You need to generate a correctly formatted HTTP response. At the very least, that's

HTTP/1.1 404 Not Found
[blank line]

(where each line is terminated with "\r\n").

However, I found that this didn't actually display in the browsers. So, I sent a message body as well, which requires some extra work:

HTTP/1.1 404 Not Found
Content-Length: [calculated value]
Content-Type: text/html
[blank line]

and then printed some html code for the error response web page. Actually, I printed the html code into a string (sprintf) so I could use strlen() to get the Content-Length, then printed the string.

You can see my messages by running my proxy server, connecting with telnet, and giving it a request with a nonexistent Host: field, such as:

GET / HTTP/1.1
Host: not.really.a.host.at.all
[blank line]

You can see other webserver's error messages the same way.. use telnet to connect to port 80 on your favorite web server and send the above request.