What Happens When You Request a Webpage

Okay, with all the securing all the things posts, it occurs to me that I haven’t actually explained this. (I did, however, explain this in depth for an interview at Large Internet Company, so it’s probably higher-level than I thought.) One of my first jobs included writing CGIs (web-based programs) that logged into and/or proxied other sites for our authenticated users. (That’s legit; it was a college that had paid for an online database service and wanted to provide access to their students. There are now products (google EZProxy, for example) that do this, but this was back in the wild wild west frontier. But I digress.)

Okay. You fire up your computer from a dead sleep. You launch a web browser and point it at http://www.google.com. What happens then?

The first thing that happens is your computer asks your DNS (Domain Name Server) for the IP address of www.google.com. Because these names are easy for humans to remember but computers still use the IP address. (You know what an IP address is, right? It’s a number that looks like 123.122.12.123–four numbers between 1 and 255 inclusive separated by dots. These numbers allow your computer to actually find the remote server on the internet. I’ll just leave that there for you to take as written unless someone really cares, LOL.) Unless you’re using something like Open DNS or Google’s DNS–basically, unless you change what your ISP gave you when they gave you your IP address–that DNS server is probably run by your ISP.

Your DNS server doesn’t actually store any IP addresses permanently (unless they’re Authoritative for a particular domain, like your ISP, and they might/probably actually separate those out between authoritative DNS servers and client query DNS servers). If someone else has requested www.google.com recently, however, it stores the answer for however long Google’s DNS servers told it it could cache the answer (this is called TTL, or Time To Live). If no one has requested www.google.com recently, or if the old answer has expired, it goes out to the servers that are authoritative for the .com domains and says, “Who’s authoritative for google.com?” Those domain servers give your local DNS server that IP, and your local DNS server goes out to that IP and says, “What’s the IP for www.google.com?” Google’s DNS servers give them an IP address and how long they can store that answer. When your local DNS server has an answer it likes, it gives that answer back to you.

Your web browser than opens a connection to that IP address on port 80 (the port for unencrypted web browsing traffic) and requests the main page (“GET /”, perhaps with a status number and header information about your browser and what kind of media you’re willing to accept and such, but the “GET /” is all that’s required. If you have telnet, you can actually do this manually and see what you get back). The www.google.com web server sends you back an unencrypted HTML file and closes the connection. If there are embedded images, your browser requests each of them in turn while it loads the page.

If you have a fast connection, this all typically happens in milliseconds. Well. Unless the page is full of ginormous images, hahaha.

If you instead request https://www.google.com, the DNS dance is the same but there are some additional moves on the actual file requests. Your browser gets the web site’s public key from the web server and uses that to request a symmetric key to encrypt the traffic, and then moves on to the file requests and deliveries. Oh, and this happens on 443 instead of 80, because that’s the default port. Unless your site has numbers on the end of the URL, like https://my401kprovider.com:8124 or something.

Aside: I previously mentioned a “man in the middle” attack; that’s where someone steps in between you and the web server and sends you a fake key and does the encrypt and decrypt dance between you, proxying the content. This is part of why you should pay attention to whether or not that padlock is green in your URL window, yes. There’s a whole system of who certifies whether the certificate is good; there are certificate authorities and your browser has some public keys for the authorities stored. (Some malware messes with those, too. Your antivirus might also be man-in-the-middling you to scan encrypted content for viruses. There’s some debate as to whether or not this is benign, and whether the advantages outweigh the risks–I think it depends on the user. You can tell both in your AV settings and by clicking on the padlock and asking for more info. Some AV companies also sell certs, so if you have Symantec or Comodo and the cert is from Symantec or Comodo that doesn’t prove anything, but if it says Avast, yeah, that’s their HTTPS scanning feature at work.)

The email client dance is similar; There’s the DNS dance of “who is mail.yourmailserver.com?” followed by really short requests sent by the software to authenticate you (your username and password) and get the content, and a similar encryption dance if you’re using encryption (which I recommend, yes). You can use telnet to send mail from the command line, which someone did in a previous embedded video.

Leave a Comment

Filed under written for nontechnical friends

Leave a Reply

Your email address will not be published. Required fields are marked *