What happens when you type google.com in your browser and press Enter?
Most of us take the Internet for granted, we know that if we type any website on an Internet browser we will get a response. However, come to think of it, the Internet is quite the miraculous creation. In order for a website to be loaded, many things must take place. We will now pull the veil behind this and look into what goes on in between you type “google.com” and you receive a response.
First of all, what exactly is the Internet? It’s a WAN (wide area network) connecting an enormous number of computers all over the planet sharing information among them. The information we are trying to load is contained within a server which is a computer that can receive requests from other computers and deliver data to them (also known as clients). You might wonder, if there are so many computers from where we can pull this information, how does the browser know from which one exactly this information can be pulled? This is where IP (Internet Protocol) addresses come into play. These IP addresses identify servers in the same way physical addresses identify buildings thus making it possible for your friendly postman to deliver your mail.
If you have visited the requested website before, then it will be stored in your browser’s cache which is similar to a contacts’ list, namely a list of names (which would be web domains) linked to a list of numbers (in our case, IP addresses). If it can’t be found here, the browser will ask the OS to check its cache. If google.com is not found in the OS’s the OS will start looking for it within DNS (Domain Name System). The first step will be to check with the resolver which is usually your Internet provider. In the case of highly trafficked websites, such as google.com, the resolver could have its value stored in its cache. Otherwise, since the resolver knows the IP address of the root server, it will start looking for the desired IP address there.
The root server has the IP address of TLD (Top-Level Domain) servers, such as .org, .com, .net and so on. Our domain name is .com, therefore, the resolver is redirected to that server which will contain all .com domain names. At the same time, the resolver will cache the IP address of the .com TLD in order to avoid requesting it from the root server again. In case the .com TLD does not contain the address directly it will direct the resolver to the authoritative name server.
The authoritative name server is property of the company which purchased the domain name. Even though the .com TLD server might not know the address of the website we are requesting it does know the server address of its authoritative name server. Why you may ask? Whenever you buy a domain, the domain registrar keeps its name and lets the TLD registry know which authoritative name server is associated with which address. Once the resolver is at the authoritative name server the resolver can get the desired IP address and provide it to the OS. At this point the OS will store it so that the resolver does not have to go through this same process again. Then, finally, the address is returned to the browser.
Let’s connect!
Now that the browser has the desired IP address, it must find a way to get there, same as when your postman has your address and needs to find a way to get to your house and deliver your mail.
TCP/IP, or Transmission Control Protocol/Internet Protocol, is a set of networking protocols for linking network computers over the internet. TCP/IP can also be used as a networking protocol in a private computer network (an intranet or an extranet).
Common TCP/IP protocols are:
- HTTP (Hyper Text Transfer Protocol) handles the communication between a web server and a web browser.
- HTTPS (Secure HTTP) handles secure communication between a web server and a web browser
- FTP (File Transfer Protocol) handles transmission of files between computers
The TCP/IP protocol assigns a unique “IP Address” (Internet Protocol Address) to each computer or device on a network, and allows each IP address to open and communicate over up to 65535 different “ports” for transmitting and receiving data to and from any other network device. The IP address is used to identify a network machine or device, whereas the “Port Number” is used to identify a specific connection between two computers or devices.
The TCP/IP protocol assigns a unique “IP Address” (Internet Protocol Address) to each computer or device on a network, and allows each IP address to open and communicate over up to 65535 different “ports” for transmitting and receiving data to and from any other network device. The IP address is used to identify a network machine or device, whereas the “Port Number” is used to identify a specific connection between two computers or devices.
At this point the browser takes the destination server’s IP address and the specified port number from the URL (the HTTP protocol defaults to port 80, and HTTPS to port 443), and calls the system library function socket to request a TCP socket stream — AF INET/AF INET6 and SOCK STREAM. This request is sent to the Transport Layer, which creates a TCP segment. The source port is chosen from the kernel’s dynamic port range (ip local port range in Linux), and the destination port is added to the header.
With the rise of cyber-attacks, offering a secure surfing experience has become a top priority for website owners, businesses, and Google. The tech giant marks websites without an SSL/TLS certificate loaded as “Not Secure,” which is supported by practically all other major browsers. But, what can you do to get rid of this security notice (or prevent it from ever showing on your website)? Use a tool that allows you to connect via port 443, utilizing a secure protocol.
This insecure connection warning message can be removed by installing an SSL certificate on the web server that hosts the site you’re trying to access. Between the client browser and the server, an SSL/TLS certificate establishes an encrypted, secure communication channel. The connection will be made over HTTPS using port 443, the next time you visit the site.
TCP uses a virtual numerical address called a port as a communication endpoint. Network ports assist devices recognize which service is being requested by directing traffic to the correct locations. For example, port 80 is responsible for all unencrypted HTTP web traffic. The communication channel between the browser and the server is encrypted when we use a TLS certificate to protect all sensitive data exchanges. All secure transfers use port 443, which is the standard for HTTPS transmission. HTTPS port 443, on the other hand, allows sites to be accessed via HTTP connections. If the site uses HTTPS but port 443 is unavailable for any reason, port 80 will be used to load the HTTPS-enabled site. However, Google uses secure connection by default.
Any transaction that occurs — for example, your account credentials (if you’re attempting to login to the site) — stays encrypted when your client browser submits a request to a website over a secure communication link. This means that an attacker on the network cannot read it. This occurs because the original data is encrypted and delivered to the server as ciphertext. Even if the traffic is intercepted, the attacker is left with garbled data that can only be decrypted using the appropriate decryption key.
HTTP over an SSL/TLS connection employs public key encryption to distribute a shared symmetric key, which is then utilized for bulk transmission. HTTPS port 443 is commonly used for TLS connections. Before establishing a connection, the browser and server must agree on the connection settings that will be used throughout communication. They reach an understanding using an SSL/TLS handshake:
- The procedure starts with the client browser and the web server exchanging hello messages.
- When protocol negotiation begins, both sides’ encryption standards are communicated, and the server shares its certificate.
- The client now has the server’s public key, which was retrieved from the certificate. Before utilizing the public key to build a pre-master secret key, it verifies the validity of the server certificate. The pre-master secret is then encrypted and communicated with the server using the public key.
- Both sides independently compute the symmetric key based on the value of the pre-master secret key.
- Both parties send a change cipher spec message indicating that they’ve calculated the symmetric key, and symmetric encryption will be used for bulk data transport.
Load balancing
Popular websites, such as Google, must handle hundreds of simultaneous queries and respond with accurate text, image, and video responses. The content is frequently dispersed across numerous servers to accommodate a huge number of requests. In front of these servers, a load balancer functions as a traffic officer, directing traffic to the correct server. It ensures that no server is overwhelmed and that all requests are delivered, delivering excellent availability and reliability. When a server goes down, it begins redirecting queries to other servers that are still operational.
Firewalls
A firewall is used by web servers to safeguard the system against breaches and attacks. If a problematic source begins flooding the web server with a significant number of concurrent requests, the firewall will detect the issue and prohibit requests from that IP address from reaching the web server.
Application Server
A server dedicated to running apps is known as an application server. A web server is built to serve web pages and is frequently optimized to do so. As a result, it may be unable to run resource-intensive online applications. The processing power and memory required to operate these apps in real time are provided by an application server. It also creates an environment in which specific apps can execute. A cloud service, for example, could need to process data on a Windows PC. Although a Linux-based server can provide the cloud service’s web interface, it cannot execute Windows programs. As a result, input data may be sent to a Windows-based application server. The data can be processed by the application server, which then sends the result to the web server, which can display it on a web browser.
Database
A database is a logically organized collection of structured data kept electronically in a computer system. A database management system is usually in charge of a database (DBMS). The data, the DBMS, and the applications that go with them are referred to as a database system, which is commonly abbreviated to just database.
In most occasions websites serve dynamic content which needs to be converted to a static file before serving it to the customer. At this point the application server pulls the information from a database and turns this information into a static file. Then it passes the information to the web server. When the file is sent to the browser, it interprets the HTML file and displays the website.
In a nutshell…
Now that we have gone through the main components and protocols involved in requesting and displaying a website it’s amazing that we are so used to it. Let’s wrap up the main things we discussed:
1- Your web server uses DNS in order to locate the IP address of google.com
2- TPC/IP protocol is used to connect the browser to the server
3- This server’s firewall restricts traffic going in and out
4- An encrypted connection takes place between the browser and the server
5- The load balancer redirects your request to a web server in order to keep traffic stable
6- The web server requests help from the application server in order to process the request
7- The application server pulls information from the database and turns it into a static file for the web server
8- The web server sends this file to your browser
9- Your browser turns this file into a web page