Anatomy of a URL

Overview

I often find that there are some misconceptions regarding the structure of a URL and want to produce a clear definition of a properly formatted URL along with an explanation of the constituent parts and the purpose they serve.

URL is an acronym of Uniform Resource Locator and contains all the information required to identify the server responsible for responding to the request along with the information that it needs to locate the specific resource.

Dissecting the URL

Here’s a diagram to dissect the URL:

vouzamo-wordpress-com-anatomy-of-a-url

Scheme

The scheme (also referred to as protocol) describes the protocol that should be used. Common protocols include http (Hypertext Transfer Protocol), https (Secure Hypertext Transfer Protocol), ftp (File Transfer Protocol).

Every URL requires a scheme but modern browsers have made us lazy by automatically adding a scheme and absolute root (usually http://) when an authority is entered into the address bar.

Absolute Root

This is required for any absolute URL and, if omitted, would result in a relative URL. Modern browsers will add the absolute root to requests typed into the address bar appended to the default scheme. The double forward slash ‘//’ precedes an authority.

A common usage of the absolute root is to request a resource with the same scheme as the current resource without having to include it explicitly (e.g. ‘//www.example.com/some-image.png’). This is particularly useful on HTML pages which can be served using either the http or https protocols and need to request their associated resources accordingly.

Authority

This should be familiar as every URL we type into an address bar will contain one (unless we use the corresponding IP address instead). The authority is the address of a server on the network that can be looked up using DNS (Domain Name System).

Here’s a diagram to dissect the authority:

vouzamo-wordpress-com-anatomy-of-a-hostname

Subdomain

This is optional but there are some common subdomains including www, mail, ftp etc. A request to http://www.example.com is a different to a request to example.com but because the www subdomain is so commonly associated with the Domain the domain administrator will often configure a redirect from example.com to http://www.example.com.

Domain

This must be unique for each top level domain (e.g. .com) and often describes the registered authority.

Top Level Domain

A top level domain is one of the domains at the highest level in the hierarchical Domain Name System.

Port

The port is required for all requests but modern browsers will include a default automatically for certain schemes. It will include port 80 for the http scheme, port 443 for the https scheme, and port 21 for the ftp scheme. A colon will prefix the port number in a URL.

Path

The path informs the server of the specific location where it can retrieve the resource. It must start with a forward slash ‘/’ and historically denoted a file system location relative to a web application root.

The simplest path is ‘/’ and it is possible for multiple paths to reference the same resource e.g. ‘/about-us’, ‘/about-us/’, and ‘/about-us/index.html’ could all refer to the same resource but this shouldn’t be taken for granted. It depends how the server has been configured and these are different addresses and it is common for administrators to configure redirects from ‘/{path}’ and ‘/{path}/’ to ‘/{path}/{default document}’.

Query

The query exposes some parameters for the server to consume and must be prefixed with a question mark ‘?’. The format of a parameter is ‘{parameter}={value}’. Multiple parameters can be specified by using ampersand ‘&’ as a delimiter and the same parameter can be included multiple times with different values.

Fragment

The fragment is never sent to the server and is the final part of a URL. Its purpose is to identify a specific location within a resource. There can only be a single fragment optionally appended to the URL and it must be prefixed with a hash ‘#’.

What are Methods?

Methods are specific to the Hypertext Transfer Protocol (including https) and provide a way of differentiating between different intentions for the same URL. We use verbs to describe these intentions and the variations are below:

vouzamo-wordpress-com-anatomy-of-the-methods

HEAD

This is identical to the GET method with the exception that the server must NOT return a response body.

GET

This is the most commonly used method and the method that is used when entering an address into a browser. It will use the URL to locate and return a resource.

POST

This method is used to send information to the server. The function performed by the POST method is determined by the server and may not result in a resource that can be identified by the URL. In this case a status code in the response can describe success or failure.

PUT

This is similar to the POST method but, if the URL refers to an already existing resource, the server should consider this a modified version.

DELETE

This method informs the server to delete any resource at the specified URL. If the deletion is deferred instead of being deleted immediately then an accepted status code can be used in the response.

CONNECT

This method can be used with a proxy to switch to being a tunnel (e.g. SSH Tunneling).

OTHERS

There are other methods such as TRACE and methods are subject to being extended by the W3C HTTP specification in the future. See https://www.w3.org/Protocols/ for more information.