URL Analysis
Uniform Resource Locator (URL) is an application-level addressing scheme for TCP/IP, allowing for human-readable resource addressing.
- points to the host or service location on the Internet
- by domain name or IP address
- can encode some action or data to submit to the server host
- is a common vector for malicious activity
- URLs can provide indicators of session hijacking/replay, forgery, and injection attacks

HTTP
- important to understand how HTTP operates
- HTTP session starts with a client (a user-agent, such as a web browser) making a request to an HTTP server
- the connection establishes a TCP connection
- TCP connection can
- be used for multiple requests
- or a client can start new TCP connections for different requests
- TCP connection can
- request typically comprises:
- a method
- GET—retrieve a resource
- POST—send data to the server for processing by the requested resource
- PUT—create or replace the resource
- a resource (such as a URL path)
- version number
- headers
- and body
- a method
- Data can be submitted to a server either:
- by using a POST or PUT method and the HTTP headers and body
- or by encoding the data within the URL used to access the resource
- Data submitted via a URL is delimited by
?- follows the resource path
- Query parameters are usually formatted as one or more name=value pairs
- ampersands
&delimit each pair
- ampersands
- server response comprises:
- the version number
- a status code and message
- plus optional headers
- and message body
- HTTP response code is the header value returned by a server when a client requests a URL
Percent Encoding
- URL can contain only unreserved and reserved characters from the standard set
- Reserved characters:
- are used as delimiters within the URL syntax
- should only be used unencoded for those purposes
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
- unsafe characters:
- cannot be used in a URL
- Control characters are unsafe
- e.g., null string termination, carriage return, line feed, end of file, and tab
Percent encoding is a mechanism for encoding characters as hexadecimal values delimited by the percent sign.
- allows a user-agent to submit any safe or unsafe character to the server within the URL
- uses:
- encode reserved characters within the URL when they are not part of the URL syntax
- to submit Unicode characters
- can be misused to obfuscate the nature of a URL (encoding unreserved characters) and submit malicious input