URL Analysis


Uniform Resource Locator (URL) is an application-level addressing scheme for TCP/IP, allowing for human-readable resource addressing.

  • points to the host or service location on the Internet
    • by domain name or IP address
  • can encode some action or data to submit to the server host
    • is a common vector for malicious activity
  • URLs can provide indicators of session hijacking/replay, forgery, and injection attacks

HTTP

  • important to understand how HTTP operates
    • HTTP session starts with a client (a user-agent, such as a web browser) making a request to an HTTP server
    • the connection establishes a TCP connection
      • TCP connection can
        • be used for multiple requests
        • or a client can start new TCP connections for different requests
    • request typically comprises:
      • a method
        • GET—retrieve a resource
        • POST—send data to the server for processing by the requested resource
        • PUT—create or replace the resource
      • a resource (such as a URL path)
      • version number
      • headers
      • and body
    • Data can be submitted to a server either:
      • by using a POST or PUT method and the HTTP headers and body
      • or by encoding the data within the URL used to access the resource
    • Data submitted via a URL is delimited by ?
      • follows the resource path
    • Query parameters are usually formatted as one or more name=value pairs
      • ampersands & delimit each pair
    • server response comprises:
      • the version number
      • a status code and message
      • plus optional headers
      • and message body
    • HTTP response code is the header value returned by a server when a client requests a URL

Percent Encoding

  • URL can contain only unreserved and reserved characters from the standard set
  • Reserved characters:
    • are used as delimiters within the URL syntax
    • should only be used unencoded for those purposes
    • : / ? # [ ] @ ! $ & ' ( ) * + , ; =
  • unsafe characters:
    • cannot be used in a URL
    • Control characters are unsafe
      • e.g., null string termination, carriage return, line feed, end of file, and tab

Percent encoding is a mechanism for encoding characters as hexadecimal values delimited by the percent sign.

  • allows a user-agent to submit any safe or unsafe character to the server within the URL
  • uses:
    • encode reserved characters within the URL when they are not part of the URL syntax
    • to submit Unicode characters
  • can be misused to obfuscate the nature of a URL (encoding unreserved characters) and submit malicious input