Understanding HTML Encoding
HTML Encoding (often called HTML escaping) is the process of converting special characters into their corresponding HTML entities. This is essential for displaying characters like `<`, `>`, and `&` as text on a web page without the browser interpreting them as code.
For example, if you want to display the text "5 > 3" on a webpage, you must encode it as "5 > 3". If you don't, the browser will hide the text and try to render a tag.
Security & XSS Prevention
The most critical use case for HTML encoding is preventing Cross-Site Scripting (XSS) attacks. When a website accepts user input (like a comment or username) and displays it back to other users, a malicious user could inject scripts (e.g., <script>alert('hack')</script>).
By encoding the output before rendering it, the script is displayed as harmless text instead of being executed. This is a fundamental security best practice for all web developers.
Common HTML Entities
HTML uses a set of predefined entities to represent characters that have special meaning in markup. Here are the most common ones handled by this tool:
- < (Less Than) - Encodes the character `<`.
- > (Greater Than) - Encodes the character `>`.
- & (Ampersand) - Encodes the character `&`.
- " (Quote) - Encodes the double quote character `"`.
- ' (Apostrophe) - Encodes the single quote character `'`.
- (Non-Breaking Space) - Creates a space that does not wrap to the next line.
URL Encoding Explained
While HTML encoding protects data in the browser body, URL Encoding (also known as Percent Encoding) protects data in the URL address bar or query parameters.
URLs can only contain a limited set of characters (ASCII alphanumeric and some symbols like `-`, `_`, `.`, `~`). If you want to send other characters (like spaces or unicode) in a URL query parameter, they must be encoded.
How it Works
URL encoding replaces unsafe ASCII characters with a `%` followed by two hexadecimal digits. For example, a space character is converted to `%20`. This allows complex data strings to be transmitted safely over the internet without breaking the URL structure.