In website operation, the construction and processing of URL (Uniform Resource Locator) is a fundamental and key task.Especially when dynamically generating links, handling user input as parameters, it is crucial to encode URLs correctly, which can ensure the validity of the links, prevent garbled text and potential security issues.AnQiCMS (AnQiCMS) providedurlencodeandiriencodeThese two filters help us better manage special characters in URLs. Although they are both used for encoding, their application scenarios and processing methods are different.

urlencodeFilter: Strict Percent-encoding

First, let's understandurlencodeFilter. Its main function is to perform standard URL percent-encoding (percent-encoding).This means, any character not in the URL safe character set (usually letters, numbers, and a few punctuation symbols like- . _ ~Characters within the ) range will be converted to%xx(wherexxThe hexadecimal ASCII value of the character).

This encoding method is very strict and comprehensive, its goal is to ensure that all characters in the URL can be safely transmitted and parsed by network protocols, avoiding ambiguity. For example, spaces, Chinese characters, and other characters cannot be included directly in the URL.&Symbol (because it is a parameter separator),=Symbols (because they are key-value separators) and so on. If these characters appear in a URL without encoding, it may cause links to break, parameter parsing errors, or even security vulnerabilities.

Application scenarios:

  • Encode the entire URL or query string:When you need to pass a complete URL string as a parameter to another URL (for example, during redirection or tracking) or to encode the entire query string to ensure its integrity, urlencodeit is an ideal choice.
  • Encode a single query parameter value:The most common scenario is that users enter Chinese characters, including spaces or special symbols, in the search box. To safely pass these keywords as URL parameters, you need tourlencode.
    • Example:Suppose the user searches for "Anqi CMS official website", if entered directly into the URL, it may cause problems.http://example.com/search?q=安企 CMS 官网UseurlencodeAfter:http://example.com/search?q=%E5%AE%89%E4%BC%81%20CMS%20%E5%AE%98%E7%BD%91(Here, the%20representing a space,%E5%AE%89etc. represent Chinese characters)
  • Ensure that all unsafe characters are processed:Use when you are unsure about the character set of the input content:urlencodeCan provide the highest level of security, avoiding any unexpected characters that may cause the URL to fail.

Using in Anqi CMS template,urlencodeThe way the filter works is as follows:

{{ "http://www.example.org/foo?a=b&c=d"|urlencode }}
{# 输出: http%3A%2F%2Fwww.example.org%2Ffoo%3Fa%3Db%26c%3Dd #}

{{ "我的搜索关键词"|urlencode }}
{# 输出: %E6%88%91%E7%9A%84%E6%90%9C%E7%B4%A2%E5%85%B3%E9%94%AE%E8%AF%8D #}

iriencodeFilter: structure-preserving internationalized encoding

iriencodeThe filter provides a relatively lenient encoding method, which is mainly used to process IRI (Internationalized Resource Identifier, an internationalized resource identifier).IRI is a superset of URL, allowing more Unicode characters in identifiers to support various languages globally.iriencodeWhen encoding, some structural special characters in the URL are retained, and only other characters that need to be encoded are processed.

According to the Anqi CMS document instructions,iriencodewill be retained/#%[]=:;$&()+,!?*@'~The original appearance of these characters, while escaping the other characters for URL parameters.This means it will be able to identify the structure of URLs more intelligently, avoiding encoding of characters that act as separators or have specific meanings, thereby maintaining the readability and structural integrity of URLs.

Application scenarios:

  • Encode the segments in the URL path:When the URL path contains Chinese or special characters, you may want to use path delimiters/to maintain the path structure.
    • Example: http://example.com/产品分类/电子产品Useiriencodemay include产品分类and电子产品Chinese characters are encoded but retained/:http://example.com/%E4%BA%A7%E5%93%81%E5%88%86%E7%B1%BB/%E7%94%B5%E5%AD%90%E4%BA%A7%E5%93%81
  • Internationalized domain names or paths:If your website uses Chinese or other non-ASCII character domain names or paths (such as.公司or/新闻标题)iriencodeIt is more suitable for handling these internationalized elements as it is designed to be compatible with a wider character set.
  • It is necessary to retain the specific URL structure characters:In the construction of some complex URLs, you may explicitly know that certain characters (such as:/=/&etc.) are part of the URL structure and should not be encoded.iriencodeIt can encode other unsafe characters without destroying these structural characters.
  • The scenarios of HTML entity encoding in a specific environment:Although the name isiriencode, but the examples given in the document"?foo=123&bar=yes"|iriencodeoutput?foo=123&bar=yesThis indicates that in some cases, it may also perform HTML entity encoding (such as&to&). If your final output is directly embedded in HTML and you need to process&Characters such as these are encoded as HTML entities rather than URL percent encoding, which may be a hidden feature or specific behavior. But in typical URL encoding scenarios,&Generally not converted&. It is recommended to verify the actual output effect when using it

In AnQi CMS