In the template development of Anqi CMS,indexA filter is a very practical tool that helps us locate the position of a specific substring in a string.However, when dealing with content containing Chinese characters, its performance in position calculation may puzzle some users.indexHow many positions do we occupy in the filter? What kind of logic is hidden behind it? Let's delve deeper together next.

indexThe working principle of the filter

In simple terms,indexThe filter is used to find the first occurrence position of a substring (or element) in a string (or array). If found, it returns the starting index of the substring (counting from 0); if not found,-1This is very helpful for string slicing, locating before replacement, or logical judgment.

For example, the location of an English string is very intuitive:

{{"Hello World"|index:"World"}}
{# 输出: 6 #}

Here, "World" starts from index 6, which is consistent with our intuitive perception.

Special considerations for Chinese strings: one Chinese character occupies three positions.

The underlying Anqi CMS is developed in Go language, and Go language uses UTF-8 encoding by default to handle strings.In the UTF-8 encoding standard, an English character typically occupies 1 byte, while a common Chinese character will occupy 2 to 4 bytes (in most cases, one Chinese character is 3 bytes).indexThe filter calculates the position based on the byte position of the string rather than the number of characters we commonly understand.

This means, in your eyes, the position of a 'character' isindexThe filter may be 3 'positions'. Understanding this feature is essential for usingindexFilter processing of multilingual content is crucial.

Example demonstration: Deep understanding of position calculation

Let us further understand this mechanism through several specific examples:

  1. Mixed Chinese and English string:Consider the string"欢迎使用安企CMS(AnQiCMS)", and find the substring"CMS"at the position.

    {{"欢迎使用安企CMS(AnQiCMS)"|index:"CMS"}}
    {# 输出: 18 #}
    

    Why is it 18? Let's break it down step by step:

    • "欢迎使用安企": It contains 4 Chinese characters, each character occupies 3 bytes, a total of4 * 3 = 12bytes.
    • "CMS"(第一个):3个英文字符,每个占用1个字节,共3 * 1 = 3bytes.
    • "(":1个中文字符,占用3个字节,共1 * 3 = 3bytes.

    所以,子串”CMS”首次出现的字节位置是:12(中文)+ 3(CMS)+ 3(()=18. This is the opposite ofindexThe output of the filter perfectly matches.

  2. Pure Chinese string:The rule still applies if the string is completely composed of Chinese characters.

    {{"你好世界"|index:"世界"}}
    {# 输出: 6 #}
    

    Here,"你好"Contains 2 Chinese characters, occupying2 * 3 = 6bytes."世界"Start from byte position 6.

  3. When the substring is not found:If the substring does not exist,indexThe filter will return-1There is no difference whether the string is English or Chinese.

    {{"安企CMS"|index:"内容管理"}}
    {# 输出: -1 #}
    

Why is it important to understand this?

UnderstandingindexThis filtering method based on bytes rather than characters is very critical for template developers. It can help you:

  • Avoid deviations when performing expected string slicing or matching:If you mistakenly believe that a Chinese character only occupies 1 position, then when you cut based onindexthe result, it may cause garbled text or incomplete cutting.
  • More accurately locate and operate string content:After clarifying the calculation rules, you can write template logic more accurately.
  • When handling internationalized content, predict the string processing results:Especially in multilingual websites, this cognition can help you better plan content display.

Practical suggestions and alternative solutions

AlthoughindexThe behavior of the filter is based on the underlying Go language's UTF-8 features, and we also have some alternative solutions or auxiliary tools to meet different needs:

  1. Only judge whether it contains: usingcontainFilterIf you just want to check if a string contains a substring without caring about its specific position, thencontainthe filter would be a more concise and intuitive choice, it returns directlyTrueorFalse.

    {{"欢迎使用安企CMS"|contain:"CMS"}}
    {# 输出: True #}
    
  2. Get the actual character count of a string: uselengthFilterIf you need to get the actual character count of a string (not the byte count),lengthThe filter can help you. It will correctly calculate each Chinese character as 1 character.

    {{"你好世界"|length}}
    {# 输出: 4 #}
    {{"Hello World"|length}}
    {# 输出: 11 #}
    
  3. Character-by-character processing: usemake_listFilterFor scenarios that require complex character-by-character processing (such as, needing to operate or display each character precisely), you may consider using firstmake_listThe filter converts a string into a character array and then iterates and operates on it.

    {% set chars = "你好世界"|make_list %}
    {% for char in chars %}
        {{char}}
    {% endfor %}
    {# 输出: 你好世界 (但此时char变量是单个字符) #}
    

In short, in the Anqi CMS,indexThe filter treats Chinese characters as occupying 3 positions during processing, which is the manifestation of Go language's UTF-8 encoding feature at the template level. Understanding this point can make use of it more flexibly and accurately.indexFilters and other related string processing tools, build powerful and expected performance websites.


Common Questions (FAQ)

Q1: Why does a Chinese character count as 3 positions inindexthe filter? A1:The underlying development language of Anqi CMS is Go, and Go language uses UTF-8 encoding by default to handle strings.In UTF-8 encoding, a common Chinese character usually occupies 3 bytes.indexThe filter calculates positions based on these byte positions, not the number of characters as we intuitively understand.

Q2: If I only want to check if a string contains a substring without caring about the specific position, what should I do? A2:You can usecontainFilter. It will directly return a boolean value (TrueorFalseIndicates whether the string contains the target substring without concerning its specific position or the byte length of Chinese characters.

Q3:How do I get the actual character count of a string (not the byte position)? A3:You can uselengthFilter.lengthThe filter correctly counts the actual number of characters in a string, including Chinese characters, which are counted as 1 character each, unlikeindexThe filter calculates based on byte length.