In the template development of AnQi CMS,indexA filter is a very practical tool that can help us locate the position of a specific substring in a string.However, when dealing with content containing Chinese characters, its performance in position calculation may confuse some users.Where a Chinese character is actuallyindexHow many positions does the filter occupy? What logic lies behind it? Next, let's delve deeper together.
indexThe principle of the filter.
In simple terms,indexThe filter is used to find the position of the first occurrence of another substring (or element) in a string (or array). If found, it returns the starting index of the substring (counting from 0); if not found, it returns-1This is very helpful for string truncation, positioning before replacement, or logical judgment.
For example, the positioning of an English string is very intuitive:
{{"Hello World"|index:"World"}}
{# 输出: 6 #}
Here, 'World' starts from index 6, which is consistent with our intuitive feeling.
Special consideration for Chinese string: one character takes up three positions.
The AnQi CMS is developed based on Go language, which uses UTF-8 encoding by default to handle strings.In the UTF-8 encoding standard, an English character typically occupies 1 byte, while a common Chinese character will occupy 2 to 4 bytes (in most cases, a Chinese character is 3 bytes).indexThe filter calculates the position based on the byte position of the string rather than the character count that we commonly understand.
This means, in your eyes, a 'character' position is inindexThe filter may be 3 'positions'. Understanding this feature, for accurately usingindexFiltering multilingual content is crucial.
Example demonstration: Deep understanding of position calculation
Let's understand this mechanism further through several specific examples:
Mixing Chinese and English strings:Consider the string
"欢迎使用安企CMS(AnQiCMS)", and find the substring"CMS"position.{{"欢迎使用安企CMS(AnQiCMS)"|index:"CMS"}} {# 输出: 18 #}Why is it 18? Let's break it down step by step:
"欢迎使用安企": Contains 4 Chinese characters, each character occupies 3 bytes, a total of4 * 3 = 12bytes."CMS"(The first one): 3 English characters, each occupies 1 byte, a total of3 * 1 = 3bytes."("A Chinese character occupies 3 bytes in total:1 * 3 = 3bytes.
The first occurrence of the substring "CMS" is at the byte position:
12(中文)+ 3(CMS)+ 3(()=18. This matches the output of theindexfilter perfectly.Pure Chinese string:If the string is completely composed of Chinese characters, the rule still applies.
{{"你好世界"|index:"世界"}} {# 输出: 6 #}here,
"你好"Occupies 2 Chinese characters,2 * 3 = 6bytes. Therefore,"世界"Starting from byte position 6.The case where the substring is not found:If the substring does not exist,
indexthe filter will return-1This is the same regardless of whether the string is English or Chinese.{{"安企CMS"|index:"内容管理"}} {# 输出: -1 #}
Why is it important to understand this?
UnderstandingindexThis calculation method based on bytes rather than characters is very crucial for template developers. It can help you:
- Avoid deviations when expected string slicing or matching:If you mistakenly think that a Chinese character only occupies 1 position, then when you cut based on
indexthe result, it may cause garbled text or incomplete cutting. - More accurately locate and operate on string content:After clarifying the calculation rules, you can write template logic more accurately.
- When dealing with internationalized content, predict the string processing results:Especially in multilingual websites, this awareness can help you better plan content display.
Practical suggestions and alternatives
AlthoughindexThe behavior of the filter is based on the UTF-8 characteristics of the underlying Go language, and we also have some alternative solutions or auxiliary tools to meet different needs:
Use for: Only to judge whether it contains:
containFilterIf you just want to check if a string contains a substring without caring about its specific position, thencontainThe filter would be a more concise and intuitive choice, it returns directlyTrueorFalse.{{"欢迎使用安企CMS"|contain:"CMS"}} {# 输出: True #}Use to get the actual character count of a string:
lengthFilterIf you need to get the actual character count of a string (not the byte count),lengthThe filter can help you. It will correctly count each Chinese character as 1 character.{{"你好世界"|length}} {# 输出: 4 #} {{"Hello World"|length}} {# 输出: 11 #}Character-by-character processing: use
make_listFilterFor scenarios that require complex character-by-character processing (for example, when you need to operate or display each character accurately), you may consider using firstmake_listThe filter converts a string to a character array and then performs traversal and operations.{% set chars = "你好世界"|make_list %} {% for char in chars %} {{char}} {% endfor %} {# 输出: 你好世界 (但此时char变量是单个字符) #}
In short, in Anqi CMS,indexThe filter treats Chinese string as occupying 3 positions when processing, which is the reflection of Go language UTF-8 encoding feature at the template level. Understanding this can make it more flexible and accurate to use.indexFilter and other related string processing tools to build a powerful and expectedly performing website.
Frequently Asked Questions (FAQ)
Q1: Why does a Chinese character occupyindex3 positions in the filter?
A1: The underlying development language of AnQi CMS is Go, Go language uses UTF-8 encoding by default to process strings.In UTF-8 encoding, a common Chinese character usually occupies 3 bytes.indexThe filter calculates the position based on these byte positions, rather than the character count we intuitively understand.
Q2: If I only want to check if a string contains a substring without caring about the specific position, what should I do?
A2: You can usecontaina filter. It will directly return a boolean value (TrueorFalseContains a substring target without concerning the specific position or byte count of Chinese characters.
Q3: How do I get the actual character count of a string (not byte position)?
A3: You can uselengthfilter.lengthThe filter correctly counts the actual number of characters in a string, including Chinese characters, which are also counted as 1 character, and thisindexis different from the byte-based calculation method of the filter.