Adaptively handle: Security CMS multilingual content inindexThe uniqueness of filter and Chinese position calculation

When using the Safe CMS to manage multilingual content, flexibly using its powerful template engine and rich filters can greatly enhance content operation efficiency. Among them,indexFilter is a very practical tool that helps us quickly locate the first occurrence position of a specific substring within a string. However, when our content involves Chinese characters,indexFilter has some unique behaviors in position calculation, understanding these can help us process multilingual text more accurately.

indexBasic usage of the filter

Firstly, let's review.indexFilter basic function.It accepts two parameters: the source string and the substring to be searched.If the substring is found, it will return the starting position (index from 0) of the first occurrence of the substring in the source string; if not found, it will return -1.

For example, search for a word in an English sentence:{{ "Welcome to AnQiCMS"|index:"CMS" }}This code will return11Because “CMS” in “Welcome to AnQiCMS” starts from the 11th character (index 11), which is consistent with our intuitive character count.

However, when the source string contains Chinese characters,

However, when the source string contains Chinese characters,indexThe filter shows its special features when calculating the position.In the template environment of AnQi CMS, a Chinese character is usually stored using multiple bytes in the underlying storage (a Chinese character typically occupies 3 bytes in UTF-8 encoding).indexThe filter, when performing position calculation, may locate based on this underlying byte representation in some cases, rather than the visual character count we usually understand.

This means, when we useindexThe filter finds a substring, and if there are Chinese characters before the source string, its return position may be larger than the character index we intuitively calculate.In particular, in some processing scenarios, a Chinese character may be calculated as 3 positions (that is, 3 bytes).

Let's look at a practical example: Suppose we have a string mixed with Chinese and English:"欢迎使用安企CMS(AnQiCMS)"We continue to useindexFilter to find"CMS":{{ "欢迎使用安企CMS(AnQiCMS)"|index:"CMS" }}You may find that the result of this expression is18.

If we count the number of characters as we usually do visually:

  • Huan (1)
  • Ying (2)
  • Shi (3)
  • 用 (4)
  • 安 (5)
  • 企 (6)
  • C (7)
  • M (8)
  • S (9)
  • ( (10)
  • A (11)
  • n (12)
  • Q (13)
  • i (14)
  • C (15)
  • M (16)
  • S (17)
  • ) (18)

At first glance, the number of visual characters in this example isindexThe results returned seem to match. This indicates that when processing this specific string,indexThe filter may process the Chinese character part before calculating "CMS", and may also be processed according to the number of characters, or it may just coincidentally be the same as the number of bytes.

However, the document clearly states thatIf there is Chinese in the string, one Chinese character counts as 3 positions when calculating the position.This reminds us that when performing more complex string operations, especially involvingsliceWhen filters need to be precise about start and end positions, one must be vigilant about this internal byte calculation mechanism. For example, if you are going to useslicethe filter fromindexto start extracting the string from the returned position, whileindexThe return is a byte position, so using it directly may cause Chinese characters to be truncated, resulting in garbled or incomplete text.

Considerations and suggestions in practical applications

In the operation of the multilingual website of AnQi CMS, understandingindexthe filter's feature is particularly important for scenarios that require precise control of content display.

  1. Dynamic content extraction:If you need to dynamically truncate a string based on the position of a keyword, especially when the source string is a mix of Chinese and English, do not directly useindexthe result assliceThe parameter may result in unexpected consequences.In this case, it is recommended that you perform sufficient testing, or consider additional processing in the template logic to ensure the integrity of Chinese characters.
  2. Avoid truncating after directly calculating the length:When you want to pass throughindexFind the position of the substring, then cooperate withsliceTo cut the front or back part, be especially careful. If the string starts with Chinese characters,indexCalculating by bytes, you may need to adjust the length of the substring by multiplying the actual number of Chinese characters by 3.
  3. Safe handling of multilingual content:AutoCMS supports rich multi-language promotion features. When handling template content in different languages, be sure to pay attention to these differences. For single-byte characters such as English and numbers,indexThe performance is usually consistent with intuition; however, extra care is needed for languages like Chinese, Japanese, and Korean that may use multi-byte characters.
  4. Make good use of other filters:If you just want to determine if a substring exists without needing its specific position,containfilters (such as{{ "欢迎使用安企CMS"|contain:"CMS" }}ReturnsTrueIt will be a simpler, safer alternative, as it does not involve complex location calculations.

AutoCMS is an efficient and customizable content management system, and its multilingual support provides strong support for enterprises to expand into the international market. MasterindexThe subtle details of the filter in processing Chinese characters can help us understand the system mechanism more deeply, thereby developing more robust and expected multilingual templates.Before performing any complex string operations, it is always wise to conduct thorough testing and verification.


Common Questions and Answers (FAQ)

Q1: WhyindexFilter has this special behavior that 'one Chinese character counts as 3 positions' when processing Chinese characters?

A1: Secure CMS at the bottom uses Go language for development, Go language strings are stored in UTF-8 encoding by default.Under UTF-8 encoding system, a Chinese character usually occupies 3 bytes.indexThe filter locates the position in the calculation, in some contexts, based on the underlying bytes rather than the character count we see intuitively.This behavior is the manifestation of the interaction between low-level encoding and high-level logic, ensuring the accuracy of data processing, but we users need to pay attention to the differences at the template level.

Q2: BesidesindexFilter, are there any filters related to string position or length that need special attention when processing Chinese characters?

A2:sliceFilters are directly affectedindexThe filter results affect, if you useindexThe position returned assliceof