Deftly handle: Anqi CMS multilingual content inindexThe unique aspects of Chinese position calculation with filters

When using AnQi CMS to manage multilingual content, flexibly using its powerful template engine and rich filters can greatly enhance content operation efficiency. Among them,indexThe filter is a very practical tool that can help us quickly locate the first occurrence of a specific substring in a string. However, when our content involves Chinese characters,indexThe filter has some unique behavior in position calculation, understanding these can help us handle multilingual text more accurately.

indexBasic usage of the filter.

First, let's take a look backindexThe basic function of the filter. It accepts two parameters: the source string and the substring to be searched.If the substring is found, it will return the starting position of the first occurrence of the substring in the source string (index starts from 0).If not found, return -1.

For example, searching for a word in an English sentence:{{ "Welcome to AnQiCMS"|index:"CMS" }}This code will return11Because "CMS" starts from the 11th character (index 11) in "Welcome to AnQiCMS", which matches our intuitive character count.

Special calculation characteristics brought by Chinese characters

However, when the source string contains Chinese characters,indexThe filter shows its special characteristics when calculating position. In the template environment of AnQi CMS, a Chinese character usually occupies multiple bytes when stored at the bottom level (under UTF-8 encoding, a Chinese character usually occupies 3 bytes).indexThe filter performs location calculations and in some cases, it relies on the underlying byte representation for positioning rather than the visual character count that we commonly understand.

This means, when we useindexThe filter finds a substring, and if there are Chinese characters before the source string, its return position may be larger than the character index we intuitively calculate with our eyes.In particular, under certain processing scenarios, a Chinese character may be calculated as 3 positions (i.e., 3 bytes).

Let's look at a real example: Suppose we have a mixed Chinese and English string:"欢迎使用安企CMS(AnQiCMS)"We continue to useindexa filter to find"CMS":{{ "欢迎使用安企CMS(AnQiCMS)"|index:"CMS" }}You may find that the result returned by this expression is18.

If we count the number of characters according to our usual visual impression:

  • Huan (1)
  • Ying (2)
  • Shi (3)
  • Use (4)
  • Safe (5)
  • Company (6)
  • C (7)
  • M (8)
  • S (9)
  • LEFT PARENTHESIS (10)
  • A (11)
  • n (12)
  • Q (13)
  • i (14)
  • C (15)
  • M (16)
  • S (17)
  • )

At first glance, the number of visual characters in this example is consistent withindexThe results returned seem to match. This indicates that when processing this specific string,indexThe filter calculates the Chinese character part before "CMS", it may also be processed according to the number of characters, or it may just happen to coincide with the number of characters and bytes at some boundary.

However, the document explicitly states that...If there are Chinese characters in the string, each Chinese character counts as 3 positions when calculating the position.This reminds us, when performing more complex string operations, especially involvingsliceWhen precise start and end positions are required for filters, you must be aware of this internal byte calculation mechanism. For example, if you want to useslicefilters fromindexstart cutting the string from the returned position, andindexIt returns a byte position, using it directly may cause Chinese characters to be truncated, resulting in garbled or incomplete characters.

Considerations and suggestions in practical applications.

Understand in the operation of the AnQi CMS multilingual website,indexThis feature of the filter is particularly important for scenarios that require precise control of content display.

  1. Dynamic content extraction:If you need to dynamically truncate a string based on the position of a keyword, especially when the source string is a mix of Chinese and English, use it directlyindexas the result is used assliceThe parameter may result in unexpected results. In this case, it is recommended that you conduct sufficient testing or consider additional processing in the template logic to ensure the integrity of Chinese characters.
  2. Avoid directly calculating the length after truncating:When you want to go throughindexFind the substring position and then cooperate:sliceBe careful when truncating the front or back part. If the string starts with Chinese characters, andindexWhen calculated by bytes, the length of your truncation may need to be adjusted by multiplying the actual number of Chinese characters by 3.
  3. Safe handling of multilingual content:AnQi CMS supports rich multilingual promotion features. When dealing with template content in different languages, be sure to pay attention to these differences. For English, numbers, and other single-byte characters,indexThe performance usually conforms to intuition; however, extra care is needed for languages such as Chinese, Japanese, and Korean that may use multi-byte characters.
  4. Make good use of other filters:If you just want to check if a substring exists without needing its specific position,containFilter (such as{{ "欢迎使用安企CMS"|contain:"CMS" }}ReturnTrueIt would be a simpler, safer alternative, as it does not involve complex location calculations.

AnQi CMS as an efficient and customizable content management system, its multilingual support provides strong support for enterprises to expand into the international market. MasteringindexThe subtle details of the filter in processing Chinese characters can help us better understand the system mechanism, thereby developing more robust and expected multilingual templates.Before performing any complex string operations, it is always wise to conduct thorough testing and verification.


Frequently Asked Questions (FAQ)

Q1: WhyindexDoes the filter have this special behavior where a Chinese character counts as 3 positions when processing Chinese characters?

A1: The AnQi CMS is developed using Go language, the strings in Go language are stored by default in UTF-8 encoding.Under the UTF-8 encoding system, a Chinese character usually occupies 3 bytes.indexThe filter calculates the position based on the underlying bytes in some contexts, rather than the character count we see intuitively.This behavior is the manifestation of interaction between low-level encoding and high-level logic, ensuring the accuracy of data processing, but it requires our users to pay attention to the differences at the template level.

Q2: BesidesindexFilters, are there any filters related to string position or length that need to be particularly attentive when processing Chinese characters?

A2:sliceFilters are directly affectedindexThe filter results are affected, if you useindexReturn position assliceof