What is the speciality of the `index` filter for Chinese position calculation when processing multi-language content?

Calendar 👁️ 66

Deftly handle: Anqi CMS multilingual content inindexThe unique aspects of Chinese position calculation with filters

When using AnQi CMS to manage multilingual content, flexibly using its powerful template engine and rich filters can greatly enhance content operation efficiency. Among them,indexThe filter is a very practical tool that can help us quickly locate the first occurrence of a specific substring in a string. However, when our content involves Chinese characters,indexThe filter has some unique behavior in position calculation, understanding these can help us handle multilingual text more accurately.

indexBasic usage of the filter.

First, let's take a look backindexThe basic function of the filter. It accepts two parameters: the source string and the substring to be searched.If the substring is found, it will return the starting position of the first occurrence of the substring in the source string (index starts from 0).If not found, return -1.

For example, searching for a word in an English sentence:{{ "Welcome to AnQiCMS"|index:"CMS" }}This code will return11Because "CMS" starts from the 11th character (index 11) in "Welcome to AnQiCMS", which matches our intuitive character count.

Special calculation characteristics brought by Chinese characters

However, when the source string contains Chinese characters,indexThe filter shows its special characteristics when calculating position. In the template environment of AnQi CMS, a Chinese character usually occupies multiple bytes when stored at the bottom level (under UTF-8 encoding, a Chinese character usually occupies 3 bytes).indexThe filter performs location calculations and in some cases, it relies on the underlying byte representation for positioning rather than the visual character count that we commonly understand.

This means, when we useindexThe filter finds a substring, and if there are Chinese characters before the source string, its return position may be larger than the character index we intuitively calculate with our eyes.In particular, under certain processing scenarios, a Chinese character may be calculated as 3 positions (i.e., 3 bytes).

Let's look at a real example: Suppose we have a mixed Chinese and English string:"欢迎使用安企CMS(AnQiCMS)"We continue to useindexa filter to find"CMS":{{ "欢迎使用安企CMS(AnQiCMS)"|index:"CMS" }}You may find that the result returned by this expression is18.

If we count the number of characters according to our usual visual impression:

  • Huan (1)
  • Ying (2)
  • Shi (3)
  • Use (4)
  • Safe (5)
  • Company (6)
  • C (7)
  • M (8)
  • S (9)
  • LEFT PARENTHESIS (10)
  • A (11)
  • n (12)
  • Q (13)
  • i (14)
  • C (15)
  • M (16)
  • S (17)
  • )

At first glance, the number of visual characters in this example is consistent withindexThe results returned seem to match. This indicates that when processing this specific string,indexThe filter calculates the Chinese character part before "CMS", it may also be processed according to the number of characters, or it may just happen to coincide with the number of characters and bytes at some boundary.

However, the document explicitly states that...If there are Chinese characters in the string, each Chinese character counts as 3 positions when calculating the position.This reminds us, when performing more complex string operations, especially involvingsliceWhen precise start and end positions are required for filters, you must be aware of this internal byte calculation mechanism. For example, if you want to useslicefilters fromindexstart cutting the string from the returned position, andindexIt returns a byte position, using it directly may cause Chinese characters to be truncated, resulting in garbled or incomplete characters.

Considerations and suggestions in practical applications.

Understand in the operation of the AnQi CMS multilingual website,indexThis feature of the filter is particularly important for scenarios that require precise control of content display.

  1. Dynamic content extraction:If you need to dynamically truncate a string based on the position of a keyword, especially when the source string is a mix of Chinese and English, use it directlyindexas the result is used assliceThe parameter may result in unexpected results. In this case, it is recommended that you conduct sufficient testing or consider additional processing in the template logic to ensure the integrity of Chinese characters.
  2. Avoid directly calculating the length after truncating:When you want to go throughindexFind the substring position and then cooperate:sliceBe careful when truncating the front or back part. If the string starts with Chinese characters, andindexWhen calculated by bytes, the length of your truncation may need to be adjusted by multiplying the actual number of Chinese characters by 3.
  3. Safe handling of multilingual content:AnQi CMS supports rich multilingual promotion features. When dealing with template content in different languages, be sure to pay attention to these differences. For English, numbers, and other single-byte characters,indexThe performance usually conforms to intuition; however, extra care is needed for languages such as Chinese, Japanese, and Korean that may use multi-byte characters.
  4. Make good use of other filters:If you just want to check if a substring exists without needing its specific position,containFilter (such as{{ "欢迎使用安企CMS"|contain:"CMS" }}ReturnTrueIt would be a simpler, safer alternative, as it does not involve complex location calculations.

AnQi CMS as an efficient and customizable content management system, its multilingual support provides strong support for enterprises to expand into the international market. MasteringindexThe subtle details of the filter in processing Chinese characters can help us better understand the system mechanism, thereby developing more robust and expected multilingual templates.Before performing any complex string operations, it is always wise to conduct thorough testing and verification.


Frequently Asked Questions (FAQ)

Q1: WhyindexDoes the filter have this special behavior where a Chinese character counts as 3 positions when processing Chinese characters?

A1: The AnQi CMS is developed using Go language, the strings in Go language are stored by default in UTF-8 encoding.Under the UTF-8 encoding system, a Chinese character usually occupies 3 bytes.indexThe filter calculates the position based on the underlying bytes in some contexts, rather than the character count we see intuitively.This behavior is the manifestation of interaction between low-level encoding and high-level logic, ensuring the accuracy of data processing, but it requires our users to pay attention to the differences at the template level.

Q2: BesidesindexFilters, are there any filters related to string position or length that need to be particularly attentive when processing Chinese characters?

A2:sliceFilters are directly affectedindexThe filter results are affected, if you useindexReturn position assliceof

Related articles

How to get the first occurrence position of a keyword in a string in AnQiCMS

In content operation, we often need to fine-tune and manage the text content on the website.In order to content review, dynamic display, or SEO optimization, sometimes we need to know the position of the first occurrence of a specific keyword in a text.AnQiCMS (AnQiCMS) is an efficient content management system that provides a convenient way to help us meet this need.Understanding the need: Why do we need to find the position of keywords?Understanding the position of keywords in strings is of great practical value in the process of content publishing and maintenance

2025-11-08

What role can the `contain` filter play in the context of sensitive word filtering?

In today's era where content is king, website operators are facing the dual challenge of actively publishing high-quality content while strictly controlling content security and compliance.Especially on platforms where user-generated content is increasing, how to efficiently and accurately filter sensitive words has become a key link in maintaining a healthy website ecosystem and protecting brand reputation.AnQiCMS (AnQiCMS) fully understands this need and provides many content security management functions, among which the `contain` filter can play a surprisingly practical role in the sensitive word filtering scenario.### The Foundation of Content Security

2025-11-08

How to check if a specific element exists in an array within a template

In the daily operation and template customization of AnQi CMS, we often encounter situations where we need to dynamically adjust the display of the page according to the data content.One common and practical requirement is to determine whether a specific element exists in an array (or list) within a template.This is crucial for realizing personalized recommendations, conditional content display, and even for functions such as permission control.The Anqi CMS adopts a template engine syntax similar to Django, providing rich tags and filters to handle data.When we get some data set from the backend, such as the list of article tags

2025-11-08

How to judge whether a line of text in the AnQiCMS template contains a certain keyword?

In website operation, we often encounter such needs: to decide how to display page elements based on whether the content contains a specific word, such as highlighting some text, or only displaying a module when certain conditions are met.The AnQiCMS template system provides flexible filter functions to help us easily implement these keyword-based judgments.This article will delve into how to efficiently and accurately determine whether a line of text in the AnQiCMS template contains a certain keyword.### Use `contain`

2025-11-08

How to count the character length of article titles, descriptions, or custom fields?

When managing content on AnQi CMS, we often need to pay attention to the character length of article titles, descriptions, or custom fields.This concerns not only SEO optimization, ensuring that the title and description are in line with the practices of search engines, but also affects the reading experience of users on the search results page or within the website's internal list.How can you easily count the character length of this content in Anqi CMS templates?The Anqi CMS built-in template engine provides many practical filters (filters), among which the `length` filter is our tool for counting character length

2025-11-08

What is the difference between the `length` filter and the `length_is` filter in terms of content length judgment?

In AnQiCMS template design, we often need to judge or obtain the length of content in order to flexibly control the display of content.The `length` filter and `length_is` filter are born for this purpose.Although they are all related to 'length', in actual use, their functions and application scenarios are obviously different.Understanding these subtle differences can help us build templates more efficiently and accurately.### `length` filter: Content length 'counter' `length`

2025-11-08

How to get the number of elements in a list or key-value pair in AnQiCMS template?

AnQiCMS (AnQiCMS) makes website content display intuitive and efficient with its flexible template engine.In website operation, we often need to dynamically adjust the page layout, display different content, and even perform complex logical judgments based on the number of elements in a list or key-value pair (Map).Mastering how to obtain the number of these elements in a template is an indispensable skill for advanced customization and optimization.Luckyly, AnQiCMS template syntax provides a variety of convenient ways to help us achieve this goal.### Clever Utilization

2025-11-08

How to split a string containing multiple tags (such as “SEO, keywords, optimization”) into an array?

In the daily operation of Anqi CMS, we often encounter scenarios where we need to flexibly display some structured information in the content.For example, when setting keywords for articles or products, we may enter a string containing multiple tags in the "document keywords" field in the backend, such as Such strings are convenient to enter, but when displayed on the front end, we usually hope to display these tags separately, even to convert them into independent, clickable elements.How to convert such a comma-separated tag string

2025-11-08