In the template development of Anqi CMS,sliceThe filter is a commonly used tool for processing strings and arrays in English.It can help us easily extract a part of the content, whether it is a few elements of a list or a specified segment of a long text.However, when it comes to truncating Chinese character strings, if we are not familiar with the underlying working principle, we may encounter a common and annoying problem: the truncation result may appear as 'half a character' or garbled text.
Deep understandingsliceFilter challenge with Chinese characters
sliceThe operation of the filter is similar to the slicing operation in many programming languages, which truncates strings or arrays based on the start and end indices. For example,{{ "abcdef"|slice:"1:3" }}You will getbc。Here, the index is calculated by unit.bytesFor English characters, one character usually occupies one byte, so this method of truncation usually has no problem.
But Chinese characters are different. In UTF-8 encoding, a Chinese character usually occupiesthree or more bytes. This means that if we usesliceFilter directly cuts the Chinese string at the byte level, which is likely to truncate the byte sequence within a Chinese character, resulting in garbled text (for example, `或无法识别的符号),或者出现不完整的字符。比如,{{ “你好世界”|slice:“1:4” }}`If applied directly, it will attempt to slice from the second byte to the fourth byte of the string.This will directly cut into the middle of the first Chinese character, causing the display to be incorrect.
To avoid this situation, we need acharacter levelway to cut off, rather than at the byte level.
Use combination filters cleverly to achieve character-level cutting
幸运的是,安企CMS强大的模板引擎提供了多种过滤器,我们可以通过巧妙地组合它们,来实现对中文字符串的精确、字符级截取,确保结果的完整性。这个策略主要包含三个步骤:
Split a string into a character array:
make_listFiltermake_listThe filter can split a string into an array by each character (not byte). For example,{{ "你好时间"|make_list }}You will get an array containing four string elements:["你", "好", "时", "间"]At this point, each Chinese character is treated as a separate element, avoiding issues at the byte level.English translation: Extract characters from a character array accurately:
sliceFilter (acts on an array)Once a string is converted to a character array,sliceThe filter can be safely applied to this array. Because at this point, each element of the array represents a complete character,sliceIt will cut according to the index of the elements, thus ensuring the integrity of the characters. For example, if we cut["你", "好", "时", "间"]Applyslice:"1:3", the result will be["好", "时"], it accurately cuts the second and third characters.Reassemble the sliced character array into a string:
joinFilterThe final step is to reassemble the sliced character array back into a readable string.joinThe filter is just for this. It can concatenate all elements of the array with a specified separator. If we don't need a separator, we can pass an empty string. For example,{{ ["好", "时"]|join:"" }}You will get好时.
Integrating and Using: A Complete Example
By linking the above three steps, we can achieve the character-level extraction of Chinese character strings:
{% set original_string = "安企CMS,高效内容管理系统" %}
{% set start_index = 0 %} {# 从第一个字符开始 (索引0) #}
{% set end_index = 8 %} {# 截取到第8个字符 (不包含索引8) #}
{% set sliced_characters = original_string|make_list|slice:(start_index ~ ":" ~ end_index) %}
{% set result_string = sliced_characters|join("") %}
<p>原始字符串:{{ original_string }}</p>
<p>截取后的完整字符:{{ result_string }}</p>
In this example,start_index ~ ":" ~ end_indexIt is a way to dynamically construct slice parameters, ensuring the flexibility of parameters. Whenstart_indexis 0,end_indexis 8, the output will be安企CMS,高效内容, completely avoiding the problem of garbled characters or half characters.
Attention Points and **Practice
- Performance considerationsAlthough this combination-based method can perfectly solve Chinese character problems, it involves three filter operations.For extremely short strings, the performance impact is negligible; however, if you need to frequently process very long strings, consider whether it is possible to preprocess them at the content generation stage, or evaluate the impact on front-end performance.However, in the vast majority of website content display scenarios, this performance overhead is acceptable.
- Understanding of the length of the cut: In
sliceFiltering in,from:tooftoParameter indicates up to which indexpreviouslyFor example,slice:"0:5"Represents extracting 5 characters starting from index 0 (i.e., indexes 0, 1, 2, 3, 4). This is slightly different from the concept of truncation length in some languages, and attention should be paid when using it. - Alternative solution:
truncatecharsFilter:If your requirement is just to truncate a string and automatically append an ellipsis (...), thentruncatecharsThe filter is a more concise option. It internally considers character length rather than byte length. For example:{{ "安企CMS,高效内容管理系统"|truncatechars:8 }}It will output安企CMS,高效内容...Please choose the appropriate filter according to your actual needs.
By using this combination ofmake_list/sliceandjoinFilter method, you will be able to handle Chinese string truncation in Safe CMS with ease, ensuring clear and accurate content presentation.
Common Questions (FAQ)
1. Why use directlysliceFilter extracts Chinese characters and produces garbled characters?Answer: In the Anqi CMS templatesliceThe filter is set to extract by defaultbytesPerforming the cut.Since a Chinese character is usually composed of multiple bytes (for example, it is usually 3 bytes in UTF-8 encoding), directly cutting by bytes may cut off the byte sequence of a Chinese character, causing the browser to be unable to parse it correctly and thus displaying garbled text or incomplete characters.
2. This combined extraction method(make_list|slice|join) will affect website performance?答:Compared to directsliceOperation, this combination method indeed adds some processing steps (converting a string to an array, and then back to a string).For most conventional string lengths (tens to hundreds of characters), this additional performance overhead is usually very small and can be ignored almost.If your website needs to handle the truncation of a large number of extremely long strings and has extreme performance requirements, consider preprocessing the content when storing or generating it to lighten the burden of template rendering.
3. BesidessliceFilters, does the Secure CMS also have other filters for extracting strings?答:Yes, AnQi CMS also providestruncatecharsandtruncatewordsFilter.
truncatechars: Truncate strings by character count and add an ellipsis at the end...It can already handle Chinese characters correctly.truncatewords: Cut the string by word count and add an ellipsis at the end...This filter is more suitable for processing English content because it uses spaces as word separators and is not applicable to Chinese. When your need is to truncate and automatically add ellipses,truncatecharsIt will be a more concise and convenient choice.