In the template development of AnQi CMS,sliceThe filter is a commonly used tool for processing strings and arrays.It can help us easily extract a part of the content, whether it is a few elements of a list or a specified segment of a long text.However, when it comes to truncating Chinese character strings, if you are not familiar with the underlying working principle, you may encounter a common and annoying problem: the truncation result may contain 'half characters' or garbled characters.
Deep understandingsliceThe challenge of filters with Chinese characters
sliceThe way filters work is similar to the slicing operation in many programming languages, which truncates strings or arrays based on the start and end indices. For example,{{ "abcdef"|slice:"1:3" }}You will getbc. The index is counted bybytes. For English characters, a character usually occupies one byte, so this method of截取 is usually without problem.
But Chinese characters are different. In UTF-8 encoding, a Chinese character usually occupiesthree or more bytes. This means that if we usesliceThe filter directly cuts Chinese strings at the byte level, which may truncate the byte sequence inside a Chinese character, resulting in scrambled display (for example,或无法识别的符号),或者出现不完整的字符。比如,{{ “Hello World”|slice:“1:4” }}`If applied directly, it would attempt to slice the string starting from the second byte to the fourth byte.This will directly cut into the middle of the first Chinese character, causing the display to be incorrect.
In order to avoid this situation, we need a kind ofcharacter levelcutoff method instead of byte level.
Skillfully using the combination filter to achieve character-level cutoff
Fortunately, AnQi CMS's powerful template engine provides a variety of filters, which we can cleverly combine to achieve precise, character-level extraction of Chinese character strings, ensuring the integrity of the result. This strategy mainly includes three steps:
Split a string into an array of characters:
make_listFiltermake_listThe filter can split a string into an array of characters (not bytes). For example,{{ "你好时间"|make_list }}You will get an array containing four string elements:["你", "好", "时", "间"]At this time, each Chinese character is considered an independent element, avoiding byte-level issues.Exact extraction of the character array:
sliceFilter (affecting the array)Once a string is converted to a character array,sliceThe filter can safely act on this array. Because at this point, each element of the array represents a complete character,sliceWill extract according to the index of the element, thus ensuring the integrity of characters. For example, if we extract["你", "好", "时", "间"]applyslice:"1:3", the result would be["好", "时"]the second and third characters accurately.Combine the sliced character array into a string:
joinFilterThe final step is to combine the sliced character array back into a readable string.joinThe filter is exactly for this purpose. It can concatenate all elements of the array with a specified separator. If we do not need a separator, we can pass an empty string. For example,{{ ["好", "时"]|join:"" }}You will get好时.
Apply integration: a complete example
By linking the three steps above, we can achieve character-level segmentation of Chinese string:
{% set original_string = "安企CMS,高效内容管理系统" %}
{% set start_index = 0 %} {# 从第一个字符开始 (索引0) #}
{% set end_index = 8 %} {# 截取到第8个字符 (不包含索引8) #}
{% set sliced_characters = original_string|make_list|slice:(start_index ~ ":" ~ end_index) %}
{% set result_string = sliced_characters|join("") %}
<p>原始字符串:{{ original_string }}</p>
<p>截取后的完整字符:{{ result_string }}</p>
In this example,start_index ~ ":" ~ end_indexIt is a way to dynamically construct slice parameters, ensuring flexibility. Whenstart_indexis 0,end_indexis 8, the output will be安企CMS,高效内容completely avoiding the problem of garbled characters or half characters.
Cautionary notes and **practice
- Performance considerationAlthough this combined extraction method can perfectly solve the problem of Chinese characters, it involves three filter operations.For very short strings, the performance impact is negligible; however, if you need to frequently process very long strings, consider whether it is possible to preprocess them at the content generation stage or evaluate the impact on front-end performance.However, in most website content display scenarios, this performance overhead is acceptable.
- the understanding of truncation lengthIn
sliceIn the filter,from:tooftoThe parameter indicates up to which indexBeforeFor example,slice:"0:5"This indicates extracting the first 5 characters from the index (i.e., index 0, 1, 2, 3, 4). This is slightly different from the concept of extracting length in some languages and should be noted when used. - Alternative solution:
truncatecharsFilterIf your requirement is simply to truncate a string and automatically add an ellipsis at the end (...ThentruncatecharsThe filter is a more concise option. It internally considers character length rather than byte length. For example:{{ "安企CMS,高效内容管理系统"|truncatechars:8 }}It will output.安企CMS,高效内容...Please choose the appropriate filter according to your actual needs.
By using this combinationmake_list/sliceandjoinThe method of filter, when you handle Chinese string truncation in AnQi CMS, you will be able to handle it with ease, ensuring the clear and accurate presentation of content.
Frequently Asked Questions (FAQ)
1. Why use directlysliceFiltering Chinese strings with the filter will cause garbled characters?Answer: In the Anqi CMS templatesliceThe filter is set to filter by defaultbytesPerforming a cut. Since a Chinese character is usually composed of multiple bytes (for example, in UTF-8 encoding it is usually 3 bytes), directly cutting by bytes may cut the byte sequence of a Chinese character, causing the browser to fail to parse it correctly and thus displaying as garbled or incomplete characters.
2. This combined extraction method (make_list|slice|join) will it affect website performance?Answer: Compared to directsliceOperation, this combination method indeed adds some processing steps (converting a string to an array, and then back to a string).For most conventional string processing lengths (tens to hundreds of characters), this additional performance overhead is usually very small, almost negligible.If your website needs to handle a large number of long string truncations and has extreme performance requirements, consider preprocessing the content when storing or generating it to reduce the burden on template rendering.
3. Besidesslicefilters, does AnQi CMS have other filters that can extract strings?Answer: Yes, AnQi CMS also providestruncatecharsandtruncatewordsfilter.
truncatechars: Truncate string by character count and add an ellipsis at the end.... It can correctly handle Chinese characters.truncatewords: Truncate string by word count and add an ellipsis at the end...This filter is more suitable for processing English content, as it uses spaces as word delimiters and is not applicable to Chinese. When your requirement is to truncate and automatically add ellipses,truncatecharsIt will be a more concise and convenient choice.