In the template design of AnQi CMS,wordcountThe filter is a utility used to count the number of words in a text.For operation personnel and content creators, understanding the working principle, especially the statistical logic when dealing with mixed Chinese-English text, can help us more accurately assess content length, optimize article structure, and better meet the needs of search engine optimization (SEO) and user reading experience.
wordcountBasic usage of the filter.
wordcountThe filter is very straightforward to use. It applies to a string variable and returns the number of "words" in the string. The basic syntax in AnQiCMS templates is:
{{ 你的字符串变量 | wordcount }}
Or, if you want to count words in a template content, you can also use the filter tag form:
{% filter wordcount %}
这里是需要统计单词的文本内容。
{% endfilter %}
For example, if you have a string{{ "Hello AnQiCMS World" | wordcount }}it will return3This is consistent with the word count we usually understand when processing pure English text.
Word count logic for mixed Chinese-English text.
wordcountThe core statistical logic of the filter lies in how it 'recognizes' a word: it mainlydistinction between words is made through spacesThis means that any continuous sequence of characters separated by spaces, whether in English or Chinese, will be counted as a single word.
In particular, when dealing with mixed Chinese-English text:
- English words:Generally, there is a space between English words as separators. Therefore,
wordcountAccurately counts each English word separated by a space as an independent word. - Chinese text:In Chinese writing habits, words are not separated by spaces. In this case,
wordcountThe filter will process a segmentContinuous Chinese text without spacesCounted as a single "word". For example, the whole Chinese paragraph, "Anqi CMS is a content management system", if taken as a whole without being separated by spaces, it will bewordcountFiltered statistics are1. - Mixed Chinese-English text:When the text contains both English words and Chinese text at the same time, the counting method is a combination of both.English words are counted independently based on spaces, while Chinese parts are counted as continuous text blocks without spaces.
Let's better understand this logic through several examples:
Pure English example:
{{ "AnQiCMS is a powerful CMS." | wordcount }}- Result:
5(AnQiCMS, is, a, powerful, CMS.)
- Result:
Pure Chinese example:
{{ "安企CMS是一个内容管理系统。" | wordcount }}- Result:
1(Because the entire string does not contain spaces, it is considered a continuous "block" of words)
- Result:
Mixed Chinese-English example:
{{ "Hello AnQiCMS 用户,这是一个测试文章。" | wordcount }}- Result:
5(Hello, AnQiCMS, User, This is a test article.) - The logic here is: 'Hello' (1 word) + 'AnQiCMS' (1 word) + 'user' (1 word) + 'This is a test article' (1 word) = 4.If 'user' and 'this is a test article' are not separated by spaces, they will be treated as a whole.In fact, in Chinese text, 'user' and 'this is a test article' are each considered a 'word segment' because there is a comma (not a space) between them, and the comma is also considered one of the delimiters (or more accurately, the character sequence before and after the comma is considered a word).A more rigorous example:
{{ "Hello AnQiCMS user, this is a test article." | wordcount }}-> 9. - Reread the document description "Words are separated by spaces."}If it does not contain spaces, it is considered a word. This means that non-space characters are accumulated.For the part "User, this is a test article," the Chinese comma
,It will not be a separator unless the implementation of AnQiCMS internally has made special treatment of punctuation symbols.According to the literal understanding, the entire "user, this is a test article."A period may be counted as 1 word. But in actual testing, commas may also be considered as separators, and a more precise definition is needed.However, the core is the 'non-space contiguous character block'.
A more precise mixed text example:
{{ "AnQiCMS 提供了丰富的功能。" | wordcount }}- “AnQiCMS” (1) + “Provided rich features.” (1) =
2
- “AnQiCMS” (1) + “Provided rich features.” (1) =
{{ "GoLang 开发的 AnQiCMS,部署简单。" | wordcount }}- “GoLang” (1) + “Development” (1) + “AnQiCMS” (1) + “Deployment is simple.” (1) =
4(Note the Chinese comma here,It was also used as a separator)
- “GoLang” (1) + “Development” (1) + “AnQiCMS” (1) + “Deployment is simple.” (1) =
According to the actual usage scenario and the definition of "word", AnQiCMS'
wordcountThe filter is more focused on counting continuous non-space character blocks when facing Chinese, rather than Chinese linguistic words in terms of meaning.- Result:
Practical application and precautions
Understandwordcountworking principle, which can help us better utilize it:
- English content assessment:For pages mainly containing English (such as English websites, English versions of multilingual websites),
wordcountThe filter can provide relatively accurate word statistics, which is helpful for content length planning and SEO keyword density control. - Chinese content evaluation: Please note when evaluating content that is purely Chinese or mainly Chinese.
wordcountThe result is not the exact count of Chinese words. It is more like counting the number of 'text blocks'.If you need an accurate Chinese word count, you may need to combine other tools or front-end JavaScript to implement tokenization processing. - Mixed content strategy:In content that is mixed Chinese and English,
wordcountProvide a comprehensive 'chunk' quantity, which is still valuable for roughly estimating the content volume. - Combination
lengthFilter:To more comprehensively measure the length of the content, consider using simultaneously.lengthA filter to count the total number of characters (including Chinese, English, punctuation and spaces). This provides a more direct concept of 'word count'.
Understanding these details can help us avoid misunderstandings.wordcountThe statistical results are applied to the content management and operation strategy of AnQiCMS, thus producing higher quality content that meets the expected standards.
Frequently Asked Questions (FAQ)
1. Why does my Chinese article usewordcountWhen the filter is counted, the result is always 1 or very few?This is becausewordcountThe filter mainly identifies words through spaces. Chinese text usually does not use spaces to separate words, so a continuous Chinese text, even if it contains many words, will bewordcountCounted as a "chunk".
2. Does AnQiCMS provide a more accurate Chinese word count function?According to the existing documents,wordcountThe filter is mainly separated by spaces and some punctuation marks.If you need to count the number of words in Chinese linguistics (for example, recognizing "Content Management System" as four words), AnQiCMS's template filter does not currently provide such advanced segmentation features.This usually requires the use of an external Chinese word segmentation library or processing through front-end JavaScript.
3. BesideswordcountWhat are some filters that can be used to measure the length of content?You can uselengthFilter to count the total number of characters in the text. For example,{{ 你的字符串变量 | length }}Returns the total number of all characters (including Chinese, English, numbers, punctuation marks, and spaces) in a string, which is a more direct indicator for measuring the 'word count' of an article.