How does the `wordcount` filter handle strings containing numbers and special characters when counting?

Calendar 👁️ 185

AnQiCMS provides a series of practical and powerful template filters to help us process content. Among them,wordcountA filter is commonly used in content operations, aiming to count the number of 'words' in a string. However, when our string contains numbers, special characters, or even Chinese, wordcountHow is the statistics carried out? This is the place where many users may feel confused when using it, and we will discuss its internal logic in detail next.

wordcountThe core statistical mechanism of the filter is very intuitive: it mainlyuses spaces as the only delimiter for wordsThis means that any continuous sequence of non-space characters, regardless of whether it contains numbers, letters, punctuation marks, or other special characters, will bewordcountconsidered as an independent 'word'.

Combination of numbers and lettersFor example, strings like 'AnQiCMSv3.0', even though they contain letters and numbers, are still considered a single word due to the lack of spaces between them.Similarly, combinations like “Windows11” or “iPhoneX”, without spaces, are considered as a whole.
Symbols and punctuationWhen a string contains hyphens (-), underscores (_), exclamation marks (!)When special characters such as parentheses (）、question mark （?）and so on, as long as they are not separated by spaces from letters or numbers, they are considered part of the word.For example, “Hello-World!” is counted as a single word because the hyphen and exclamation mark do not act as separators.“AnQiCMS!” is also a word. Only when these symbols are preceded and followed by spaces can they become separators between words.
Consecutive spacesIf a string contains multiple consecutive spaces (for example, "Hello     World"),wordcountIt automatically treats it as a word separator and does not increase the word count because of multiple spaces. It intelligently considers them as a logical separator.
Chinese charactersFor pure Chinese strings, as there are no spaces between Chinese characters, for example, “Anqi Content Management System”,wordcountIt will treat the whole string as a word. Only when the Chinese string explicitly contains English characters or numbers, and there is a space between these English characters or numbers, will it be separated according to the space.For example, "Anqi CMS System" will be counted as 3 words.

Let us better understand it through several specific exampleswordcountThe statistical logic:

Input string	Expected output (word count)	Explanation
`"Hello World"`	`2`	`Hello`and`World`is separated by spaces.
`"Hello-World!"`	`1`	`-`and`!`Does not constitute a space, so`Hello-World!`The whole is considered a single word.
`"AnQiCMS V3.0"`	`2`	`AnQiCMS`and`V3.0`is separated by spaces.
`"12345"`	`1`	Pure numbers, separated without spaces.
`"这是一个 AnQiCMS 教程。"`	`3`	`这是一个`,`AnQiCMS`,`教程。`Separated by spaces, punctuation attached to words.
`"GoLang v1.20"`	`2`	`GoLang`and`v1.20`is separated by spaces.
`"安企CMS功能丰富！"`	`1`	Pure Chinese with punctuation, separated without spaces, considered a single word.
`""`(empty string)	`0`	The empty string has no content.

These examples clearly demonstratewordcountHow the filter strictly adheres to the principle of counting with 'space as delimiter'.

UnderstandingwordcountThe logic of filters using space as the main separator is of great significance to us in content operation and template design. For English or other languages separated by spaces, wordcountCan provide relatively accurate word statistics. When content contains numbers, special symbols, or Chinese characters, if they are closely connected with English words without spaces, they will be regarded as a whole.This means that if your goal is to count 'entries' rather than the strict sense of 'words', this logic may meet your expectations.

However, since Chinese usually does not separate words with spaces,wordcountThe filter often treats a whole paragraph of pure Chinese text as a single word (unless there is explicit English or numbers separated by spaces), which may be different from our usual understanding of word statistics.If you need a more accurate Chinese word count, you may need to combine other methods (such as using a front-end JavaScript library for tokenization, or preprocessing at the content generation stage).

If your content contains a large number of special symbols or numbers that need to be counted separately, but they are not separated by spaces (such as产品-ID-123Please count as 3 words), you may need to apply in the applicationwordcountbefore the filter.replaceThe filter replaces special symbols with spaces (for example{{ your_string|replace:"-, "|wordcount }}), then proceedwordcountCounting.

AnQiCMS'wordcountThe filter provides a quick tool to count the number of words in a string with its concise 'space-separated' logic.Whether it is dealing with mixed text containing numbers and special symbols, or pure alphabetical sequences, as long as the core mechanism is mastered, it can be more effectively utilized to meet the needs of content display and operation.

How does the `wordcount` filter handle strings containing numbers and special characters when counting?

Related articles

How to use the `wordcount` result in AnQiCMS for conditional judgment, such as displaying different prompts based on the word count?

Can I use the `wordcount` filter to check if the user's submitted content meets the minimum word count requirement?

What is the core difference between the `wordcount` filter and the `length` filter in calculating text length?

AnQiCMS blog article detail page, how to display the total word count of the current article in real time?

In AnQiCMS template, how to ensure that the `wordcount` filter removes HTML tags before counting?

How to avoid extra whitespace characters affecting the accuracy of word count when using `wordcount`?

What are the different application scenarios between the `truncatewords` filter and the `wordcount` filter in text processing?

How to limit the maximum number of words displayed in the AnQiCMS article summary instead of the character count?