`wordcount` filter can identify and count consecutive non-space characters (such as URLs) as a single "word"?

Calendar 👁️ 79

During the content management process of AnQiCMS, we often need to count the number of characters in articles in order to better plan content, estimate reading time, or optimize SEO. At this point,wordcountThe filter has become a very practical tool. However, many users may be curious when the text contains some continuous non-space character sequences, such as a complete URL link, an email address, or a hyphenated word group,wordcountHow will the filter recognize and count them? Will it treat them as a single 'word' like it does with ordinary words?

After analyzing AnQiCMSwordcountA deep understanding of the filter, we can say explicitly, its working principle mainly involves identifying the text inspacesTo distinguish and count "words". This means that any continuous sequence of non-space characters, regardless of whether they contain letters, numbers, or special symbols (such as slashes, dots in URLs, or @ symbols in email addresses), will be counted as a single "word".

For example, likehttps://en.anqicms.comSuch a complete URL, although it contains multiple letters, slashes, and dots, but because there are no spaces between them,wordcountthe filter will treat it asOnea word. Similarly,[email protected]Such email addresses, as well as likeGo-LangSuch hyphenated terms, inwordcountAlso in the eyes of the filter, areOneWords.

This counting method brings many conveniences in practical applications.Firstly, it makes the estimation of content length more intuitive and in line with expectations.For example, when writing an article that contains a large number of external references or technical terms, we do not have to worry about URLs or complex names being incorrectly split into multiple words, which could lead to distorted statistical results.Secondly, for scenarios where strict control of character count is required (such as social media summaries, advertising copy, etc.),wordcountThe filter can provide a relatively unified and reliable measurement standard.It avoids complex semantic analysis of words, focusing instead on their textual form, ensuring concise and efficient statistical results.

Using the templatewordcountThe filter is also very simple, and it has two common uses:

Used as an inline filter:When you need to quickly count the number of words in a variable, you can simply add a pipe symbol after the variable|andwordcount.
```
{{ my_content | wordcount }}
```

Used as a block-level filter:When you need to count the number of text words in a template code block, you can usefilter.

{% filter wordcount %}
    这里是需要统计单词数量的一段文本，其中包含一个URL：https://anqicms.com/docs 和一个邮箱地址：[email protected]。
{% endfilter %}

Let's demonstrate this counting mechanism with several specific examples:

Assuming we have the following text segments:

"安企CMS 是一个强大的内容管理系统。"
"这是一个包含URL的句子：https://en.anqicms.com 和一个电子邮件地址：[email protected]。"
"AnQiCMS 是一个基于 Go-Lang 开发的系统。"
""(empty string)

UsewordcountThe filter statistics will be:

For text 1:{{ "安企CMS 是一个强大的内容管理系统。" | wordcount }}The result will be:7(AnQi CMS, is, a, powerful, content, management, system, .).
For text 2:{{ "这是一个包含URL的句子：https://en.anqicms.com 和一个电子邮件地址：[email protected]。" | wordcount }}The result will be:10(This, is, a, sentence, containing, a, URL:,)https://en.anqicms.comAnd, an email address: [email protected].
For text 3:{{ "AnQiCMS 是一个基于 Go-Lang 开发的系统。" | wordcount }}The result will be:8(AnQiCMS is a system developed based on Go-Lang.)
For text 4:{{ "" | wordcount }}The result will be:0.

From these examples, it can be seen thatwordcountThe filter, with its concise logic, provides AnQiCMS users with an efficient and easy-to-understand word counting method, especially suitable for handling modern web content containing URLs, email addresses, and other continuous characters.

Frequently Asked Questions (FAQ)

Q1: If my text contains words with many hyphens (such as "all-in-one"), are they considered as a single word?A1: Yes, if there is no space between the characters on both sides of the hyphen,wordcountThe filter will treat it as a continuous character sequence and count it asOneWord. For example, 'all-in-one' is counted as a single word.

Q2:wordcountDoes the filter distinguish between uppercase and lowercase?A2:wordcountThe filter does not distinguish between uppercase and lowercase when counting word numbers. It only cares whether the character sequence is separated by spaces, not analyzing the actual content or case of the words.

Q3:wordcountCan the filter count Chinese characters?A3: Yes,wordcountThe filter can correctly count Chinese characters. Each Chinese word group separated by spaces or a single character (if not separated by spaces, then a continuous sequence of characters) is regarded as a separate 'word' for counting.For example, "Hello World" is counted as two words.

`wordcount` filter can identify and count consecutive non-space characters (such as URLs) as a single "word"?

Related articles

How to limit the maximum number of words displayed in the AnQiCMS article summary instead of the character count?

What are the different application scenarios between the `truncatewords` filter and the `wordcount` filter in text processing?

How to avoid extra whitespace characters affecting the accuracy of word count when using `wordcount`?

In AnQiCMS template, how to ensure that the `wordcount` filter removes HTML tags before counting?

Does AnQiCMS's backend content editing feature provide real-time word count similar to `wordcount`?

How to combine the `split` filter to split a string into an array of words for counting or traversal?

What is the difference between the `count` filter and the `wordcount` filter in counting specific elements in a string?

How to count the occurrences of a specific word in a paragraph in AnQiCMS template?