When using Anqi CMS to manage website content, we often need to count the number of words in articles, whether it is for content planning, SEO optimization, or simply to meet publishing requirements,wordcountThe filter is a very practical tool. It can quickly calculate the number of words in the text, providing intuitive data support for content operation.

However, when the text we are processing contains HTML entities, such as the common 'non-breaking space', wordcountHow does the filter actually calculate? This is often a place that can raise doubts.

wordcountFilter: Review of Basic Functions

In the AnQi CMS template system,wordcountThe filter is mainly used to count the number of words in a given string.It works by recognizing spaces in text and treating sequences separated by spaces as independent 'words'. For example:

{{ "Hello World"|wordcount }} {# 结果会是 2 #}
{{ "安企CMS 是一个高效的内容管理系统"|wordcount }} {# 结果会是 7 #}

From these examples, we can see that,wordcountIn recognizing Chinese text, it also uses spaces as delimiters to determine the number of words.If a Chinese text appears continuously without spaces, it will be counted as a complete 'word'.

When HTML entities appear in the text

Now, let's return to the core issue: when HTML entities are included in the text,wordcountHow should we count?

HTML entities, such as (non-breaking space),<(less than sign),&The ampersand and others will be parsed into the corresponding characters when rendered in the browser.But for internal program processing, especially when performing word counting on plain text, whether they are treated as standard spaces or decoded before counting depends on the implementation logic of the filter.

In AnQi CMS'swordcountin the filter, itwill not automatically parse or decode HTML entities before countingThis means that it will treat the HTML entity itself, for example, as a complete character sequence.

Let's look at some examples to understand this behavior:

  1. Text contains HTML entities directly, but without actual spaces:

    {{ "Hello World"|wordcount }} {# 结果可能是 1 #}
    

    In this case,wordcountwill takeHello WorldConsidered a continuous string without standard spacing, it may only be counted as 1 word.It will not treat it as a space to separate "Hello" and "World".

  2. Text contains both actual spaces and HTML entities:

    {{ "Hello   World"|wordcount }} {# 结果可能是 3 #}
    

    here,wordcountCan identify that "Hello" and "World" are separated by actual spaces.At the same time, this character sequence itself, due to spaces at both ends, is also considered to be an independent 'word'.So, it will calculate the words 'Hello', ' ', and 'World'.

  3. The text contains other HTML entities or tags:

    {{ "安企CMS<strong>非常强大</strong>"|wordcount }} {# 结果可能是 3 #}
    

    In this example,&lt;strong&gt;and&lt;/strong&gt;It will be considered as an independent character sequence, and if they are separated by spaces, they will also be counted as words.For example, 'AnQi CMS', '<strong>'Extremely strong and may be considered two words.If there is no space in the middle, it may be a big word.

Practical suggestions: How to get an 'accurate' word count

This 'literals' treatment of HTML entities and tags may not be what we expect in some cases.We usually hope that the word count is based on the content actually read by the user, rather than the original string containing HTML tags.

In order to obtain a word count that is more in line with human reading habits, that is, excluding pure text word count after HTML tags and entities, we can usewordcountBefore the filter, use other filters to 'clean' the content.

AnQi CMS providesstriptagsA filter that can remove all HTML tags from text. If the content also contains such entities, we usually also want them to be considered as spaces rather than words.

This is a more practical word count method:

{# 假设content变量包含 HTML 文本和实体 #}
{% set cleanContent = content|striptags %} {# 移除所有HTML标签 #}
{% set finalWordCount = cleanContent|wordcount %} {# 对纯文本进行字数统计 #}

<p>这篇文章的实际字数(不含HTML标记和实体)是:{{ finalWordCount }}个词。</p>

By using such a combination,striptagsWill first<p>/<strong>Remove HTML tags, and it also tends to replace such entities with actual spaces (the specific behavior may vary depending on the content and context, but it usually achieves the expected result). After that,wordcountYou can count words on a relatively 'clean' plain text to produce results that are more in line with our intuitive understanding.

Summary

Of Security CMSwordcountThe filter treats text containing HTML entities as plain character sequences and does not automatically decode or handle them in a special way.This means that such entities, if separated by spaces, may also be counted as a word.

To obtain a more accurate and human-readable word count, it is recommended to apply in thewordcountbefore the filter.striptagsFilters preprocess the content, remove HTML tags and entities, and then count the pure text.This way, you can better grasp the true volume of the content, providing more accurate data support for website operations and SEO strategies.

Frequently Asked Questions (FAQ)

1.wordcountDoes the filter count Chinese characters?

Yes,wordcountThe filter calculates Chinese characters. However, it does not count by character number, but follows the principle of 'distinguishing words by spaces'.If multiple Chinese characters appear consecutively without spaces in between, they are considered as one "word".For example, 'Anqi Content Management System' is counted as 1 word because there are no spaces between them.

2. How to makewordcountDoes the filter ignore punctuation in the text?

wordcountThe filter itself does not have the function to ignore punctuation. It treats punctuation as part of a word unless the punctuation itself is separated by spaces.If you need to count while ignoring punctuation, you may need towordcountPreprocessing more complex text, such as usingreplaceA filter combined with regular expressions can be used to remove or replace punctuation, but this usually requires some custom template functions or more advanced programming skills.For general needs, it is recommended to accept the default behavior.

3. If my content contains images,wordcountWill the image be counted?

wordcountThe filter only counts words in the text. If the image is through<img>The tag inserted, then it will calculate<img>Any readable text inside the tag (such asaltThe value of the attribute, if it is extracted as text and passed to the filter), but the image itself will not be counted. If you want to completely ignore the HTML tags and their attribute content, make sure thatwordcount.striptagsProcess with the filter.