How to ensure that the `wordcount` filter removes HTML tags before counting in the AnQiCMS template?

In AnQiCMS template processing, we often need to perform various operations, such as counting the number of words in an article.Word count is a very basic but practical feature for content management, which helps us assess the richness of the content and is also an important reference indicator for SEO optimization.wordcountThe filter provides great convenience for this, allowing us to easily obtain the word count of the text.

However, when usingwordcountFiltering, there is a detail we may need to pay attention to. If our article content is edited through a rich text editor, it will often contain various HTML tags, such as<b>（Bold）、<a>(Link),<img>（Image）evenly<p>（Paragraph）etc. When we directly use these tags in the contentwordcountThe filter may find that the statistical results are more than the actual number of words we see with our eyes. This is becausewordcountThe filter defaults to treating these HTML tags as part of the text for counting, which can result in a word count that deviates from the pure text content we actually want to count.

To ensure the accuracy of word count, we need to obtain the pure text word count, which requires us to inwordcountBefore the filter takes effect, remove HTML tags from the content. AnQiCMS provides very practical tools, mainlystriptagsFilter, it can effectively help us complete this task.

striptagsThe filter, as the name implies, is mainly used for "tag stripping". No matter how many types and layers of HTML tags are included in your content,striptagsAll of them can be completely removed, leaving only pure text information. In this way, when we pass the processed plain text tostriptagswordcountWhen filtering, you can get an accurate word count.

Of course, AnQiCMS also providesremovetagsA filter that allows us to control more finely, removing only the specified HTML tags instead of all. But in our pursuit of pure text word count scenarios,striptagsIt is often a more direct and convenient choice.

Let's see how to implement it with a simple code example. Suppose we have a variable for article contentarchive.ContentIt may contain rich HTML formatting.

If you directly count the words in the content, you may get inaccurate results:

<!-- 这样统计可能会包含HTML标签的字符，导致结果不准确 -->
文章总字数：{{ archive.Content|wordcount }} 个字

To obtain the accurate pure text word count, we should firststriptagsremove HTML tags, and thenwordcountperform the count:

<!-- 先移除HTML标签，再统计纯文本字数 -->
文章纯文本字数：{{ archive.Content|safe|striptags|wordcount }} 个字

Here we see,archive.ContentFirstly,|safeFilter processing, ensures that its content is recognized by the template engine as safe HTML (to prevent interference from already escaped HTML entities)striptagswork) thenstriptagsThe filter removes all HTML tags, finallywordcountFilter performs accurate word count on the cleaned plain text. With this combination, we can easily implement precise plain text word count in AnQiCMS templates.

This technique is very useful in many practical situations.For example, on the article list page, you may want to display the pure text word count below each article summary to give visitors a clear understanding of the article length; or at the bottom of the article detail page, provide a hint such as 'This article contains XXX words' to enhance user experience or meet SEO requirements.striptagsandwordcountFilter, it allows your AnQiCMS website content data to be more real and transparent, thus better serving your operational strategy.

Frequently Asked Questions (FAQ)

wordcountHow does the filter define 'word'?AnQiCMSwordcountThe filter is mainly used to distinguish words by spaces.Any continuous sequence of characters separated by spaces is considered a 'word'.For example, if your content is 'AnQiCMS is a content management system', it is usually counted as 6 words.
striptagsandremovetagsWhat are the differences between filters, and how should I choose? striptagsThe filter will remove all HTML tags from the string, leaving only plain text content. It handles it more thoroughly.removetagsThe filter allows you to specify one or more HTML tags to remove (for example|removetags:"p,a"will be removed<p>and<a>tags). It is usually used when counting the number of words in plain text.striptagsMost convenient and thorough. If you have special requirements, such as only wanting to remove specific style tags without affecting other content structure, consider usingremovetags.
**Beforestriptagsusing `

How to ensure that the `wordcount` filter removes HTML tags before counting in AnQiCMS templates?

AnQi CMS Website Case

AnQi CMS Usage Help

AnQi CMS Template Tag Manual

Security BLOG

Design Market

Anqi CMS API Help

Anqi CMS Update Log

Question Exchange

Feature Introduction

Video Tutorial

What error message does the `archive/list` interface return when the `moduleId` parameter is invalid?

How to use the result of `archive/list` to achieve clicking to view the article details with `archiveDetail.md`?

Does the AnQiCMS document list interface support more complex queries on the `extra` field of the returned data?

How to use the `archive/list` interface to dynamically load more documents on the front end (infinite scrolling)?

What help does the `canonical_url` and `fixed_link` fields returned by the `archive/list` interface provide for SEO optimization?

What will `data` and `total` return if no document meeting the conditions is found in the AnQiCMS document list?

What is the statistical logic of the `wordcount` filter when processing strings containing numbers and special symbols?

How to use the `wordcount` result for conditional judgment in AnQiCMS, for example, displaying different prompts based on the word count?

Can I use the `wordcount` filter to check if the user's submitted content meets the minimum word count requirement?

How to avoid extra whitespace characters affecting the accuracy of word count when using `wordcount`?

What are the different application scenarios between the `truncatewords` filter and the `wordcount` filter in text processing?

How to limit the maximum number of words displayed in the article summary of AnQiCMS instead of the number of characters?