During website operation, we often need to analyze the published content in various ways, among which word count and SEO analysis are crucial.}However, rich text editors in content management systems (CMS) often add a large number of HTML tags to text, which, although they provide rich visual effects when rendered on the front-end page, often cause interference when performing word counts or when pure text is needed for SEO analysis.

AnQiCMS (AnQiCMS) is an efficient and flexible content management system that fully considers the needs of users.It comes with a powerful template engine and rich filters, allowing us to conveniently convert HTML content into plain text, thus achieving more accurate word count and more effective SEO analysis.

Why is it necessary to convert HTML content to plain text?

We publish articles and product details through the rich text editor of the Anqi CMS backend, and the content you enter will be stored as a string with HTML tags. For example, you enter “Security CMSIt is an excellent CMS system, which may be stored in the database as<b>安企CMS</b> 是一款优秀的CMS系统.

When we need to count the number of words in this article, directly calculating the string containing HTML tags will result in an inaccurate result.Similarly, if a large amount of tagged content is directly fed to certain SEO tools for keyword density analysis, HTML tags will also be included, interfering with the accuracy of the analysis results.Therefore, stripping these tags and obtaining the pure text content is the premise for carrying out these analyses.

Core Tool:`striptags`Filter

Anqi CMS provides a variety of filters to process data in templates, includingstriptagsThe filter is specifically used to convert HTML content to plain text.This filter can intelligently identify and remove all HTML, XML, and PHP tags, leaving only the purest text information.

In AnQi CMS template, you can simply apply this filter. Assuming the article content is stored inarchive.Contentvariables (usually used on article detail pages){% archiveDetail with name="Content" %}Tag to get, you need to convert it to plain text, just like thisstriptagsFilter:

{{ archive.Content|striptags }}

Through this simple operation, no matter how mucharchive.ContentIt contains<div>/<p>/<strong>/<img>HTML tags, the output will be plain text without any tags.

Extended Applications: Word Count and SEO Analysis

Once we have obtained the plain text content, we can perform more in-depth analysis and processing on this basis.

1. Word count

The word count of plain text content becomes very direct. Anqi CMS provides a very practicalwordcountA filter that can accurately count the number of words (or Chinese words) in plain text content.

tostriptagswithwordcountCombined with this, we can easily display the number of plain text characters in the article in the template:

<p>文章纯文本字数：{{ archive.Content|striptags|wordcount }} 字</p>

If you need to count characters instead of words, you can uselengthFilter:

<p>文章纯文本字符数：{{ archive.Content|striptags|length }} 个字符</p>

2. SEO Analysis Preparation

The importance of plain text content for SEO analysis is self-evident.It can provide a clean, undisturbed data source for external SEO analysis tools, helping us to evaluate keyword density, content relevance, and other indicators.

In addition, when generating a Meta Description or website summary, we often need to truncate to a fixed length of plain text. At this time,truncatechars(character truncation) andtruncatewords(Truncating by word) The filter comes into play. It can automatically add an ellipsis at the end while extracting plain text content, maintaining the integrity and aesthetics of the content:

<meta name="description" content="{{ archive.Content|striptags|truncatechars:150 }}">

This will extract the first 150 characters (excluding HTML tags) from the article content as a description.

3. Flexible tag removal:`removetags`

In order to completely remove all tags, sometimes we may want to retain some part of the HTML tags, such as, retaining bold<strong>Tags to emphasize keywords, but remove all other tags such as<script>/<img>Wait. Now,removetagsThe filter becomes more flexible.

removetagsAllow you to specify the list of HTML tags to be removed, any tags not specified in the list will be retained. For example, if you only want to remove<script>Tags and<img>tags from the content, you can use it like this:

{{ archive.Content|removetags:"script,img" }}

This is very useful for scenarios that require fine-grained control of content output while also considering some formatting and plain text analysis.

Operation steps and precautions

Determine the target content:Make it clear what HTML content variable you need to convert, which is usually the article detail page'sarchive.Contentfield, or the article list page'sDescriptionfield, or it may be the place to extractContentthe generated abstract.
Edit the template file:According to the AnQi CMS template structure agreement, find the corresponding template file. For example, the article detail page may be{模型table}/detail.html, the list page may be{模型table}/list.html.
Apply filter:Use a pipe symbol after the variable that needs to output plain text|Followed by the corresponding filter, for example|striptags/|wordcount/|truncatechars:N.
Test and verification:After modification, be sure to test the page display and check if the plain text output meets expectations.You can view the final HTML output in the page source code, or directly through the frontend to verify.

By this method, users of Anqi CMS can very conveniently extract clean plain text from complex HTML content, whether it is for internal data statistics or to provide standardized content for external SEO tools, it will become very easy.

Frequently Asked Questions (FAQ)

Q1:striptagsandremovetagsWhat are the main differences in the use scenarios of the filter?

A1: striptagsThe filter will remove all HTML, XML, and PHP tags from the string.

How to use a filter to convert HTML content to plain text for word count or SEO analysis?

Why is it necessary to convert HTML content to plain text?

Core Tool:`striptags`Filter

Extended Applications: Word Count and SEO Analysis

1. Word count

2. SEO Analysis Preparation

3. Flexible tag removal:`removetags`

Operation steps and precautions

Frequently Asked Questions (FAQ)

What error message will the `archive/list` interface return when the `moduleId` parameter is invalid?

How to use the results of `archive/list` to implement click to view article details in conjunction with `archiveDetail.md`?

Does the AnQiCMS document list interface support complex queries on the returned data's `extra` field?

How to use the `archive/list` interface to dynamically load more documents on the front end (infinite scrolling)?

What is the help of `archive/list` interface returned `canonical_url` and `fixed_link` fields to SEO optimization?

What will `data` and `total` return if no documents meeting the criteria are found in the AnQiCMS document list?

How to remove whitespace tags from HTML without affecting content display in AnQiCMS?

Can the `striptags` filter retain images or links in HTML content, while removing only the formatting tags?

How to convert Markdown to HTML in a template and then remove the tags from the generated HTML?

What Markdown syntax features does the `render` parameter of AnQiCMS support when converting Markdown to HTML?

How to prevent external JavaScript code from injecting through HTML when AnQiCMS template is rendered?

What hooks or extension points does AnQiCMS provide to allow custom processing logic for HTML content?

How to use a filter to convert HTML content to plain text for word count or SEO analysis?

Why is it necessary to convert HTML content to plain text?

Core Tool:striptagsFilter

Extended Applications: Word Count and SEO Analysis

1. Word count

2. SEO Analysis Preparation

3. Flexible tag removal:removetags

Operation steps and precautions

Frequently Asked Questions (FAQ)

What error message will the `archive/list` interface return when the `moduleId` parameter is invalid?

How to use the results of `archive/list` to implement click to view article details in conjunction with `archiveDetail.md`?

Does the AnQiCMS document list interface support complex queries on the returned data's `extra` field?

How to use the `archive/list` interface to dynamically load more documents on the front end (infinite scrolling)?

What is the help of `archive/list` interface returned `canonical_url` and `fixed_link` fields to SEO optimization?

What will `data` and `total` return if no documents meeting the criteria are found in the AnQiCMS document list?

How to remove whitespace tags from HTML without affecting content display in AnQiCMS?

Can the `striptags` filter retain images or links in HTML content, while removing only the formatting tags?

How to convert Markdown to HTML in a template and then remove the tags from the generated HTML?

What Markdown syntax features does the `render` parameter of AnQiCMS support when converting Markdown to HTML?

How to prevent external JavaScript code from injecting through HTML when AnQiCMS template is rendered?

What hooks or extension points does AnQiCMS provide to allow custom processing logic for HTML content?

Core Tool:`striptags`Filter

3. Flexible tag removal:`removetags`