During the operation of the website, we often need to analyze the content we publish, among which word count and SEO analysis are crucial. However, rich text editors in content management systems (CMS) often add a large number of HTML tags to text. Although these tags provide rich visual effects when rendered on the front-end page, they often cause interference when performing word counts or when pure text is needed for SEO analysis.

AnQiCMS (AnQiCMS) as an efficient and flexible content management system, fully considers these needs of users.It includes a powerful template engine and a rich set of filters, allowing us to conveniently convert HTML content to plain text, thereby achieving more accurate word count and more effective SEO analysis.

Why do you need to convert HTML content to plain text?

We publish articles and product details through the rich text editor on the Anqi CMS backend, and the content entered will be stored as a string with HTML tags. For example, if you enter “Secure CMSIt is an excellent CMS system<b>安企CMS</b> 是一款优秀的CMS系统.

When we need to count the number of words in this article, directly calculating the string that includes HTML tags will result in an inaccurate result.Similarly, if a large amount of labeled content is directly fed to some SEO tools for keyword density analysis, HTML tags will also be included in the calculation, which may interfere with the accuracy of the analysis results.Therefore, stripping these tags and obtaining pure text content is the prerequisite for carrying out these analyses.

Core Tools:striptagsFilter

Anqi CMS provides various filters to process data in templates, wherestriptagsThe filter is a tool specifically used to convert HTML content to plain text.This filter can intelligently identify and remove all HTML, XML, and PHP tags from the content, leaving only the purest text information.

In the template of Anqi CMS, you can apply this filter very simply. Suppose your article content is stored inarchive.Contentvariables (usually used on the article detail page){% archiveDetail with name="Content" %}标签来获取),你需要将其转换为纯文本,只需像这样使用striptagsFilter:

{{ archive.Content|striptags }}

通过这一简单的操作,无论你的archive.Content包含多少<div>/<p>/<strong>/<img>HTML tags, the output will always be plain text without any tags.

Extended Application: Word Count and SEO Analysis

Once we have obtained the plain text content, we can conduct more in-depth analysis and processing on this basis.

1. Word count

Text content word count becomes very direct. Anqi CMS provides a very practicalwordcountFilter, it can accurately count the number of words (or Chinese words) in plain text content.

tostriptagsWithwordcountCombined use, we can easily display the number of plain text characters in the template:

<p>文章纯文本字数:{{ archive.Content|striptags|wordcount }} 字</p>

If you need to count the number of characters instead of words, you can uselengthFilter:

<p>文章纯文本字符数:{{ archive.Content|striptags|length }} 个字符</p>

2. SEO analysis preparation

Additionally, when generating Meta Description or website summary, we often need to truncate to a fixed-length plain text. At this time,truncatechars(character truncation) andtruncatewordsThe 'auto' word segmentation filter comes into play. It can automatically add ellipses at the end while extracting pure text content, maintaining the integrity and aesthetics of the content:

<meta name="description" content="{{ archive.Content|striptags|truncatechars:150 }}">

This will extract the first 150 characters (excluding HTML tags) from the article content as a description.

3. Flexible tag removal:removetags

Tags to emphasize keywords, but remove all other tags such as, for example, removing all tags<strong>Tags to emphasize keywords, but remove all other tags such as,<script>/<img>autoremovetagsauto

removetagsAllows you to specify a list of HTML tags to remove, and tags not specified in the list will be retained. For example, if you only want to remove the tags from the content,<script>Tags and<img>you can use it like this:

{{ archive.Content|removetags:"script,img" }}

This is very useful for scenarios that require fine-grained control over content output while also considering some formatting and plain text analysis.

Operation Steps and Precautions

  1. Determine the target content:Make sure you know the HTML content variables you need to convert, which is usually the detail page of an article.archive.Contentfield, or the field on the article list page.Descriptionfield, or the place where you need to extract.Contentto generate a summary.
  2. Edit template file:Find the corresponding template file according to the template structure agreement of AnQi CMS. For example, the article detail page may be in{模型table}/detail.html, the list page may be in{模型table}/list.html.
  3. Apply filter:On the variable that needs to output plain text, use the pipe character|Followed by the corresponding filter, for example|striptags/|wordcount/|truncatechars:N.
  4. Testing and verification:Modify it and make sure the page display is normal after modification, and check if the plain text output meets the expectations.You can view the final HTML output in the page source code, or directly through the frontend display to verify.

Through the above methods, users of Anqi CMS can easily extract clean plain text from complex HTML content, whether it is for internal data statistics or to provide standardized content for external SEO tools, it will be effortless.


Common Questions (FAQ)

Q1:striptagsandremovetagsWhat are the main differences in the application scenarios of the filter?

A1: striptagsThe filter will remove all HTML, XML, and PHP tags from the string