During website operation, we often need to analyze the published content in various ways, among which word count and SEO analysis are crucial.}However, rich text editors in content management systems (CMS) often add a large number of HTML tags to text, which, although they provide rich visual effects when rendered on the front-end page, often cause interference when performing word counts or when pure text is needed for SEO analysis.
AnQiCMS (AnQiCMS) is an efficient and flexible content management system that fully considers the needs of users.It comes with a powerful template engine and rich filters, allowing us to conveniently convert HTML content into plain text, thus achieving more accurate word count and more effective SEO analysis.
Why is it necessary to convert HTML content to plain text?
We publish articles and product details through the rich text editor of the Anqi CMS backend, and the content you enter will be stored as a string with HTML tags. For example, you enter “Security CMSIt is an excellent CMS system, which may be stored in the database as<b>安企CMS</b> 是一款优秀的CMS系统.
When we need to count the number of words in this article, directly calculating the string containing HTML tags will result in an inaccurate result.Similarly, if a large amount of tagged content is directly fed to certain SEO tools for keyword density analysis, HTML tags will also be included, interfering with the accuracy of the analysis results.Therefore, stripping these tags and obtaining the pure text content is the premise for carrying out these analyses.
Core Tool:striptagsFilter
Anqi CMS provides a variety of filters to process data in templates, includingstriptagsThe filter is specifically used to convert HTML content to plain text.This filter can intelligently identify and remove all HTML, XML, and PHP tags, leaving only the purest text information.
In AnQi CMS template, you can simply apply this filter. Assuming the article content is stored inarchive.Contentvariables (usually used on article detail pages){% archiveDetail with name="Content" %}Tag to get, you need to convert it to plain text, just like thisstriptagsFilter:
{{ archive.Content|striptags }}
Through this simple operation, no matter how mucharchive.ContentIt contains<div>/<p>/<strong>/<img>HTML tags, the output will be plain text without any tags.
Extended Applications: Word Count and SEO Analysis
Once we have obtained the plain text content, we can perform more in-depth analysis and processing on this basis.
1. Word count
The word count of plain text content becomes very direct. Anqi CMS provides a very practicalwordcountA filter that can accurately count the number of words (or Chinese words) in plain text content.
tostriptagswithwordcountCombined with this, we can easily display the number of plain text characters in the article in the template:
<p>文章纯文本字数:{{ archive.Content|striptags|wordcount }} 字</p>
If you need to count characters instead of words, you can uselengthFilter:
<p>文章纯文本字符数:{{ archive.Content|striptags|length }} 个字符</p>
2. SEO Analysis Preparation
The importance of plain text content for SEO analysis is self-evident.It can provide a clean, undisturbed data source for external SEO analysis tools, helping us to evaluate keyword density, content relevance, and other indicators.
In addition, when generating a Meta Description or website summary, we often need to truncate to a fixed length of plain text. At this time,truncatechars(character truncation) andtruncatewords(Truncating by word) The filter comes into play. It can automatically add an ellipsis at the end while extracting plain text content, maintaining the integrity and aesthetics of the content:
<meta name="description" content="{{ archive.Content|striptags|truncatechars:150 }}">
This will extract the first 150 characters (excluding HTML tags) from the article content as a description.
3. Flexible tag removal:removetags
In order to completely remove all tags, sometimes we may want to retain some part of the HTML tags, such as, retaining bold<strong>Tags to emphasize keywords, but remove all other tags such as<script>/<img>Wait. Now,removetagsThe filter becomes more flexible.
removetagsAllow you to specify the list of HTML tags to be removed, any tags not specified in the list will be retained. For example, if you only want to remove<script>Tags and<img>tags from the content, you can use it like this:
{{ archive.Content|removetags:"script,img" }}
This is very useful for scenarios that require fine-grained control of content output while also considering some formatting and plain text analysis.
Operation steps and precautions
- Determine the target content:Make it clear what HTML content variable you need to convert, which is usually the article detail page's
archive.Contentfield, or the article list page'sDescriptionfield, or it may be the place to extractContentthe generated abstract. - Edit the template file:According to the AnQi CMS template structure agreement, find the corresponding template file. For example, the article detail page may be
{模型table}/detail.html, the list page may be{模型table}/list.html. - Apply filter:Use a pipe symbol after the variable that needs to output plain text
|Followed by the corresponding filter, for example|striptags/|wordcount/|truncatechars:N. - Test and verification:After modification, be sure to test the page display and check if the plain text output meets expectations.You can view the final HTML output in the page source code, or directly through the frontend to verify.
By this method, users of Anqi CMS can very conveniently extract clean plain text from complex HTML content, whether it is for internal data statistics or to provide standardized content for external SEO tools, it will become very easy.
Frequently Asked Questions (FAQ)
Q1:striptagsandremovetagsWhat are the main differences in the use scenarios of the filter?
A1: striptagsThe filter will remove all HTML, XML, and PHP tags from the string.