In AnQiCMS content management practice, we often encounter such needs: to extract pure text information from document content containing rich formatting (such as bold, italic, images, links, etc.).This may sound contradictory, content management systems are dedicated to the diversified display of content, why do we still 'strip' these formats?However, in many specific scenarios, displaying pure text content can play a crucial role, such as generating concise abstracts for articles, optimizing the meta description (Meta Description) for search engines, providing unified and clean content previews on list pages, or importing content into platforms that do not support HTML formats.
Then, how can we efficiently achieve this goal in the flexible template system of AnQiCMS? AnQiCMS provides powerful filters, whichstriptagsandremovetagsIt is the tool to solve this problem.
Understanding Core Tools:striptagsFilter
striptagsThe filter is a very practical feature in the AnQiCMS template, which functions as its name suggests — “strip tags”. When you want to remove all HTML or XML tags from a piece of content containing them all at once and keep only the text inside,striptagsit can be very useful.
The usage is very intuitive, just append the pipe symbol to the variable you want to process|ConnectstriptagsFor example, if you have a variable namedarchive.ContentThe variable stores the article content in HTML format, and you can retrieve its plain text content like this:
{{ archive.Content | striptags }}
This simple code will iterate overarchive.ContentAll content, identify and remove all of<div>/<p>/<a>/<img>HTML tags, and output only the visible text in the end.
Scenarios for processing Markdown content
It is worth noting that the AnQiCMS backend may have enabled the Markdown editor when editing document content. In this case,archive.ContentThe variable may contain Markdown-formatted text, rather than direct HTML. If Markdown text is used directlystriptags,Effect may not be satisfactory as it cannot recognize Markdown syntax and convert it to the corresponding plain text.
At this point, we need to userenderFilter, render Markdown text to HTML, and then use itstriptagsRemove HTML tags.renderThe filter can correctly convert Markdown syntax into HTML structures that browsers can recognize. Therefore, the complete processing flow will be as follows:
{# 假设 archive.Content 变量中存储的是 Markdown 格式的内容 #}
{{ archive.Content | render | striptags }}
PassrenderFilter, Markdown content is converted to HTML,striptagsRemove these HTML tags and ensure that the final output is plain text. It should be noted that,renderThe filter outputs HTML, if displayed directly on the page, in order to avoid the browser escaping HTML tags and make the HTML tags display directly, it is necessary torenderFilter AppendsafeFilter, but withstriptagsUsed together, sincestriptagsFinally, it will remove all HTML, sosafeIt is not necessary, because the final result it processes is plain text.
Flexible control:removetagsFilter
Sometimes, our requirements may be more specific: we do not want to remove all HTML tags, but only certain ones, while retaining other tags (such as, we want to retain<a>Label so that users can click on the link, but remove all images<img>word or paragraph<p>Label). At this time,removetagsThe filter becomes particularly powerful.
removetagsThe filter allows you to specify one or more HTML tags to remove.You just need to provide a comma-separated list of tag names after the filter.<i>and<span>Label, but keep all other content, you can write it like this:
{# 移除 <i> 和 <span> 标签,保留其他所有标签 #}
{{ "<strong><i>Hello!</i><span>AnQiCMS</span></strong>" | removetags:"i,span" }}
This code will output<strong>Hello!AnQiCMS</strong>English translation: , can be seen<i>and<span>the tags have been removed, and<strong>English translation: The label is retained. This fine control provides great flexibility in specific scenarios of content display.
English translation: Combined with the excerpt function to generate a pure text summary
When generating article summaries or abstracts, we not only need plain text but also need to control its length. AnQiCMS providestruncatecharsandtruncatewordsFilter, they can automatically add ellipses while truncating strings (...) when used withstriptags, you can easily generate a plain text summary that meets the requirements:
{# 获取纯文本内容,并截取前100个字符作为摘要 #}
<p>{{ archive.Content | render | striptags | truncatechars:100 }}</p>
{# 或者,按单词数量截取 #}
<p>{{ archive.Content | render | striptags | truncatewords:30 }}</p>
Please note,truncatecharsWill truncate by character count (including Chinese characters as one character), andtruncatewordsWill truncate by word count. Choose the appropriate truncation method based on your specific needs and content characteristics.
Practical Suggestions
In AnQiCMS template, remove HTML tags and display only plain text content, mainly aroundstriptagsandremovetagstwo filters expanded. In practical applications, you need:
- Confirm the source of the content:Judgment
archive.ContentThe content stored in variables such as this is pure HTML or Markdown. If it is Markdown, be sure to userenderby the filter. - Select an appropriate filter:According to the need, either remove all tags
striptags),or remove only part of the tagsremovetags),to select the most suitable filter. - Consider the length of the summary:If used to generate a summary, combine
truncatecharsortruncatewordsEnsure the output content is concise. - SEO Optimization: In
<meta name="description" content="...">the label usestriptagsEnsure the output is plain text, which is friendly to search engines.
AnQiCMS provides these filters to allow template designers to flexibly control the way content is presented, meeting various needs from full rich text display to concise plain text output, thereby building websites with more expressiveness and functionality.
Common Questions (FAQ)
1.striptagsandremovetagsWhat are the main differences between the filters?
striptagsThe filter will remove all detected HTML and XML tags from the content without leaving any room, and directly output plain text.removetagsThe filter provides more fine-grained control, allowing you to specify one or more specific HTML tags to remove (such as<img>/<p>),and other unspecified HTML tags will be retained in the content.The choice of which filter to use depends on whether you want to remove all formatting completely or selectively retain some formatting.
2. How to ensure the appropriate length of plain text content when generating article summaries?
After obtaining the plain text content, it can be combined withtruncatecharsortruncatewordsa filter to control the length. First, userender(If the content is Markdown) andstriptagsRemove HTML tags, then apply the cropping filter. For example,{{ archive.Content | render | striptags | truncatechars:150 }}The content will be converted to plain text, then the first 150 characters will be truncated and an ellipsis will be added.truncatewordsIt will be truncated by word count.
3. UsestriptagsDoes filtering output pure text content affect the website's search engine optimization (SEO)?
This depends on where you use plain text. In some cases, using plain text is beneficial for SEO. For example, the website's<meta name="description">The label should only contain plain text, as search engines usually only crawl and display descriptions in plain text.When displaying article summaries on the list page, plain text also helps search engines understand the content more quickly.But please note that not all main content should be converted to plain text, because search engines also need to parse HTML structure to understand the page layout and content focus.**Practice is using plain text in areas that require concise, unformatted display, while maintaining rich HTML formatting in the main content areas.