In website content management, we often encounter a common requirement: to extract pure text information from formatted dynamic content.The reasons behind this are varied, such as the need to generate concise and clear meta descriptions (Meta Description) for search engines, to display unformatted summaries on list pages, or simply to obtain clean plain text content for data analysis.AnQi CMS is a flexible and efficient content management system that fully considers these scenarios, providing users with elegant and practical solutions through its powerful template engine and built-in filters.
Dynamic content coexistence with HTML tags
In the Anqi CMS backend, whether editing articles, product details, category introductions, or single-page content, we usually use a feature-rich rich text editor.These editors allow us to conveniently insert images, links, adjust font styles (such as bold, italic), create lists, and more.After the content is saved to the database, these diverse formats are actually presented through HTML tags.
When we pass through the template on the front end of the websitearchive.Content/category.Descriptionorpage.Contentto output this content using such variables, in order to retain the original visual effect, it is usually配合|safeFilter, for example{{ archive.Content|safe }}This will instruct the template engine to output the content as safe HTML, and the browser will parse and render the HTML tags within it.However, in certain specific display areas or data output scenarios, these HTML tags may even seem redundant and may even destroy the page layout or the purity of the data.
The core tool for extracting plain text: Anqi CMS built-in filter
To solve the problem of removing tags from dynamic HTML content, the Anqi CMS template engine is built-in with several very practical filters, the most important of which isstriptagsandremovetags.
striptags:Completely remove all HTML tags
When your goal is to obtain a completely pure text, without retaining any HTML formatting,striptagsThe filter is your first choice. It will traverse the input string and mercilessly remove all HTML tags (including opening and closing tags), leaving only the plain text content between the tags.
Usage example:Assuming your article content(archive.Content)contains}<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>
If you use in the template:
<p>文章纯文本内容:{{ archive.Content|striptags }}</p>
The output of the page will be:
<p>文章纯文本内容:这是一段加粗的文字,还有一个链接。</p>
As you can see, all<p>/<strong>/<a>tags have been removed.
Application scenarios:
- Generate plain text summaryOn the article list page, you may want to only display the pure text summary of the article in the first few words.
- SEO meta descriptionOn the page,
<head>section,meta name="description"The value should be plain text to avoid search engines from capturing HTML tags and affecting display.
removetags: Precisely remove specified HTML tags
withstriptagsDifferent from the 'one-size-fits-all' approach,removetagsThe filter provides more fine-grained control. It allows you to specify one or more HTML tags you wish to remove, while retaining the other HTML tags you have not specified.
Usage example:Continue with the example, if your content is<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>.
If you only want to remove the link tags<a>and paragraph tags<p>but keep the bold tags<strong>you can use them in the template:
<p>部分HTML保留内容:{{ archive.Content|removetags:"p,a"|safe }}</p>
Please note that this is being used|safebecauseremovetagsAfter processing, it is possible that HTML tags may still be retained and need to be parsed by the browser.
The output of the page will be:
<p>部分HTML保留内容:这是一段<strong>加粗</strong>的文字,还有一个链接。</p>
Here<a>Tags have been removed, but<strong>tags have been retained.
Application scenarios:
- Content local format adjustmentIn some display areas, you may want to retain certain core text formats (such as bold) but remove unrelated or tags that may cause layout issues (such as image tags
<img>or video tag<video>) - Data cleaningWhen preparing data for API output, it may be necessary to remove some specific HTML tags to meet the data format requirements.
Actual application: Make the content more in line with display requirements
Combining the above filters, we can easily deal with various pure text extraction needs in Anqi CMS:
Display a plain text summary in the article list or product display card:On the homepage or category list page, to maintain the page's cleanliness and consistency, it is usually only the article title and a concise description that are displayed.
{% archiveList archives with type="list" limit="10" %} {% for item in archives %} <div> <h2><a href="{{ item.Link }}">{{ item.Title|striptags }}</a></h2> {# 移除所有HTML标签后,再截取前100个字符 #} <p>{{ item.Description|striptags|truncatechars:100 }}</p> </div> {% endfor %} {% endarchiveList %}Optimize SEO Meta Description:By
tdkDescription obtained from the tag, if the source content comes from a rich text editor, it is best to do it again.striptagsHandle, ensure that the output to the search engine is plain text.<meta name="description" content="{% tdk seoDescription with name="Description" %}{{ seoDescription|striptags|truncatechars:150 }}">Even here
seoDescriptionIt is usually already plain text, an additional filter can also avoid potential risks and combinetruncatecharsControl the word count.
Safety and **practice
|safeFilter and tag removal:|safeThe filter is used to prevent HTML content from being automatically escaped, so that the browser can correctly parse and display it. When you usestriptagsAfter all HTML tags are completely removed, the content has become plain text, theoretically no longer needed|safe. But if usingremovetagsthe HTML tags are still retained, it may still be necessary|safeMake sure these retained tags are parsed correctly.- Flexible filter selectionChoose according to the degree of precise control over the final text format,
striptagsComprehensive cleaning, orremovetagsSelective retention. - Combine truncation function: After removing HTML tags, the content length may still be very long. At this time, combine
truncatecharsCharacter truncate ortruncatewords(Word break) Filter, which can further control the length of displayed text. - Special requirement: Retain HTML structure and truncate.It is worth mentioning that if you need to truncate content while retaining its HTML structure (for example, if you want the bold text to remain bold after truncation), AnQi CMS also provides `truncatechars_