In website content management, we often encounter a common requirement: to extract pure text information from dynamically generated content rich in formatting.The reasons behind this are diverse, such as the need to generate concise and clear meta descriptions (Meta Description) for search engines, to display unformatted summaries on list pages, or simply to obtain clean plain text content for data analysis.安企CMS as a flexible and efficient content management system, fully considers these scenarios, and provides elegant and practical solutions for users through its powerful template engine and built-in filters.
Coexistence of dynamic content and HTML tags
In the AnQi CMS backend, whether it is editing articles, product details, category introductions, or single-page content, we usually use a feature-rich rich text editor.These editors allow us to conveniently insert images, links, adjust font styles (such as bold, italic), and create lists.After the content is saved to the database, these rich and diverse formats are actually presented through HTML tags.
When we output this content through the template on the front-end of the websitearchive.Content/category.Descriptionorpage.Contentof this variable, in order to maintain its original visual effect, it is usually配合|safeFilter, for example{{ archive.Content|safe }}.This will indicate to the template engine to output the content as safe HTML directly, and the browser will parse and render the HTML tags within it.However, in certain specific display areas or data output scenarios, these HTML tags may even seem redundant, and may even disrupt the layout or purity of the data.
Extract the core text: Anqi CMS built-in filter
To solve the problem of removing tags from dynamic HTML content, the template engine of Anqi CMS is built-in with several very practical filters, the most important of which isstriptagsandremovetags.
striptags:Completely remove all HTML tags
When your goal is to obtain a segment of pure text, without retaining any HTML formattingstriptagsThe filter is your preference.It will traverse the input string and mercilessly remove all HTML tags (including opening and closing tags), leaving only the pure text content between the tags.
Example Usage:Assuming your article content (archive.Content) contains<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>
If you use it in the template:
<p>文章纯文本内容:{{ archive.Content|striptags }}</p>
The output on the page will be:
<p>文章纯文本内容:这是一段加粗的文字,还有一个链接。</p>
As you can see, all the<p>/<strong>/<a>tags have been removed.
Applicable scenarios:
- Generate plain text summaryIn the article list page, you may want to only display the pure text summary of the first few words of the article.
- SEO meta descriptionon the page
<head>Part,meta name="description"The value of the label should be plain text to avoid search engines from capturing HTML tags, which may affect the display effect.
removetags: Precisely remove the specified HTML tag
Withstriptagsis different from the "one-size-fits-all" approach.removetags
Example Usage:Continue with the above example, if your content is<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>.
If you only want to remove the link tags<a>and paragraph tags<p>but keep the bold tags<strong>you can use them in the template:
<p>部分HTML保留内容:{{ archive.Content|removetags:"p,a"|safe }}</p>
Please note that English is used here|safebecauseremovetagsAfter processing, HTML tags may still be retained and need to be parsed by the browser.
The output on the page will be:
<p>部分HTML保留内容:这是一段<strong>加粗</strong>的文字,还有一个链接。</p>
Here<a>tags have been removed, but<strong>tags have been retained.
Applicable scenarios:
- Content partially formatted adjustmentIn some display areas, you may want to retain certain core text formats (such as bold), but remove irrelevant or potentially layout-breaking tags (such as image tags)
<img>English or video tag<video>). - Data CleaningWhen preparing data for API output, it may be necessary to remove some specific HTML tags to meet data format requirements.
Actual Application: Make content more in line with display requirements
With the above filters, we can easily meet various pure text extraction needs in the Aiqi CMS:
In the article list or product display card, show the plain text summary.On the homepage or category list page, to maintain the cleanliness and unity of the page, it is usually only the article title and a concise description that are displayed.
{% archiveList archives with type="list" limit="10" %} {% for item in archives %} <div> <h2><a href="{{ item.Link }}">{{ item.Title|striptags }}</a></h2> {# 移除所有HTML标签后,再截取前100个字符 #} <p>{{ item.Description|striptags|truncatechars:100 }}</p> </div> {% endfor %} {% endarchiveList %}Optimize SEO Meta Description:Pass
tdkPage description obtained from tags, if the source content comes from a rich text editor, it is best to do it again.striptagsProcessing, ensure that the pure text is output to the search engine.<meta name="description" content="{% tdk seoDescription with name="Description" %}{{ seoDescription|striptags|truncatechars:150 }}">Even if here
seoDescriptionIt is usually already plain text, an additional filter can also avoid potential risks, and combine withtruncatecharsControl the number of words.
Security and **Practice
|safeFilter and tag removal:|safeThe filter is used to prevent HTML content from being automatically escaped so that the browser can correctly parse and display it. When you usestriptagsCompletely remove all HTML tags, and the content has become plain text, theoretically no longer needed|safe. But if usingremovetagsand still retains the HTML tags, it may still be necessary|safeEnsure these retained tags can be parsed correctly.- Flexible choice of filter: Choose based on the degree of precise control over the final text format, choose
striptagsWhether to clean up comprehensively, orremovetagsPerform selective retention. - Combine truncation function: After removing HTML tags, the content length may still be long. At this point, combine
truncatechars(truncated by character) ortruncatewordsEnglish word breaking filter, which can further control the length of displayed text. - Special requirement: retain HTML structure and truncate.It is worth mentioning that if you need to truncate content while retaining its HTML structure (for example, you still want the text to be bold after truncation), Safe CMS also provides `truncatechars_