How to remove all HTML tags from dynamically generated HTML content?

Calendar 👁️ 71

In website content management, we often encounter a common requirement: to extract pure text information from formatted dynamic content.The reasons behind this are varied, such as the need to generate concise and clear meta descriptions (Meta Description) for search engines, to display unformatted summaries on list pages, or simply to obtain clean plain text content for data analysis.AnQi CMS is a flexible and efficient content management system that fully considers these scenarios, providing users with elegant and practical solutions through its powerful template engine and built-in filters.

Dynamic content coexistence with HTML tags

In the Anqi CMS backend, whether editing articles, product details, category introductions, or single-page content, we usually use a feature-rich rich text editor.These editors allow us to conveniently insert images, links, adjust font styles (such as bold, italic), create lists, and more.After the content is saved to the database, these diverse formats are actually presented through HTML tags.

When we pass through the template on the front end of the websitearchive.Content/category.Descriptionorpage.Contentto output this content using such variables, in order to retain the original visual effect, it is usually配合|safeFilter, for example{{ archive.Content|safe }}This will instruct the template engine to output the content as safe HTML, and the browser will parse and render the HTML tags within it.However, in certain specific display areas or data output scenarios, these HTML tags may even seem redundant and may even destroy the page layout or the purity of the data.

The core tool for extracting plain text: Anqi CMS built-in filter

To solve the problem of removing tags from dynamic HTML content, the Anqi CMS template engine is built-in with several very practical filters, the most important of which isstriptagsandremovetags.

striptags:Completely remove all HTML tags

When your goal is to obtain a completely pure text, without retaining any HTML formatting,striptagsThe filter is your first choice. It will traverse the input string and mercilessly remove all HTML tags (including opening and closing tags), leaving only the plain text content between the tags.

Usage example:Assuming your article content(archive.Content)contains}<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>

If you use in the template:

<p>文章纯文本内容:{{ archive.Content|striptags }}</p>

The output of the page will be:

<p>文章纯文本内容:这是一段加粗的文字,还有一个链接。</p>

As you can see, all<p>/<strong>/<a>tags have been removed.

Application scenarios:

  • Generate plain text summaryOn the article list page, you may want to only display the pure text summary of the article in the first few words.
  • SEO meta descriptionOn the page,<head>section,meta name="description"The value should be plain text to avoid search engines from capturing HTML tags and affecting display.

removetags: Precisely remove specified HTML tags

withstriptagsDifferent from the 'one-size-fits-all' approach,removetagsThe filter provides more fine-grained control. It allows you to specify one or more HTML tags you wish to remove, while retaining the other HTML tags you have not specified.

Usage example:Continue with the example, if your content is<p>这是一段<strong>加粗</strong>的文字,还有<a href="#">一个链接</a>。</p>.

If you only want to remove the link tags<a>and paragraph tags<p>but keep the bold tags<strong>you can use them in the template:

<p>部分HTML保留内容:{{ archive.Content|removetags:"p,a"|safe }}</p>

Please note that this is being used|safebecauseremovetagsAfter processing, it is possible that HTML tags may still be retained and need to be parsed by the browser.

The output of the page will be:

<p>部分HTML保留内容:这是一段<strong>加粗</strong>的文字,还有一个链接。</p>

Here<a>Tags have been removed, but<strong>tags have been retained.

Application scenarios:

  • Content local format adjustmentIn some display areas, you may want to retain certain core text formats (such as bold) but remove unrelated or tags that may cause layout issues (such as image tags<img>or video tag<video>)
  • Data cleaningWhen preparing data for API output, it may be necessary to remove some specific HTML tags to meet the data format requirements.

Actual application: Make the content more in line with display requirements

Combining the above filters, we can easily deal with various pure text extraction needs in Anqi CMS:

  1. Display a plain text summary in the article list or product display card:On the homepage or category list page, to maintain the page's cleanliness and consistency, it is usually only the article title and a concise description that are displayed.

    {% archiveList archives with type="list" limit="10" %}
        {% for item in archives %}
        <div>
            <h2><a href="{{ item.Link }}">{{ item.Title|striptags }}</a></h2>
            {# 移除所有HTML标签后,再截取前100个字符 #}
            <p>{{ item.Description|striptags|truncatechars:100 }}</p>
        </div>
        {% endfor %}
    {% endarchiveList %}
    
  2. Optimize SEO Meta Description:BytdkDescription obtained from the tag, if the source content comes from a rich text editor, it is best to do it again.striptagsHandle, ensure that the output to the search engine is plain text.

    <meta name="description" content="{% tdk seoDescription with name="Description" %}{{ seoDescription|striptags|truncatechars:150 }}">
    

    Even hereseoDescriptionIt is usually already plain text, an additional filter can also avoid potential risks and combinetruncatecharsControl the word count.

Safety and **practice

  • |safeFilter and tag removal:|safeThe filter is used to prevent HTML content from being automatically escaped, so that the browser can correctly parse and display it. When you usestriptagsAfter all HTML tags are completely removed, the content has become plain text, theoretically no longer needed|safe. But if usingremovetagsthe HTML tags are still retained, it may still be necessary|safeMake sure these retained tags are parsed correctly.
  • Flexible filter selectionChoose according to the degree of precise control over the final text format,striptagsComprehensive cleaning, orremovetagsSelective retention.
  • Combine truncation function: After removing HTML tags, the content length may still be very long. At this time, combinetruncatecharsCharacter truncate ortruncatewords(Word break) Filter, which can further control the length of displayed text.
  • Special requirement: Retain HTML structure and truncate.It is worth mentioning that if you need to truncate content while retaining its HTML structure (for example, if you want the bold text to remain bold after truncation), AnQi CMS also provides `truncatechars_

Related articles

How does the `yesno` filter handle boolean or null values and customize the display of 'Yes/No/Unknown'?

In AnQi CMS template development, how to display boolean (true/false) states or handle unknown (empty) values in an intuitive and concise manner is an important aspect for improving user experience and code readability.The `yesno` filter is designed for this purpose, it can simplify complex logical judgments into a single line of code, and allows you to customize the output results, such as displaying as "yes/no/unknown".### `yesno` filter: Smart converter for boolean and null values In a content management system, we often encounter situations where we need to display whether a project is enabled or a feature is turned on

2025-11-08

What is the use of the `addslashes` filter in JavaScript or JSON data output?

In website content management, especially when we want to insert dynamic data into JavaScript code or construct JSON formatted output, handling special characters is a non-negligible aspect.The AnQiCMS template engine provides a rich set of filters to help us elegantly handle such issues, with the `addslashes` filter being a practical tool specifically designed for this kind of scenario.The purpose of the `addslashes` filter explained

2025-11-08

How to ensure that single quotes, double quotes, and backslashes are correctly escaped in HTML output?

During website operation and template creation, we often need to output dynamic content to the HTML page.This is a common but often overlooked question: How to ensure that special characters such as single quotes, double quotes, and backslashes in the content do not破坏 the page structure or cause security issues when output to HTML?Don't worry, AnQiCMS provides very friendly built-in mechanisms and flexible tools in this aspect, which help us handle it easily.### AnQiCMS's default security mechanism: automatic escaping AnQiCMS has taken full consideration of content security in its design

2025-11-08

What are the limitations of the `lower` and `upper` filters when dealing with case conversion (such as Chinese)?

In AnQiCMS template development, the `lower` and `upper` filters are commonly used tools for handling text case conversion.They are designed to help us quickly standardize the display of text, such as converting the irregular content entered by users to lowercase or uppercase to maintain consistent page style or meet certain data processing requirements.However, when using these convenient filters, we may encounter some "edge" cases that they cannot handle, especially when it comes to non-English characters, such as Chinese.### `lower` and `upper`

2025-11-08

Can the `removetags` filter remove specified tags from HTML content (such as `<i>`)?

In AnQiCMS (AnQiCMS) such a flexible content management system, handling HTML content is a common task in daily operations.Sometimes, we want to remove certain tags from the content without completely stripping all HTML structure, in order to maintain consistency in the page display or meet design specifications.At this time, the `removetags` filter has become a very practical tool.### Understanding the `removetags` filter The `removetags` is an embedded filter provided by the Anqi CMS template engine

2025-11-08

How to get the first or last element of a list in AnQiCMS template?

When building a website, we often encounter the need to pick out the most special one from a pile of content, such as displaying the latest article as the headline, or highlighting a popular product.In the AnQiCMS template, it is crucial to flexibly obtain the first or last element of the list to meet these requirements.Fortunately, AnQiCMS provides a variety of intuitive and efficient methods to handle these scenarios, making content display more vivid.AnQiCMS's template system uses syntax similar to the Django template engine

2025-11-08

Does the `first` and `last` filter return a single Chinese character when processing Chinese string?

In Anqi CMS template development, we often use various filters (filters) to format or extract data.The `first` and `last` filters are among the most common ones, used to get the first or last element from a string or array.Many friends who use AnQi CMS may be curious, when we process data containing Chinese characters, such as article titles or content snippets, will these two filters return single Chinese characters?The answer is: **Yes, the `first` and

2025-11-08

How to encode URL parameters to ensure the correctness and security of the link?

In the daily operation of Anqi CMS, we often need to deal with various website links, which not only need to be beautiful and SEO-friendly, but more importantly, they must work correctly and safely.Among these, encoding URL parameters is a seemingly trivial but crucial link, which is directly related to the integrity and user experience of our website links.Why is URL parameter encoding so important?Imagine if your website had a search function, and users entered keywords containing spaces, special characters, or even Chinese characters, such as "Anqi CMS"}

2025-11-08