In AnQiCMS templates, when handling content display, it is often necessary to截取rich text fields (such as article details, product descriptions, etc.).Directly extracting strings from HTML content often leads to the destruction of tag structure, which can affect page layout and even cause rendering errors. 幸运的是,AnQiCMS提供了一些非常实用的模板过滤器,可以帮助我们优雅地完成这项任务,同时确保HTML标签的完整性。
Common Challenges in Extracting Rich Text Content
When we use the rich text editor in the AnQiCMS backend to publish articles or product details containing various formats (such as titles, paragraphs, images, links, bold, italic, etc.), this content is usually stored in the database in the form of a complete HTML string.However, on the homepage, list page, search results page, or recommendation block of the website, we usually only need to display a concise summary of this content.
If at this point you simply cut these HTML strings character or byte-wise, for example, using programming languagessubstringFunction, it is easy to 'cut off' in the middle of an HTML tag. For example, a piece of HTML content might be<p>这是一段很长的<em>加粗文字</em></p>If we<em>tag inside and cut it off, such as getting<p>这是一段很长的<em, the browser cannot correctly identify and close<em>tags, which may eventually lead to page style chaos, even affecting other elements that should normally be displayed.
AnQiCMS's elegant solution: retaining the HTML structure extraction
AnQiCMS knows the pain points of content operators, built-in specialized filters for handling HTML content extraction in the template engine, which can intelligently identify and retain HTML tag structures to ensure that the extracted content is still a valid HTML fragment. These filters usually act on content obtained from backend rich text fields, such as througharchiveDetailTags retrieved from comments.Contentfield.
1. Cut and retain the HTML structure by character count:truncatechars_html
If you want to cut rich text content by strict character count limit while not destroying the HTML structure,truncatechars_htmlThe filter is an ideal choice.It calculates the number of visible characters (HTML tags themselves are not counted) and truncates when reaching the specified number of characters.It is most important that it will automatically ensure that all HTML tags opened before the cut point are closed correctly.
For example, retrieve from the document detailsContent截取前25个字符:
{# 假设archive.Content是富文本内容字段 #}
{%- archiveDetail articleContent with name="Content" %}
<div class="summary">
{{ articleContent|truncatechars_html:25|safe }}
</div>
In this example,truncatechars_html:25Will attempt to capturearticleContentThe first 25 characters, if the cut point is inside a certain HTML tag, it will intelligently adjust the cut point and complete all unclosed tags to ensure that the output HTML is valid. Finally,|safeThe filter is essential, it tells the template engine that this part of the content is safe HTML, which does not need to be escaped again, so that the browser can correctly parse and display it.
2. Cut by word count while retaining the HTML structure:truncatewords_html
If you pay more attention to the semantic integrity of the content and want to cut by words, thentruncatewords_htmlThe filter would be a better choice.It counts the number of words in the HTML content and truncates it when it reaches a specified word count, and also intelligently handles the closing of HTML tags.
For example, extract the first 5 words from the document details:
{# 假设archive.Content是富文本内容字段 #}
{%- archiveDetail articleContent with name="Content" %}
<div class="summary">
{{ articleContent|truncatewords_html:5|safe }}
</div>
here,truncatewords_html:5It will extractarticleContentthe first five words.For content containing HTML tags, it ensures that word truncation occurs outside the tags, or that all necessary HTML tags are properly closed after truncation.|safeIt plays a crucial role here, ensuring that the browser can render the output correctly as HTML.
When to choose which truncation method?
Selecttruncatechars_htmlOrtruncatewords_htmlIt usually depends on your specific design requirements and content characteristics:
truncatechars_htmlApplicable to:- When you need to strictly control the character length of the output content to fit a fixed width or height display area (for example, in a card layout, each summary needs to end on the same line).
- During generation
<meta name="description">When it comes to SEO-related tags, there are usually strict character limits.
truncatewords_htmlApplicable to:- When you pay more attention to the natural reading experience of the content, hoping to avoid word truncation, making the summary more readable.
- When processing multilingual content, there is a large difference in the length of words in different languages. Cutting words by words can better maintain semantic coherence.
These filters will automatically add an ellipsis at the end of the content after being truncated...to indicate that the content has been truncated
Overview of Implementation Steps
In AnQiCMS template, extract rich text content while maintaining the integrity of the tag structure, usually following several simple steps:
- Determine the rich text field:Clarify which field you need to store the rich text content in (for example, by
{% archiveDetail with name="Content" %}or{{ archive.Content }}obtaining). - Choose the appropriate filter:Choose whether to截取 characters or words based on your needs,
truncatechars_htmlortruncatewords_html. - Specify the truncation length:Use a colon after the filter
:and a number to specify the number of characters or words to be truncated. - add
|safeFilter:This is crucial! Immediately following the truncated content,|safeEnsure the content is rendered in HTML format rather than plain text.
For example, on an article list page, you may want to display the first 100 characters of each article as a summary:
{# 在archiveList循环中 #}
{% archiveList archives with type="page" limit="10" %}
{% for item in archives %}
<div class="article-item">
<h3><a href="{{ item.Link }}">{{ item.Title }}</a></h3>
<div class="article-summary">
{{ item.Content|truncatechars_html:100|safe }}
</div>
<a href="{{ item.Link }}" class="read-more">阅读更多</a>
</div>
{% endfor %}
{% endarchiveList %}
Points to note
|safeThe importance of:Emphasize again,|safeThe filter is the cornerstone for processing HTML content.AnQiCMS's template engine defaults to escaping all output content to HTML entities to prevent cross-site scripting attacks (XSS).|safe, HTML tags will be displayed as plain text and not parsed by the browser. Therefore, when you are sure that the content is safe HTML, you must use|safe.- Consideration for truncation length:Reasonably set the length of the excerpt, which can provide enough information while keeping the page tidy. Too short may lack information, and too long may lose the meaning of the summary.
- Performance consideration: Although AnQiCMS's filter has been optimized, in extreme scenarios involving a large amount of HTML content, it is still recommended to assess its potential impact on page loading performance if high-frequency extraction operations need to be performed.
AnQiCMS through these intelligent and easy-to-use template filters greatly simplifies the display management of rich text content.Content operators do not need to deeply understand complex HTML parsing logic and can flexibly control the way content is presented while ensuring that the page is beautiful and functional.
Frequently Asked Questions (FAQ)
1.truncatechars_htmlandtruncatewords_htmlWhat are the specific differences between these filters? How should I choose?
truncatechars_htmlIt truncates HTML content based on character count, whereastruncatewords_htmlIt is truncated based on the number of words.Both are smart in handling the closing of HTML tags to ensure the structure is complete.truncatechars_htmlIt is more suitable; if you are more concerned about the natural coherence and readability of the text and want to avoid words being truncated,truncatewords_htmlWould be a better choice.
2. Why am I usingtruncatechars_htmlortruncatewords_htmlAfter, the extracted HTML content is still displayed as plain text on the page instead of being parsed by the browser?
This usually happens because you forgot to add to the end of the filter chain.|safeFilter.The AnQiCMS template engine, for security reasons, defaults to escaping all output content with HTML entities.|safeMake it clear to the template engine that this content is safe HTML, no need to escape, so that the browser can correctly parse and