In content operation, we often encounter such needs: on a list page of articles or a special topic page, it is necessary to display the abstract content of the articles.These articles are usually written using a Markdown editor, which may contain images, links, bold text, and other rich HTML structures.If simply truncating the HTML string rendered by Markdown, it will often destroy the original tag structure, causing the page layout to become chaotic, even appearing unclosed tags, which seriously affects the user experience.
AnQi CMS is an efficient and flexible content management system that fully considers the challenges of this type of content display.It provides an elegant solution through its powerful template engine and built-in filter functions, ensuring that the tag structure remains intact and damage-free when extracting the HTML content rendered from Markdown.
Understand the Markdown content rendering in AnQi CMS
Firstly, we need to understand how Anqi CMS handles Markdown content.When we use the Markdown editor in the background to write articles, the system will store the Markdown text.When displaying this content on the front-end page, especially on complete content pages like document detail pages, Markdown text is usually rendered into HTML.
In the AnQi CMS template, we can usearchiveDetailTag to get various fields of the article, including the content field of the Markdown editorContent. ThisContentThe field has a very practicalrenderparameter. When we setrender=trueWhen, the system will automatically convert and render the stored Markdown text into standard HTML content. If rendering is not required, it can be set torender=falseThis parameter can be omitted when the editor is closed, at this timeContentThe field will output the original Markdown text.
For example, we can use it like this to get the rendered article content:{% archiveDetail articleContent with name="Content" render=true %}
It should be noted that the rendered HTML content, when output in the template to avoid being escaped again by the browser and displayed as plain text, needs to be accompanied by|safeThe filter is used. This is a common security practice in web development.|safeTell the template engine that this content is safe HTML and can be output directly.
Core Strategy: Smart Extraction of HTML Content
Now, we have obtained the rendered HTML content, but problems arise if we directly truncate the HTML string in characters or words. For example, a segment of HTML<p>这是一段<b>重要的</b>文字。</p>If we interrupt the characters 'Zhuanyao' in the middle<b>The tags cannot be closed, and the browser will try to fix it, but the result is often unpredictable, causing the layout to be chaotic.
To solve this problem, AnQi CMS has built-in truncation filters specifically designed to handle HTML content:truncatechars_htmlandtruncatewords_html.
truncatechars_html:numberThis filter will truncate HTML content based on the specified character count, while also intelligently checking and closing all unclosed HTML tags.It ensures that the truncated HTML is still a valid, structurally complete fragment, and adds an ellipsis “…” at the truncation position.truncatewords_html:number: withtruncatechars_htmlSimilar, but it truncates HTML content based on the specified word count. It also handles the closing of HTML tags and adds an ellipsis.
These filters are the key to extracting HTML content without damaging the tag structure.
Practice exercise: Extract the HTML content rendered by Markdown.
Assuming we are building a list page of articles, each article needs to display a summary of about 150 characters, and we hope to retain the original HTML styles such as bold and italic in the Markdown summary.
In our template file, you can write it like this:
{# 假设我们正在遍历一个文章列表,item是当前文章对象 #}
{% for item in archives %}
<div class="article-summary">
<h3><a href="{{ item.Link }}">{{ item.Title }}</a></h3>
<div class="summary-content">
{# 先获取并渲染Markdown内容为HTML #}
{%- archiveDetail fullContent with name="Content" id=item.Id render=true %}
{# 对渲染后的HTML内容进行字符截断,并确保安全输出 #}
{{ fullContent|truncatechars_html:150|safe }}
</div>
<a href="{{ item.Link }}" class="read-more">阅读更多 ></a>
</div>
{% endfor %}
In the code above:
- We first pass through
{% archiveDetail fullContent with name="Content" id=item.Id render=true %}Got the specified article'sContentfield content and force it to be rendered as HTML. The rendered HTML content is assigned tofullContentVariable. - Then, we handle
fullContentthe variable was used|truncatechars_html:150Filter. This filter intelligently truncates the first 150 characters of HTML content (including the characters occupied by HTML tags themselves), and most importantly, it automatically handles the potential unclosed tags caused by truncation positions and closes them correctly. - Finally, we used it again
|safeA filter to ensure that the extracted and processed HTML summary can be normally parsed and displayed by the browser, rather than being output as plain text.
In this way, we can see the brief abstracts of each article on the article list page, which not only retains the original HTML format but also avoids the problem of tag structure damage caused by truncation, keeping the page layout neat.
Further considerations: When to choose which cutting method
- Character-based cutting (
truncatechars_html)When you have a strict character limit on the length of summaries, such as requiring that all summaries be kept within 100 characters, regardless of whether the content is Chinese, English, or HTML tags,truncatechars_htmlIt would be a more precise choice. - Cut by word (
truncatewords_html): If your website content is mainly in English and you want the summary to be semantically complete, avoid cutting words in the middle, thentruncatewords_htmlIt will be more suitable. It will try to truncate at word boundaries to make the summary more readable. - Get plain text summary (
striptags): Sometimes, we may not need to retain any HTML styles and just want a plain text summary. In this case, we can use|striptagsA filter that removes all HTML tags, then you can truncate the plain text you get|truncatecharsor|truncatewordsFor example:{{ fullContent|striptags|truncatechars:150 }}.
These built-in features of AnQi CMS provide great convenience for content operators.No need to manually clean HTML, nor worry about complex regular expressions, just call the corresponding tags and filters in the template, and you can easily achieve a high-quality content summary display.
Frequently Asked Questions (FAQ)
How to get the original Markdown content instead of the rendered HTML?If you want to get the original text of Markdown content on the front-end page, rather than the rendered HTML, you canarchiveDetailput in the tag.renderthe parameter tofalse. For example:{% archiveDetail rawMarkdown with name="Content" render=false %}At thisrawMarkdownThe variable stores the original Markdown text without conversion.
2. How can I get a plain text summary without retaining any HTML tags?If you want the summary to be plain text without any HTML tags, you can first use|striptagsThe filter removes all HTML tags and then truncates characters or words. For example, extract a plain text summary of 150 characters:{% archiveDetail fullContent with name="Content" render=true %}
{{ fullContent|striptags|truncatechars:150 }}Here, Markdown is rendered into HTML first, then the HTML tags are stripped, and finally the plain text is truncated.
3. Why while usingtruncatechars_htmlortruncatewords_htmlAfter that, it still needs to be added|safeFilter?The template engine of AnQiCMS (similar to Django) defaults to escaping all output content to prevent cross-site scripting attacks (XSS) and other security issues. This means that eventruncatechars_htmlortruncatewords_htmlThe filter has intelligently handled the closing of HTML tags, generating a valid HTML fragment, if missing|safeFilter, these HTML tags (such as<p>/<b>The closing parenthesis will be escaped as entity encoding (for example<p>/<b>), resulting in the browser being unable to correctly parse and render. Add a|safeThis is to explicitly inform the template engine that this content has been verified and can be directly output as HTML.