How does the `truncatechars_html` filter safely truncate HTML content without breaking the tag structure?

In website operation, we often need to display an abstract of a large amount of content on a page, such as the article list on the homepage, a brief introduction on the product detail page, or recommended content for a module.These summaries must be able to attract readers to click and maintain the neat and beautiful layout of the page.However, when the content itself contains rich HTML formatting (such as bold, italic, images, links, etc.), simply truncating the character length often leads to a headache: the HTML tag structure is destroyed, causing the page display to be chaotic and even affecting the overall style.

Imagine an article with careful formatting, but the abstract leaves an unclosed one due to improper cutting<div>Or a picture tag that only shows half of it<img src="..." alt="...This result not only greatly affects the user experience, making the page look disorganized, but may also have a negative impact on search engine optimization (SEO), because search engines tend to grab pages with good structure and standard code.

The AnQi CMS template system, which draws on the flexible syntax of Django, includes a feature namedtruncatechars_htmlThe practical filter, which is exactly for solving the above difficulties.This filter can intelligently extract content containing HTML tags while ensuring that the extracted HTML code is complete and valid, without damaging the original tag structure.

`truncatechars_html`How to ensure the security of HTML content extraction

truncatechars_htmlThe filter usage is very intuitive. You just need to pass the content variable to be extracted through the pipe symbol|,truncatechars_htmland specify the length of the 'visible characters' you want to extract.

For example, the content of your article is stored inarticle.Contenta variable, and you want to extract the first 120 visible characters as a summary:

{{ article.Content|truncatechars_html:120|safe }}

The key point is heretruncatechars_htmlThe 'smart' part. It's not just about counting 120 characters from the beginning and cutting off. Instead, it will:

Identify HTML tagsIt knows which are HTML tags (for example<strong>/<a>/<p>), which are the actual text content.
Calculate visible charactersIt only counts the visible text characters that the user can see, while ignoring the characters occupied by the HTML tags themselves.
Safe truncationWhen the specified length is reached, if the truncation point is exactly in the middle of an HTML tag,truncatechars_htmlit will intelligently adjust the truncation point to ensure that the tag is not truncated into an incomplete segment.
Self-closing tags: What's more, if any unclosed HTML tags remain after truncation (such as<div>tags that were opened but do not have a corresponding</div>It will automatically add the correct closing tag at the end, thus ensuring that the generated content fragment is a structurally complete HTML block.
Add an ellipsis.By default, if the content is truncated,truncatechars_htmlan ellipsis "..." is added at the end of the truncated content to indicate that the content is incomplete.

Let's experience the magic through a simple example. Suppose you have a piece of HTML content:

<div class="foo">
  <p>这是一段很长的<b>测试文本</b>，它会被安全地截取，而不会破坏HTML结构。</p>
  <ul>
    <li>列表项1</li>
    <li>列表项2</li>
  </ul>
</div>

If you usetruncatechars_html:25To extract this content:

{{ "<div class=\"foo\"><p>这是一段很长的<b>测试文本</b>，它会被安全地截取，而不会破坏HTML结构。</p><ul><li>列表项1</li><li>列表项2</li></ul></div>"|truncatechars_html:25|safe }}

The result will be like this (for readability, it may be simplified or truncated according to the specific content and cut-off point):

<div class="foo"><p>这是一段很长的<b>测试文本</b>，它会被安全地截取，而不会破...</p></div>

As can be seen, even the original<ul>and<li>The tag may be truncated after the breakpoint, but<div>and<p>All tags have been properly closed, ensuring the integrity of the HTML structure. However, if a regulartruncatecharsfilter is used, it may be at risk of<p>within the tags or<b>The tag is truncated directly in the middle, causing HTML rendering error.

Application scenarios in practice

truncatechars_htmlIt is widely used in the daily content operation of AnQi CMS:

Summary of the article list pageOn the blog or news list page, display the concise content of each article, providing key information while avoiding the layout being stretched by long content.
Short description of the product listOn e-commerce websites, display the core selling points of products on the product list page while maintaining page loading speed and aesthetics.
Search results previewIn the in-site search results, provide users with fragments of relevant content to help them quickly determine if it is the information they need.
Recommended module contentIn the sidebar, footer recommendation modules, and other modules, display the essence of related content to attract users to click.

Through this filter, content operators can confidently use the rich text editor to create colorful content in the background, and there is no need to worry about complex truncation logic on the front end, truncatechars_htmlWill intelligently handle everything, keeping your website always professional and tidy.

Frequently Asked Questions (FAQ)

1.truncatechars_htmlWill it truncate Chinese characters? How does it calculate the length?Yes,truncatechars_htmlIt can correctly truncate Chinese characters. It calculates length based on 'characters' rather than 'bytes'.This means that a Chinese character and an English letter are both counted as 1 character, ensuring consistency and accuracy in multi-language environments.

2. Will an ellipsis be added if the visible character length of the content itself is less than the length I set?No.truncatechars_htmlExtremely intelligent, an ellipsis "..." is added at the end of the content only when the actual content length exceeds the length you set.If the original content is already quite short and does not reach the length you set, it will be output as is, without adding an ellipsis unnecessarily.

3.truncatechars_htmlandtruncatewords_htmlWhat is the difference? Which one should I choose?Both are used as filters for safely extracting HTML content, the main difference being the units they extract:

truncatechars_htmlPress:characterTruncate the length. It will start counting visible characters from the beginning and safely truncate after reaching the specified length.Even if the truncation point is in the middle of a word, it will keep the part before it and add an ellipsis.
truncatewords_htmlPress:wordsThe number of characters is truncated. It calculates the number of visible words and safely truncates after reaching the specified word count.This way ensures that the extracted content ends with a complete word.Choose which one depends on your specific needs. If you have strict character length restrictions (such as fixed-width card display),truncatechars_htmlIt may be more appropriate. If you pay more attention to the semantic integrity of the content, you hope that the summary always ends with a complete word, thentruncatewords_htmlWould be a better choice.

How does the `truncatechars_html` filter safely truncate HTML content without breaking the tag structure?

`truncatechars_html`How to ensure the security of HTML content extraction

Application scenarios in practice

Frequently Asked Questions (FAQ)

AnQiCMS website case

AnQiCMS usage help

Anqi CMS Template Tag Manual

Security BLOG

Design Market

Anqi CMS API Help

AnqiCMS Update Record

Problem Exchange

Feature Introduction

Video Tutorial

What error message will the `archive/list` interface return when the `moduleId` parameter is invalid?

How to use the results of `archive/list` to implement click to view article details in conjunction with `archiveDetail.md`?

Does the AnQiCMS document list interface support complex queries on the returned data's `extra` field?

How to use the `archive/list` interface to dynamically load more documents on the front end (infinite scrolling)?

What is the help of `archive/list` interface returned `canonical_url` and `fixed_link` fields to SEO optimization?

What will `data` and `total` return if no documents meeting the criteria are found in the AnQiCMS document list?

How to truncate a long string and automatically add an ellipsis (...)?

What are the similarities and differences between the `stampToDate` and `date` filters in handling time formatting and their applicable scenarios?

How to format a Unix timestamp into a readable date and time string?

How to convert the first letter or the first letter of each word in an English string to uppercase in AnQiCMS?

What are the limitations of the `lower` and `upper` filters when dealing with case conversion (such as Chinese)?

How to ensure that single quotes, double quotes, and backslashes are correctly escaped in HTML output?

How does the `truncatechars_html` filter safely truncate HTML content without breaking the tag structure?

truncatechars_htmlHow to ensure the security of HTML content extraction

Application scenarios in practice

Frequently Asked Questions (FAQ)

AnQiCMS website case

AnQiCMS usage help

Anqi CMS Template Tag Manual

Security BLOG

Design Market

Anqi CMS API Help

AnqiCMS Update Record

Problem Exchange

Feature Introduction

Video Tutorial

What error message will the `archive/list` interface return when the `moduleId` parameter is invalid?

How to use the results of `archive/list` to implement click to view article details in conjunction with `archiveDetail.md`?

Does the AnQiCMS document list interface support complex queries on the returned data's `extra` field?

How to use the `archive/list` interface to dynamically load more documents on the front end (infinite scrolling)?

What is the help of `archive/list` interface returned `canonical_url` and `fixed_link` fields to SEO optimization?

What will `data` and `total` return if no documents meeting the criteria are found in the AnQiCMS document list?

How to truncate a long string and automatically add an ellipsis (...)?

What are the similarities and differences between the `stampToDate` and `date` filters in handling time formatting and their applicable scenarios?

How to format a Unix timestamp into a readable date and time string?

How to convert the first letter or the first letter of each word in an English string to uppercase in AnQiCMS?

What are the limitations of the `lower` and `upper` filters when dealing with case conversion (such as Chinese)?

How to ensure that single quotes, double quotes, and backslashes are correctly escaped in HTML output?

`truncatechars_html`How to ensure the security of HTML content extraction