In website operation, we often need to display an abstract of a large amount of content on a single page, such as the article list on the homepage, a brief introduction on the product detail page, or recommended content for a certain module.These summaries must be able to attract readers to click and maintain the neat and beautiful layout of the page.However, when the content itself contains rich HTML formatting (such as bold, italic, images, links, etc.), simply truncating the character length often leads to a headache: the HTML tag structure is destroyed, causing the page to display incorrectly and even affecting the overall style.

Imagine a well-formatted article, but the abstract is left incomplete due to improper truncation,<div>a label that only shows half of the image, or<img src="..." alt="....Such results not only greatly affect user experience, making the page look disorganized, but may also have a negative impact on search engine optimization (SEO), because search engines tend to crawl pages with good structure and standardized code.

The template system of Anqi CMS, inspired by Django's flexible syntax, includes one namedtruncatechars_htmlThe practical filter, which is born to solve the above difficulties.This filter can intelligently extract content containing HTML tags while ensuring that the extracted HTML code is still complete and valid, without destroying the original tag structure.

truncatechars_htmlHow to ensure the security of HTML content extraction

truncatechars_htmlThe use of the filter is very intuitive. You just need to assign the content variable you want to extract, through the pipe character|and pass it totruncatechars_html, and specify the length of the 'visible characters' you want to extract.

For example, the content of your article is stored inarticle.Contenta variable, and you want to extract the first 120 visible characters as a summary:

{{ article.Content|truncatechars_html:120|safe }}

The key point here istruncatechars_htmlThe 'smart' part. It's not just simply cutting off after 120 characters from the beginning. Instead, it will:

  1. Identify HTML tags:It knows which are HTML tags (such as<strong>/<a>/<p>), and which are the actual text content.
  2. Calculate visible charactersIn counting, it only counts the text characters that the user can see and ignores the characters occupied by the HTML tags themselves.
  3. Safe truncationWhen truncating to a specified length, if the truncation point is exactly in the middle of an HTML tag,truncatechars_htmlit will intelligently adjust the truncation point to ensure that the tag is not truncated into an incomplete fragment.
  4. autoauto<div>auto</div>It will automatically add the correct closing tag at the end of the content, ensuring that the generated content fragment is a structurally complete HTML block.
  5. Add ellipsisBy default, if the content is truncated,truncatechars_htmlan ellipsis "…" will be added to the end of the truncated content to indicate that the content is incomplete.

Let us experience its magic through a simple example. Suppose you have some HTML content:

<div class="foo">
  <p>这是一段很长的<b>测试文本</b>,它会被安全地截取,而不会破坏HTML结构。</p>
  <ul>
    <li>列表项1</li>
    <li>列表项2</li>
  </ul>
</div>

If you usetruncatechars_html:25Extract this content:

{{ "<div class=\"foo\"><p>这是一段很长的<b>测试文本</b>,它会被安全地截取,而不会破坏HTML结构。</p><ul><li>列表项1</li><li>列表项2</li></ul></div>"|truncatechars_html:25|safe }}

The output will be like this (simplified for readability; the actual output may vary slightly depending on the content and truncation points):

<div class="foo"><p>这是一段很长的<b>测试文本</b>,它会被安全地截取,而不会破...</p></div>

As can be seen, even though the original<ul>and<li>The label may be truncated after the truncation point, but<div>and<p>All labels are properly closed, ensuring the integrity of the HTML structure. However, if a regulartruncatecharsfilter is used, it is very likely that<p>the tag or<b>The label is truncated directly, causing HTML rendering errors.

Actual application scenarios

truncatechars_htmlIt is widely used in the daily content operation of AnQi CMS:

  • Summary of the article list pageIn the blog or news list page, display the abridged content of each article, which can provide key information while avoiding the layout being stretched by long content.
  • Short description of the product listIn the product list page of e-commerce websites, show the core selling points of products while maintaining page loading speed and aesthetics.
  • Search result previewIn the search results within the site, provide users with fragments of relevant content to help them quickly determine if it is the information they need.
  • Recommended module contentIn the sidebar, footer recommendation, and other modules, display the essentials of related content to attract users to click.

Through this filter, content operators can safely use the rich text editor to create colorful content in the background, and there is no need to worry about complex truncation logic on the front end,truncatechars_htmlIt will handle everything intelligently, keeping your website always professional and tidy.

Common Questions (FAQ)

1.truncatechars_htmlDoes it truncate Chinese characters? How does it calculate the length?Yes,truncatechars_htmlCan correctly truncate Chinese characters.It is based on 'characters' rather than 'bytes' when calculating the length.This means that a Chinese character and an English letter will both be counted as 1 character, ensuring consistency and accuracy in length when slicing in multi-language environments.

2. If the visible character length of the content itself is less than the length I set, will it also add “…”?No.truncatechars_htmlVery intelligent, an ellipsis “…” will be added at the end of the truncated content only when the actual content extracted (i.e., the visible character length of the original content exceeds the length you set).If the original content is relatively short and does not reach the length you set, it will be output as is, without unnecessarily adding an ellipsis.

3.truncatechars_htmlandtruncatewords_htmlWhat is the difference? Which one should I choose?These are both used as filters for safely extracting HTML content, the main difference being the unit they extract:

  • truncatechars_html: ByCharacterTruncate length.It will start counting visible characters from the beginning and safely truncate after reaching the specified length.Even if the truncation point is in the middle of a word, it will retain the part before and add an ellipsis.
  • truncatewords_html: BywordsQuantity extraction.It will calculate the number of visible words, and safely truncate after reaching the specified number of words.This method ensures that the content截取 ends with a complete word. Choose which one depends on your specific needs.truncatechars_htmlIt may be more suitable. If you attach more importance to the semantic integrity of the content, and hope that the summary always ends with a complete word,truncatewords_htmlthis would be a better choice.