In website content operation, we often need to display the summary or part of the content in different scenarios, such as on the list page, search results, or related recommendations.This is when you need to truncate the content.If the content is plain text, a simple character or word cutter can perform the task well.However, when the content is rich in HTML tags, the problem becomes complex: directly cutting it may cause HTML tags to be truncated, thereby destroying the page structure, causing display anomalies, and even affecting user experience.
AnQiCMS as an efficient content management system takes full consideration of the flexibility and security of content display. It provides us with two template filters specially designed to handle text with HTML tags:truncatechars_htmlandtruncatewords_html,Cleverly solved the problem of HTML content security truncation.
The challenges faced by HTML truncation.
Imagine you have a piece of content like this:<p>这是一段很长的文本,其中包含<b>重要的关键词</b>,还有一些<i>斜体字</i>和<img src="/image.jpg" alt="图片描述">。</p>If we simply truncate this text to the first 20 characters, the result may become:<p>这是一段很长的文本,其中包含<b>重It is clear that,<b>the tag is not closed,<i>Tags and<img>the tag may also be truncated, resulting in:
- Page layout is chaotic: An unclosed tag may affect the style of subsequent elements.
- Semantic destruction: Important keywords may lose their bold effect.
- User experience decline: The image cannot be displayed normally, and it may even cause browser parsing errors.
The traditional text truncators do not understand the structure of HTML, they only cut off according to the number of characters or words by force.
AnQiCMS's solution: Smart HTML Extractor
Provided by AnQiCMStruncatechars_htmlandtruncatewords_htmlThe filter is born to solve the above problems. Its core advantage lies in the ability tointelligently parse HTML structuresEnsure that all truncated HTML tags are properly closed while extracting text content to avoid structural damage.
1.truncatechars_html: Safely truncate HTML text by character count.
truncatechars_htmlThe filter allows you to specify a character limit, ensuring that all open HTML tags are correctly closed when truncating text to the specified character count.This ensures that the HTML structure of the page remains intact even if the content is truncated.
Usage:
{{ obj|truncatechars_html:number }}
Among them,objThe HTML text content you need to extract,numberThe total number of characters you want to extract (including the ellipsis at the end).
Example:Suppose we have the following HTML text:long_html_text = "<div class=\"foo\"><ul class=\"foo\"><li class=\"foo\"><p class=\"foo\">这是一段很长的文本,其中包含<b>重要的关键词</b>。</p></li></ul></div>"
If we want to truncate about 25 characters (the actual visible character count will be slightly less than 25 because HTML tags are not counted as visible characters, but the ellipsis is included):{{ long_html_text|truncatechars_html:25 }}
The output may look something like:
<div class="foo"><ul class="foo"><li class="foo"><p class="foo">这是一段很长的文本,其中包含<b>重要的...</b></p></li></ul></div>
As can be seen from the output, even if the text is truncated,<b>/<p>/<li>/<ul>/<div>All tags were correctly closed, avoiding structural errors. If the final content does not reach the specified number of characters, it will automatically add “…” to indicate truncation.
Application scenario:When you need to strictly control the length of the summary, for example in SEO-friendly descriptions or ad positions with limited word count, this filter is particularly useful.
2.truncatewords_html: Safely truncate HTML text by word count
withtruncatechars_htmlsimilar,truncatewords_htmlIt is cut based on the number of words. It also intelligently identifies and closes all related HTML tags to ensure that the HTML code after cutting is valid.
Usage:
{{ obj|truncatewords_html:number }}
Among them,objThe HTML text content you need to extract,numberIs the total number of words you want to extract.
Example:Suppose there is HTML text:html_content = "<p>This is a long test which will be cutted after some words. <b>Important words here.</b></p>"
If we want to extract 5 words:{{ html_content|truncatewords_html:5 }}
The output may look something like:
<p>This is a long test ...</p>
You can see here, after extracting 5 words,<b>The tag is not included in the final content, but<p>The tag is correctly closed.
Application scenario:Suited for blog article lists, news summaries, and other scenarios where readability and continuity of content are usually more important. Extracting by words can provide better readability.
Why choose HTML extractor instead of a normal one?
AnQiCMS also providestruncatecharsandtruncatewordsThese are cutters for plain text. They do not parse HTML tags, but truncate directly according to the number of characters or words.
truncatechars:{{ "<b>Hello World</b>"|truncatechars:10 }}Possible output<b>Hello Wo..., HTML structure is damaged.truncatewords:{{ "<b>Hello World</b>"|truncatewords:1 }}Possible output<b>Hello..., HTML structure is damaged.
Therefore,If your text content may contain any HTML tagsit should be given priority.truncatechars_htmlortruncatewords_htmlTo ensure the stability and aesthetics of the page. Only consider using a non-HTML tokenizer when you are sure the text is plain text.
Useful tips and precautions
- Always cooperate
|safeThe filter is used:AnQiCMS template engine defaults to escaping output variables with HTML entities to prevent XSS attacks. Whentruncatechars_htmlortruncatewords_htmlWhen we return a processed HTML code, we need to use|safeThe filter clearly tells the template engine that this HTML is safe, no need to escape it again, otherwise what will be displayed on the page will be with<and>The original HTML string encoded by entities. For example:{{ archive.Content|truncatechars_html:100|safe }} - Source HTML quality:These filters canClose safelyThey truncate their labels, but they cannot repair the HTML structure that is already damaged in the original text.If the source content itself contains a large number of unclosed or incorrect tags, these filters cannot repair them to a perfect state.Therefore, ensuring the standardization of HTML from the source of content is still very important.
- Consideration for truncation length:When truncating the number of characters or words, it is necessary to test and adjust according to the actual page layout and design requirements to ensure that the abstract can convey sufficient information without destroying the beauty of the page.
- **Omit behavior: