In website operation, the article abstract plays a crucial role.It is not only the first window to attract visitors to click, but also an important basis for search engines to understand page content, index, and rank.A good abstract can quickly convey the core information of the article, improve user experience, and help with SEO performance.However, when the content of the article itself contains rich HTML tags (such as images, links, bold, paragraphs, etc.), how to safely extract a summary from these contents while avoiding destroying the HTML structure has become a common challenge.
Challenge: Why does ordinary truncation not work?
Many CMS systems, when extracting article summaries, are prone to be "cut in half" in the middle of HTML tags if they use simple character or byte cutting methods. For example, a piece of content is<div>这是一段引人入胜的**故事内容**,点击阅读更多</div>If the character is truncated to "Attracting attention"then"The tag is not closed. This will cause the following problems:
- Page rendering error: The browser may try to fix incomplete HTML, but the results are often unsatisfactory, which may lead to a chaotic page layout and lost styles.
- Function exception:If the truncated is a link tag
<a>or an image tag<img>, it may cause the link to fail or the image to not display. - Search engine misreading:A damaged HTML structure may affect the search engine's correct understanding and extraction of page content, thereby affecting SEO effectiveness.
Therefore, we need an intelligent way to extract, identify, and maintain the integrity of HTML tags.
AnQiCMS solution: Maintain the integrity of the HTML structure.
AnQiCMS provides a powerful and flexible template tag and filter, perfectly solving this problem. The core is to use the built-in HTML safe truncation filter:truncatechars_htmlandtruncatewords_html.
These filters can intelligently parse HTML structures, ensuring the correct closure of tags when extracting content, thus avoiding rendering issues caused by improper extraction.
truncatechars_html:number: Truncate based on character count and add an ellipsis (...) at the end. It ensures that all open HTML tags are properly closed after truncation.truncatewords_html:number: Truncate based on the number of words and add an ellipsis (...) at the end. It also maintains the integrity of the HTML structure.
In order to make the browser render the captured HTML code as page content rather than plain text, we also need to usesafefilter.
Actual application: article list and detail page
In AnQiCMS, we usually display article summaries in scenarios such as article list pages, category pages, search result pages, etc. The following are two common sources of summaries and corresponding safe extraction methods:
Use the article's 'Summary' field (
Description)In the AnQiCMS article publishing interface, there is a "Document Introduction" field.This field is usually used to fill in a brief overview of the article, and in most cases, its content is plain text or simple formatted HTML.{# 假设您正在循环输出文章列表,item是当前文章对象 #} <div class="article-item"> <h3><a href="{{ item.Link }}">{{ item.Title }}</a></h3> <p class="summary">{{ item.Description|truncatechars_html:150|safe }}</p> <a href="{{ item.Link }}">阅读详情</a> </div>here,
item.Descriptionwhich is the content of the article's summary. We usetruncatechars_html:150to truncate the first 150 characters (including HTML tags), and usesafeThe filter ensures that HTML is rendered correctly.From the "content" field of the article (
Content) extract the summarySometimes, the 'Document Summary' field may not be enough to express the desired summary effect, or you may want to more richly include certain HTML elements from the article content (such as a short formatted text or a key image). In this case, you can directly extract from the 'Document Content' field (Content) Truncate it.{# 假设您正在循环输出文章列表 #} <div class="article-item"> <h3><a href="{{ item.Link }}">{{ item.Title }}</a></h3> {# 直接从Content字段截取200个字符的摘要,并保持HTML结构完整 #} <p class="summary">{{ item.Content|truncatechars_html:200|safe }}</p> <a href="{{ item.Link }}">阅读详情</a> </div>In this example,
item.ContentIs the complete HTML content of the article.truncatechars_html:200Intelligently extracts the first 200 characters from this long HTML string, and properly handles all open HTML tags, thensafeThe filter is responsible for displaying it as available HTML.If you prefer to truncate by word count (especially for English websites), you can use
truncatewords_html:<p class="summary">{{ item.Content|truncatewords_html:50|safe }}</p>It will try to extract the first 50 words, while also ensuring the integrity of the HTML structure. It should be noted that for Chinese and other non-space-delimited languages,
truncatewords_htmlThe definition of the word may differ from expectations at this timetruncatechars_htmlIt is usually a more intuitive choice.
**Practice and Precautions
- Prefer to use the "summary" field:If your content strategy allows it, and the "Document Summary" field meets the abstract requirements, use it first.Because the "introduction" field is usually shorter and more concise than the "content" field, and the HTML structure is relatively simple or it is pure text.
- Choose the截取 length according to your needs:
numberParameters such as150or50) Adjust according to your website design, content type, and target user habits. Test different lengths and find the most suitable balance point. - Consider language features:For Chinese content,
truncatechars_html(Cut by character) usually comparedtruncatewords_html(Extracting by word) Performs more stably, more intuitive. - Always use
safeFilter:If you have used_htmlThe series' extraction filter must be added afterwards|safeOtherwise, the browser will display the HTML code as plain text instead of rendering the style. - Back-end management and front-end display are separated:This security clipping mechanism is a convenient feature provided by AnQiCMS at the template layer, it does not modify the original article content in the database, ensuring the integrity and flexibility of the data.
By flexibly using these intelligent filters provided by AnQiCMS, developers and operators can easily solve the problem of abstract extraction, which can not only ensure the beauty and normal function of the website page, but also better serve the optimization goals of search engines and enhance the overall value of the content.
Frequently Asked Questions (FAQ)
Q1: Why did I usetruncatechars_htmlAfter, the summary still displays HTML tags without style?
A1:This is usually because you forgot to add at the end of the filter chain.|safeFilter. The AnQiCMS template engine, for safety reasons, defaults to escaping all output content with HTML entities. Whentruncatechars_htmlIf it is already a string containing correct HTML tags, if not|safeInform the template engine that this is 'safe' HTML, it will escape it.<div>Such forms, causing the browser to display as plain text. Adding|safeAfter that, the browser can normally parse and render the style.
**Q2:truncatechars_htmlAnd `truncatewords_html