Robots.txt Generate

author:

AnqiCMS

Running environment:

AnqiCMS 3.0.5 or above

Installation method:

Please log in to your website background and install it in function management

price:

free

robots.txt is a search engine crawler protocol file. Its function is to tell search engines spiders which pages of your website can be crawled and which pages cannot be crawled. robots.txt is a plain text file that follows the bot exclusion criteria. It consists of multiple rules. No jump rules can prohibit or allow specific search engine spiders to crawl files under the file path of the website. If you do not set it, all files are allowed to crawl by default.

The robots.txt rules include:
User-agent: The crawler's User Agent (UA) identifier, viewable here.
Allow: Allow access to crawl.
Disallow: Disable access crawling.
Sitemap: Sitemap. There is no limit on the number of characters, you can add several sitemap links.
#: Comment line.

Here is a simple robots.txt file with two rules:

User-agent: YisouSpider
Disallow: /

User-agent: *
Allow: /

Sitemap: /sitemap.xml

User-agent: BaiduSpider line means: for the rules that the user agent is "YisouSpider" (YisouSpider), it can also be set to bingbot, Googlebot, etc. The proxy names of other search engine crawlers can be viewed here.
Disallow: / means "disable access to crawl all content".
User-agent: The line means: rules for all user agents (* is wildcard). All other user agents can crawl the entire website. It doesn't matter if you don't specify this rule, the result is the same; the default behavior is that the user agent can crawl the entire website.
Sitemap: The meaning of the line is: the site map file path of the site is/sitemap.xml.

system-plugin-robots

Related functions

Robots.txt Generate

Sitemap automatically generated

Active link push

Link management

Content Comment Management