Web site owners use the robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.
It works likes this: a robot wants to visits a Web site URL, http://example.blogspot.com/p/welcome.html. Before it does so, it firsts
checks for http://example.blogspot.com/robots.txt, and finds:
There are two important considerations when using robots.txt:
There is /search/ and /search where /search will disallow all content with parameter such as /search?q=keyword or /search?updated-max. For the /search/ will Allow all url contain /search/ such as /search/label. Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /search/ /p/sample.html/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records. The '*' in the User-agent field is a special value meaning "any robot".
Here some other examples:
User-agent: * Disallow: /The "User-agent: *" means this section applies to all robots. The "Disallow: /" tells the robot that it should not visit any pages on the site.
There are two important considerations when using robots.txt:
- robots can ignore your robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
- the robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.
The details
The robots.txt is a de-facto standard, and is not owned by any standards body. There are two historical descriptions:- the original 1994 A Standard for RobotExclusion document.
- a 1997 Internet Draft specification A Method for WebRobots Control
How to create a robots.txt file for Blogger
- Go to your blogger blog.
- Navigate to Settings >> Search Preferences ›› Crawlers and indexing ›› Custom robots.txt ›› Edit ›› Yes
- Now paste your robots.txt file code in the box.
- Click on Save Changes button.
- You are done!
What to put in it
The "robots.txt" file is a text file, with one or more records. Usually contains a single record looking like this:User-agent: * Allow: /search/ Disallow: /search Disallow: /p/sample.html Disallow: /search/label/somelabel Disallow: /2018/10/somepost.htmlIn this example, single part allowed and four part are excluded.
There is /search/ and /search where /search will disallow all content with parameter such as /search?q=keyword or /search?updated-max. For the /search/ will Allow all url contain /search/ such as /search/label. Note that you need a separate "Disallow" line for every URL prefix you want to exclude -- you cannot say "Disallow: /search/ /p/sample.html/" on a single line. Also, you may not have blank lines in a record, as they are used to delimit multiple records. The '*' in the User-agent field is a special value meaning "any robot".
Here some other examples:
To exclude all robots from the entire server
User-agent: * Disallow: /
To allow all robots complete access
User-agent: * Disallow:(or just create an empty "robots.txt" in input field)
To exclude a single robot
User-agent: BadBot Disallow: /
To allow a single robot
User-agent: Google Disallow: User-agent: * Disallow: /
If you want to search more Robots.txt and Google updates then you can check here.
ReplyDeleteoralmasaj
ReplyDelete