Search This Blog

robots spider


Robots Spider reads your site's robots.txt file and the site to check for client-side references to the robots.txt entries. This allows you to prevent disclosure of directories that you don't want indexed by identifying the ones that aren't referenced by local pages. Search engines read the site and build their indexes accordingly. Directories and files that aren't referenced on the site shouldn't be indexed by spiders, and their presence in the robots.txt file may be security risk.
Online Robots.txt Generator Tool
Robots.txt Generator
Create your own robots.txt

I’ve recently needed to create a better robots.txt for WordPress blog and thought I’d contribute something back for those that need help doing the same. I’m by no means an expert, this is just information I’ve researched about how to make a better Robot.txt for WordPress blog.

How To Create Best Robot.txtCreating a better robots.txt for WordPress blog is important as without one, even your unique self-written articles can appear to be duplicate content due to the way WordPress is built. Additionally there are areas of the WordPress installation where the Googlebot need not look i.e. /wp-admin

By making an organised robots.txt ourselves we can improve the efficiency of the crawler across our site and that helps improve SEO.

I know there are plug-ins for creating robots.txt however the ones I looked at were either messy or no real help, you still needed to sort it out yourself – so creating a simple, tidy robots.txt from scratch seems to be the best way.

It’s also quick and easy.

So what should be included / excluded?

    User-agent: *
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /trackback
    Disallow: /comments
    Disallow: /category/*/*
    Disallow: */trackback
    Disallow: */comments
    Allow: /wp-content/uploads

Here we are saying that the bots (user-agents) are allowed to index your blog. We have then disallowed all folders that make up the WordPress installation excluding the uploads folder. We allow this as it contains uploaded files i.e. images and videos.

    Disallow: /*?*
    Disallow: /*?

This part disallows all files with a ? in the url.

Be careful with this one, you need to have modified your file structure (permalinks) in WordPress, it’s best to do this anyway as this also helps with SEO. If you have left your permalinks as the default setting then the generated URL will contain a ? when you click on your articles/categories etc. so you should not use this part.

I prefer to setup a custom structure in WordPress:

/%postname%/

Back to the robots.txt!

    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.wmv$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$

Here we disallow files with extensions such as .php, .js – you can alter this list to suit yourself however the above is a good starting point.

Allow google image bot to search all images :

    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

Allow Google adsense bot on entire site :

    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

Above we have now allowed Google image bot and Google adsense bot to access everything.

Additionally if you use the XML-Sitemap plugin for WordPress you can add this at the end:

    #BEGIN XML-SITEMAP-PLUGIN
    Sitemap: your domain com/sitemap.xml.gz
    # END XML-SITEMAP-PLUGIN

This tells the Search Engine Crawler where your sitemap is.

Pingates is a service that pings or notifies a number of services that keep track of weblogs and publish them. By pinging, you let the services know that your blog has been updated and hence, they crawl and index your site, publishing your blog contents, thus increasing your blog's popularity.

0 komentar:

Post a Comment

says,,,

bisa sms gratis disini ni,ga percaya,,cobain dulu..

Followers