Protect Your Content
1. Set up a bad-bot-banning script on your site.
There's one offered in this forum:
This is to bar leechers. You use 'robots.txt' to disallow a subdirectory. After two weeks, you set up the bad-bot script. You wait until legitimate bots have read your 'robots.txt' file.
The idea is that a bot that accesses a subdirectory, which you have disallowed, is a bad bot. It's just hoovering up your data. It's not from a legit search engine. So it's probably a competitor, or a leech.
The script rewrites your .htaccess file to forbid the bot access to your site altogether. Webmasters are keen to ban some bots to save bandwidth also.
Furthermore, you can ban individual downloader softwares by their HTTP_USER_AGENT environmental variable:
2. Forbid visitors from Russia, China, Romania etc. using .htaccess.
This is to bar countries that are more likely to try leeching, or other jiggery-pokery:
Add or delete countries according to your needs by downloading this file:
You can use the database within to sort countries by continent, and build up a list.
Some Third-World countries are noted for hacking and fraud. You may think you want the whole world to come to your site, but you don't. Most of the world is poor, and $30.00 USD is a lot of money. It's not that they wouldn't _like_ to buy your goods; they just can't afford to. They are more likely to be tyre-kickers than customers.
3. Use absolute URLs in your internal links.
This is to have lots of links to your main site in any HTML copied from you. Plaster your absolute URL, phone number, email address, AND absolute IMG urls all over your pages. Then bar offsite image hot-linking, using .htaccess. You can use .htaccess mod_rewrite to replace hot-linked images with an image of your choice.
This makes it less worthwhile to copy your content. They get a lot of broken images and big ads for your site in it.
4. Bar offsite image hotlinking by using .htaccess mod_rewrite:
5. Have a script generate a small amount of random content in the HTML of each page.
Find a random quote script at http://www.hotscripts.com. Get one that can be called using Server Side Includes (for static HTML pages), or a PHP one that you can insert in the footer file of your PHP site. Replace the included quotes with some original ones of your own.
This will make your page seem to be the latest, updated version of the content _if_ you have a higher Google PageRank than your plagiarist, _or_ Google indexes your content before theirs. Or both!
6. If copyists use Adsense, report them to Google via the Google adword link in each ad.
This is to rob them of the reason to rip-off your content. Google will (hopefully) terminate their account.
This is to break a stolen page out of any page framing it. Put this code anywhere after the <BODY> tag in your web page.
8. Use Google Alerts to email you when your domain name turns up on a site they spider.
Google Alerts: http://www.google.com/alerts. Once I got an email about a site containing a URL of mine. Went to the site. Nothing but Google Adsense; no other text at all. The Google Alert email showed text I recognised as being mine. No sign of it in the HTML of the 'linking' site. Looked suspect. Ratted them out to Google Adsense.
9. Type this into the Google search box:
... to find possible plagiarists.
This command looks for sites with 'www.yoursite.com' in the URL, excluding your own. It will show sites using redirect scripts. Many are harmless, some are not.
Some sites use an Apache web server 302 temporary redirect to usurp your search engine results ranking. Google sometimes sees their site as the originator. It helps the plagiarists if they have a higher PageRank than you. Because Google is searching through its own cache of the competitor site, you can't block these redirects from your own site.
If the results from the query above contain URLs with script names like 'nph-proxy.cgi' in them, you may have a problem. However, Google seems to be getting better at pushing these sites into its supplemental listings.
The trick seems to be who gets spidered first, and who has higher PR. So get working on getting more links!
10. Put "noarchive" meta tags in all your pages. Stop your content being archived by legtimate bots. Put the following code between the <HEAD> tags in your HTML document:
<meta NAME="robots" CONTENT="index,follow,noarchive">
It stops plagiarists using search engine caches of your content.
11. Use a Redirect 301 in your .htaccess file to redirect from yourdomain.com to www.yourdomain.com, (or the reverse, whichever is the most common way you write your URLs).
Paste the following code into your .htaccess file, replacing 'yourdomain.com' with your domain name:
This stops search engines finding 'duplicate' copies of your site, and hammering you with a duplicate content penalty.
12. http://www.CopyScape.com can help find content thieves.
I wouldn't debate the matter with plagiarists. Give them 48 hours to remove your content, then inform their ISP. Accept no excuses, like "My web developer did it". That's the oldest get-out whine in the book: "Someone else did it, it wasn't me!".
Save your email correspondence as templates for future use.
13. You could also contact the plagiarists' advertisers.
For example: "Your ad is displayed on this page here blahdeblah.com/copypage.htm, check out my page here mysite.com/original.htm."
14. Slap them with a DMCA order.
A working example: http://www.google.com/dmca.html. DMCA means Digital Millenium Copyright Act. Relax, this is relatively easy. It's a formal way of telling web sites and search engines who the real owner of the content is, and getting them to remove plagiarism.
15. Contact their web host.
Most web hosts won't tolerate plagiarism; they don't want law suits. State your case simply, with examples, in an email to the technical support staff.
15. Encrypt your web page source code.
You can't really have your cake and eat it, or display a web-page on the internet _and_ bar all copying, but you can frustrate it.
Individually these tricks wouldn't do much. Together, however, they will 'harden' your site.