I have recently noticed in increase in the number of spam robots hitting various sites, leaving a message eerily like a real person's.
The robots don't appear to be targeting any specific site, but are instead configured in a very generic way to process almost any type of web form, post a submission, and hope the web site includes a referer link. Of course, the referer link contains the real spam payload, providing a link to their web site and the hope of a higher Google ranking.
Here is an example of the message left by one of the robots:
Very interesting and beautiful site. It is a lot of helpful information. Thanks!
Try searching that exact phrase at Google to see how many web sites have been attacked:
The majority of the sites are blogs.
The problem with most blogs are that they leave referer links, trackbacks, and user comments completely open for automation. Here is the general algorithm the bots are using:
1. Perform a search for a target keyword in a popular search engine.
2. For each resulting web site, perform the following:
3. Crawl entire site looking for a submission form. Any form will do.
4. Parse every field in the form and create a submission string.
5. Insert payload (spam web site to promote) in particular field. Perhaps, a username link field, or web site link field, if one can be found.
6. Insert canned user response in comment field to appear to be a genuine user comment.
7. Generate automatic username and email address.
8. Submit using protocol specified in form (GET or POST).
9. Go to step 2 to repeat.
Looking at the algorithm above, which I believe they are using, notice the danger of step 3. The spam I have found has been on forms which do not even provide a referer link or any URL link for that matter. For example, a comment field which allows no room for a link.
The bots try a submission on any and all submission forms they come across. With the availability of networked PCs and zombie PCs, the spammers have access to a growing network from which they can launch these attacks, even on innocent submission forms.
The potential result of this? At a minimum, it will result in wasted bandwith for the target web sites. Worse, blog comments littered with seemigly real user comments. And even worse than that, junk data added to backend databases.
A solution to this problem lies not neccessarily in stopping the attacks, but rather in discouraging them from taking place to begin with.
Think about the benefits for the spammer: a link on thousands of blogs, a higher Google rank since their key algorithm relies on which sites link back to yours, easy access to blog comment screens with little or no security mechanism (capcha), and virtually no cost to perform the submissions.
With the growing abundance of blogs, it's no wonder people want to take advantage of this. But first, the major blog sites need to activate comment security across all of their blogs and turn it on by default. Search engine evolution may also play an important role in curbing this issue.