[GUIDE] Scrapebox intermediate tut [GUIDE]

***squishybearz*** · 12-19-2011, 02:19 PM

Found this on BHW; how it helpsComments:
The most important feature to make comments work for you is to use spintax. This feature allows us to utilize one spintax'd comment to produce hundreds or thousands of different comments. The way we implemented Spintax is as follows:

Code:I {like|love} your {site|website|blog}, what {font|background image|Wordpress theme} do you use?
As you can see, certain words are surrounded by {curly brackets}, and are seperated|by|these|lines. What happens isScrapebox randomly chooses one of these words, and uses it in the comment. This means that this comment will produce many varieties of outputs, for example:

Code:I like your blog, what background image do you use?
could be a possible comment from the above spintax.

One word in the curly brackets is randomly selected, and a comment is spun. This is the most vital part of making good, high success rate comments. But there's more!

Scrapebox is not a stupid tool, and can and will work in real time with the page it is on to create a unique comment. This can cleverly make the comment seem like it could have only been written by a person! A number of operators will take information from a website and use it within the comment. It's better that you read on to get a good understanding of this, it's hard to explain verbally. Here's the list of the comment operations you can use with Scrapebox:

%BLOGTITLE% – Replaced with the page <title> of the blog you’re commenting on.
%WEBSITE% – This will be replaced by your websites you have loaded in websites.txt (great for adding your links directly in your comments)
%BLOGURL% – This will be replaced by the blogs URL you’re commenting on (domain name).
%NAME% – This will be replaced by the users name from names.txt or if have your anchors setup to be used in the websites.txt, it will use an anchor from that.
%EMAIL% – Will be replaced by one of your emails in the emails.txt file.

(source of this list - http://www.scrapeboxhelp.com/new-pre...en-this-before)

But these will not guarantee your comment looks genuine! They must be used effectively and creatively. Writing a comment, using these operands, such as this:

Code:I love %BLOGTITLE%, it's so great!
Will not be effective! If the title of the page is something like "~~~ John Donahue's Magic blog! and&and Best blog in the Universe and&and - Blog Post #223 ~~~", this comment would seem strange, and noticeably automated:

Code:I love ~~~ John Donahue's Magic blog! and&and Best blog in the Universe and&and - Blog Post #223 ~~~, it's so great!
Instead, use them in a way that sounds intelligent and contributional to the topic.

For example, I could use it in the following way, which makes me sound much more human, and much more like a query rather than a spammy useless post:

Quote:I noticed the title of your blog (%BLOGTITLE%) doesn't seem to be very well optimized for what the blog post is about. Hit me back on %EMAIL% if you need any help with your on page SEO</acronym>! Would be glad to help such a great blogger

As you can see, I have used the blog's title, which makes me seem like a real person, as well as offered my email. But these are not the most important things - I came off very human in my post, and I offered him a helping hand, which seems to be beneficial for the author. Even if the blog's title IS optimized for the blog, the comment shows some uncertainty, and so it could be taken as pure ignorance towards SEO</acronym> as to why I've said that I don't think the title is optimised. The comment uses page-precise information and seems human and genuine. These are vital for successful comments, and by adding in spintax, this comment would probably have a very high success rate to non-AA blogs, snagging us even more backlinks. Let's move on.

More of footprints:
Footprints are the shizzle. I'm not gonna lie, this is where the magic should start, they are your initial way of filtering URLs from the start, and so they catch most of the shit you don't want.

So it is important you nkow ho to use them to your full advantage. One of the first things I realised when using footprints is that they utilise Google and other search engines. Therefore, they can utilise their commands. Here's some you should know how to use, and some advanced ones:

filetype: - this allows you to filter by file type, For example, and probably the way I most use these is to find XML sitemaps. Typing in filetype:xml makes sure all of the returned results are .xml; I use this by appending all my AA blog sites with "site:", and then use the filetype:xml as my custom footprint to scrape any sitemaps of my AA blogs - they will likely have more AA blogs within their sitemaps. Talking of "site:"..
site: - this only allows results to be those of the domain stated. For instance, "site:google.com" will only return pages from Google.com, eg. "google.com/maps" and "google.com/images".
inurl, intitle- self explanatory - inurl returns documents that only have that text string in their url (eg. inurl:"bunny-treats" could return "bunnies.net/bunny-treats" and "bunny-treats.com" as possible results". And intitle does the same, but for the title, who'd have thoguht huh.
An extensive list is available here (http://www.googleguide.com/advanced_operators.html), but these operands should get you by most tasks.

However, the fun doesn't stop here. There are many more operands to master and use! Utilise these to really fine tune your searching capabilities!

+ or - signs - these either make sure a word must be in the result (eg +bunnies means documents must contain the word bunnies exactly once), and - signs means pages must definitely not have that word or phrase in them (eg. -"justin beiber" will return results which definitely do not include the exact phrase justin beiber, who wants him anyway).
AND or OR - search terms with AND between them must include that search term and the one following it (so "+carpets AND +removal will return pages that contain the word carpets and the word removal). OR does.. well yeah you get it. They msut be in CAPS LOCK.
.. -here's one many people do not know - the double dot. When placed between two numbers, any integer value between them will be considered in the search quote, eg. using the footprint "2..50 comments" will return posts only containing the phrase x comments, where x is between 2 and 50. HINT HINT there btw, it's useful
~ - the wavey thing produces synonyms. So, the search term ~courage will find not just websites related to courage, but also to the synonyms of the word courage (eg valour, brave, fearless, etc).

If you use these intelligently, use can make some footprints that are near masterpieces. For instance:

Code:"2..50 comments" -"comments closed" "best buy" AND "coupons"
This footprint would find posts that only have between 2 and 50 comments (low number of comments = low number of OBL = better link juice), that do not have the words "comments closed"(ie more likely to have open comments), and can be targetted to meet blogs only containing the phrases "best buy" and coupons". So this would be very targetted towards best buy coupon blogs with open comments, and which have less than 50 comments. Impressive!

This is not as far as footprints go though. They have so many uses it's hard to list them all, but I do have a favourite that I'm surprised very few people mention, and this combines the power of relevancy with highly successful comments. By seaching for websites that have a certain topic that you have written about, let's say Warren Buffett, you could easily produce relevant comments. So, what defines a page as being about a certain topic or person or whatever? Well, the title of course, and I'm sure you can see where this is going:

Quote:intitle:"Warren Buffett"

And that's it. Honestly. Now save this and merge it to some business related keywords, marge any other buffer footprints you have (I would also use "powered by Wordpress", "Powered by Blogengine" etc. as examples), et voila. You can tailor a comment around Warren Buffett, for example, you could have the comment "Warren Buffett is the most fantastic businessman of the last 100 years. Great article!". It's relevant and precisely tailored to the pages you have scraped. And that means high approval rates.

How long did that take to think about? 5 seconds? The grand entrance of creative thinking huh. If you need a footprint, this thread lists all the possible wildcards, parameteres, etc. The only thing that prevents you from being able to get whatever you want with Scrapebox is lack of imagination. If you can't scrape it, you're not thinking hard enough.

Pinging and improving AA lists:
Pinging is great because, let's face it, AA blogs probably are not such good examples themselves of well kept websites. They are openly spammed, have thousands of OBL and usually very few IBL. So it's very predictable that many are not indexed by Google or are not frequently maintained (in the SEO</acronym> sense of things), and so they will not have greatly consistent value. Scrapebox is here to help.

Pinging is basically tell search engine spiders that the site has been updated, a cry for the spiders to come creeping. Therefore, it's a good idea to do this after you have made the backlinks. There's not much point telling the search engine that the page is updated if you have not been posting comments, what's the value of that to you?

Anyway, it's very basic and is very similar to link checking. Click the "Ping mode" radio button, and ping. This can take a couple of days though, as the search engines can take a while to crawl pages. This is also useful for pinging your own pages.

Grooming your list:
After you have finally got yourself a nice AA list, and you've posted and pinged for one website, you will more than likely keep this list handy for other uses. But remember, a lot can happen in a small period of time online, and websites that are there one day may be gone the next. Therefore, before you start a new campaign, or before you sell that 100K AA list that you found from a couple months back, it's a good idea to check and groom the list. These are also good grooming techniques to use on any AA list to keep it up to scratch.

The first useful tool is the Alive checker addon. Does what it says on the tin, very easy to use.

Now you probably know this but Google are very F****** smart, and the don't favour quick linkbuilding. They also do not seem to appreciate a shit torrent of spammed to hell blogs backlinking your page - it seems suspicious to them. So it's good toquality test your backlinks before you blast websites that you value. The best two ways of doing this are to search for blogs that do not have many comments, and to only use high PR</acronym> blogs.

There's two ways to search for blogs that have a low number of comments: first of all, there's the footprint that uses the .. operand (eg. 2..50 comments); secondly, there's the blog size checker. Going to Settings > Slow and Manual Blog Limits will limit the allowed size of URLs - it may say that it is only for slow and manual posting, but I have definitely noticed a difference with this on fast posting, and that difference correlates towards the lesser the size stated, the less the number of comment on the blogs I post to (but also the less number of blogs that are succesfully posted to, as a side effect of filtering out spammed blogs).

Finding high PR</acronym> blogs works very much in the way you'd expect. Before I post to the URLs, I check their PageRank; all of those that do not have a PR</acronym> of 1 or higher, I remove (HINT: a very very nifty way of doing this is to order the URLs by their PR</acronym>, and then to use some smart keyboard shortcuts I was taught in school - pressing CTRL + SHIFT + END will highlight from the current selection down to the bottom of the page. So highlight the first blog with PR0, and press those hotkeys - you have now selected all non-PR</acronym> blogs;right-click and remove). I then post and scrape other internal pages of these and usually drag some more PR</acronym> blogs from this.

These two little helpful snippets produce backlinks with high link juice. Now, many people will tell you that checking for nofollow and d0f0ll0w blogs is important, but to me I believe the opposite. It looks unnatural to have lots of d0f0ll0w blogs, so I accept any type of backlink without checking this. However, some wish to, and there's an addon for this. Look into it, but do not blast only d0f0ll0w blogs, please. Google will notice this, and they will penalise for this.

Other Scrapebox uses:
This is really up to you, Scrapebx can be used for anything related to needing to scrape URLs, but addons have allowed it to go a bit further (not too significantly off of the course of what the program does, but to some extent). By no means will this be extensive. There is no way I could possibly list the endless uses of this tool, it's designed to be too versatile for that. To give you an idea, I used it no less than 3 days ago to spin me different descriptions for Fiverr gigs on the multiple number of accounts I have, by using the commenter. It's all up to imagination.

However, I will give you a couple of examples to show you the extensive abilities of the tool. Do not take this as a thorough list though, Google for ideas, search this thread, read read read to really get ideas that are unique for Scrapebox - everybody thinks in different ways, everybody comes up wiht different ideas. So steal them! [Image: wink.gif]

TDNAM and Fake PR</acronym> checker:
Not a new concept, the TDNAM checker will check expiring domains. These could be aged and with PR</acronym>, and if you snatch these up (sometimes for very little), you have yourself a nice website. TheFake PR</acronym> checker can check for you that they are not faking the PR</acronym> that they claim, just as a security measure. Grabbing bargain PR</acronym>'d domains is not something I expected the tool to be capable of.

Whois addon:
The Whois addon lets you take the Who is information from harvested domains. Can you see where this is going? Oh yes, data mining webmasters. Finding out their Whois can allow you to do much more blackhat things, such as attack competition with "cease and desist" type letters and claims (not something I promote btw, it's pretty dickish, but hey it's done), or offering them services and advertising. Again, pretty neat! And because you scrape the domains, you can customise who's Whois you are data mining by targetting your audience with custom footprints.

Data mining:
This builds on from the above in a more sophisticated way, using another tool. Let's take for example something mentioned before the second edit on this post, somebody asked how they could data mine phone numbers. Well, it's pretty d*** simple to be frank. Let's say I want UK numbers from London. To my knowledge, they all start with 01895, AKA the London telephone number code, and all UK numbers have 11 digits. Now this takes a bit of thinking outside the box, but here's a little idea from me. Using the knowledge in this thread, you know what the .. (double dot) operand does; it states integers between these values. See where this is going yet? Bare with me..

So we know they must start with "020", and then they have 8 numbers following them. Therefore the footprint I would use is "1895000000..1895999999" +london; this would return any value between 1895 000 000 and 1895 000 000, and this includes numbers beginning with 0. Therefore, all numbers beginning with 01895 would be harvested from the web with this footprint. It would be better however to find a good number directory and scrape all it's pages, as phone nubmers can easily just be random duplicates.

The basic jist of all this is that it relies heavily on building a good custom footprint, by finding something generic about the data you're trying to mine. Maybe all websites that have Vaja iPad leather cases have their slogan, which is (idk) "The best leather for your iPad" (i'm not going into the slogan business anytime soon), in which case use the custom footprint that specifies that pages must have that text string. You get the idea!

Then, to data mine we must use a data mining program (I will not specify one as I don't want to advertise products I don't know about, but you can PirateBay yourself a (probably) good one for free) the URLs we have. Et voila, data mining with help from Scrapebox.

Whew, that took a long time to write. CTRL + A.. delete.. no no, I think I'd break down. Anyway, I hope this has been useful to you and I hope you've got the main point of this thread. This is all intermediate knowledge because the real advanced knowledge comes from using all of this together and creating advanced algorithms (algorithm = a line of procedures or processes) that harvests exactly what you want. That is the key to Scrapebox. Good luck, and if you find out more please share, and think outside the box!

..Well I thoguht I was done, but I read one last request for a little more on d0f0ll0w blogs and how to find them. I don't personally do this much, but I remember reading some stuff about it, and I'm providing this from research I'm doing right now. So..

D0f0ll0w:
I'm writing this mostly because I like typing 0s instead of os, and not for your benefit, please remember that. So expect overuse of the word d0f0ll0w.

Scrapebox searches through HTML as well as page text when used in the linkchecker mode, so it finds stuff we don't see. Check the page source of nofollow blogs, and you'll see many have this in their code:

HTML Code:rel="nofollow"
or
rel='nofollow'
So all we have to do is load that large list of harvested URLs into the link checker, and check for those two strings in the web pages. We remove those that have these, and voila, we have d0f0ll0w blogs. Some will have other ways of making their pages nofollow, so this list is not 100% d0f0ll0w.

Another key way to find d0f0ll0w blogs is with your first line of defense against shit you don't want; custom footprints. Many Wordpress plugins that are d0f0ll0w use plugins like Keyword Luv and Comment Luv, and may leave behind traces of evidence that blogs use them. There are many threads about this throughout the forum, and SweetFunny herself has already offered her own ideas about this, but from what I've Googled, most suggest this as a footprint for KeywordLuv:

Code:
“This site uses KeywordLuv. Enter YourName@YourKeywords in the Name field to take advantage.”

There's your footprint! Have fun ladies and gents [Image: smile.gif]

***phinuss*** · 01-13-2013, 06:04 PM

many thanks 4 share

***rcpkrc*** · 01-13-2013, 07:58 PM

thanks for sharing..

***Metalpriest*** · 01-14-2013, 10:20 AM

From BHW...anyways nice share buddy

***dannyedison*** · 02-01-2013, 01:04 AM

Hi, I read your topic its so good and amazing, this topic very helpful and useful for me, thanks for sharing.

***healix*** · 02-06-2013, 04:38 PM

interesting thread to follow with good explanation.. i may give it a try..