Best Blackhat Forum

Full Version: [GET] yell.com business scraper script
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone,

I wrote a simple script to scrape through the yell.com (UK yellow pages) website and pull out various details then enter them into a text file for future reference. It's useful if you want to keep a record of local businesses without having to keep going back to the website.

How to use:

1. Upload the yell_scraper folder to your server

2. Browse to yourdomain.com/yell_scraper/index.php?term=BUSINESSTYPE&location=AREATOSEARCH

3. The script will automatically scroll through the different pages every 20 seconds and enter all of the business details into the scrape.txt file. You will need to stop the script manually in this version otherwise it will just keep going.

4. What I like to do then is rename the scrape.txt file to something I'll remember then create a new, blank scrape.txt to do another search.

I wouldn't recommend attempting to scrape too many pages in a short space of time as I got temporarily blocked from the website for 2-3 days but if you take it slow you should be safe. You'll need CURL activated on your server and also javascript enabled.

When I get time I'll probably update the script to automatically stop when there's no more results and make it more automated where possible .

Enjoy!

Magic Button :
Code:
http://www.uploadseeds.com/download.php?uid=XA72ALWN

Virus Total:

Code:
https://www.virustotal.com/file/56038b793c07556eaaf5816ffbbafd60c16e647d1090d5cb3fdccff76aa49246/analysis/1352196559/
Dave, superb coding, if you did this all yourself WELL DONE!

Maybe some timestop or curl pause in seconds, as you say above would be great, if you could add in rotating proxy it would be super cool buddy!

> on the b4nning side of things with y3ll, just stick to 19 pages, per hit, per location and it should be fine... until it gets some timing|proxy addons :)
(11-06-2012 09:41 PM)supercharger Wrote: [ -> ]Dave, superb coding, if you did this all yourself WELL DONE!

Maybe some timestop or curl pause in seconds, as you say above would be great, if you could add in rotating proxy it would be super cool buddy!

> on the b4nning side of things with y3ll, just stick to 19 pages, per hit, per location and it should be fine... until it gets some timing|proxy addons :)


Hi Supercharger, thanks for your feedback :) It's the first proper script I've ever written, usually just design websites for a living. I tried using yellabot and it wouldn't work so I just wanted something simple to collect the data for mailing etc.

I think I got blocked before because I used it before I implemented the 20 second wait in jQuery so my IP was just accessing the data too fast, hasn't been a problem since the wait's been put in place. I'll have a look into rotating proxies at some point, not something I've done before.
thanks for share. i will try it.
Very nice share. Will come in handy when I am emailing the companies for my hosting services. $5 a month unlimited.
Could you possibly make one for yp.com? (the american version of yellow pages)
all the links are dead. can someone please share this?
re-up please !!
re up please
Reference URL's