11-06-2012, 08:23 PM
Hi everyone,
I wrote a simple script to scrape through the yell.com (UK yellow pages) website and pull out various details then enter them into a text file for future reference. It's useful if you want to keep a record of local businesses without having to keep going back to the website.
How to use:
1. Upload the yell_scraper folder to your server
2. Browse to yourdomain.com/yell_scraper/index.php?term=BUSINESSTYPE&location=AREATOSEARCH
3. The script will automatically scroll through the different pages every 20 seconds and enter all of the business details into the scrape.txt file. You will need to stop the script manually in this version otherwise it will just keep going.
4. What I like to do then is rename the scrape.txt file to something I'll remember then create a new, blank scrape.txt to do another search.
I wouldn't recommend attempting to scrape too many pages in a short space of time as I got temporarily blocked from the website for 2-3 days but if you take it slow you should be safe. You'll need CURL activated on your server and also javascript enabled.
When I get time I'll probably update the script to automatically stop when there's no more results and make it more automated where possible .
Enjoy!
Virus Total:
I wrote a simple script to scrape through the yell.com (UK yellow pages) website and pull out various details then enter them into a text file for future reference. It's useful if you want to keep a record of local businesses without having to keep going back to the website.
How to use:
1. Upload the yell_scraper folder to your server
2. Browse to yourdomain.com/yell_scraper/index.php?term=BUSINESSTYPE&location=AREATOSEARCH
3. The script will automatically scroll through the different pages every 20 seconds and enter all of the business details into the scrape.txt file. You will need to stop the script manually in this version otherwise it will just keep going.
4. What I like to do then is rename the scrape.txt file to something I'll remember then create a new, blank scrape.txt to do another search.
I wouldn't recommend attempting to scrape too many pages in a short space of time as I got temporarily blocked from the website for 2-3 days but if you take it slow you should be safe. You'll need CURL activated on your server and also javascript enabled.
When I get time I'll probably update the script to automatically stop when there's no more results and make it more automated where possible .
Enjoy!
Magic Button :
Code:
http://www.uploadseeds.com/download.php?uid=XA72ALWN
Virus Total:
Code:
https://www.virustotal.com/file/56038b793c07556eaaf5816ffbbafd60c16e647d1090d5cb3fdccff76aa49246/analysis/1352196559/