07-25-2013, 02:08 AM
PHP Web Crawler is a software that searches for
links in the web. It stores the links and some extra data in a database
and shows them as HTML output.
Features:
thanks
links in the web. It stores the links and some extra data in a database
and shows them as HTML output.
Features:
- The crawler can be run as multiple instances
- It can be run by a cron job.
- Crawl results are saved in a MySQL database. It generates the table "urls" to store the crawls.
- For each url it saves the url of source, the url of the destiny and the anchor text.
- Validates the urls via a regular expression. It
avoids the links to static data into the site. Including the unnecessary
media files. Despite this I can't ensure that the crawler avoids all
the media files. That be more complex to validate.
thanks