Answer
Mar 02, 2013 - 02:34 AM
You can't. PHP code isn't capable of performing "actions" on web pages. PHP doesn't even know there's a button there to press. The site you're screen scraping is just one really big string as far as PHP is concerned. Above answerer is on the right track mentioning CURL, but that's a really vague answer.
What you need to do is not scrape the page with the form on it, but scrape the page that the form submits the data to. Let's say the form uses the GET method to submit data. This means the variables of the form end up in the URL. You know the URLs that look like
www.site.com?var1=value&var2=anotherva…
and everything after the ? is called the querystring. Being a PHP programmer you likely already know all about that.
But, some forms submit data via the POST method. That is, the variables are included within the HTTP request header, rather than in the URL. Typically, this is done when you need to submit large amounts of data, or binary data. The standard is, essentially, if the variables will affect what is displayed to the user (eg. it's a search term, a page number, etc), it should use the GET method. If the form performs an action (eg manipulating data in a database, or uploading a file) or contains sensitive data, then use POST.
Now for screen scraping, using the GET method is easy - scrape the page resulting from a constructed querystring. Eg if you want to scrape the page Google shows when you search for the term "puppies", have your scraper scrape the page http://www.google.com/search?q=puppies
This will be true for almost any page scraping you need if they are following the convention that GET be used if its value modifies the page content (eg pretty much any search engine, including eBay search).
But, if the page is only accessible by submitting POST data (eg. it can only be accessed by logging in via a POSTed form), then you need to construct a HTTP request that includes the variables (eg. in the case of a login page, the username and password). And this is where CURL comes in: whilst it's possible to build this request in pure PHP code, the CURL extension comes with libraries that make it easier and cleaner to do so, and is the most common way to do it.
Source: http://www.botguruz.com/products
Add New Comment