I've written a script in php to scrape the title of a product located at the top right corner in a webpage. The title is visible as Gucci.
when I execute my below script, it gives me an error Notice: Trying to get property 'plaintext' of non-object in C:\xampp\htdocs\runcode\testfile.php on line 16.
How can I get only the name Gucci from that webpage?
I've written so far:
<?php
include "simple_html_dom.php";
$link = "https://www.farfetch.com//bd/shopping/men/gucci-rhyton-web-print-leather-sneaker-item-12964878.aspx";
function get_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('User-Agent: Mozilla/5.0',));
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$itemTitle = $dom->find('#bannerComponents-Container [itemprop="name"]', 0)->plaintext;
echo "{$itemTitle}";
}
get_content($link);
?>
Btw, the selector I've used within the script is flawless.
To clear the confusion I've copied a chunk of html elements from the page source which neither generats dynamically nor javascript encrypted so I don't find any reason for curl not to be able to handle that:
<div class="cdb2b6" id="bannerComponents-Container">
<p class="_41db0e _527bd9 eda00d" data-tstid="merchandiseTag">New Season</p>
<div class="_1c3e57">
<h1 class="_61cb2e" itemProp="brand" itemscope="" itemType="http://schema.org/Brand">
<a href="/bd/shopping/men/gucci/items.aspx" class="fd9e8e e484bf _4a941d f140b0" data-trk="pp_infobrd" data-tstid="cardInfo-title" itemProp="url" aria-label="Gucci">
<span itemProp="name">Gucci</span>
</a>
</h1>
</div>
</div>
Please check out the below image to recognize the title I've already marked by a pencil. 
from Trouble getting the name of a product from a webpage
No comments:
Post a Comment