I've written a script in php to scrape a title visible as hair fall shamboo from a webpage. When I execute my below script, I get the following error:
Notice: Trying to get property 'nodeValue' of non-object in C:\xampp\htdocs\runcode\testfile.php on line 16.
Script I've tried with:
<?php
function get_content($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_exec($ch);
$htmlContent = curl_exec($ch);
curl_close($ch);
return $htmlContent;
}
$link = "https://www.purplle.com/search?q=hair%20fall%20shamboo";
$xml = get_content($link);
$dom = @DOMDocument::loadHTML($xml);
$xpath = new DOMXPath($dom);
$title = $xpath->query('//h1[@class="br-hdng"]/span')->item(0)->nodeValue;
echo "{$title}";
?>
My expected output is:
hair fall shamboo
Although the xpath I used within my above script seems to be correct, I pasted here the relevant portion of html elements within which the title can be found:
<h1 _ngcontent-c0="" class="br-hdng"><span _ngcontent-c0="" class="pr dib">hair fall shamboo<!----></span></h1>
PostScript: The title I wish to parse gets loaded dynamically. As I'm new to php I don't understand whether the way I tried is accurate. If not what I should do then?
The following are the scripts I've created using two different languages and found them working like magic.
I got success using javascript:
const puppeteer = require('puppeteer');
function run () {
return new Promise(async (resolve, reject) => {
try {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://www.purplle.com/search?q=hair%20fall%20shamboo");
let urls = await page.evaluate(() => {
let items = document.querySelector('h1.br-hdng span');
return items.innerText;;
})
browser.close();
return resolve(urls);
} catch (e) {
return reject(e);
}
})
}
run().then(console.log).catch(console.error);
Again, I got success using python:
import requests_html
with requests_html.HTMLSession() as session:
r = session.get('https://www.purplle.com/search?q=hair%20fall%20shamboo')
r.html.render()
item = r.html.find("h1.br-hdng span",first=True).text
print(item)
What's wrong with php then?
from Trouble fetching some title from a webpage
No comments:
Post a Comment