Thursday, 23 September 2021

Extract text with cheerio

I'm trying to write a script to extract email id and name from this website. I tried the following snippet but it doesn't work.

   <!DOCTYPE html>
<html>

<head>
    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <title>foo</title>
    <meta name="description" content="">
    <meta name="viewport" content="width=device-width, initial-scale=1">
    <link rel="stylesheet" href="">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
</head>

<body>
    <div>
        <strong style="color: darkgreen;">Can read this tag</strong>

        <object id="external_page" type="text/html" data="https://aleenarais.com/buddy/" width="800px" height="600px"
            style="overflow:auto;border:5px ridge blue">
            <!-- I want to read tag values from this object -->
        </object>
    </div>

    <script>
        window.addEventListener('load', function () {
            const item = [];
            $('strong[style="color: darkgreen;"]').each(function () {
                item.push($(this).text())
            })
            console.log(item)

        })
       
    </script>
</body>

</html>

Is there any better way to do this? Or is it possible to convert the whole page into a string and extract the email using RegEx?



from Extract text with cheerio

No comments:

Post a Comment