Thursday, 20 May 2021

Download website locally without Javascript using puppeteer

I am trying to download a website as static, I mean without JS, only HTML & CSS.

I've tried many approaches yet some issues still present regarding CSS and Images.

A snippet

const puppeteer = require('puppeteer');
const {URL} = require('url');
const fse = require('fs-extra');
const path = require('path');


(async (urlToFetch) => {

    const browser = await puppeteer.launch({
        headless: true,
        slowMo: 100
    });

    const page = await browser.newPage();
    await page.setRequestInterception(true);

    page.on("request", request => {
      if (request.resourceType() === "script") {
        request.abort()
      } else {
        request.continue()
      }
    })
    page.on('response', async (response) => {
        const url = new URL(response.url());
        let filePath = path.resolve(`./output${url.pathname}`);
        if(path.extname(url.pathname).trim() === '') {
            filePath = `${filePath}/index.html`;
        }
        await fse.outputFile(filePath, await response.buffer());
        console.log(`File ${filePath} is written successfully`);
    });

    await page.goto(urlToFetch, {
        waitUntil: 'networkidle2'
    })

    setTimeout(async () => {
        await browser.close();
    }, 60000 * 4)


})('https://stackoverflow.com/');

I've tried using

content = await page.content();
fs.writeFileSync('index.html', content, { encoding: 'utf-8' });

As well as, I download it using CDPSession.

I've tried it using website-scraper-puppeteer

So what is the best approach to come to a solution where I provide a website link, then It downloads it as static website.



from Download website locally without Javascript using puppeteer

No comments:

Post a Comment