Friday 21 August 2020

Iterate over multiple payloads and take multiple screenshots with Puppeteer AWS Lambda

I am currently using the following Puppeteer AWS Lambda Layer to scrape 30 URLs and create and save screenshots in S3. At the moment, I send 30 individual payloads therefore running 30 AWS Lambda functions.

Each JSON payload contains a URL and an image file name that are sent every 2-3 seconds to API Gateway via a POST request. The first 6 or 9 Lambda functions in the list seem to run fine, then they start to fail with Navigation failed because browser has disconnected! as reported in AWS Cloudwatch.

So I am looking for an alternative solution, How could I edit the code below to batch screenshot a set of 30 URLs, by handling a single array of JSON payloads? (eg. For loop etc)

Here is my current code for generating individual AWS Lambda screenshots and sending to S3:

// src/capture.js

// this module will be provided by the layer
const chromeLambda = require("chrome-aws-lambda");

// aws-sdk is always preinstalled in AWS Lambda in all Node.js runtimes
const S3Client = require("aws-sdk/clients/s3");

process.setMaxListeners(0) // <== Important line - Fix MaxListerners Error

// create an S3 client
const s3 = new S3Client({ region: process.env.S3_REGION });

// default browser viewport size
const defaultViewport = {
  width: 1920,
  height: 1080

// here starts our function!
exports.handler = async event => {

  // launch a headless browser
  const browser = await chromeLambda.puppeteer.launch({
    args: chromeLambda.args,
    executablePath: await chromeLambda.executablePath,
  console.log("Event URL string is ", event.url)

  const url = event.url;
  const domain = (new URL(url)).hostname.replace('www.', '');

  // open a new tab
  const page = await browser.newPage();

  // navigate to the page
  await page.goto(event.url);

  // take a screenshot
  const buffer = await page.screenshot()

  // upload the image using the current timestamp as filename
  const result = await s3
      Bucket: process.env.S3_BUCKET,
      Key: domain + `.png`,
      Body: buffer,
      ContentType: "image/png",
      ACL: "public-read"

  // return the uploaded image url
  return { url: result.Location };

Current Individual JSON Payload


