Friday 23 July 2021

How to scrape QlikView tables using Nodejs?

This Brazilian government website features wage data for judges from various instances and courts. I would like to download all the tables, but the data referring to the tables are not in the html that I get as a response when I use request.

To work around this issue, I used puppeteer and cheerio to open a browser, wait for the table to load, and then use a JQuery selector and pull the data. This is my code:

const puppeteer = require("puppeteer");
const cheerio = require("cheerio");


const main = async () => {
    const browser = await puppeteer.launch({ headless: false});
    const page = await browser.newPage();
    await page.goto("https://paineis.cnj.jus.br/QvAJAXZfc/opendoc.htm?document=qvw_l%2FPainelCNJ.qvw&host=QVS%40neodimio03&anonymous=true&sheet=shPORT63Relatorios");
    await sleep(10*1000);
    const html = await page.content();
    const $ = cheerio.load(html);
    console.log($(".injected").text())

}

async function sleep(miliseconds) {
    return new Promise(resolve => setTimeout(resolve, miliseconds));
}

main();

The problem is that the table I get as an answer is incomplete, with few lines and incomplete cells:

P63_CE_TRIBUNALCNJTribunalMagistradoMês/Ano Ref.CNJADHAILTON LACET CORREIA PORTO12/2018ADRIANA FRANCO MELO MACHADO02/202103/202104/2021ADRIANA LINS DE OLIVEIRA BEZERRA12/2018ADRIANO DA SILVA ARAUJO08/201909/201910/201911/201912/201901/202002/202003/202004/202005/202006/202007/202008/202009/202010/202011/202012/202001/202102/202103/202104/2021ALESSANDRA VARANDAS PAIVA MA...12/2018ALEXANDRE CHINI NETO09/201810/2018Subsídio (R$)Direitos Pessoais (1)Indenizações (2)Direitos Eventuais (3)Total de Rendimentos (4)Previdência Pública (5) (R$)Imposto de Renda (6) (R$)Descontos Diversos (7) (R$)Retenção por Teto Constitucional (8) (R$)Total de Descontos (9)Rendimento Líquido (10)Remuneração do órgão de origem (11) (R$)Diárias (12) (R$)0,000,000,00463,16463,160,000,000,000,000,00463,160,000,001.698,450,000,000,001.698,450,000,000,000,000,001.698,4533.689,110,003.639,540,0067.378,220,0071.017,760,00191,130,000,00191,1370.826,6333.689,110,003.639,540,000,000,003.639,540,00191,130,000,00191,133.448,4133.689,110,000,000,000,004.631,614.631,610,001.272,050,000,001.272,053.359,560,000,003.371,830,000,000,003.371,830,00150,970,000,00150,973.220,8632.004,710,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,719.100,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,717.700,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,711.750,005.323,940,000,002.218,317.542,250,00618,290,000,00618,296.923,9632.004,719.100,005.323,940,000,002.661,977.985,910,00594,720,000,00594,727.391,1932.004,715.600,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,0032.004,710,0037.328,650,00594,720,000,00594,7236.733,9332.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,710,005.323,940,004.158,850,009.482,790,00594,720,000,00594,728.888,0732.004,710,005.323,940,004.158,850,009.482,790,00673,560,000,00673,568.809,2332.004,710,005.323,940,004.158,85286,699.769,480,00673,560,000,00673,569.095,9232.004,710,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,719.100,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,000,004.436,629.760,560,001.189,440,000,001.189,448.571,1232.004,714.550,005.323,940,000,002.661,977.985,910,00594,720,000,00594,727.391,1932.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,005.323,940,000,000,005.323,940,00594,720,000,00594,724.729,2232.004,714.550,000,000,000,004.631,614.631,610,001.272,050,000,001.272,053.359,560,000,003.127,300,000,000,003.127,300,00161,200,000,00161,202.966,1028.947,550,003.127,300,000,000,003.127,300,00114,300,000,00114,303.013,0028.947,5511.900,00

I tried several variations of the JQuery selector without success.

I read that I could communicate with QlikView using enigmajs and then make my request. However, it turns out that not even the most basic example in the documentation worked correctly on the site I'm using. Now I'm stuck.

How do I retrieve data from a table of a QlikView?

EDIT: Unfortunately, this specific URL doesn't appears to work for some countries outside Brazil. However, I think that any site with a QlikView table can be used as an example by the answer. The author of this (python) question ran in the same problem with other site. Perhaps his url doesn't have the same access problem.



from How to scrape QlikView tables using Nodejs?

No comments:

Post a Comment