The snippet works already but I need help in filtering some duplicates results.
Issue #1: After running the script for few minutes, it displays duplicate result / the same results
Issue #2: Sometimes, it misses some data. Im not sure if this is normal
Goal #1: Eliminate duplicates / dont process if it has been read already
Goal #2: Possibly read all data in succession/continuity. (10205401, 10205402, 10205403, 10205404 and so on)
from bs4 import BeautifulSoup
from time import sleep
import re, requests
trim = re.compile(r'[^\d,.]+')
url = "https://bscscan.com/txs?a=0x10ed43c718714eb63d5aa57b78b54704e256024e&ps=100&p=1"
baseurl = 'https://bscscan.com/tx/'
header = {"User-Agent": "Mozilla/5.0"}
scans = 0
previous_block = 0
while True:
scans += 1
reqtxsInternal = requests.get(url,header, timeout=2)
souptxsInternal = BeautifulSoup(reqtxsInternal.content, 'html.parser')
blocktxsInternal = souptxsInternal.findAll('table')[0].findAll('tr')
print (" -> Whole Page Scanned: ", scans)
for row in blocktxsInternal[1:]:
txnhash = row.find_all('td')[1].text[0:]
txnhashdetails = txnhash.strip()
block = row.find_all('td')[3].text[0:]
if float(block) > float(previous_block):
previous_block = block
value = row.find_all('td')[9].text[0:]
amount = trim.sub('', value).replace(",", "")
transval = float(amount)
if float(transval) >= 0 and block == previous_block:
print (" Processing data -> " + str(txnhashdetails)[60:] + " " + str(block) + " " + str(transval))
else:
pass
sleep(1)
Current Output: (After few minutes of running the script)
-> Whole Page Scanned: 14
Processing data -> 8490f9 10205401 0.0
Processing data -> 31f486 10205401 0.753749522929516
Processing data -> 180ff9 10205401 0.0011
-> Whole Page Scanned: 15 <--- duplicate reads/data
Processing data -> 8490f9 10205401 0.0
Processing data -> 31f486 10205401 0.753749522929516
Processing data -> 180ff9 10205401 0.0011
-> Whole Page Scanned: 16 > <--- just fine
Processing data -> 836486 10205402 0.0345
Processing data -> d05a8a 10205402 1.37
Processing data -> 0a035d 10205402 0.3742134
-> Whole Page Scanned: 17 <--- missed one (10205403)
Processing data -> e9d7b7 10205404 10.10
Processing data -> 9079c9 10205404 1.09
Processing data -> f8a8a0 10205404 100.2
from How to elimate duplicate data being read from a url that refreshes in python beautifulsoup
No comments:
Post a Comment