Friday, 10 December 2021

Merge two files and add computation and sorting the updated data in python

I need help to make the snippet below. I need to merge two files and performs computation on matched lines

I have oldFile.txt which contains old data and newFile.txt with an updated sets of data.

I need to to update the oldFile.txt based on the data in the newFile.txt and compute the changes in percentage. Any idea will be very helpful. Thanks in advance

from collections import defaultdict
num = 0
data=defaultdict(int)
with open("newFile.txt", encoding='utf8', errors='ignore') as f:
    for line in f:
        grp, pname, cnt, cat = line.split(maxsplit=3)
        data[(pname.strip(),cat.replace('\n','').strip(),grp,cat)]+=int(cnt)
        
sorteddata = sorted([[k[0],v,k[1],k[2]] for k,v in data.items()], key=lambda x:x[1], reverse=True)

for subl in sorteddata[:10]:
    num += 1
    line = " ".join(map(str, subl))
    print ("{:>5} -> {:>}".format(str(num), line))

    with open("oldFile.txt", 'a', encoding='utf8', errors='ignore') as l:
        l.write(" ".join(map(str, subl)) + '\n')

oldFile.txt

 #col1             #col2        #col3  #col4
 1,396 c15e89f2149bcc0cbd5fb204   4    HUH_Token (HUH)                      
   279 9e4d81c8fc15870b15aef8dc   3    BABY BNB (BBNB)                
   231 31b5c07636dab8f0909dbd2d   6    Buff Unicorn (BUFFUN...)             
   438 1c6bc8e962427deb4106ae06   8    Charge (Charge)                      
 2,739 6ea059a29eccecee4e250414   2    MAXIMACASH (MAXCAS...)

newFile.txt #-- updated data with additional lines not found in oldFile.txt

 #col1             #col2        #col3  #col4
 8,739 6ea059a29eccecee4e250414   60   MAXIMACASH (MAXCAS...)
   138 1c6bc8e962427deb4106ae06   50   Charge (Charge)                      
   860 31b5c07636dab8f0909dbd2d   40   Buff Unicorn (BUFFUN...)             
   200 9e4d81c8fc15870b15aef8dc   30   BABY BNB (BBNB)    #-- not found in the oldFile.txt
    20 5esdsds2sd15870b15aef8dc   30   CharliesAngel (CA)            
 1,560 c15e89f2149bcc0cbd5fb204   20   HUH_Token (HUH)     

Need Improvement: #-- With additional columns (col5, col6) and sorted based on (col3) values

 #col1             #col2        #col3      #col4                #col5 (oldFile-newFile)   #col6 (oldFile-newFile)
 8,739 6ea059a29eccecee4e250414  62   MAXIMACASH (MAXCAS...)   2900.00 % (col3 2-60)    219.06 % (col1 2,739-8,739) 
   138 1c6bc8e962427deb4106ae06  58   Charge (Charge)           625.00 % (col3 8-50)    -68.49 % (col1   438-138)      
   860 31b5c07636dab8f0909dbd2d  46   Buff Unicorn (BUFFUN...)  566.67 % (col3 6-40)    272.29 % (col1   231-860)
   200 9e4d81c8fc15870b15aef8dc  33   BABY BNB (BBNB)           900.00 % (col3 3-30)    -28.32 % (col1   279-200) 
    20 5esdsds2sd15870b15aef8dc  30   CharliesAngel (CA)          0.00 % (col3 0-30)     20.00 % (col1   0-20) 
 1,560 c15e89f2149bcc0cbd5fb204  24   HUH_Token (HUH)           400.00 % (col3 4-20)     11.75 % (col1 1,396-1,560)


from Merge two files and add computation and sorting the updated data in python

No comments:

Post a Comment