Saturday 31 July 2021

Perform a single task on (220 choose 5) combination in a dataframe

I have data with 220 rows. Initially choose 5 rows randomly and apply an operation to them. Now I have to perform a similar task on (220 choose 5) combination(That means 4102565544 data frames with 5 rows).Python is hitting memory issues when I use list(itertools.combinations(list(range(0,222)),5)) and applying loop on each data frame with 5 rows is too much time-consuming. Below I have attached my data as a dictionary and I have replicated my problem set.

Data

df={'Name': {0: '004737367A89', 1: '006D631822DA', 2: '007FEEEF095D', 3: '015EA8035B5D', 4: '0168C7824FB3', 5: '02236A01C769', 6: '026A35601C28', 7: '03939D273F7D', 8: '05BE3A6A6344', 9: '0735B7F399C8', 10: '075F90DEDAAC', 11: '079D00DB87B6', 12: '08321FDDA475', 13: '084147D3DE00', 14: '08693ADAF466', 15: '08EE69FF7C9B', 16: '0996F835D14B', 17: '0A061E004649', 18: '0BDADD43DF2D', 19: '0D580A803B2C', 20: '11DCF10E0F76', 21: '1241EC5AC73C', 22: '150595F71A7A', 23: '160D7B436114', 24: '1805135DA1B7', 25: '18D26316EA11', 26: '1B744908A7E9', 27: '1CB417508187', 28: '1EA75E92E370', 29: '1F1B4DA40CE4', 30: '209D86760A9C', 31: '228BC53DB280', 32: '235D0F9A5E0E', 33: '2452814BCC90', 34: '2923CA6C88B1', 35: '2CB60EF30BAA', 36: '2CD7BD1FC443', 37: '2D03FAC79D60', 38: '2F34FFA27A7C', 39: '2F8F282FDCEE', 40: '3.03891E+11', 41: '31B4A8BDBA5F', 42: '34EC4E7D8E15', 43: '3695444ADBFF', 44: '370F1D138305', 45: '3826943C86AF', 46: '39F11738A59D', 47: '39F2FF0A2E05', 48: '3A8B6F61E548', 49: '3B256CE48F60', 50: '3C09C2C73655', 51: '3D6858B43366', 52: '3D94154B544C', 53: '3DDD62DDF6C4', 54: '3EBDAFB8E7EE', 55: '408B3D0EAF85', 56: '40ED913F4BB6', 57: '43380E855E4E', 58: '44C8332521DE', 59: '4817047FFAC1', 60: '481896BC4240', 61: '49263E82B2B8', 62: '4AF76F8D6BBB', 63: '4BC2016E5222', 64: '4CCF2D4FF5EC', 65: '4E9750936994', 66: '4F61F6A5588D', 67: '505F16F25595', 68: '50756E6D3B32', 69: '50E1E1F5F31D', 70: '516B4C9C3F45', 71: '52608C24A09E', 72: '52B2EBC622A6', 73: '539B8164BD32', 74: '5462E581A288', 75: '55149C502434', 76: '55D8B9306A65', 77: '5808368AFA0A', 78: '58F6BA305E2A', 79: '58FE73C690DA', 80: '596857EDC73F', 81: '599DF7F0CB41', 82: '59F1F27E85F4', 83: '5AE11428142F', 84: '5B27B574EA5B', 85: '5D3FA98DDD61', 86: '5DE6CFC7E471', 87: '5DF85F5EA21C', 88: '5EA87B759595', 89: '5EAA2E0BEAA2', 90: '5EAFEBA99A30', 91: '5EFC03FC84DF', 92: '5F6A8D18E234', 93: '6008B6021BAA', 94: '63765F49AC32', 95: '64099F419232', 96: '652349DF5059', 97: '6551FB43EE37', 98: '6613C12B0634', 99: '66C312BFDFD6', 100: '66D964D2E1D0', 101: '6790A35547E2', 102: '67A2603888E5', 103: '6991A9411704', 104: '6CFC28C22836', 105: '6D5DAED137C9', 106: '6EBB87FAD022', 107: '6EF1206450AF', 108: '70C74C90C3E2', 109: '71168B36CCFD', 110: '7177392ADD8B', 111: '74AF6AA78FB9', 112: '759CFBB05E2F', 113: '771E8EA5A4C7', 114: '7740740D57BE', 115: '7926DFB85C8B', 116: '7A6091203844', 117: '7C23D53CE5DD', 118: '7C4ED1AA239F', 119: '7E0C21E0010F', 120: '80E9914A0BF8', 121: '82867FEAF519', 122: '82C735B34C85', 123: '85EF1FFBAC47', 124: '872F22A4D018', 125: '87C72000AAB2', 126: '8978B70E88C3', 127: '8ADEF3F17E42', 128: '8B5F4EE22DF5', 129: '8B757ED14D67', 130: '8E0C10341AA8', 131: '90289E4E68F6', 132: '9259DEED6524', 133: '92754763710B', 134: '92B164934E01', 135: '96DBA1873BFF', 136: '97E7144ECEF9', 137: '9AE4EB9DF4F0', 138: '9CAC53908EE1', 139: '9F31161E7BDF', 140: 'A090B8A939CB', 141: 'A12E89E87CB5', 142: 'A31CA572620F', 143: 'A4263AA51F9A', 144: 'A540D6615FA0', 145: 'A56804CE6BAF', 146: 'A60313C4FC06', 147: 'A612803F81BA', 148: 'A77E12FFA171', 149: 'A87B6602E946', 150: 'AADE28D99973', 151: 'AEB37BE9DBFF', 152: 'B04ACAB6A193', 153: 'B41004303288', 154: 'B454AAFDA2AF', 155: 'B701B4E2F2BF', 156: 'B7EF621EC0AE', 157: 'B9084B8E2378', 158: 'BA8C4B0E8378', 159: 'BBD01B2776A8', 160: 'BE5377A632DF', 161: 'BE8D95B26DEE', 162: 'BEEB25AC3BB3', 163: 'BF585F42B5F6', 164: 'BF889C615B6A', 165: 'C1934D47BC69', 166: 'C31934680839', 167: 'C43F40D3D865', 168: 'C4955BCC1F0C', 169: 'C4F03F22DE3E', 170: 'C5BC9B26046C', 171: 'C5D2BE738C56', 172: 'C762399CAF83', 173: 'C7B9B444D117', 174: 'C943B9F6FDDF', 175: 'C9C7138CAF65', 176: 'CB66BE597E30', 177: 'CC7DA44E344E', 178: 'CE81A7E65B6B', 179: 'CE971F87D0B5', 180: 'CECC8C16ECAB', 181: 'D111860A3AC1', 182: 'D159C02757AE', 183: 'D33BB70DCA77', 184: 'D386F0671D80', 185: 'D43B801CCCA9', 186: 'D465BE3D4A94', 187: 'D49E08EEC650', 188: 'D4BD5D5DD7E4', 189: 'D64F455CB56A', 190: 'D6D99F00B58B', 191: 'D7774555E609', 192: 'D7CDFD417C01', 193: 'DBF16B9938A4', 194: 'DCC2FA798C09', 195: 'DE6E090827B8', 196: 'E25F5A55A4D8', 197: 'E5A82C4E86C7', 198: 'E5AC30A8337B', 199: 'E6EBC0EFBF18', 200: 'EB9BBBA2FEB9', 201: 'EC8A20CAC153', 202: 'EC8EA44FDACD', 203: 'ECB284CBDDA7', 204: 'EED0F8B3B968', 205: 'EF4B578B0902', 206: 'F13986786A7A', 207: 'F17F0E81FC73', 208: 'F34CFBCB7A28', 209: 'F396C1E8BF59', 210: 'F40ED923507F', 211: 'F87A72CF9671', 212: 'F8CDE15A2FCB', 213: 'F9032EE897A9', 214: 'FAC08B5AA521', 215: 'FB3071FBA3BC', 216: 'FC6435726337', 217: 'FD5F2F4D32D7', 218: 'FD6E925243AA', 219: 'FDA85734568D', 220: 'FF18E7D41654', 221: 'FFEC03758A05'}, 'Code': {0: 375000, 1: 275000, 2: 225000, 3: 275000, 4: 175000, 5: 275000, 6: 295000, 7: 525000, 8: 175000, 9: 135000, 10: 275000, 11: 250000, 12: 275000, 13: 350000, 14: 225000, 15: 175000, 16: 395000, 17: 275000, 18: 225000, 19: 195000, 20: 225000, 21: 175000, 22: 135000, 23: 225000, 24: 250000, 25: 225000, 26: 250000, 27: 295000, 28: 275000, 29: 250000, 30: 275000, 31: 250000, 32: 295000, 33: 195000, 34: 275000, 35: 195000, 36: 275000, 37: 175000, 38: 525000, 39: 225000, 40: 350000, 41: 135000, 42: 295000, 43: 195000, 44: 495000, 45: 495000, 46: 275000, 47: 375000, 48: 295000, 49: 250000, 50: 250000, 51: 225000, 52: 175000, 53: 250000, 54: 475000, 55: 135000, 56: 350000, 57: 225000, 58: 250000, 59: 275000, 60: 225000, 61: 295000, 62: 225000, 63: 250000, 64: 225000, 65: 250000, 66: 135000, 67: 175000, 68: 295000, 69: 175000, 70: 295000, 71: 295000, 72: 225000, 73: 225000, 74: 365000, 75: 295000, 76: 225000, 77: 195000, 78: 225000, 79: 225000, 80: 225000, 81: 295000, 82: 135000, 83: 195000, 84: 295000, 85: 550000, 86: 250000, 87: 225000, 88: 275000, 89: 225000, 90: 295000, 91: 250000, 92: 250000, 93: 225000, 94: 175000, 95: 250000, 96: 175000, 97: 350000, 98: 175000, 99: 275000, 100: 295000, 101: 225000, 102: 225000, 103: 195000, 104: 175000, 105: 350000, 106: 175000, 107: 275000, 108: 275000, 109: 175000, 110: 195000, 111: 225000, 112: 275000, 113: 375000, 114: 135000, 115: 135000, 116: 395000, 117: 295000, 118: 195000, 119: 275000, 120: 195000, 121: 375000, 122: 195000, 123: 275000, 124: 275000, 125: 175000, 126: 325000, 127: 275000, 128: 250000, 129: 135000, 130: 175000, 131: 195000, 132: 550000, 133: 225000, 134: 250000, 135: 350000, 136: 495000, 137: 275000, 138: 135000, 139: 175000, 140: 175000, 141: 225000, 142: 175000, 143: 275000, 144: 325000, 145: 295000, 146: 275000, 147: 275000, 148: 175000, 149: 350000, 150: 550000, 151: 250000, 152: 350000, 153: 325000, 154: 175000, 155: 250000, 156: 175000, 157: 250000, 158: 275000, 159: 225000, 160: 195000, 161: 175000, 162: 225000, 163: 275000, 164: 225000, 165: 135000, 166: 250000, 167: 225000, 168: 175000, 169: 275000, 170: 175000, 171: 275000, 172: 175000, 173: 195000, 174: 325000, 175: 275000, 176: 295000, 177: 350000, 178: 350000, 179: 425000, 180: 225000, 181: 135000, 182: 150000, 183: 135000, 184: 350000, 185: 225000, 186: 375000, 187: 175000, 188: 295000, 189: 195000, 190: 350000, 191: 175000, 192: 225000, 193: 195000, 194: 195000, 195: 350000, 196: 250000, 197: 175000, 198: 175000, 199: 395000, 200: 175000, 201: 225000, 202: 175000, 203: 350000, 204: 175000, 205: 250000, 206: 375000, 207: 275000, 208: 525000, 209: 175000, 210: 375000, 211: 295000, 212: 275000, 213: 175000, 214: 325000, 215: 250000, 216: 195000, 217: 275000, 218: 250000, 219: 135000, 220: 195000, 221: 135000}}

What I want is to select random 5 rows first

import random
import pandas as pd 
data = pd.DataFrame(df)
inputt=pd.DataFrame({"NameID":data1.Name[random.sample(range(10, 30), 5)]})
for i in range(len(inputt.index)):
      D1 = data[data["Name"] == inputt["NameID"].iloc[i]]
      D2 =  D2.append(D1)

values=D2.Code       
real_sum=values.sum()

and then I want to perform the same operation on the rest of the rows in the data frame and figure which data frame with such rows has sum less than the real_sum.Is there any simulation technique I can apply here or anything else ?

Thanks



from Perform a single task on (220 choose 5) combination in a dataframe

No comments:

Post a Comment