Given a file with this structure:
- Single column lines are keys
- Non-zero values of the keys
For example:
abc
ef 0.85
kl 0.21
xyz 0.923
cldex
plax 0.123
lion -0.831
How to create a sparse matrix, csr_matrix
?
('abc', 'ef') 0.85
('abc', 'kl') 0.21
('abc', 'xyz') 0.923
('cldex', 'plax') 0.123
('cldex', 'lion') -0.31
I've tried:
from collections import defaultdict
x = """abc
ef 0.85
kl 0.21
xyz 0.923
cldex
plax 0.123
lion -0.831""".split('\n')
k1 = ''
arr = defaultdict(dict)
for line in x:
line = line.strip().split('\t')
if len(line) == 1:
k1 = line[0]
else:
k2, v = line
v = float(v)
arr[k1][k2] = v
[out]
>>> arr
defaultdict(dict,
{'abc': {'ef': 0.85, 'kl': 0.21, 'xyz': 0.923},
'cldex': {'plax': 0.123, 'lion': -0.831}})
Having the nested dict structure isn't as convenient as the scipy
sparse matrix structure.
Is there a way to read the file in the given format above easily into any of the scipy
sparse matrix object?
from How to load a sparse matrix efficiently?
No comments:
Post a Comment