I'm using ProtTransBertBFDEmbedder
embedding to covert my sequence into embedding format. It returns me an array (length: 1024) . My purpose is how can I find the original sequence again by using this 1024 length array. So how can I detokenize/reverse it?
!pip3 install -U bio_embeddings[all] > /dev/null
from bio_embeddings.embed import ProtTransBertBFDEmbedder
embedder_bertbfd = ProtTransBertBFDEmbedder()
embedding = embedder_bertbfd.embed("YSPNNIQHFHEEHLVHFVL")
reduce_per_protein = embedder_bertbfd.reduce_per_protein(embedding)
print(reduce_per_protein)
print(reduce_per_protein.shape)
Output (1024,)
How can I get this original sequence (YSPNNIQHFHEEHLVHFVL
) again by using reduce_per_protein
You can use this Original Colab to try
from How to detokenize Protein Embedding Method
No comments:
Post a Comment