Thursday 23 June 2022

How to detokenize Protein Embedding Method

I'm using ProtTransBertBFDEmbedder embedding to covert my sequence into embedding format. It returns me an array (length: 1024) . My purpose is how can I find the original sequence again by using this 1024 length array. So how can I detokenize/reverse it?

!pip3 install -U bio_embeddings[all] > /dev/null

from bio_embeddings.embed import ProtTransBertBFDEmbedder

embedder_bertbfd = ProtTransBertBFDEmbedder()

embedding = embedder_bertbfd.embed("YSPNNIQHFHEEHLVHFVL")
reduce_per_protein = embedder_bertbfd.reduce_per_protein(embedding)

print(reduce_per_protein)


print(reduce_per_protein.shape)

Output (1024,)

How can I get this original sequence (YSPNNIQHFHEEHLVHFVL) again by using reduce_per_protein

You can use this Original Colab to try



from How to detokenize Protein Embedding Method

No comments:

Post a Comment