Saturday 13 March 2021

What is the best method to *classify* series of data?

I'm working on a project for CCG (such as HearthStone, Yu-Gi-Oh!, etc.) which does classify (or give labels) deck type when user uploads their own deck.

let's assume every cards are having their own IDs and the ID would one of alphabet character (and It should be unique). now you can assume that there are 26 cards only.

and every player can make their own deck but you should keep in mind that basic form of deck structure are already given. players can just modify small amount of cards with their taste.

let me give you guys a example:

Assume that there are many basic known deck types.
every each alphabet character means card ID and order is not considered.
every deck should have 13 cards.

e.g.)

A A B B B C C C C D E E F - This deck called 'Foo'.
C C C C F F G G E E E Z Z - This deck called 'Bar'.

// and so on...

in this circumstance, I had come up that the problem is occurred here:

If user can modify small amount of cards on their deck while maintaining basic structure of their deck type, how can we make computer can classify given deck?

so I had to decide to use Similarity Measuring which just compare between given deck and pre-defined deck and calculate similarity. so we can assume which type given deck has when it have high similarity. (using Jaccard index or some else algorithm)

but the only problem is Using deck pre-defined by developer. we've assumed that there are just 26 cards above but in real case, there are like 10k~ cards and deck type are newly created in every 3~4 months. I don't even know which decks are available out there. there are literally too many deck types to pre-define on my own. :')

and I know above method is called Supervised Learning. I think Supervised Learning is quite challenging for this case since when new deck type have created by user community, Program wouldn't classify the type of given deck correctly when given deck type is newly created recently.

that's why we should define it every single time new deck type has released. I think that's the pain in the bum. totally waste of time/effort.

so finally I've decided to use Unsupervised Learning method which don't need pre-defined data for classification. and we know that a deck is just sequence of numeric data (Card ID). all we have to do is this:

  1. Somehow convert deck (or array of Card IDs) into euclidean space
  2. make groups (or cluster) by using K-Means or whatever algorithm
  3. Now we can just label every groups (or clusters) to define name of deck type.

This simple but this is the reason why my head is about to blow up 😂

so my question is:

How can we convert series of Card IDs into eucliean-space of clustering? Is it even possible? If this is impossible, is there any method to group (or make cluster) series of data?



from What is the best method to *classify* series of data?

No comments:

Post a Comment