Thursday, 29 October 2020

Sentences embedding using word2vec

I'd like to compare the difference among the same word mentioned in different sentences, for example "travel". What I would like to do is:

  • Take the sentences mentioning the term "travel" as plain text;
  • In each sentence, replace 'travel' with travel_sent_x.
  • Train a word2vec model on these sentences.
  • Calculate the distance between travel_sent1, travel_sent2, and other relabelled mentions of "travel" So each sentence's "travel" gets its own vector, which is used for comparison.

I know that word2vec requires much more than several sentences to train reliable vectors. The official page recommends datasets including billions of words, but I have not a such number in my dataset(I have thousands of words).

I was trying to test the model with the following few sentences:

    Sentences
    Hawaii makes a move to boost domestic travel and support local tourism
    Honolulu makes a move to boost travel and support local tourism
    Hawaii wants tourists to return so much it's offering to pay for half of their travel expenses

My approach to build the vectors has been:

from gensim.models import Word2Vec

vocab = df['Sentences']))
model = Word2Vec(sentences=vocab, size=100, window=10, min_count=3, workers=4, sg=0)
df['Sentences'].apply(model.vectorize)

However I do not know how to visualise the results to see their similarity and get some useful insight. Any help and advice will be welcome.

Update: I would use Principal Component Analysis algorithm to visualise embeddings in 3-dimensional space. I know how to do for each individual word, but I do not know how to do it in case of sentences.



from Sentences embedding using word2vec

Can't pickle

Set form control to dirty in angular test

I have the following component template:

<div>
  <mat-checkbox [(ngModel)]="model.value" required="true" #checkbox="ngModel"></mat-checkbox>
  <mat-error *ngIf="checkbox.invalid && checkbox.dirty">Some Error</mat-error>
</div>

And in my test I would like to test the the error state is shown. However I need to access the input control "#checkbox" in my test and set as dirty.

  it('should show error on invalid', () => {
    const checkbox = fixture.debugElement.query(By.directive(MatCheckbox))
    // I have the mat-checkbox, however not sure how to access the form control to set as dirty
  })

If I add a ViewChild to my component I can access this in the test:

@Component({
  selector: 'my-checkbox',
  templateUrl: './checkbox.component.html',
  styleUrls: ['./checkbox.component.scss']
})
export class CheckboxComponent {
  @Input() model: any;

  // Don't ever use this in production, but added for tests
  @ViewChild('checkbox', { static: false }) checkbox: NgModel;
}

...

// At that point I can access in test through the component
component.checkbox.control.markAsDirty();

However, I would like to access the model inside my test without adding a variable to the component itself as that model is never used.

EDIT: I have also tried to just modify the value to toggle the control to dirty:

it('should show error on invalid', (done) => {
  component.model.value = false  // set value to unchecked
  fixture.detectChanges()

  fixture.whenStable().then(() => {
    const error = fixture.debugElement.query(By.directive(MatError))
    // error is null here, verified that control is not marked as dirty
    done();
  })
})


from Set form control to dirty in angular test