Sunday, October 17, 2021

Finding of Michele Banko and Eric Brill, Microsoft Research 2001 paper summarized

"Scaling to Very Very Large Corpora for
Natural Language Disambiguation", Michele Banko and Eric Brill, Microsoft Research, 2001 paper


This NLP paper seems to show that as the quantity of high quality training data increased, the test accuracy of all models improve significantly, for complex models and even simple models. To use data to achieve high quality result in NLP, the vocabulary size (unique words) should exceed the current state-of-art one million of words, when there are "hundreds of billions of words" readily available on the internet and the size of the vocab continues to grow. 

Friday, October 15, 2021

Seaborn data visualization

 Cool seaborn plot sns.jointplot



Source official seaborn documentation.

Friday, October 8, 2021

Office tour airbnb, pytorch, github

In our newsletter we mentioned the many drinks on tap at Airbnb (also a stage that is decorated to look like an airbnb house), github office filled with octocat arts, pytorch conference artisan coffee machines staffed by professional latte art baristas, an extremely fancy nespresso machine at Accenture.
Uniqtech virtual office tour