Sunday, December 26, 2021
Saturday, December 25, 2021
What training data look like in image classification?
What does training data look like in image classification tasks?
Images are separated into train, valid folders.
Within each folder, images are organized by the numeric value corresponding to their class name e.g.0,1,2. ... usually starting from 0 to the number of classes minus 1 when it is zero-indexed. Else it starts at 1.
This is what the oxford flower dataset looks like
Monday, December 20, 2021
Uber machine learning workflow
We talked extensively about having a framework for understanding machine learning - the machine learning workflow. We published a simplified (crystalized) version of the workflow. We refer to it very often, because it is such a useful framework. Check out the list. We also routinely compare this workflow with workflow illustrations from leading engineering companies to check if it is still valid and relevant. Here is Uber Engineer's machine learning workflow.
https://ml.learn-to-code.co/topic_machine_learning_workflow.html
Tuesday, December 14, 2021
Web3
web3 tech stack illustrated by web3 foundation
5 layers of web3 stack, layer 0 through 5 illustrated by web3 foundation. |
Friday, December 3, 2021
Wednesday, November 3, 2021
Blockchain news for developers - November 2021
NVIDIA CMP HX dedicated GPU for professional mining: Remember at the beginning of the year, NVIDIA wants crypto miners to stop snatching up GPUs from gamers and instead using this device to mine professionally.
#hardware #trend #GPU #crypto #mining
NFT
Check out this video by MomentRank
https://twitter.com/kissingsky/status/1428368687644364805?s=20
Sunday, October 17, 2021
Finding of Michele Banko and Eric Brill, Microsoft Research 2001 paper summarized
"Scaling to Very Very Large Corpora forNatural Language Disambiguation", Michele Banko and Eric Brill, Microsoft Research, 2001 paper
This NLP paper seems to show that as the quantity of high quality training data increased, the test accuracy of all models improve significantly, for complex models and even simple models. To use data to achieve high quality result in NLP, the vocabulary size (unique words) should exceed the current state-of-art one million of words, when there are "hundreds of billions of words" readily available on the internet and the size of the vocab continues to grow.
Friday, October 15, 2021
Friday, October 8, 2021
Office tour airbnb, pytorch, github
Uniqtech virtual office tour |
Saturday, September 25, 2021
Scikit-Learn sklearn code snippets common patterns cover photo
Wednesday, September 1, 2021
Pandas Profiling
Pandas Profiling (not an official part of pandas, it is a pypl package) provide summary statistics, calculate important stats, beyond the basic df.describe(). It has 7000+ stars and 1000+ forks. It can calculate type inference, histogram, missing values, correlation automatically.
Wednesday, August 25, 2021
OpenAI Codex - Uniqtech Guide to code generation, natural language processing, GPT-3
Need to brush up on GPT-3 knowledge? Check out our GPT-3 knowledge landing page. It's free. Log in to access for free. Uniqtech Guide to understanding GPT-3. Scroll to the bottom to read all about Codex.
Uniqtech Guide to OpenAI Codex Basics
Link to our knowledge flash cards:
- OpenAI Codex Basics - Uniqtech Guide to GPT-3 [public, open access]
- Intermediate Codex - Uniqtech Guide to GPT-3 Codex code generation [public, intermediate]
- Advanced Codex notes - Uniqtech Guide to OpenAI GPT-3 Codex [pro, paid subscriber]
01 Copy and paste first grade math question from a worksheet
02 Use the question as a prompt and get an answer from OpenAI Codex
03 Codex translates the prompt from English to Python Code
04 Codex generates a numeric answer to the math question.
You can copy and paste the code into a notepad to customize.
OpenAI Codex answers first grade math questions Uniqtech Guide to Codex |
Codex says Hello World
Wednesday, July 28, 2021
Friday, June 18, 2021
Job posts visualizations : Product Manager versus Program Manager versus Data Engineer
Our data visualization for job postings of Product Manager, Program Manager and Data Engineer.
Click the image to view a larger version, styling our logo. Keep in mind, our initial analysis is limited as we are just starting to collect big tech data. As our dataset grow, these insights may evolve. The difference between product and program management is still subtle, but in reality they are very different positions. We have friends doing both. The team hopes to see the visualization to be more informative soon.
Saturday, May 1, 2021
Imputation Strategies
Imputation is used when handling pre-processing training data in machine learning. It is useful in handling missing data.
Installation - Machine Learning Deep Learning Prerequisites
import numpy as np # linear algebra
import seaborn as sns # data visualization, API
from bs4 import BeautifulSoup as soup # web scraping
Install packages based on requirements.txt using command line
$sudo pip install -r requirements.txt
Other commonly used libraries:
Other scikit-learn import statements you might see in the wild:
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
Machine Learning in the Cloud
Workflow : How to generate or collect, preprocess and train with data.
Sample tasks :
- train machine learning models in google cloud.
- Data collection in Google Cloud or on Amazon Web Services (AWS).
- Analyze, preprocess training data.
- Clean, analyze data and present your findings
- Pre-processing data using python
- Train a basic machine learning model
- Deploy a model for prediction using a REST API
Sunday, April 11, 2021
AI Research and Investment at Baidu (Google of China), AI, AI chips, Autonomous Driving, Self Driving Cars
百度 Baidu CEO 李彦宏 discusses AI, autonomous driving, AI chips, self driving cars (Tesla competitor) strategy, research and development (R&D), investment, and innovation at Baidu, self driving in China, US China relations in terms of tech innovation.
Read our knowledge card here. Link to video in the card.
Thursday, April 1, 2021
Wednesday, March 31, 2021
Startup office tech company interior design inspiration
Headspace office has an open amphitheater for group meditation and meeting, improving happiness, wellness.
Sunday, March 7, 2021
What does it take to be a Software Engineer or Machine Learning Engineer at Quora?
Quora is a place where people go ask questions, and get relevant answers / tips back from experts, the crowd, people with experiences... What does it take to be an engineer at Quora? This recruiting flyer tells us what it really takes to be a part of the Quora engineering team. Pro (paid) members can access this info card . Follow us for more job post analysis, career growth suggestions like this http://ml.learn-to-code.co/.
Thursday, March 4, 2021
Monday, February 8, 2021
Next Rembrandt - Generated AI - Uniqtech curated coolest AI demo series
This is a state-of-art collaboration among researchers and technologists at ING, Microsoft, Tu Delft ... Rembrandt Museum. This is a perfect example of AI generated fine art. Unlike hobbyist generated pictures, and unlike prototype experimental art generation, this is high fidelity, fine grained, HD generated fine art, meticulously 3D printed, which is hard to achieve. In this case, it is executed to perfection. Title: Next Rembrandt - Generated AI - Uniqtech curated coolest AI demo series. Original source citation : see URL link.