Saturday, December 25, 2021

What training data look like in image classification?

 What does training data look like in image classification tasks?

Images are separated into train, valid folders.


Within each folder, images are organized by the numeric value corresponding to their class name e.g.0,1,2. ... usually starting from 0 to the number of classes minus 1 when it is zero-indexed. Else it starts at 1. 


This is what the oxford flower dataset looks like



Monday, December 20, 2021

Uber machine learning workflow

We talked extensively about having a framework for understanding machine learning - the machine learning workflow. We published a simplified (crystalized) version of the workflow. We refer to it very often, because it is such a useful framework. Check out the list. We also routinely compare this workflow with workflow illustrations from leading engineering companies to check if it is still valid and relevant. Here is Uber Engineer's machine learning workflow. 

https://ml.learn-to-code.co/topic_machine_learning_workflow.html



Tuesday, December 14, 2021

Web3

 web3 tech stack illustrated by web3 foundation

5 layers of web3 stack, layer 0 through 5 illustrated by web3 foundation.


Career, jobs, resources, technical interviews

 


Wednesday, November 3, 2021

Blockchain news for developers - November 2021

NVIDIA CMP HX dedicated GPU for professional mining: Remember at the beginning of the year, NVIDIA wants crypto miners to stop snatching up GPUs from gamers and instead using this device to mine professionally.



#hardware #trend #GPU #crypto #mining



NFT


Check out this video by MomentRank




Is this what it looks like inside the #apefest?
Source: https://twitter.com/MomentRanks/status/1456087067495108609

Web3 content creation and monetization simply explained:


https://twitter.com/kissingsky/status/1428368687644364805?s=20


Sunday, October 17, 2021

Finding of Michele Banko and Eric Brill, Microsoft Research 2001 paper summarized

"Scaling to Very Very Large Corpora for
Natural Language Disambiguation", Michele Banko and Eric Brill, Microsoft Research, 2001 paper


This NLP paper seems to show that as the quantity of high quality training data increased, the test accuracy of all models improve significantly, for complex models and even simple models. To use data to achieve high quality result in NLP, the vocabulary size (unique words) should exceed the current state-of-art one million of words, when there are "hundreds of billions of words" readily available on the internet and the size of the vocab continues to grow. 

Friday, October 15, 2021

Seaborn data visualization

 Cool seaborn plot sns.jointplot



Source official seaborn documentation.

Friday, October 8, 2021

Office tour airbnb, pytorch, github

In our newsletter we mentioned the many drinks on tap at Airbnb (also a stage that is decorated to look like an airbnb house), github office filled with octocat arts, pytorch conference artisan coffee machines staffed by professional latte art baristas, an extremely fancy nespresso machine at Accenture.
Uniqtech virtual office tour

Saturday, September 25, 2021

Wednesday, September 1, 2021

Pandas Profiling

Pandas Profiling (not an official part of pandas, it is a pypl package) provide summary statistics, calculate important stats, beyond the basic df.describe(). It has 7000+ stars and 1000+ forks. It can calculate type inference, histogram, missing values, correlation automatically. 



Wednesday, August 25, 2021

OpenAI Codex - Uniqtech Guide to code generation, natural language processing, GPT-3

 Need to brush up on GPT-3 knowledge? Check out our GPT-3 knowledge landing page. It's free. Log in to access for free.  Uniqtech Guide to understanding GPT-3.  Scroll to the bottom to read all about Codex. 


   Uniqtech Guide to OpenAI Codex Basics


Link to our knowledge flash cards:


Subscribers (pro members $5 coffee price / month) can read summary notes in the advanced flash cards. We summarize takeaways from the demo and paper. 
OpenAI Codex insights, notes, demos summarized for pro members. A big time saver.



Screenshots from the OpenIA Demo:
OpenAI Codex Challenge day


Give Codex first grade math questions:


01 Copy and paste first grade math question from a worksheet

02 Use the question as a prompt and get an answer from OpenAI Codex

03 Codex translates the prompt from English to Python Code

04 Codex generates a numeric answer to the math question. 

You can copy and paste the code into a notepad to customize.


OpenAI Codex answers first grade math questions Uniqtech Guide to Codex

Codex says Hello World
Then says Hello World with empathy lol
Here's where it gets interesting: Codex generates Python code from plain English instructions, and the Python code in turn emits HTML code. In other words, Codex can generate a simple webpage!


Wednesday, July 28, 2021

Opening text file

 

uniqtech tutorial handling text file
Open and read from text file. 

Friday, June 18, 2021

Job posts visualizations : Product Manager versus Program Manager versus Data Engineer

Our data visualization for job postings of Product Manager, Program Manager and Data Engineer. 

Click the image to view a larger version, styling our logo. Keep in mind, our initial analysis is limited as we are just starting to collect big tech data. As our dataset grow, these insights may evolve. The difference between product and program management is still subtle, but in reality they are very different positions. We have friends doing both. The team hopes to see the visualization to be more informative soon.  









Saturday, May 1, 2021

Imputation Strategies

 Imputation is used when handling pre-processing training data in machine learning. It is useful in handling missing data. 

Installation - Machine Learning Deep Learning Prerequisites

import numpy as np # linear algebra

import seaborn as sns # data visualization, API

from bs4 import BeautifulSoup as soup # web scraping

Install packages based on requirements.txt using command line

Install requirements

$sudo pip install -r requirements.txt

Other commonly used libraries:

numpy, scipy - for scientific computing, matplotlib, 

import os 
# import the os module 
# "This module provides a portable way of 
# using operating system dependent functionality."

Other scikit-learn import statements you might see in the wild:

from sklearn.metrics import roc_auc_score

from sklearn.ensemble import RandomForestClassifier

from sklearn.naive_bayes import GaussianNB

from sklearn.neighbors import KNeighborsClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import AdaBoostClassifier

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.linear_model import LogisticRegression

Machine Learning in the Cloud

Workflow :  How to generate or collect, preprocess and train with data. 

Sample tasks : 

  • train machine learning models in google cloud. 
  • Data collection in Google Cloud or on Amazon Web Services (AWS). 
  • Analyze, preprocess training data. 
  • Clean, analyze data and present your findings
  • Pre-processing data using python
  • Train a basic machine learning model
  • Deploy a model for prediction using a REST API


Sunday, April 11, 2021

AI Research and Investment at Baidu (Google of China), AI, AI chips, Autonomous Driving, Self Driving Cars

百度 Baidu CEO 李彦宏 discusses AI, autonomous driving, AI chips, self driving cars (Tesla competitor) strategy, research and development (R&D), investment, and innovation at Baidu, self driving in China, US China relations in terms of tech innovation. 

Read our knowledge card here. Link to video in the card. 



Thursday, April 1, 2021

Machine Learning Jokes

 xkcd machine learning joke https://xkcd.com/1838/

Generative Adversarial Networks (GANs) 



Wednesday, March 31, 2021

Sunday, March 7, 2021

What does it take to be a Software Engineer or Machine Learning Engineer at Quora?

 Quora is a place where people go ask questions, and get relevant answers / tips back from experts, the crowd, people with experiences... What does it take to be an engineer at Quora? This recruiting flyer tells us what it really takes to be a part of the Quora engineering team. Pro (paid) members can access this info card . Follow us for more job post analysis, career growth suggestions like this http://ml.learn-to-code.co/.




Thursday, March 4, 2021

Cool GPT-3 Demos

 GPT-3 Discussion on clubhouse

OpenAI DALL-E
https://openai.com/blog/dall-e/






Monday, February 8, 2021

Next Rembrandt - Generated AI - Uniqtech curated coolest AI demo series

This is a state-of-art collaboration among researchers and technologists at ING, Microsoft, Tu Delft ... Rembrandt Museum.  This is a perfect example of AI generated fine art. Unlike hobbyist generated pictures, and unlike prototype experimental art generation, this is high fidelity, fine grained, HD generated fine art,  meticulously 3D printed, which is hard to achieve. In this case, it is executed to perfection. Title: Next Rembrandt - Generated AI - Uniqtech curated coolest AI demo series. Original source citation : see URL link.



Sunday, January 10, 2021

Developer stickers perks for winter interns

 A warm welcome to our Winter 2020 Machine Learning Interns. It's a fantastic Stanford group.



Friday, January 1, 2021

Machine learning versus traditional programming vs Deep Learning

In traditional programming, developers give computer explicit instructions in procedural top down scripts and or via control flow statements that can “jump around” as opposed to top-down. Machine learning and deep learning is about supplying well-known, proven algorithms with cleaned, feature selected and or feature engineered data, as well as corresponding labels for the data (in unsupervised learning, only data is supplied), the algorithm leverage loss calculation, metrics, and optimizer to update parameters such as weights and coefficients in the algorithm. Finally these learned weights and coefficients are used along with the algorithm for prediction.


The more high quality data the better.


The biggest difference is: developers give specific instructions in traditional programming, and in machine learning and deep learning algorithms learn parameters based on data and loss function rather than rules.


Not having to write all the rules has benefits especially when the rules are hard to encode or program. The final product is also more robust, less likely to fail because it is not a strict rule based program.


Deep learning uses many layers of neural networks, hence the word deep. It is usually consisted of weight-learning layers or neural networks. Neural networks stacked together, have the unique capability of being universal function approximation - representing complex functions without explicitly coding them.