We hear a lot about Machine Learning & Deep Learning these days – that the next wave of jobs are going to be in AI and Machine learning – we attempt to answer some questions here and give some insight on component selection.
What is it?
Machine learning is focused on using computer algorithms which draw predictive insight from static or dynamic data sources – this is done using analytic and probabilistic models – refined using training and feedback.
It uses pattern recognition, artificial intelligence learning methods, and statistical data modeling. “Learning” happens in two ways – Supervised Learning ; when the desired outcome is known and an annotated data set used to train / fit a model to accurately predict values outside the training set. Unsupervised learning ; uses raw data to seek out inherent partitions or a group of characteristics present in the data itself.
Machine learning combines varied disciplines of computer science, mathematics, graph and network theory, statistics and probability – practically endless areas of application
A lot more computing can be accomplished with a workstation and a few graphic cards than what the whole planet’s computing power could do in 1990 ! Let that sink in. Estimated worldwide data storage is in zettabyte ranges , 1 zettabyte is a trillion gigabytes. We’re probably staring down the golden age of Machine Learning right now!
Right time ?
Humongous amounts of data from scientific instruments, business records, mature and well annotated databases – things look to be falling in place for ML. Inexpensive computing resources and established programming methods for utilizing GPU acceleration AND there is the money. Money in research, startups and big business budget allocations – Data Scientist is the hottest job title.
Quite understandably, the single biggest driving force for the rise of Machine Learning is to try and sell you stuff! High paying jobs in places like Google, Baidu, Microsoft & Facebook – some really amazing work is being done to improve marketing and advertising.
The applications ?
ML has huge potential for scientific discovery – terabytes of data generated by research instruments open up possibilities for discovery in chemistry, bio-medicine, economics etc. Great potential for robotics, autonomous vehicles, translation and voice recognition is a no-brainer.
To realize how far computing power has come along , here’s a short note for perspective. In 2012 Google set out analyzing YouTube data using a neural network with unsupervised learning – the feature extraction classifier discovered (on its own) that Cat videos are the most popular on YouTube. This work was accomplished using a conventional computing cluster for training – consisted of 1000 nodes with 1600 cores! Today you could run a model with the exact same complexity using 2 workstations with 4 GPU’s in each. This makes for a great use case for GPU acceleration in ML.
What configuration should i choose?
Short answer is – it depends. On your data set and your work load
Part selection guideline would be some thing like the below –
- One or Two Nvidia Pascal GPU’s
- Intel i7 or i9 CPU
- 32GB to 62GB of system memory (RAM)
Recommended configuration –
- GPUS – 2 nos. (GTX 1070, GTX 1080ti or Titan Xp) (Cuda Cores are your friends!)
- Core i9 7900X (10 Core)
- 64GB of Memory
- 1 TB SSD/HDD (SSD is preferred for faster data transfer)
A more basic configuration can be arrived at, optimizing part selection based on budget and use case – not every application will need a i9 7900x – an i5 8400 can do the job for smaller data sets (although not as fast)
Do comment if you want to see a more detailed reasoning on component selection or shoot us an email at firstname.lastname@example.org for any questions or queries.
Until next time.
theMVP.in – High Performance Systems