ml-questions-tammy

ML or Tech Area Just a few questions to know what area we could focus you on.

What have you solved with ML? Please give us an idea of something you deployed and what it solved. *

I have applied ML for disambiguating authors of pubmed journals to differentiate the different authors having same name. This enables to identify key opinion leaders in specific domain.

Do you learn better by Doing (Example) or Reading then Doing? *
Reading then doing.

Why would you use an Forward Feed Vs LSTM Algorithm? Dont know is also ok *

If the input is not inherently sequential, Forward Feed algorithm makes more sense. The LSTM algorithm is more apt for sequential data with feedback loop and more dynamic capabilities.

What is A.I. to you? Meaning whats the difference between A.I./ML etc or is it the same?
AI is much broader term which includes all sort of intelligent programming including rule based systems. ML enables to improve the algorithm efficiency based on the input data without explicitly being programmed (like rule based systems).

Business and Professional Questions These are the most important above technical skills for us and our teams.

What hours are you available Mon - Friday? *
7:00 AM-11:00 AM ; 8:00 PM - 12:00 AM

What hours are you available on Sat or Sunday? *
7:00AM to 11:00AM PST

Do you know how to send Meeting Requests in Google Calendar? *
Yes No

Do you know how to use Google Drive and ask questions offline using Comments? *
Yes No

If you are an Intern it means your working on tasks which are purely educational and likely never project facing. What rate do you expect to earn in the 1st month or until you can produce "results" which match projects goals * Hope to be able to work on real projects soon. Otherwise, the rates are negotiable, any amount is OK.

What is your rate you are hoping for in 3 months?
35$ per hour

When can you start?

How much work can you do per week?
40 hours

What time PST are you available in the AM for short meetings.
:

or what time PST are you available in the PM for short meetings if AM is not good for you.
:

Misc Skills to know

Can you deploy and use EC2 Linux Servers? *
Yes No

Do you know python? *
Yes No

Have you every scraped data using ScrapyD *
Yes No

Do you know Node.JS *
Yes No

Do you know Angular 1? or 2? *
Yes No

Have you ever worked as a Backend Developer *
Yes No

Have you ever worked as a Frontend Developer *
Yes No

How many years working with Angular ? *
0

How many years working with Pandas ? *
1

How many years working with Python? *
8

Thank You - The Important Part Now We will contact you via Email based on your reply and the positions we have open. Our goal is to find GREAT people who are willing and able to learn. We will make sure to pull you forward and billable so you can earn a global avg run rate.

Which of these have you used at least 2 times ? *
LSTM GAN Genetic Algorithms None

Scenarios - These give us the most info If you are provided a challenge solve this problem could you? - Just a model or algorithm or approach is enough for us.

Stats: You are provided 4 separate data sets. Labeled. these are Names of people, Cars, Streets, Costs of Cars and other misc data. There is correlation between these 4 data sets. What would you do to solve this? Goal is find the Correlation or Nearest or Highest Probable Match. * The correlation is only applicable between 2 variables that can be calculated using pandas corr (df.corr()) or numpy.corrcoeff() etc. To find the nearest match for a given observation, we can compute distance (e.g. using scipy.spatial.distance) and sort the values and get the closest observation(s).

NLP: You are provided a long chat log between two people. The goal is to be able to predict the conversation or any part of it. The chat log contains recurring sentences. Problem is to predict what Person 1 may ask or answer based on specific questions. How would you try and solve this? * Since it is given that log contains "recurring sentences", I would model this as "predict next sentence" problem which is exactly similar to "predict next word" problem. During pre-processing, I would convert all sentences into sequence of "sentence-ids" (similar to word ids) and train the model to generate "sentence-embeddings" using same algorithm for "word embeddings". A simpler algorithm would be to just compute n-grams using the training set (using multiple values for n such as 5, 4, 3, 2) and look-up the n-gram tree to retrieve the most probable next word. Even though the algorithm is relatively simple it still involves lot of coding. The recommended approach would be to use RNN model which can yield more accurate results and also easier to code using keras/tensor flow.

Computer Vision: You are provided 1 image which contains 100 objects of 3 main shapes. Square, Rectangle and Circle. Can you locate the shapes ? and how many or which type could you locate? * Yes, it is possible to create model that can detect and locate all 3 shapes. Any latest state-of-the-art model techniques will work. Especially YOLO-v3 model can perform well with out having to require large CPU/GPU resources. Since the shapes involved are fairly simple, it is possible to generate random sets of images with random number of squares, rectangles and circles with random noise background.

You are asked to take a .csv which has 100 rows of data and see if you can locate a MATCH in a second data set which has 100,000 rows. How do you solve it? * Choose any algorithm that first loads the 100 rows into memory then iterate over the 100K rows to filter them using nested comparison. This still takes O(100*100K) time complexity. If the row sizes are too big, the lower level csv read functions can be used to read the bigger file each row at a time, without having to load all into memory. Since 100K is not too big, it is OK to load it into dataframe and use higher level operations to filter them. E.g. result_df = big_df.apply(filter_fn_is_row_matches_small_df, axis=1)