Chapter 4 Data Scientist Job

In general, the data scientist role can be divided into decision scientist and the machine scientist The Kinds of Data Scientist (hbr.org) 4 Types of Data Science Jobs | Udacity

The entry-level (i.e., data analyst) deals with basic analysis tools such as Excel and SQL programming skills to pull data. The middle-level (i.e., data scientist I) deals with more advanced analytics tools such as R, Python. The senior-level uses the same analytics tool but can write or modify the published package/ library. On a side note, the data product management side deals with data product/service, which is akin to the product manager in general.

In practice, the substantive job content can be differentiated based on strategy, consumer behavior, and optimization tracks. The strategy track produces the analysis for managers to make further decisions. In the consumer behavior track, they produce the analysis to elucidate the psychological mechanism and come up with interesting mental models used by the consumers. The optimization track focuses on making things more efficient on a large scale or using machine learning to automate the analysis.

I list the specific duties of each track below, along with the resources to develop the corresponding skills or business sense.

4.1 Business strategy Track (a.k.a Marketing Analytics)

4.1.1 Database marketing

RFM targeting offer design, discount offer optimization See the Database marketing page

4.1.2 Programming

SQL:
- Data quality validation Data manipulation and merging
- https://mailmissouri-my.sharepoint.com/:b:/g/personal/ylb3c_umsystem_edu/ETfI6jNCUhhJvMF-edkqqZEBu5gcbJ0CD92edzkvNjPjwQ?e=u3MJid
Python
- https://www.notion.so/Python-practice-110-2a8c3c764f6a489b911c8e1c432ea165
- Introduction to Data Science in Python - Week 1 | Coursera
  - map()
  - lambda list comprehension
  - numpy library
  - series, data frame
  - groupby, pivot, merging
  - Distribution
R
- Base
- MAP → same function to all variables.
- Ggplot → viusalization
- Tidyr → data manipulation
- <https://mailmissouri-my.sharepoint.com/:f:/g/personal/ylb3c_umsystem_edu/Em8gdLAUHelNud9b7wGlTr8B6L5tVHwYiVyyh6t112Gvhg?e=i2r0PZ>
  - basic commands
  - model accuracy
  - KNN
  - Classification - LDA?
  - Bootstraping
  - Regularization
  - Non-linear model
  - Tree based
  - SVM
  - Unsupervised

4.1.3 Statistics:

Statistical tests for differences:
- Independent, Paired T tests, F- tests, Chi-square tests (used for A/B testing, incrementality testing)
- Incrementality
- Regression
- Independent variables: dummy variable, variable transformation, exploratory/ descriptive analysis
- Dependent variable: Binary (logit, probit regression), Count (poission, negative binomial regression), Censored (Tobit, survival regression)
Advanced: Causal inference
- Control for observables
- Mixed model / Hierachical linear model
- Difference in Difference
- Regression Discontinuity
- Modeling process

4.1.4 Visualization

Drawing Interaction plot
Decile plots
Written report on the analysis

4.1.5 Automation

Function building
Analysis library

4.2 Consumer insight Track (aka. Marketing Research)

Qualitative study design
Quantitative study design (i.e., survey)
- Qualtrics for questionnaire or experiment design
- Generating research questions
Data analysis
- Meditation , Path Model
- Measurement model

4.3 Optimization Track (a.k.a Operational Research)