Chapter 4 Data Scientist Job
In general, the data scientist role can be divided into decision scientist and the machine scientist The Kinds of Data Scientist (hbr.org) 4 Types of Data Science Jobs | Udacity
The entry-level (i.e., data analyst) deals with basic analysis tools such as Excel and SQL programming skills to pull data. The middle-level (i.e., data scientist I) deals with more advanced analytics tools such as R, Python. The senior-level uses the same analytics tool but can write or modify the published package/ library. On a side note, the data product management side deals with data product/service, which is akin to the product manager in general.
In practice, the substantive job content can be differentiated based on strategy, consumer behavior, and optimization tracks. The strategy track produces the analysis for managers to make further decisions. In the consumer behavior track, they produce the analysis to elucidate the psychological mechanism and come up with interesting mental models used by the consumers. The optimization track focuses on making things more efficient on a large scale or using machine learning to automate the analysis.
I list the specific duties of each track below, along with the resources to develop the corresponding skills or business sense.
4.1 Business strategy Track (a.k.a Marketing Analytics)
4.1.1 Database marketing
- RFM targeting offer design, discount offer optimization See the Database marketing page
4.1.2 Programming
SQL:
- Data quality validation Data manipulation and merging
- https://mailmissouri-my.sharepoint.com/:b:/g/personal/ylb3c_umsystem_edu/ETfI6jNCUhhJvMF-edkqqZEBu5gcbJ0CD92edzkvNjPjwQ?e=u3MJid
Python
https://www.notion.so/Python-practice-110-2a8c3c764f6a489b911c8e1c432ea165
Introduction to Data Science in Python - Week 1 | Coursera
map()
lambda list comprehension
numpy library
series, data frame
groupby, pivot, merging
Distribution
R
Base
MAP → same function to all variables.
Ggplot → viusalization
Tidyr → data manipulation
-
basic commands
model accuracy
KNN
Classification - LDA?
Bootstraping
Regularization
Non-linear model
Tree based
SVM
Unsupervised
4.1.3 Statistics:
Statistical tests for differences:
- Independent, Paired T tests, F- tests, Chi-square tests (used for A/B testing, incrementality testing)
- Incrementality
- Regression
- Independent variables: dummy variable, variable transformation, exploratory/ descriptive analysis
- Dependent variable: Binary (logit, probit regression), Count (poission, negative binomial regression), Censored (Tobit, survival regression)
Advanced: Causal inference
- Control for observables
- Mixed model / Hierachical linear model
- Difference in Difference
- Regression Discontinuity
- Modeling process
4.2 Consumer insight Track (aka. Marketing Research)
Qualitative study design
Quantitative study design (i.e., survey)
- Qualtrics for questionnaire or experiment design
- Generating research questions
Data analysis
- Meditation , Path Model
- Measurement model
4.3 Optimization Track (a.k.a Operational Research)
4.3.1 Model optimization
Machine learning - regularization
dimension reduction
Chapter 34 Model Stacking | Advanced Data Analysis (bookdown.org)
4.3.2 Macro level models
Marketing Mix Models
- https://www.notion.so/A-Better-Way-to-Calculate-the-ROI-of-Your-Marketing-Investment-f0b034ef75154ffea26bf81b3fda2b2c
- Ad stock function
- Media Attribution Modeling
- Attribution Modeling in Marketing Python | Kaggle
- https://mailmissouri-my.sharepoint.com/:b:/g/personal/ylb3c_umsystem_edu/EfV-Ld_JH8FNiCqp6MtSkFMBAcWzyMHvVKmBhdCyLT7k2A?e=TtLaQ1