Content pfp
Content
@
https://warpcast.com/~/channel/confluence
0 reply
0 recast
0 reaction

TheModestThief pfp
TheModestThief
@thief
a roundup of the ML-speedrunning course so far (linear regression): ## Jupyter Notebook ## 1. Exploratory Data Analysis (EDA): - gotten a grasp on EDA methods i.e. check/drop duplicates, deal with na values, check/deal with outliers (drop/cap/impute mean) - check distribution of target variable - if skewed (left/right) and apply an appropriate transformation (e.g. log) - univariate analysis/visualisation: trellis plots, scatterplots, using seaborn and plotly (for interactive, e.g. hovering) - bivariate analysis: correlation matrix (any highly correlated features? maybe drop 1 for redundancy) 2. Data-splitting 3. Feature scaling (normalise or standardise) - using standardscaler() to standardise necessary numerical values
1 reply
0 recast
3 reactions

TheModestThief pfp
TheModestThief
@thief
4. Feature encoding pipeline (for categorical features) - one-hot encoding (binary) or ordinal encoding 5. Assemble main pipeline (columnTransformer) 6. Build baseline model: - fit X_train, y_train through main pipeline (i.e. training the baseline model) - predict on X_val - evaluate metrics: MAE, MSE, RMSE, R-squared
1 reply
0 recast
1 reaction