GitBook: [#2896] update

This commit is contained in:
CPol 2021-12-10 01:01:04 +00:00 committed by gitbook-bot
parent 5d35e0d35b
commit 3d22c28ce3
No known key found for this signature in database
GPG Key ID: 07D2180C7B12D0FF

View File

@ -273,3 +273,20 @@ dataset.iloc[10:20] # Get some indexes that contained empty data before
```
To fill categorical data first of all you need to think if there is any reason why the values are missing. If it's by **choice of the users** (they didn't want to give the data) maybe yo can **create a new category** indicating that. If it's because of human error you can **remove the rows** or the **feature** (check the steps mentioned before) or **fill it with the mode, the most used category** (not recommended).
## Combining Features
If you find **two features** that are **correlated** between them, usually you should **drop** one of them (the one that is less correlated with the target), but you could also try to **combine them and create a new feature**.
```python
# Create a new feautr combining feature1 and feature2
dataset['new_feature'] = dataset.column1/dataset.column2
# Check correlation with target column
dataset[['new_feature', 'column1', 'column2', 'target']].corr()['target'][:]
# Check for collinearity of the 2 features and the new one
X = add_constant(dataset[['column1', 'column2', 'target']])
# Calculate VIF
pd.Series([variance_inflation_factor(X.values, i) for i in range(X.shape[1])], index=X.columns)
```