Создан заказ №11243033
28 ноября 2024
Работа с датасетом, проект по статическому анализу данных в питоне
Как заказчик описал требования к работе:
1. For five variables calculate the appropriate measures of the central tendency (mode/median/mean), interpret them. Remember that for different variables, you can count different MCTs (for example, for age, at least you can count all three, but for color only mode).
2. Look at the outliers fo
r three variables. Visualize them using a box plot (hint sns.boxplot()). Interpret the graphs.
3. For two variables, calculate the outliers using both the interquartile range and the standard deviation from the mean. Are the results different? Interpret them.
4. If there are outliers, delete them (if there is a lot of data) or replace them with the mean/median (if there is little data) and see how the measures of the central tendency in the variable under consideration have changed.
5. If there are missing values in the data, specify in which variables and how many of them. And fill them with the median/mean.
6. Build a correlation matrix (use seaborn.heatmap function) based only on those features for which the correlation can be calculated (If there are a lot of such pairs, build 5 of any)
7. Interpret each correlation value in the matrix
8. Plot the scatter plots based on these features (hint sns.pairplot())
9. Define the problem task (whether it is regression or binary classification) depending on your target that you're going to predict
"10. Apply machine learning algorithm (linear regression model for regression / logistic regression for classification)
10.1 1st experimental data: take all numeric and encoded categorical features
10.2 2nd experimental data: take top-3 (top-5) features with the highest correlation with target
11. For the 1st experimental and the 2nd experimental data separately:
11.1 Split your data into train and test into 80% and 20% proportion correspondingly
11.2 Train ML algorithm on train set
11.3 Make predictions on test set
11.4 calculate quality metrics (R^2, RMSE for regression; accuracy, recall, precision for classification)
12. compare results of two models (where the accuracy, precision, recall are higher (or RMSE is the lowest for regression task), the better model is)"
13. Conclusion (Interpreation of obtained results, comparison between two experiments and of the whole project in a free form)
14. Create zip archive with .ipynb file and .csv/.xlsx dataset (or separate files
подробнее
Заказчик
заплатил
заплатил
100 ₽
Заказчик не использовал рассрочку
Гарантия сервиса
Автор24
Автор24
20 дней
Заказчик принял работу без использования гарантии
29 ноября 2024
Заказ завершен, заказчик получил финальный файл с работой

5

Работа с датасетом, проект по статическому анализу данных в питоне.docx
2024-12-02 17:28
Последний отзыв студента о бирже Автор24
Общая оценка
5

Положительно
автор гений.работа прошла с первой проверки без доработок без нариканий. работу сдал раньше срока.вообще красавчик.советую всем