library(readr)data=read_csv("test_measurements.csv")
When we import the .csv or .xlsx file into the R window, by default, it will be considered as a data frame. Then we are checking the dimension of the data frame and the summary of the data. In summary, we can get a clear view of the NA's (missing) values of the corresponding columns.
dim(data)summary(data)
This initial inspection helps identify missing values, unusual patterns, and potential issues that need to be addressed before proceeding with deeper analysis or modeling.
To find the total number of missing values.
total = sum(is.na(data))print(total)colSums(is.na(data))
The missing values are replaced with the median by using the code below:
New_df = data[,2:12]
New_df$Presentation = ifelse(is.na(New_df$Presentation), median(New_df$Presentation,na.rm = TRUE),New_df$Presentation)
New_df$`Influencing and Convincing` = ifelse(is.na(New_df$`Influencing and Convincing`), median(New_df$`Influencing and Convincing`,na.rm = TRUE),New_df$`Influencing and Convincing`)
New_df$`Stress Tolerance` = ifelse(is.na(New_df$`Stress Tolerance` ), median(New_df$`Stress Tolerance`, na.rm = TRUE),New_df$`Stress Tolerance`)
New_df$`Achievement Orientation` = ifelse(is.na(New_df$`Achievement Orientation`), median(New_df$`Achievement Orientation`, na.rm = TRUE),New_df$`Achievement Orientation` )
Again, we are checking for the missing values in the data frame.
total = sum(is.na(New_df))print(total)summary(New_df)
To check for outliers
boxplot(New_df) col = c('Presentation','Influencing.and.Convincing','Stress.Tolerance','Achievement.Orientation') boxplot(New_df[,c('Presentation','Influencing.and.Convincing','Stress.Tolerance','Achievement.Orientation')]) for (x in c('Presentation','Influencing.and.Convincing','Stress.Tolerance','Achievement.Orientation')) { value =New_df[,x][New_df[,x] %in% boxplot.stats(New_df[,x])$out] New_df[,x][New_df[,x] %in% value] = NA }
Checking whether the outliers in the above-defined columns are replaced by NULL or not.
as.data.frame(colSums(is.na(New_df)))
In some cases, the null values may lead to less accuracy. So we have to remove them. Removing the null values with this code:
library(tidyr) New_df = drop_na(New_df) as.data.frame(colSums(is.na(New_df)))

Post a Comment
The more questions you ask, the more comprehensive the answer becomes. What would you like to know?