Often economic and other machine learning data are of different units or sizes, making estimation, interpretation, or visualization difficult. These issues can be handled if the data can be converted to unitless or uniform magnitude data. When the need for change thus arises, it becomes difficult to achieve what is an easy task.
In this blog, I share with you a function data_transform from Dyn4cast package that can easily transform your data.frame for estimation and visualization purposes. It is one line code and easy to use. Usage is as follows:
data_transform(data, method, x, margin)
data clean numeric data frame
Mean method of transformation or standardization: 1 = min-max, 2 = log, 3 = mean-sd.
Margins Optional, to indicate whether the data is column-wise or row-wise. Default column-wise if not specified
library (Dyn4cast) transform <- read rds("data/transform.rds") data 0 <- ट्रांसफ़ॉर्म%>% pivot_length(!x, name_to = “factor”, value_to = “data”) ggplot(data = data0, aes(x) = x, y = data, fill = factor, color = factor)) + gom_line() + scale_fill_brew(palette = “set 1”) + scale_color_brew(palette = “set 1”) + lab(y = “data”, x = “series”, color = “factor”) + theme_bw(base_size = 12)
The pattern of small data is obscured by large data, their distribution is even more difficult to see
You can also change the X column but better not to.
data 11 <- data 1 <- data_transform(transform[, -1]1) data 1 <- cbind(Transform[, 1]data1) data1 <- डेटा1 %>%pivot_longer(!x, name_se = “factor”, value_se = “data”) ggplot(data = data1, aes(x = x, y = data, fill = factor, color = factor) )) + gom_line() + scale_fill_brew (palette = “set1”) + scale_color_brewer (palette = “set1”) + labs (y = “data”, x = “series”, color = “factor”) + theme_bw (base_size = 12)
The pattern of each variable is now very clear.
data 21 <- data 2 <- data_transform(transform[, -1]2) data2 <- cbind(Transform[, 1]data2) data2 <- डेटा2 %>%pivot_longer(!x, name_se = “factor”, value_se = “data”) ggplot(data = data2, aes(x = x, y = data, fill = factor, color = factor)) + gom_line() + scale_fill_brew( palette = “set1”) + scale_color_brew(palette = “set1”) + labs(y = “data”, x = “series”, colors = “factor”) + theme_bw(base_size = 12)
Log is a linear transformation of the data. The pattern is shown but the change is very minimal with the min-max method.
data 31 <- data 3 <- data_transform(transform[, -1]3) data3 <- cbind(Transform[, 1]data3) data3 <- डेटा3%>%pivot_long(!x, name_se = “factor”, value_se = “data”) ggplot(data = data3, aes(x = x, y = data, fill = factor, color = factor) )) + gom_line() + scale_fill_brew (palette = “set1”) + scale_color_brewer (palette = “set1”) + labs (y = “data”, x = “series”, color = “factor”) + theme_bw (base_size = 12)
Similar to min-max transformation, but the essential pattern of the data is apparent.
raw <- lm(column1~.,data=transform[, -1]) Data1 <- lm(col1 ~., data = data.frame(data11)) Data2 <- lm(col1 ~., data = data.frame(data21)) Data3 <- lm(col1 ~., data = data. frame(data31)) m_list <- list(raw = raw, max = data1, log = data2, mean = data3) ModelSummary::ModelSummary (m_list, stars = TRUE, points = 2) raw max log mean (intercept ) 840399.559+ 0.288+ 10.042*** 0.000 (429034.752) (0.140) (1.804) (0.099) Column 2 642.404*** 0.740*** 0.114*** 0.722*** (111.262) (0.128) (0.014) ( 0.125) COLUMN 3 − 114.479 −0.107 0.016 −0.079 (195.042) (0.183) (0.168) (0.135) (0.135) COL4 −6770.682* −0.317* −0.189 −0.244* (2935.693) (2935.639) (2935.639) (0.135) (2935.693) + (776.594) (0.151) (0.044) (0.115) col6 2088.735 0.249 0.629+ 0.186 (1357.125) (0.161) (0.302) (0.121) pts. 25 25 25 25 R2 0.805 0.805 0.889 0.805 R2 adj. 0.753 0.753 0.860 0.753 AIC 636.6 −17.4 −68.7 43.1 BIC 645.1 −8.9 −60.2 51.6 Log.Lic. −311.285 15.701 41.363 −14.539 F 15.671 15.671 30.555 15.671 RMSE 61849.44 0.13 0.05 0.43 + p Model Summary::modelplot(m_list)
The coefficients for the transformed data are better than the coefficients for the raw data, although the effects of the variables look similar except for the intercept and col 4 and col 5. The log transformation estimated a very significant intercept, while the raw and max transformations are rarely significant. Mean change did not predict a significant intercept. All transformations except log estimated a significant col4, while log estimated a significant col5 while the others are rarely significant.
In terms of model properties, log transformation is the best, followed by max, then mean; And the raw data gave the worst model properties.
Connected