22  Кореляційний аналіз

Автор

Юрій Клебан

22.1 Генерація набору даних

?rnorm
Normal {stats}R Documentation

The Normal Distribution

Description

Density, distribution function, quantile function and random generation for the normal distribution with mean equal to mean and standard deviation equal to sd.

Usage

dnorm(x, mean = 0, sd = 1, log = FALSE)
pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
rnorm(n, mean = 0, sd = 1)

Arguments

x, q

vector of quantiles.

p

vector of probabilities.

n

number of observations. If length(n) > 1, the length is taken to be the number required.

mean

vector of means.

sd

vector of standard deviations.

log, log.p

logical; if TRUE, probabilities p are given as log(p).

lower.tail

logical; if TRUE (default), probabilities are P[X ≤ x] otherwise, P[X > x].

Details

If mean or sd are not specified they assume the default values of 0 and 1, respectively.

The normal distribution has density

f(x) = 1/(√(2 π) σ) e^-((x - μ)^2/(2 σ^2))

where μ is the mean of the distribution and σ the standard deviation.

Value

dnorm gives the density, pnorm gives the distribution function, qnorm gives the quantile function, and rnorm generates random deviates.

The length of the result is determined by n for rnorm, and is the maximum of the lengths of the numerical arguments for the other functions.

The numerical arguments other than n are recycled to the length of the result. Only the first elements of the logical arguments are used.

For sd = 0 this gives the limit as sd decreases to 0, a point mass at mu. sd < 0 is an error and returns NaN.

Source

For pnorm, based on

Cody, W. D. (1993) Algorithm 715: SPECFUN – A portable FORTRAN package of special function routines and test drivers. ACM Transactions on Mathematical Software 19, 22–32.

For qnorm, the code is a C translation of

Wichura, M. J. (1988) Algorithm AS 241: The percentage points of the normal distribution. Applied Statistics, 37, 477–484.

which provides precise results up to about 16 digits.

For rnorm, see RNG for how to select the algorithm and for references to the supplied methods.

References

Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S Language. Wadsworth & Brooks/Cole.

Johnson, N. L., Kotz, S. and Balakrishnan, N. (1995) Continuous Univariate Distributions, volume 1, chapter 13. Wiley, New York.

See Also

Distributions for other standard distributions, including dlnorm for the Lognormal distribution.

Examples

require(graphics)

dnorm(0) == 1/sqrt(2*pi)
dnorm(1) == exp(-1/2)/sqrt(2*pi)
dnorm(1) == 1/sqrt(2*pi*exp(1))

## Using "log = TRUE" for an extended range :
par(mfrow = c(2,1))
plot(function(x) dnorm(x, log = TRUE), -60, 50,
     main = "log { Normal density }")
curve(log(dnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("dnorm(x, log=TRUE)", adj = 0)
mtext("log(dnorm(x))", col = "red", adj = 1)

plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
     main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("pnorm(x, log=TRUE)", adj = 0)
mtext("log(pnorm(x))", col = "red", adj = 1)

## if you want the so-called 'error function'
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
## (see Abramowitz and Stegun 29.2.29)
## and the so-called 'complementary error function'
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
## and the inverses
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)

[Package stats version 4.1.3 ]
set.seed(1) # for reproducibility

# Generate correlated data
x1 <- rnorm(100)
x2 <- x1 + rnorm(100, mean = 0, sd = 0.5)
x3 <- 2*x1 + x2 + rnorm(100, mean = 0, sd = 0.2) + rnorm(100, mean = 1, sd = 0.5)
x4 <- rnorm(100, mean = 1, sd = 0.5)
x5 <- 0.5*x2 + 0.2*x4 + rnorm(100, mean = 0, sd = 0.3)

# Combine the data into a dataframe
data <- data.frame(x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5)

head(data)
A data.frame: 6 × 5
x1x2x3x4x5
<dbl><dbl><dbl><dbl><dbl>
1-0.6264538-0.93663715-0.66082761.5372205-0.1376835
2 0.1836433 0.20470126 1.38611351.9478274 0.4028555
3-0.8356286-1.29108944-0.65936030.6985013-0.8608171
4 1.5952808 1.67429519 5.60685920.8045661 1.0014486
5 0.3295078 0.00221545 2.03125650.7918890 0.4569658
6-0.8204684 0.06317525 0.67787710.8121713 0.6722121
plot(x2, x1)


22.2 Оглядова статистика

summary(data)
       x1                x2                 x3                x4         
 Min.   :-2.2147   Min.   :-2.54005   Min.   :-6.5714   Min.   :-0.5040  
 1st Qu.:-0.4942   1st Qu.:-0.65877   1st Qu.:-0.4326   1st Qu.: 0.6212  
 Median : 0.1139   Median : 0.15697   Median : 1.4407   Median : 0.9309  
 Mean   : 0.1089   Mean   : 0.08998   Mean   : 1.3395   Mean   : 0.9804  
 3rd Qu.: 0.6915   3rd Qu.: 0.79579   3rd Qu.: 3.2001   3rd Qu.: 1.3018  
 Max.   : 2.4016   Max.   : 2.61417   Max.   : 7.3114   Max.   : 2.9051  
       x5         
 Min.   :-1.1330  
 1st Qu.:-0.1197  
 Median : 0.2936  
 Mean   : 0.2277  
 3rd Qu.: 0.6392  
 Max.   : 1.7167  
str(data)
'data.frame':   100 obs. of  5 variables:
 $ x1: num  -0.626 0.184 -0.836 1.595 0.33 ...
 $ x2: num  -0.93664 0.2047 -1.29109 1.6743 0.00222 ...
 $ x3: num  -0.661 1.386 -0.659 5.607 2.031 ...
 $ x4: num  1.537 1.948 0.699 0.805 0.792 ...
 $ x5: num  -0.138 0.403 -0.861 1.001 0.457 ...

22.3 Попарні кореляції

cor_matrix <- cor(data) 
cor_matrix
A matrix: 5 × 5 of type dbl
x1x2x3x4x5
x11.00000000.88229020.96657190.14705120.7424915
x20.88229021.00000000.92764120.19344550.8558624
x30.96657190.92764121.00000000.17320750.7766687
x40.14705120.19344550.17320751.00000000.3233324
x50.74249150.85586240.77666870.32333241.0000000

22.3.1 Heatmap

library(ggplot2)
library(reshape2)

ggplot(melt(cor_matrix), aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
          labs(title = "Heatmap for Correlation Matrix", x = "", y = "")

ggplot(subset(melt(cor_matrix), lower.tri(cor_matrix)), 
       aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(title = "Heatmap for Correlation Matrix", x = "", y = "")

ggplot(subset(melt(cor_matrix), lower.tri(cor_matrix)), 
       aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", mid = "white", high = "red", midpoint = 0) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) +
  labs(title = "Heatmap for Correlation Matrix", x = "", y = "") +
          geom_text(aes(label = round(value, 2)), color = "black", size = 5)

pairs(data, upper.panel = NULL)