Title: | Person-Centered Analysis |
---|---|
Description: | Provides an easy-to-use yet adaptable set of tools to conduct person-center analysis using a two-step clustering procedure. As described in Bergman and El-Khouri (1999) <DOI:10.1002/(SICI)1521-4036(199910)41:6%3C753::AID-BIMJ753%3E3.0.CO;2-K>, hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure. |
Authors: | Joshua M Rosenberg [aut, cre], Jennifer A Schmidt [aut], Patrick N Beymer [aut], Rebecca R Steingut [ctb] |
Maintainer: | Joshua M Rosenberg <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.1 |
Built: | 2025-02-12 05:12:14 UTC |
Source: | https://github.com/jrosen48/prcr |
Create profiles of observed variables using two-step cluster analysis
create_profiles_cluster( df, ..., n_profiles, to_center = FALSE, to_scale = FALSE, distance_metric = "squared_euclidean", linkage = "complete" )
create_profiles_cluster( df, ..., n_profiles, to_center = FALSE, to_scale = FALSE, distance_metric = "squared_euclidean", linkage = "complete" )
df |
with two or more columns with continuous variables |
... |
unquoted variable names separated by commas |
n_profiles |
The specified number of profiles to be found for the clustering solution |
to_center |
Boolean (TRUE or FALSE) for whether to center the raw data with M = 0 |
to_scale |
Boolean (TRUE or FALSE) for whether to scale the raw data with SD = 1 |
distance_metric |
Distance metric to use for hierarchical clustering; "squared_euclidean" is default but more options are available (see ?hclust) |
linkage |
Linkage method to use for hierarchical clustering; "complete" is default but more options are available (see ?dist) |
Function to create a specified number of profiles of observed variables using a two-step (hierarchical and k-means) cluster analysis.
A list containing the prepared data, the output from the hierarchical and k-means cluster analysis, the r-squared value, raw clustered data, processed clustered data of cluster centroids, and a ggplot object.
d <- pisaUSA15 m3 <- create_profiles_cluster(d, broad_interest, enjoyment, instrumental_mot, self_efficacy, n_profiles = 3) summary(m3)
d <- pisaUSA15 m3 <- create_profiles_cluster(d, broad_interest, enjoyment, instrumental_mot, self_efficacy, n_profiles = 3) summary(m3)
Identifies potential outliers
detect_outliers(df, return_index = TRUE)
detect_outliers(df, return_index = TRUE)
df |
data.frame (or tibble) with variables to be clustered; all variables must be complete cases |
return_index |
Boolean (TRUE or FALSE) for whether to return only the row indices of the possible multivariate outliers; if FALSE, then all of the output from the function (including the indices) is returned |
* add an argument to ‘create_profiles_cluster()' to remove multivariate outliers based on Hadi’s (1994) procedure
either the row indices of possible multivariate outliers or all of the output from the function, depending on the value of return_index
Estimates R^2 (r-squared) values for a range of number of profiles
estimate_r_squared( df, ..., to_center = FALSE, to_scale = FALSE, distance_metric = "squared_euclidean", linkage = "complete", lower_bound = 2, upper_bound = 9, r_squared_table = TRUE )
estimate_r_squared( df, ..., to_center = FALSE, to_scale = FALSE, distance_metric = "squared_euclidean", linkage = "complete", lower_bound = 2, upper_bound = 9, r_squared_table = TRUE )
df |
with two or more columns with continuous variables |
... |
unquoted variable names separated by commas |
to_center |
(TRUE or FALSE) for whether to center the raw data with M = 0 |
to_scale |
Boolean (TRUE or FALSE) for whether to scale the raw data with SD = 1 |
distance_metric |
Distance metric to use for hierarchical clustering; "squared_euclidean" is default but more options are available (see ?hclust) |
linkage |
Linkage method to use for hierarchical clustering; "complete" is default but more options are available (see ?dist) |
lower_bound |
the smallest number of profiles in the range of number of profiles to explore; defaults to 2 |
upper_bound |
the largest number of profiles in the range of number of profiles to explore; defaults to 9 |
r_squared_table |
if TRUE (default), then a table, rather than a plot, is returned; defaults to FALSE |
Returns ggplot2 plot of cluster centroids
A list containing a ggplot2 object and a tibble for the R^2 values
student questionnaire data with four variables from the 2015 PISA for students in the United States
pisaUSA15
pisaUSA15
Data frame with columns #'
international student ID
international school ID
...
http://www.oecd.org/pisa/data/
Return plot of profile centroids
plot_profiles(d, to_center = F, to_scale = F)
plot_profiles(d, to_center = F, to_scale = F)
d |
summary data.frame output from create_profiles_cluster() |
to_center |
whether to center the data before plotting |
to_scale |
whether to scale the data before plotting |
Returns ggplot2 plot of cluster centroids
A ggplot2 object
Prints details of prcr cluster solution
## S3 method for class 'prcr' print(x, ...)
## S3 method for class 'prcr' print(x, ...)
x |
A 'prcr' object |
... |
Additional arguments |
Prints details of of prcr cluster solution
Concise summary of prcr cluster solution
## S3 method for class 'prcr' summary(object, ...)
## S3 method for class 'prcr' summary(object, ...)
object |
A 'prcr' object |
... |
Additional arguments |
Prints a concise summary of prcr cluster solution