Services.Rmd
This vignette explains the R script used to transform the IVRS data used in the Publicly Funded Services dashboard.
Here tidyverse is used for functions used such as dplyr and mutate. Also the respective data files are loaded in as well.
library(tidyverse)
# load in IVRS adjusted inflation file
IVRS_data_old <- read.csv(file.choose(), header=TRUE, stringsAsFactors=FALSE, check.names = FALSE)
#load in IVRS FFY2020 file
IVRS_data_new_2020 <- read.csv(file.choose(), header=TRUE, stringsAsFactors=FALSE, check.names = FALSE,fileEncoding="UTF-8-BOM")
#load in IVRS FFY2021 file
IVRS_data_new_2021 <- read.csv(file.choose(), header=TRUE, stringsAsFactors=FALSE, check.names = FALSE)
One of the things done to this data set is parsing out location information from the Client Location column. The column is initially in the format of “city, state, zip code”.
After parsing, we now have the new columns, Client City, Client State, Client Zip Code, and Client Zip Code Extra Info for zip codes that were longer than 5 digits.
# Initial split on comma to get City
IVRS_data_transformed <- IVRS_data_old %>%
separate('Client Location', c('Client City', 'Client State Zip Code'), ',')
# Remove any leading and trailing whitespace
IVRS_data_transformed$'Client City' <- str_trim(IVRS_data_transformed$'Client City')
# Secondary split to get State and Zipcode
IVRS_data_transformed <- IVRS_data_transformed %>%
separate('Client State Zip Code',c('Client State','Client Zip Code'),sep="\\s+(?=\\S*$)")
# Remove any leading and trailing whitespace
IVRS_data_transformed$'Client State' <- str_trim(IVRS_data_transformed$'Client State')
# Split Zip code into 5 digit format and extra information
IVRS_data_transformed <- IVRS_data_transformed %>%
separate('Client Zip Code',c('Client Zip Code','Client Zip Code Extra Info'),sep=5)
# Remove any leading and trailing whitespace
IVRS_data_transformed$'Client Zip Code' <- str_trim(IVRS_data_transformed$'Client Zip Code')
# Remove '-' from extra zip code info
IVRS_data_transformed$'Client Zip Code Extra Info' <-
str_replace_all(IVRS_data_transformed$'Client Zip Code Extra Info', "[^[:alnum:]]", "")
# Remove any leading and trailing whitespace
IVRS_data_transformed$'Client Zip Code Extra Info' <- str_trim(IVRS_data_transformed$'Client Zip Code Extra Info')
The next thing done is joining extra information that was obtained for cases in 2020 - 2021, such as academic participation, to the data set.
One of Iowa’s counties is O’Brien county which has a pesky apostrophe that gets dropped or displayed as strange characters when importing and saving data sets.
To solve this, the incorrect name is restored to the correct name and the R script is saved with UTF-8 encoding as to not lose the strange characters typed into the R script.
# Renames Iowa county O'Brien to have ' instead of O’Brien (Double check wording in the data as these characters consistently incorrectly save)
# !!IMPORTANT!! File needs to be saved using UTF-8 encoding for the special characters to not disappear.
# This can be changed in File -> Save with encoding -> Choose Encoding -> UTF-8
IVRS_merged$'Client County'[IVRS_merged$'Client County' == 'O’Brien'] <- "O'Brien"
There are a number of inconsistencies in the data when it comes to state naming conventions.
Some states are full names, others are abbreviations, and some are a strange combination of the two.
This section deals with those inconsistencies and would potentially need updating in the future for other cases.
In the end, states are converted to be the full state name.
# Changes state names to abbreviations
stateNames <- rep(NA, dim(IVRS_merged)[1])
index <- 0
for(state in IVRS_merged$'Client State'){
state <- str_trim(state)
index <- index + 1
if(is.na(state)){
stateNames[index] <- NA
}
else if(state == 'ArKS'){
stateNames[index] <- 'Arkansas'
}
else if(state == 'ARKS'){
stateNames[index] <- 'Arkansas'
}
else if(state == 'District of Columbia'){
stateNames[index] <- state
}
else if(state == 'Puerto Rico'){
stateNames[index] <- NA
}
else if(state == 'West VA'){
stateNames[index] <- 'West Virginia'
}
else if(nchar(state) > 2){
stateNames[index] <- state
}
else {
stateNames[index] <- state.name[grep(state, state.abb)]
}
}
IVRS_merged$'Client State' <- stateNames
One of the most important fields in this data set is the wage information.
This data allows us to determine how effective the IVRS program is.
Please read the comments below for more detail on the calculations done.
# Change wage at application and closure values to correctly calculate wage change
#
# Note: Wage change is computed by the formula (Wage (Closure) - Wage (Application) = Wage Change).
#
# NA's in the wage data indicates that the client was unemployed.
#
# To make the formula accurate, Wage (Application) NA's are changed to 0 so as to correctly calculate if there was a wage increase.
# (i.e. client A indicates they are unemployed at the start of application [Wage (Application)], and at the end their wage was $15 an hour [Wage (Closure)].
# If unemployed was left at NA then the formula would return NA since any NA value in an equation defaults to equaling NA in R. However since
# it is being set to 0, this will correctly compute a $15 increase in wage change.)
#
# Also vice versa, any 0's in Wage (Closure) were changed to NA so that the formula would output NA to indicate that the wage change
# resulted in unemployment.
#
# hourly
IVRS_merged$'Hourly Wage (Application)'[is.na(IVRS_merged$'Hourly Wage (Application)')] <- 0
IVRS_merged$'Hourly Wage (Closure)'[IVRS_merged$'Hourly Wage (Closure)' == 0] <- NA
#monthly
IVRS_merged$'Monthly Wage (Application)'[is.na(IVRS_merged$'Monthly Wage (Application)')] <- 0
IVRS_merged$'Monthly Wage (Closure)'[IVRS_merged$'Monthly Wage (Closure)' == 0] <- NA
#annually
IVRS_merged$'Annual Wage (Application)'[is.na(IVRS_merged$'Annual Wage (Application)')] <- 0
IVRS_merged$'Annual Wage (Closure)'[IVRS_merged$'Annual Wage (Closure)' == 0] <- NA
# Recalculated wage change fields
IVRS_merged$'Hourly Wage Change' <- IVRS_merged$'Hourly Wage (Closure)' - IVRS_merged$'Hourly Wage (Application)'
IVRS_merged$'Monthly Wage Change' <- IVRS_merged$'Monthly Wage (Closure)' - IVRS_merged$'Monthly Wage (Application)'
IVRS_merged$'Annual Wage Change' <- IVRS_merged$'Annual Wage (Closure)' - IVRS_merged$'Annual Wage (Application)'
# Creates new columns for whether there was a wage increase, no change, decrease, or unemployed
IVRS_merged <- mutate(IVRS_merged, 'Hourly Wage Change Category' = case_when(
IVRS_merged$'Hourly Wage Change' < 0 ~ "Decrease",
IVRS_merged$'Hourly Wage Change' == 0 ~ "No Change",
IVRS_merged$'Hourly Wage Change' > 0 ~ "Increase",
is.na(IVRS_merged$'Hourly Wage Change') ~ "Unemployed"))
IVRS_merged <- mutate(IVRS_merged, 'Monthly Wage Change Category' = case_when(
IVRS_merged$'Monthly Wage Change' < 0 ~ "Decrease",
IVRS_merged$'Monthly Wage Change' == 0 ~ "No Change",
IVRS_merged$'Monthly Wage Change' > 0 ~ "Increase",
is.na(IVRS_merged$'Monthly Wage Change') ~ "Unemployed"))
IVRS_merged <- mutate(IVRS_merged, 'Annual Wage Change Category' = case_when(
IVRS_merged$'Annual Wage Change' < 0 ~ "Decrease",
IVRS_merged$'Annual Wage Change' == 0 ~ "No Change",
IVRS_merged$'Annual Wage Change' > 0 ~ "Increase",
is.na(IVRS_merged$'Annual Wage Change') ~ "Unemployed"))
Office areas are in the format “Office Name (Office Number)” with some having an added “(inactive)” if the office area is no longer active.
This code removes the extra info and leaves only the office areas name.
# remove numbers from office areas
IVRS_merged$`Office Area` <- gsub("\\s*\\([^\\)]+\\)","",IVRS_merged$`Office Area`)
Lastly the now clean data is written to a csv file for use in Tableau.
# write to file
write.csv(IVRS_merged, "C:/Users/joelm/Documents/GitHub/DHR-Disabilities/Data Exploration/Datasets/IVRS Data/Cleaned_Closed_Iowa_Vocational_Rehabilitation_Cases.csv", row.names = FALSE)