This vignette explains the process done to get the data being used in the Mental Health and Disability Services dashboard.
The first step was downloading all region expenditure PDFs for 2021.
The next step was converting the PDFs to a CSV file.
After going through many different applications including excels import PDF, I found smallpdf.com that has a tool that correctly converted the PDFs to CSV files.
Here is an example of the smallpdf.com csv conversion file:
Unfortunately I was unable to come up with a solution to automating the cleaning of these PDFs in the period of time allotted for working on the MHDS expenditure data. The quickest way for me to clean them was manually. Here is the process done to each converted CSV file.
Columns were created for region, year, category, sub category 1, and sub category 2.
Next I populated the columns with the respective information. I used the dark colored bold lettered rows as the category, the lightly colored bold lettered rows as the sub category 1, and lastly the regular rows as sub category 2.
Next I removed the un-needed titles and subtotal rows and the combined categories column.