Creating Your Own Datasets For Analysis— A Faker Library Tutorial
If you have ever came across among the multitude of videos, blog posts, or courses on Data Analytics on your favorite platform such as TikTok, Udemy, among others, recommendations remain the same, learn technical skills, SQL, Excel, Tableau, and so on, which is super important, nonetheless one thing that is always left aside is the data you can use to put into practice what you just learned.
Sure enough there are plenty of datasets available online for free in sites such as Kaggle, oftentimes though the information available may not entirely suit your test case or might not include all of the features you are hoping, an example is this dataset where you can find credit card fraud information, metadata has been left out due to Payment Card Assurance (PCA) compliance to protect the cardholder’s private information, hence your analysis gets limited.
So what’s the answer you may wonder? Look no further, Faker library is here to help us!
For my test case, I will be complimenting this US Consumer Finance Complaints dataset with data generated by Faker, the library has available bindings in Python, Javascript, among others. I will be using the Javascript one for this experiment.
Taking a peek at column labels from the dataset I found we have some dates, unique IDs of the complaints, location data, among other things.
To enrich the data and make it more fun to analyze, I want to add the following columns too:
- Age of the filing person.
- Full name.
- Satisfaction rating with the company.
- Email address.
- Phone number.
The next step is to browse through Faker’s documentation and look which classes and methods can help us achieve our goal.
A brief look at the project’s Github repository reveals the available data we can generate, sweet!
Skimming through the documentation and the requirements we have set, I wrote the following code in TypeScript and run it with NodeJS so it can give us the information I want to have:
Once the script finishes running, its just a matter copying and pasting the column data into the US Consumer Finance Complaints dataset, and import the final file into your favorite visualization tool (Tableau, Power Bi, etc.) for analysis.
Next steps
Now that we went over the basics of how you can use the Faker library to enrich your datasets you find online, you are ready to start leveraging this library to create the information you need to practice your visualization skills.
It is my hope you’ve enjoyed this introduction on how to use Faker!
If you liked this post, stay tuned as I am currently planning to share more related stuff on Data Analytics and Electronics!