5 Comments
User's avatar
Varun Sagar Theegala's avatar

Wonderful post Andres. I feel people feel bound to the datasets that they see available on Kaggle, ready to use and just build projects around it. Definitely not implying Kaggle doesn’t have interesting dataset.

However, techniques like using API and web scraping can be helpful to extract data for more unique topics/problem statements

Joe Hovde's avatar

Reddit had locked down their API a while ago, is it free now?

Andres Vourakis's avatar

As far as I know it has a free tier that is available for non-commercial uses, such as personal projects and academic research.

But to be honest, I haven’t tried it since the changes so I’m too familiar with its limitations.

I’m going to do some research and update my article if necessary, thank you!

Joe Hovde's avatar

Cool! I used to love doing projects with Pushift which i believe was affiliated with reddit and then at some point they started charging a ton (i believe because they realized all the LLM companies were getting hugely valuable data for free)

Data x Design's avatar

A great list! Now that a year has passed, are there any datasets that you would add or exclude here?