The situation is, I am friends with a couple that does foster child care and they want to start a site that essentially contains a directory of doctors/dentists/etc. who accept foster child insurance around our metro area. The problem there of course being that they do not have a list of all the medical providers in the city to even begin figuring out which ones accept what insurance, so I’m really looking for a solution to that.
I had been looking around Google APIs to see if I could just get a list of all the doctors in the city, but they have a 60 result limit. Then I was poking around a Yellow Pages site and it occurred to me that I might use a web scraper there to page through and get a list of doctors and their basic info.
If there are no alternative ideas on here, anyone have any experience with scraping software to make some recommendations? Preferably free and something that they can use without being developers (he’s doing some front end web design stuff I think, pretty new to it, they’re both pretty smart people though). Doesn’t need to do much, just scraping Yellow Pages now and again for probably not that much data. I am a developer, but I am not going to manage this for them long term, I’m just trying to provide guidance and probably setup a database for them to store this info for their site. They are thinking about Wordpress, which I suppose is fine, but I assume I will need to write them some code to connect to the database, provide CRUD, and probably a CSV/Excel upload so they can populate this database easily.
Look to see if yellow pages has a rest api, scraping should be a least resort as it’s horrible and depending on how you do it could get you in legal trouble.
Might be an old topic but, scraping definitely can’t get you in legal trouble, especially if you’re doing it at such a small scale. Worst thing that happens is your IP get’s blocked by whatever you’re scraping.
As for scraping web pages without being a developer, I’m not aware of any such tool.
I’m not sure web scraping is a good solution for this problem anyway. You’d need some hefty logic to determine all the details about each healthcare provider, as I imagine that kind of detail isn’t all in one spot per provider, so you’d end up needing to scrape possibly several different places for each individual provider. At the very least I can think of 3. One for locations, one to gather providers in that location, and one to scrape the website of that provider to find healthcare information (like what insurance is accepted).
Any amount of web scraping would require development. If you want to venture down that path, I use and recommend Scrapy
Otherwise, I would look into other ways to get this kind of data.
Thanks everyone, but after talking with them more I think I convinced them to go a more social platform route. Ie. they’ll have other foster parents provide info rather than try to “curate,” which nobody has time to do for free. I’m not looking further into this as of now.