Wendell's Ebay Bots

JustinUtherdude · March 10, 2023, 5:41pm

I was looking for more information on Wendell’s ebay/web scraping bots. I actually came across a 7 year old post that he had commented on where he had left some of the tools that he had used to make it. I was wondering if anything had changed in the last 7 years or so as far as tools go. I was also wondering if anyone had any useful information and resources on building my own bot. I’m setting up a homelab and this would both help me find some of the equipment at reasonable prices and would give the lab PCs something to begin doing (or myself something to build experience doing). I’m trying to get more into network engineering and the homelab scene. I’m planning on running a forbidden router with high speed networking, so I’m going to try out tnsr (maybe on top of xcp-ng if I can actually pass things through without touching the kernel). I also want to try out clustering and automation. If you guys have any good resources or videos on these then let me know! I have an asus threadripper pro motherboard and a few 100 Gbe nics. This should make for a good project.

Iron_Bound · March 10, 2023, 5:52pm

Could try building a bot with GitHub - huginn/huginn: Create agents that monitor and act on your behalf. Your agents are standing by!

max1220 · March 11, 2023, 1:27pm

Disclaimer: Keep the legality of such things in mind! You might break some terms of service, or even data privacy laws. Do your own research on this. Scraping a website actually cost the website money.

When scraping, there are basically 2 ways:
Implement a basic scraper using just GET/POST requests manually. This has the best performance, but is easily detectable, and some services will block you.
The devtools have a “copy as cURL request” option that can get you started.

You also can do scraping in real browsers. Performance is slower since an actual page is rendered, but everything you can do with a browser should be possible.

You can remote-control chromium-based browsers via the protocol that is used for the devtools: Chrome DevTools Protocol which you
This requires an X11/Wayland server to function AFAIK.

Here’s a list of “headless”(doesn’t require a desktop) remote-controllable browsers:

You might also need a service to resolve captchas(e.g. website uses cloudflare). There are some machine-learning tools available for this, but you might not be able to avoid paying some service for solving captchas.
Scraping websites was a lot easier in the past…

In any scenario you should rate-limit scraping, since that means that you won’t cause any troubles for the server owners with sudden high loads etc. and makes you less likely to be detected or banned. Really fast non-rate-limited scraping is basically the same as a Denial-of-Service attack.

You never want to use your 100G to scrape some poor unsuspecting website without rate-limiting

JustinUtherdude · March 11, 2023, 2:51pm

Thanks. I’ll keep the ToS and rate limiting in mind. Currently I’m just looking to scrape some classifieds and ebay for equipment. Maybe once every 15-30 minutes. Then I want it to send me an email if it finds something. I think that the huginn bot might be the ticket for this. Once I have the basics of scraping down then I’d like to learn the more advanced topics like signing in to accounts and checking for more detailed parameters.