Web crawler?

i don't even understand how to run this

HAHA

Sounds legit. Need to drag some more coders in here.

2 Likes

Because he just asked how to set up a web crawler, and a web crawler can have a lot of uses. In first instance, I would always think of it as something to extract info from the web, not just obfuscate your traffic.

On-topic: seems like some people already made a browser extension that sends random search queries to the main search engines. It's not youtube specific, but your google searches also influence ads/recommendations on youtube.

@Bender2K1, you might be interested in this:

Let's do it. I've been looking for something like this for a while, but haven't been able to get it to work. Let's use either python or Java so it's easy to cross-platform it.

The way I see it, we have a few options. Part of the problem with this program is that a lot of sites require javascript to run and cookies to be saved, so simply using wget or curl to download pages won't work. We almost need to use xdotool or something similar to interact with the window. I figure running an instance of firefox will help us.

We can set it up with either fake accounts or existing accounts that we use. We can build it so it can sign up for accounts on its own, but that may be difficult because of recapcha and other anti-spam systems.

Then it will go straight to facebook, youtube, instagram and tumblr to create accounts if they don't exist already. I guess it will sit there and wait for input if you have a page with a capcha it can't solve. (I know there are capcha solving libraries out there) Once it's got those accounts, it starts following and liking pages or people. From there, we have seed data. This will provide us with links to click. It will "watch" all the youtube videos in "recommended videos", "read" all the articles that show up on facebook and tumblr, and I'm sure we can also design it to respond to the articles like an angry teen who, despite having things pretty well, can't seem to shake the thought that they deserve a lot more.

3 Likes


lol. I actually used to use this from time to time in the later 90s and early 2000s :D

1 Like

Your missing one thing. I don't think you actually know what you want to protect yourself from or if you even need to.

Don't want Google to suggest things you might like on youtube? Just turn it off. https://myaccount.google.com/privacy

Theres really nothing else to do there.

If you want to hide your browsing activity from government spying, you need to do a lot more than using random agent spoofer.

2 Likes

I don't think google respects privacy at all.

Anything specific you're worried about them doing with your data?

The only real solution is to not use google services, which I guess is not something you consider doing.

Python please. Python ftw

Obviously python is going to be better than Java in this situation and I agree that it's probably a good language to use, but many of the nuances in python drive me up the fucking wall.

Ha I have something written but the website keeps 403'ing whenever I try to post it.

for whatever reason you can't post certain content. It could be directory paths if you have any in the content you are trying to post.



None of those things are the issue.

It's taking me longer to figure out how to post this than it took me to write it. EDIT: Figured it out, it was the time.sleep call which was breaking the post.

It's an incomplete framework that people can feel free to modify to get it working.

import requests
import BeautifulSoup
import time.sleep as slp
import numpy as np
base_url = 'https://www.youtube.com'
searchStr = 'test'
with requests.session() as s:
    r = s.get('%s/results?search_query=%s' % (base_url, searchStr))
    assert r.status_code == 200
    soup = BeautifulSoup.BeautifulSoup(r.text)
    items = soup.findAll('h3', {'class': 'yt-lockup-title '})
    url_gen = (item.a['href'] for item in item)
    urls = ['%s%s' (base_url, url) for url in url_gen if 'watch' in url]
    for url in urls:
        s.get(url)
        slp(np.random.randint(5,15))

im sorry but i don't understand what i'm supposed to do with this?

Yes, this is what OP wants. And it should be called a Dummy Traffic generator, not a "web crawler".

It needs to do these things:

  • generate random network traffic by visiting all sorts of pages
  • do random search queries on all search engines (or just skip this and use search.disconnect.me or startpage or duckduckgo)
  • generate random traffic within specific products like facebook - hard because mouse movements are tracked

That last one is important because companies like FB know exactly when you sleep, when you come home from work, when you're in the mood for x and y, all based on when you're using fb and how you move your mouse on the page, how you hover on things etc.

Also, how advanced are trackmenot's queries? If for ex I'm a doctor and usually do lengthy medical queries, and tarckmenot only has one word queries, that's no good.

@wendell if you found any specific tools for dummy traffic generating, please drop them here.

1 Like

Shit man, Facebook tracks the mouse cursor? throws laptop fuck this shit, I'm going back to myspace.

As far as mouse movement goes, it seems like in order to do this properly, you'll need a VM. you can probably get away with 2GB ram 2 threads on a Linux VM and use the xdotool package to automate the mouse movement, and we could probably generate a profile by recording our own mouse movement habits if it feels necessary. Mouse movement isn't hard to do, it's just a pain in the ass that we need to do this in the first place.

1 Like

Hide yo kids, hide yo wife. Facebook straight up tells you (sometimes) in a top bar, similar to "we use cookies", that "in order to improve your experience bla bla, we use your browsing habits and mouse cursor movement on our page.."

And they use fb activity and notifications as a resource, like a shitty exploitative f2p model; withholding them and timing when you get what and from whom. Out of what (they think) you're interested in, the goal is not giving you posts that would help you, or posts that you'd agree with, or posts that are clickbait, but rather posts that generate more traffic on fb from you and in turn your connections.

And typing speed and whatnot in various websites/apps is also recorded.

You could use Diaspora...

Yeah, it seems we need to split up everything into vms and containers. from our browsers to our storage containers.
Someone needs to make a convenient lightweight VM system where you can just pull up a new vm for FB, a new VM for google, a new VM for skype etc. And the VM takes care of spoofing and obscuring you.

Things have escalated too far. We need to get angry, as McAfee keeps saying in his videos.

I was doing really well. I hadn't touched facebook since February, but there was a 4th of july party that I needed to RSVP for and I was more or less forced by the host to RSVP on the facebook event.

Things have escalated too far. That's why I'm trying to push people I know and interact with towards privacy.

Also, TrackMeNot's github hasn't had any activity in 8 months. Not sure what that says about the project. It can't be very in depth then.

I try not to get angry about things. When I get angry I don't think straight and can't debate/argue effectively. Instead, I try to convince people that I'm not a tinfoil hat nutcase. My primary example is the snowden leaks. I kept saying that this stuff is happening and people called me crazy. Now look at me, I WAS RIGHT!!! I KNEW IT! !! everybody called me crazy, but they didn't know what I knew in my head./s

Seriously though. Just show people a few examples of things you told them years ago that turned out to be true. Then hopefully you can convince them that what you're saying, although it sounds far fetched, could definitely be true.

Yeah, it's even necessary to use public manipulation tactics, like appeal to emotion, slippery slope arguments, cherry picking, and other fallacies just because the opposition is, and the public isn't tech savvy enough.

John McAfee seems to be very good at that. Check out the way he, on the spot, took down the CIA bullshitter by using his own tactics against him (and also reason and knowledge):

This shit is like that Thank You For Smoking movie where harvy dent is like "scientists disagree on the subject of global warming" (and the data was smth like 97% of scientists vs 3% of scientists (who were paid off)).

PS: just because you're paranoid it doesn't mean you're wrong.