Search Robots

From go-karts and bicycles to electric eskies and kids scooters.
Post Reply
bladecar
Groupie
Posts: 338
Joined: Tue, 05 Jul 2011, 16:32
Location: Brisbane

Search Robots

Post by bladecar » Wed, 31 Jul 2013, 22:03

Hi,

I'm wondering how the numerous "Search Robots" operate which appear in the 'who's on' list at the bottom of the main page.

I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot? Is that what it is or is it something else, and if so, what do you think they are looking for?

User avatar
acmotor
Senior Member
Posts: 3591
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor » Wed, 31 Jul 2013, 23:10

Some are google, yahoo etc. That's how you can do google searches and get hits on the forum.
Some may be malicious and go nowhere due to security measures and firewalls etc.
So, basically, I have no idea. Image
iMiEV MY12     102,980km in pure Electric and loving it !

User avatar
coulomb
Site Admin
Posts: 3165
Joined: Thu, 22 Jan 2009, 20:32
Real Name: Mike Van Emmerik
Location: Brisbane
Contact:

Search Robots

Post by coulomb » Thu, 01 Aug 2013, 02:33

bladecar wrote: I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot?

My understanding is that most if not all of them are standard search engine "spiders" that simply read every web page following every link. Each page is stored in a massive database. When you look at a "cached" version of a page, you are looking at that copy. (Access to the cached pages seem to move around a lot, and I haven't seem Google's for a while now.)

There is a proprietary algorithm that indexes the incredible number of searchable words in every page on the internet. (Well, they cut it down a fair bit by ignoring porn sites, and any pages with robot keep away notices).

When you think about it, it's a heroic job. First, there is the job of crawling most of the entire internet at a suitable rate, crawling faster where there are more changes or more demand (popular blogs, newspaper stories that are mostly current for only a day or so, etc.). Then there is the storage of all that. Then the keyword indexing. Then the algorithm for sanitising searches (fixing spelling errors, handling plurals and other forms of keywords, and so on). Then ranking of search results, and avoiding the efforts of millions of people to "game" the results. Then you have to be able to generate just the first say 20 results, and be able to bring up the next 20 if the user wants more; I'm sure when Google says there are 27,000 results, it doesn't generate them all as web pages, and throw away all but the few that I look at.

All paid for by advertising. Amazing, to me at least.
Learning how to patch and repair PIP-4048 inverter-chargers and Elcon chargers.

User avatar
acmotor
Senior Member
Posts: 3591
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor » Thu, 01 Aug 2013, 09:49

I'm trying to follow this... (darned if I know why) Image

The google search is in on the cached page only and google won't offer the result if the search string is not found in its cached copy. So its robot must update the cached page for you to be able to search it if there was a change to the page.
So does their software download new pages completely each visit ? I guess so otherwise it wouldn't know there was a change. Unless it reads some HTML flag showing there was a change ?

However, if you click on the search result you then go to the live web site surely, not the google cached version ? e.g try rain radar or just an edit on the forum.
So a search to know there are 27,000 results has to actually have been done on all its cached web pages. They are just results afterall.

Still, basically, I have no idea. Image

edit: yeah, that spelling. Fix it before weber see it.
Last edited by acmotor on Thu, 01 Aug 2013, 08:39, edited 1 time in total.
iMiEV MY12     102,980km in pure Electric and loving it !

User avatar
coulomb
Site Admin
Posts: 3165
Joined: Thu, 22 Jan 2009, 20:32
Real Name: Mike Van Emmerik
Location: Brisbane
Contact:

Search Robots

Post by coulomb » Thu, 01 Aug 2013, 18:29

acmotor wrote: However, if you click on the search result you then go to the live web site surely, not the google cashed version ?

Yes, though you have the choice. You can go to the cached version using the drop down box, as shown below:

Image

Sometimes the page you want has changed such that the info you want is no longer there. But you can still view the original page using the above button.

What frustrates me is that sometimes the info you want isn't on the cached page either. In that case, it admits that the search keywords were only in pages pointing to this page, or some such.
Learning how to patch and repair PIP-4048 inverter-chargers and Elcon chargers.

User avatar
acmotor
Senior Member
Posts: 3591
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor » Thu, 01 Aug 2013, 18:37

Yes, that can be useful, on very rare occasions. But given the uncertainty of the cached page date, your frustration may continue.

iMiEV MY12     102,980km in pure Electric and loving it !

User avatar
Richo
Senior Member
Posts: 3219
Joined: Mon, 16 Jun 2008, 00:19
Real Name: Richard
Location: Perth, WA

Search Robots

Post by Richo » Thu, 01 Aug 2013, 20:42

Skynets dominions are on the rise...
Help prevent road rage - get outta my way! Blasphemy is a swear word. Magnetic North is a south Pole.

Post Reply