Search Robots

From go-karts and bicycles to electric eskies and kids scooters.
Post Reply
whimpurinter
Senior Member
Posts: 513
Joined: Tue, 05 Jul 2011, 16:32
Location: Brisbane

Search Robots

Post by whimpurinter »

Hi,

I'm wondering how the numerous "Search Robots" operate which appear in the 'who's on' list at the bottom of the main page.

I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot? Is that what it is or is it something else, and if so, what do you think they are looking for?
User avatar
acmotor
Senior Member
Posts: 3614
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor »

Some are google, yahoo etc. That's how you can do google searches and get hits on the forum.
Some may be malicious and go nowhere due to security measures and firewalls etc.
So, basically, I have no idea. Image
converted RedSuzi, the first industrial AC induction motor conversion
on to iMiEV MY12 did 114,463km
now Tesla Model 3, 4/2021 MIC pearl white
User avatar
coulomb
Site Admin
Posts: 6340
Joined: Thu, 22 Jan 2009, 20:32
Real Name: Mike Van Emmerik
Location: Brisbane
Contact:

Search Robots

Post by coulomb »

bladecar wrote: I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot?

My understanding is that most if not all of them are standard search engine "spiders" that simply read every web page following every link. Each page is stored in a massive database. When you look at a "cached" version of a page, you are looking at that copy. (Access to the cached pages seem to move around a lot, and I haven't seem Google's for a while now.)

There is a proprietary algorithm that indexes the incredible number of searchable words in every page on the internet. (Well, they cut it down a fair bit by ignoring porn sites, and any pages with robot keep away notices).

When you think about it, it's a heroic job. First, there is the job of crawling most of the entire internet at a suitable rate, crawling faster where there are more changes or more demand (popular blogs, newspaper stories that are mostly current for only a day or so, etc.). Then there is the storage of all that. Then the keyword indexing. Then the algorithm for sanitising searches (fixing spelling errors, handling plurals and other forms of keywords, and so on). Then ranking of search results, and avoiding the efforts of millions of people to "game" the results. Then you have to be able to generate just the first say 20 results, and be able to bring up the next 20 if the user wants more; I'm sure when Google says there are 27,000 results, it doesn't generate them all as web pages, and throw away all but the few that I look at.

All paid for by advertising. Amazing, to me at least.
MG ZS EV 2021 April 2021. Nissan Leaf 2012 with new battery May 2019.
5650 W solar, 2xPIP-4048MS inverters, 16 kWh battery.
Patching PIP-4048/5048 inverter-chargers.
If you appreciate my work, you can buy me a coffee.
User avatar
acmotor
Senior Member
Posts: 3614
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor »

I'm trying to follow this... (darned if I know why) Image

The google search is in on the cached page only and google won't offer the result if the search string is not found in its cached copy. So its robot must update the cached page for you to be able to search it if there was a change to the page.
So does their software download new pages completely each visit ? I guess so otherwise it wouldn't know there was a change. Unless it reads some HTML flag showing there was a change ?

However, if you click on the search result you then go to the live web site surely, not the google cached version ? e.g try rain radar or just an edit on the forum.
So a search to know there are 27,000 results has to actually have been done on all its cached web pages. They are just results afterall.

Still, basically, I have no idea. Image

edit: yeah, that spelling. Fix it before weber see it.
Last edited by acmotor on Thu, 01 Aug 2013, 08:39, edited 1 time in total.
converted RedSuzi, the first industrial AC induction motor conversion
on to iMiEV MY12 did 114,463km
now Tesla Model 3, 4/2021 MIC pearl white
User avatar
coulomb
Site Admin
Posts: 6340
Joined: Thu, 22 Jan 2009, 20:32
Real Name: Mike Van Emmerik
Location: Brisbane
Contact:

Search Robots

Post by coulomb »

acmotor wrote: However, if you click on the search result you then go to the live web site surely, not the google cashed version ?

Yes, though you have the choice. You can go to the cached version using the drop down box, as shown below:

Image

Sometimes the page you want has changed such that the info you want is no longer there. But you can still view the original page using the above button.

What frustrates me is that sometimes the info you want isn't on the cached page either. In that case, it admits that the search keywords were only in pages pointing to this page, or some such.
MG ZS EV 2021 April 2021. Nissan Leaf 2012 with new battery May 2019.
5650 W solar, 2xPIP-4048MS inverters, 16 kWh battery.
Patching PIP-4048/5048 inverter-chargers.
If you appreciate my work, you can buy me a coffee.
User avatar
acmotor
Senior Member
Posts: 3614
Joined: Thu, 26 Apr 2007, 03:30
Real Name: Tuarn
Location: Perth,Australia

Search Robots

Post by acmotor »

Yes, that can be useful, on very rare occasions. But given the uncertainty of the cached page date, your frustration may continue.

converted RedSuzi, the first industrial AC induction motor conversion
on to iMiEV MY12 did 114,463km
now Tesla Model 3, 4/2021 MIC pearl white
User avatar
Richo
Senior Member
Posts: 3737
Joined: Mon, 16 Jun 2008, 00:19
Real Name: Richard
Location: Perth, WA

Search Robots

Post by Richo »

Skynets dominions are on the rise...
So the short answer is NO but the long answer is YES.
Help prevent road rage - get outta my way!
Post Reply