Page 1 of 1
Posted: Wed, 31 Jul 2013, 22:03
I'm wondering how the numerous "Search Robots" operate which appear in the 'who's on' list at the bottom of the main page.
I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot? Is that what it is or is it something else, and if so, what do you think they are looking for?
Posted: Wed, 31 Jul 2013, 23:10
Some are google, yahoo etc. That's how you can do google searches and get hits on the forum.
Some may be malicious and go nowhere due to security measures and firewalls etc.
So, basically, I have no idea.
Posted: Thu, 01 Aug 2013, 02:33
bladecar wrote: I'm guessing they are scripts that look for certain key words and report their occurrence to whoever sets up the Search Robot?
My understanding is that most if not all of them are standard search engine "spiders" that simply read every web page following every link. Each page is stored in a massive database. When you look at a "cached" version of a page, you are looking at that copy. (Access to the cached pages seem to move around a lot, and I haven't seem Google's for a while now.)
There is a proprietary algorithm that indexes the incredible number of searchable words in every page on the internet. (Well, they cut it down a fair bit by ignoring porn sites, and any pages with robot keep away notices).
When you think about it, it's a heroic job. First, there is the job of crawling most of the entire internet at a suitable rate, crawling faster where there are more changes or more demand (popular blogs, newspaper stories that are mostly current for only a day or so, etc.). Then there is the storage of all that. Then the keyword indexing. Then the algorithm for sanitising searches (fixing spelling errors, handling plurals and other forms of keywords, and so on). Then ranking of search results, and avoiding the efforts of millions of people to "game" the results. Then you have to be able to generate just the first say 20 results, and be able to bring up the next 20 if the user wants more; I'm sure when Google says there are 27,000 results, it doesn't generate them all as web pages, and throw away all but the few that I look at.
All paid for by advertising. Amazing, to me at least.
Posted: Thu, 01 Aug 2013, 09:49
I'm trying to follow this... (darned if I know why)
The google search is in on the cached page only and google won't offer the result if the search string is not found in its cached copy. So its robot must update the cached page for you to be able to search it if there was a change to the page.
So does their software download new pages completely each visit ? I guess so otherwise it wouldn't know there was a change. Unless it reads some HTML flag showing there was a change ?
However, if you click on the search result you then go to the live web site surely, not the google cached version ? e.g try rain radar or just an edit on the forum.
So a search to know there are 27,000 results has to actually have been done on all its cached web pages. They are just results afterall.
Still, basically, I have no idea.
edit: yeah, that spelling. Fix it before weber see it.
Posted: Thu, 01 Aug 2013, 18:29
acmotor wrote: However, if you click on the search result you then go to the live web site surely, not the google cashed version ?
Yes, though you have the choice. You can go to the cached version using the drop down box, as shown below:
Sometimes the page you want has changed such that the info you want is no longer there. But you can still view the original page using the above button.
What frustrates me is that sometimes the info you want isn't on the cached page either. In that case, it admits that the search keywords were only in pages pointing to this page, or some such.
Posted: Thu, 01 Aug 2013, 18:37
Yes, that can be useful, on very rare occasions. But given the uncertainty of the cached page date, your frustration may continue.
Posted: Thu, 01 Aug 2013, 20:42
Skynets dominions are on the rise...