I’ve been in the wrong business all this time

Nothing to do with seo - this is in my Second Life. Since February this year I’ve been renting skyboxes out (apartments in the sky) in SL (Second Life), and building it up very nicely to the point of taking real money out of SL - not in fantastic amounts, but real life pocket money just the same. I had some spare land, so a while back I set up a selling area where my tenants could sell their things free of charge. Then I decided to set up a shop and sell some things myself, and in a matter of weeks my shop profits have grown to the point of exceeding the profits from skybox rentals. At the moment, every week sees an increase of at least 50% on the week before, and the current week is shaping up to be just the same. Even now, I am looking at a small, but liveable, income from SL, and I’ll achieve it very soon.

I specialise in low prim items. Prims are primitives - the basic building blocks of everything in SL - cubes, spheres, etc. Everything in SL, large or small, is made by joining basic primitives together. Land in SL can only support a certain number of prims, and property renters (tenants) are only allowed to have a certain number in their property. For those reasons, low prim, quality items are very desirable, and that’s exactly what I make. If I’d known how good a shop would be, I would never have gone into skybox rentals. Rentals take time - dealing with tenants’ questions and problems, renting out to new people when tenants leave, etc., etc. But a shop takes no time at all. People buy copies of the items on display, so the stock never goes down, and the items cost me nothing because I make them.

Comments

My Second Life

For those who don’t know, Second Life is an online, multi-user, 3-D graphics world that currently has well over 8 million registrations, over 40,000 of whom are usually logged in at any given time. That’s where I’ve been for most of this year - so far. I was introduced to it by a family member and I have hardly been out of it since January. In that time I’ve built a business up in there, renting out apartments - I have almost 60 apartments to rent, which are often full, or very few are vacant.

Second Life (SL) is free to use, and most people use it for free, but you can also pay a small monthly subscription which entitles you to certain benefits, one of which is land ownership - you can buy land. Because of that, I started paying the subscription early on so that I could buy the land to put apartments on for renting out. While I was buying land, I was putting real money into it, over and above the monthly subscription, but I gradually built the business to a level that allows me to take money out of SL, so it’s now profitable for me. Not many people actually take money out, so I’m pleased with that.

A while back, I decided to stop buying more and more land to expand the apartments business, and now I’m engaged in creating things for people to buy - and people do buy them. I specialise in what I call “Chameleon” goods - objects such as furniture that are programmed to allow the user to change colours and textures to suit their decor and moods. Having been a programmer for a long time, I’m amazed how long it’s taken me to get into programming in SL, but I’m doing it much of the time now, with the result being more money for me :) I’m even selling my apartments (buildings) for other people to rent out - they really are that good.

For most people, SL isn’t about business - it’s about fun. There is just about everything imaginable in SL, and the experience is what each person makes of it. Today, for instance, I spent a little time relaxing by dancing, bull riding and shooting baskets (is that what they call throwing a basketball into a basket?). I skipped the mud wrestling because I always lose at that :( And, yes, there is plenty of ‘the other’ too ;)

I know what you’re thinking. If I’m so into SL, why am I spending time writing this? Sadly, right now there’s a power outage in San Francisco that’s affecting a significant part of SL, so they’ve taken it offline for a while, otherwise I would be in there, collecting rents and making things.

Comments (1)

Google’s Supplemental Index

The Big Daddy update of late 2005 to early 2006 was largely about installing a new Supplemental index. The new version is so different to the old version that it shouldn’t now be called the Supplemental index. The old Supplemental index was a repository for garbage webpages and such, and was accessed for the search results only when a reasonable number of results couldn’t be found in the regular index. The new version is very different because many millions of perfectly good pages are put in it.

Many, perhaps most, websites have plenty of their pages in the Supplemental index because their linkage profiles don’t score well enough. Even Google has pages in there - hundreds of thousand of them. A site’s linkage profile is an evaluation of the links into and out of the site. Things like linking to off-topic sites, and too high a percentage of a site’s inbound links being reciprocals, lowers the score of a site’s linkage profile, and reduces the number of pages that it can have in the Regular index, which means that more of its pages are placed in the Supplemental index. Improving the linkage profile brings pages out of the Supplemental index and into the Regular one.

Before Big Daddy, pages in the Supplemental index had been given the kiss of death - they rarely came out, and were rarely seen in the search results. But that has changed, and is continuing to change. It is now possible to bring pages out of the Supplemental index by getting some good links to the site, and the continued improvement is in the way that the Supplemental index is used by Google’s system.

Right now, most of the datacenters are using the new Supplemental index in the same way as the old one was used; i.e. get a results set from the Regular index and, if the set isn’t large enough, add to it from the Supplemental index. The quality of the results from the Regular index doesn’t come into it. If the results set is large enough, the Supplemental index is ignored.

But at least one datacenter operates differently. It operates along the lines of, get a results set from the Regular index. Sometimes many of those results will be poor quality matches (e.g. they only match one word of a three word query), so get some better matches from the Supplemental index. The use of the Supplemental index in a way something like this is likely to spread across the datacenters in 2007.

The new way makes a lot of sense. Since many of the results that are acquired from the Regular index are often poor matches for the query, and since millions of perfectly good pages are now stored in the Supplemental index, some of which will be good matches for many queries, it makes good sense to pull results from the Supplemental index when there are some poor matches from the Regular index.

It’s good news for website owners who have large numbers of pages in the Supplemental index. As the new way of operating spreads, more of their pages will rightly find their way into the search results, even though they are in the Supplemental index.

Comments (17)

Behind Closed Doors

In recent times, people have discussed a number of sites that allow the search engine spiders to crawl pages that are not allowed to be seen by non-registered people. These are some of the discussions:-

June 2006 (about the Experts-Exchange site)
http://forums.searchenginewatch.com/showthread.php?t=11974

June 2006 (about the New York Times site)
http://forums.searchenginewatch.com/showthread.php?t=12191

December 2006 (general discussion lower down in the thread)
http://www.mattcutts.com/blog/communication-in-other-languages/

December 2006 (about the WebMasterWorld site)
http://blog.outer-court.com/archive/2006-12-13-n85.html

What happens is that the pages are listed in the search engine results, but when people click on the listings, they are redirected to a login/register page, instead of receiving the page itself.

Some people want to think that what the sites do is spam - specifically cloaking. They want it to be spam because they don’t like it, but it isn’t spam, as I demonstrated here.

The real issue is that the sites allow search engine spiders to crawl those pages, for the sole purpose of having them listed in the search results, so that they will attract people, but they don’t allow unregistered people who click on the listings to see the pages without first registering. In other words, they are specifically using the engines to gain members. It’s understandable that many people would object to being denied direct access to pages that are listed in the search results.

That’s the real issue. Is it right for search engines to be used in that way? Is it right to intentionally have the search engines list pages that are denied to people unless they register? If people would discuss that, instead of clouding the issue by wrongly trying to make out that it’s spam, they may even persuade the engines to do something about it.

Personally, I prefer the pages to be listed, so that I know the information they contain is there, and I can choose to view it if I want to by registering. But I also think that it’s an abuse of the engines that they shouldn’t allow. So I have a foot in both camps - I like to know that the information is there, but I also think that search engines should not list pages that all people cannot go directly to when clicking the link in the results.

If the engines wanted to do something about it, the problem they have is that, without doing something out of the ordinary, they can’t programmatically differentiate between URLs that anyone can reach, and those that people can’t reach without being logged into a site. If a site allows the engines access to pages that all people aren’t allowed access to, the only way they can programmatically know about it is to request the pages with stealth spiders (unknown IP addresses), and they would need to do it for every page in the index. There are problems in doing that. For instance, how would they know if a ‘different’ returned page is due to something other than registered-only pages? It might be that the page has simply been changed.

Maybe they could write a sophisticated programme that could do a reasonable job of stealth spidering, but, since the sites aren’t in breach of any guidelines, and it is only a small problem, if they see it as a problem at all, I doubt that they would spend their time trying to deal with it programmatically when they have much bigger problems, such as spam, to deal with.

I think, if they are going to do anything at all, it will have to be done by hand, and since some of the sites that use the technique are big brand sites that the engines need in the index (New York Times and other newspapers, etc.), I can’t see anything being done about it in the near future.

Update:

In November, a Googler wrote that Google is close to making an announcement about the issue, and it will satisfy those who want to allow Google’s spider to crawl and index pages that require people to register for. This is what he wrote:

On a happier note, my colleagues and I are working on an arrangement
which I think you’ll be pleased with… balancing many Webmasters’
interest in requiring community membership or signin to content-rich
pages while still showing content in Google’s search results.  Stay
tuned :) (we’ll make an announcement in the Webmaster Central blog)

My guess is that they are coming up with a system where a site can make an arrangement with Google, to inform them which pages are behind closed doors, so that Google can mark them as such in the search results; e.g. “subscription required” and “registration required”. It could be in the form of a new meta tag to be added to each page that requires a subscription or registration to view.

Comments

Dispelling the Myth: “Subscription Cloaking”

In Matt Cutts’ blog, JohnMu asked Matt, “What’s the take on subscription cloaking?“. By “subscription cloaking”, he meant those listings in the search results where clicking on them takes some people to a login/register page instead of to the page itself. WebMasterWorld (WMW) pages that are listed in the search results often do that.

Matt chose not to answer the question, and I can’t answer it for him. What I can do is is dispell the myth that it is cloaking, or any kind of spam.

Google’s guidelines state:-

Don’t … present different content to search engines than you display to users, which is commonly referred to as cloaking.

This is the guideline that is held up to show that returning a login page to people, instead of the actual page, is cloaking. So let’s look at what that guideline actually says. It says, “Don’t … present different content to search engines than you display to users“. But when clicking a listing in the search results, the person isn’t a site user. When people are logged into the site, they are its users, but not when they are not logged in. So what happens when they are not logged in is irrelevant. If the site presents different content to its users than it does to search engines, then it would be cloaking, but, since both the users and the search engines receive the same content, it isn’t cloaking. See cloaking for a detailed description of what cloaking actually is.

What WMW, and other sites like it, do is conditionally auto-redirect some people. The condition being whether the person is registered or not. So is conditional auto-redirecting spam? Well, no. Lots of perfectly clean sites do it, including Google and other search engines. For instance, if you are in a country such as the UK, where Google has a regional version, type www.google.com into your browser’s address bar (or just click on that link) and see where you land. You’ll land at www.google.co.uk. Google checks to see where you are (the condition), and automatically redirects you to their regional version (the auto-redirect) where you receive a different page to the one you asked for.

So conditional auto-redirecting is not spam. It can be used for spam, but that’s different.

The conclusion is that WMW is not spamming - the site doesn’t break any of Google’s guidelines.

Some people have expressed a strong dislike for listings in the search results, where they don’t go to the listed page itself, but are taken to a login/register page instead. That’s fair enough. Everyone is entitled to a personal opinion. Unfortunately, some of them have tried to strengthen their personal opinions by claiming that it’s spam. It isn’t spam. It may be undesirable, but it clearly isn’t spam.

To the best of my knowledge, nobody from Google has made any comment about it, even though they have known it exists for quite a long time, and they certainly haven’t taken any action against sites that do it. So it’s reasonable to assume that Google doesn’t think of it as spam. If they did, they would surely have removed WMW from the index until the spam was cleaned up. WMW is only a forum, and there are many other forums that cover the same topics in the index, so Google doesn’t need the WMW site. Unlike the sites of major brand companies, WMW is not a ‘must have’ site for Google. They did take action against the German BMW site last year, for something quite different, but they made sure that it was back in the index very quickly, because it’s a ‘must have’ brand site. WMW isn’t anything like that.

Comments (8)

Cloaking - what it is, and what it isn’t

Cloaking keeps popping up in forums and blogs, and it’s clear that many people don’t know what it is. Back in the days before Google, cloaking was quite common and everyone in the seo business knew exactly what it was. Since then, newer people have misunderstood what it is, and have mistakenly put all sorts of things under the ‘cloaking’ umbrella.

This is cloaking

Take a 10 page website as an example. Then make another 10 pages for it - one for every normal page - and design them to rank highly in the search engines. We’ll call these pages, “engine pages”. When a person requests one of the 10 pages, send the normal page. When a search engine requests one of the 10 pages, send its equivalent ‘engine page’. People never see the ‘engine pages’, and search engines never see the normal pages. That’s all there is to cloaking.

In practise, a set of engine pages was made for each major search engine because, in those days, the engines ranked pages on content alone, and each search engine had a different algorithm. Not every page in a site had ‘engine pages’ created for it because some normal pages didn’t need them.

These are sometimes thought of as cloaking, but they are not

Hidden text:
People sometimes refer to hidden text as “poor man’s cloaking”, but the phrase simply means an alternative to cloaking that doesn’t involve paying for a cloaking system.

IP delivery:
When a page request is made, a cloaking system checks the IP of the requestor. If it’s the IP of a search engine’s spider, the ‘engine page’ is returned. Cloaking uses IP delivery, but so do other things such as geo-location for location-based content (the search engines often do that). Cloaking uses IP delivery, but IP delivery itself isn’t cloaking.

Auto-redirecting:
This is also sometimes known as “poor man’s cloaking”, but it isn’t cloaking at all. The method returns the same page to all requestors, and doesn’t differentiate between people and search engine spiders. If there’s a browser at the other end (a person), then it will automatically fetch the page that is being redirected to.

The auto-redirecting method has come up a few times recently because of some sites’ pages being listed in Google’s search results, but clicking on them takes many people to a login/register page. The method used is conditional auto-redirecting. It checks to see if the requestor is logged in, and if s/he is, then the request is allowed through to the page. I.e. both people and search engine spiders see the same pages. If the requestor isn’t logged in, then the auto-redirection kicks in to send a person to the login/register page.

The hallmarks of cloaking are, (1) special pages are created for the search engines, whether dynamically or static, and (2) search engines never see the pages that people see, and vice versa. The ‘engine pages’ may just be modified normal pages, or completely different pages. If a method doesn’t fit both of those, it isn’t cloaking. It may be spam, but it isn’t cloaking.

Comments (4)

Hidden Text and Google

Yesterday, Google’s Matt Cutts posted an example of a Dutch newspaper site that was spamming the search engines by using hidden text. The example was to show how Google dealt with it. What they did was temporarily remove the site from the index, and contacted the owners to tell them why. The owners subsequently cleaned up the site, and it was put back into the index.

Google is very good, and unique, in that they do inform some website owners that they are transgressing their guidelines, so that the sites can be cleaned up and re-included in the index. But Matt’s example is sure to leave many webmasters very frustrated.

In the item, Matt wrote, “Hidden text is also not fair to other sites that try to compete for similar queries without hiding words from users.” and he is right. Higher rankings are common with sites that go against Google’s guidelines by including spammy hidden text, than sites that are afraid to use such techniques for fear of being banned, and the clean sites lose out to the dirty sites. It is unfair because the owners of clean sites feel that they need to risk being banned by Google, or accept that they can’t compete at all.

So why would the Dutch newspaper example be frustrating to many webmasters? Because it is only an example. It isn’t the norm for Google to remove sites with hidden text from the index. In forums, people often post that they’ve reported a site for hidden text many times, and that nothing is ever done about it. Once in a while something gets done, but mostly that isn’t the case. That’s what makes it frustrating. It’s alright to trot out a public example now and again, but that doesn’t do anything about the problem that clean webmasters have, and that Matt acknowledges.

Google have said that they prefer to deal with things programmatically, rather than by hand, and that’s fine, but the hidden text method, for the purpose of higher rankings, is much older than Google, and Google still can’t deal with it programmatically. No search engine can deal with it, because there are many perfectly good reasons for hidden text, and they can’t yet programmatically differentiate between those and spam.

The answer is for Google to recognise the fact that a programmatic solution for spammy hidden text is not around the corner, and to deal with all reported sites by hand, just as they did with the Dutch newspaper site. It would make spammy hidden text much more dangerous, because of the risk of being reported by a competitor, which would cause the use of the method to largely fade away, and make a much more level playing field for clean sites.

Spammy hidden text is kept alive and well by Google’s reluctance to deal with most of it by hand. The other engines are also guilty, but webmasters are more concerned about Google. If Google dealt with hidden text reports by hand, then webmasters would be much more afraid of using the method. It would tend to disappear, and clean sites would not be at an unfair disadvantage so often.

Comments (6)

Some crazy ideas about search engines

Crazy ideas, and other miscellaneous rumours, start when people think of things that the search engines would probably like to do. They post their ideas in forums, and some people get the erroneous idea that, since a long time has gone by, they must be doing it by now. An example is that, since Google has been around for quite a few years now, they must have changed their PageRank formula. Yes, they could have changed it by now, but there is no reason to assume that they have, and there is no reason to think that there has been a need for changing it. Some of the ideas are a bit more outlandish than that one. Here are a few of the crazier ones that I’ve come across recently.

The search engines know me

A guy in a current forum thread is under the impression that search engines know him, and that all he has to do is write a URL in a post and, because it is associated with his name, which they know, it becomes a link in the engines’ indexes. He went as far as to show evidence, but the evidence was that the text of a URL that he posted showed up in a search - no link - just text. Anyone can post a URL without it being linked, and it will show up in certain search results as text, but this guy not only claims it is because the engines know him, but he also claims that it’s treated as a link because they know him.

Now he isn’t talking about some personel from the engines knowing him; he’s talking about the engines knowing him algorithmically, and, because they know him algorithmically, they give what he writes some favour in the index and search results.

The reality is that all forum and blog usernames are indexed by the engines as simple words on a page - that’s all. There is nothing in the algorithms that associate text with people, and there is no index or recording of such associations.

The idea that search engines algorithmically know people is beyond just being being a bad theory - it is well and truly in the realms of pure fantasy.

Search engines reward traffic with rankings

The idea that some search engines use a site’s traffic as a ranking factor isn’t as far fetched as the previous crazy idea, but it’s crazy just the same. For one thing, search engines have no way of knowing what traffic a site gets. Some of them have toolbars, and some people use them, so they could track people as they wander through the Web, storing traffic information about websites. But the number of people who use a particular engine’s toolbar is tiny compared to the number of people who use the Web, so any meaningful traffic statistics are not possible.

Another way for a search engine to get some traffic information is to record the clicks to websites from their search results. But clicks from search results are meaningless as far as website traffic is concerned, because they only know about clicks from their own engines. The data doesn’t say anything about whether or not a visited website is good or useful, or about how long each person stays in the site, so such statistics would be useless as a ranking factor.

In spite of all the common sense logic, one person recently wrote categorically that “you need to focus on getting visitors and good traffic - the search engines will reward that with rankings“, and he’s not the only one who writes things like that these days. The idea isn’t beyond the realms of future possibility, but right now it’s a crazy idea.

Comments (11)

How to check on-page optimisation for Google?

This is a very big IF, but here goes…

I have an experimental Google Custom Search Engine (CSE) at http://travel.in-britain.co.uk/ and I have chosen the option of having it return results only from the sites that I’ve selected. I’ve selected 17 sites, including www.holidays.org.uk, so only pages from those sites will be returned in the results.

When searching google.com and google.co.uk for [uk holidays] www.holidays.org.uk is ranked in the top 10. I.e. when it’s competing against every site in the entire Google index, it’s in the top 10. But doing the same search in my CSE, it is always ranked around the #50 mark - usually just under #50. I.e. when competing against only 16 sites from Google’s index, it ranks very low down. This only happens when the CSE is set to return the results only from selected sites.

I have a thought as to why it may be happening…

Normally, when Google processes a query, they first get a results set from the the short index (the index that contains words from link texts and page Titles). If they can’t get a big enough results set from there, they add to it from the long index (the index that contains all the words on all pages - page content). If they skip the short index, and go straight to the long index, for CSEs that are set to return the results only from selected sites, they would be ranking pages solely on page content, which would make drastic differences to the order in which the pages are ranked when compared with a normal Google search. That could account for the results I’m getting in my experimental CSE.

That idea is a very big IF, but if it’s true, then it provides a way to check the optimisation of page content. Google launched the CSE system and chose to leave out pages from the Supplemental index, although they now intend to include them, so we know that the algorithm is intentionally significantly different for CSE searches than for normal searches, and it could be that they have also chosen to skip the short index part of processing a query for CSEs that return results from selected sites.

Update:

I did some tests and found that, when a niche CSE has fewer than 24 sites/pages included, an abnormal search is done, and the results order is different to the order in which the same pages appear in a normal Google search. But when it has 24 or more sites/pages, a normal search is done, and the results order reflects those in a normal Google search. If the ‘abnormal’ search skips the short index, then it is still good for checking page content optimisation.

 

Comments

ChaCha goes beta

ChaCha is a search service where live Guides can do the the searching for you. It’s been written about in many places, including the forum here and at SEW, so I won’t go over it all again. The alpha version was launched in early September, and for two months it was riddled with bugs, which were never sorted out.

Yesterday, the beta version was launched, to the great delight of the Guides. But the great delight turned into great disappointment when most of them were unable to do anything all day, due to the worst bugs they’ve ever encountered. Bugs can be fixed, so they don’t matter too much, although ChaCha doesn’t exactly have a good record at finding and fixing bugs in the system.

The beta version was touted  in advance as having some great new things included - and it has. In the alpha version, if you did a search without a Guide, and no Guide had previously found results for the particular search, you got no results. The beta version changes that, and in ways that I find rather deceptive.

When you type in a search term, and click the “ChaCha Search” button, the first thing you see is a message that says, “Please wait while we scour the deep web …”. So you think that ChaCha is scouring the deep web to find some results for you. Wrong! That’s the first deception that I see. What it is actually doing is waiting for some perfectly ordinary search engine results to be returned for you. They come from infospace.com, which is a meta engine that gets its results from ordinary search engines like Google, Yahoo! and Ask. So the “ChaCha Search” side of ChaCha is very ordinary indeed. It’s just a mirror of a meta engine, and meta engines aren’t search engines at all.

Another deception is that ads are mixed in with the results in a rather inconspicuous way - they are camouflaged. They aren’t conspicuous by their placements, since they are randomly mixed in, and, although they each have the words “Sponsored By” attached, the words are placed and coloured in such a way that they blend in with each listing. To be fair, that’s exactly the way that Infospace does it, but it may be in contravention of the FTC’s code of practise.

Also mixed in with the results are those that were found by Guides for previous searches. Searching with a Guide is ChaCha’s big thing. It’s the very reason for their existance, and they play it up by claiming that Guides are able to return results that are exactly what you are looking for, whereas search engine algorithms can’t do that. That’s the claim. So you’d think that they would place those exceptionally good Guides’ results at the top of the listings, wouldn’t you? But they don’t. They randomly mix them into the first page of results, and they take their chances with the ads and with the results from normal search engines. Not only do they randomly mix the Guides’ results with results from search engines and ads, but they actually limit the number of Guides’ results that they show. There’s a limit of five. So even if they have many Guide results in the database, they only show five of them. That shows real confidence in their speciality “Search With Guide” - yes/no? - no!

The “ChaCha Search” is nothing more than a mirror of a meta engine, with a handful of guided search results thrown in, and all a meta engine does is display the results from normal search engines. Search engines do the crawling and ranking work, and meta engines display their results.

So does ChaCha have any value at all for searchers? Well, yes. When ChaCha first launched, I really didn’t think that there was a market for their “Search with a Guide” idea, but in the two months since then, I’ve come to the conclusion that there really is a small market for it. There really are people who are no good at finding things via search engines, and who would prefer someone else to find things for them. Whether or not there are enough people like that to keep the Guided search going, remains to be seen. Because the whole system relies on people clicking on ads (AdSense at the moment), it may work out that Guided searches run at a loss, since they are paying the Guides by the minute. Even so, the system as a whole could be profitable if enough people use it.

There are bugs to be worked out (e.g. once you’ve done a “ChaCha Search”, you can’t then decide to “Search With Guide” for the same searchterm), but they should be sorted out eventually. I’m sceptical about ChaCha’s long-term future as a guided search service, simply because it relies on ad clicks, and they pay the Guides by the minute of actual search time, but the future remains to be seen. I hope it does succeed, if only for the Guides’ sakes.

Comments (1)

« Previous entries ·