Indexing timeline
Heh. I wrote this hugely long post, so I pulled a Googler aside and asked “Dan, what do you think of this post?” And after a few helpful comments he said something like, “And, um, you may want to include a paragraph of understandable English at the top.”
Fair enough. Some people don’t want to read the whole mind-numbingly long post while their eyes glaze over. For those people, my short summary would be two-fold. First, I believe the crawl/index team certainly has enough machines to do its job, and we definitely aren’t dropping documents because we’re “out of space.” The second point is that we continue to listen to webmaster feedback to improve our search. We’ve addressed the issues that we’ve seen, but we continue to read through the feedback to look for other ways that we could improve.
People have been asking for more details on “pages dropping from the index” so I thought I’d write down a brain dump of everything I knew about, to have it all in one place. Bear in mind that this is my best recollection, so I’m not claiming that it’s perfect.
Bigdaddy: Done by March
- In December, the crawl/index team were ready to debut Bigdaddy, which was a software upgrade of our crawling and parts of our indexing.
- In early January, I hunkered down and wrote tutorials about url canonicalization, interpreting the inurl: operator, and 302 redirects. Then I told people about a data center where Bigdaddy was live and asked for feedback.
- February was pretty quiet as Bigdaddy rolled out to more data centers.
- In March, some people on WebmasterWorld started complaining that they saw none of their pages indexed in Bigdaddy data centers, and were more likely to see supplemental results.
- On March 13th, GoogleGuy gave a way for WMW folks to give example sites.
- After looking at the example sites, I could tell the issue in a few minutes. The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the inlinks or the outlinks of that site. Examples that might cause that include excessive reciprocal links, linking to spammy neighborhoods on the web, or link buying/selling. The Bigdaddy update is independent of our supplemental results, so when Bigdaddy didn’t select pages from a site, that would expose more supplemental results for a site.
- I worked with the crawl/index team to tune thresholds so that we would crawl more pages from those sorts of sites.
- By March 22nd, I posted an update to let people know that we were crawling more pages from those sorts of sites. Over time, we continued to boost the indexing even more for those sites.
- By March 29th, Bigdaddy was fully deployed and the old system was turned off. Bigdaddy has been powered our crawling ever since.
Considering the amount of code that changed, I consider Bigdaddy pretty successful in that I only saw two complaints. The first was one that I mentioned, where we didn’t index pages from sites with less trusted links, and we responded and started indexing more pages from those sites pretty quickly. The other complaint I heard was that pages crawled by AdSense started showing up in our web index. The fact that Bigdaddy provided a crawl caching proxy was a deliberate improvement in crawling and I was happy to describe it in PowerPoint-y detail on the blog and at WMW Boston.
Okay, that’s Bigdaddy. It’s more comprehensive, and it’s been visible since December and 100% live since March. So why the recent hubbub? Well, now that Bigdaddy is done, we’ve turned our focus to refreshing our supplemental results. I’ll give my best recollection of that timeline too. Around the same time, there was speculation that our machines are full. From my personal perspective in the quality group, we have certainly have enough machines to crawl/index/serve web results; in fact, Bigdaddy is more comprehensive than our previous system. Seems like a good time to throw in a link to my disclaimer right here to remind people that this is my personal take.
Refreshing supplemental results
Okay, moving right along. As I mentioned before, once Bigdaddy was fully deployed, we started working on refreshing our supplemental results. Here’s my timeline:
- In early April, we started showing some refreshed supplemental results to users.
- On April 13th, someone started a thread on WMW to ask about having fewer pages indexed.
- On April 24th, GoogleGuy gave a way for people to provide specifics (WebmasterWorld, like many webmaster forums, doesn’t allow people to post specific site names.)
- I looked through the feedback and didn’t see any major trends. Over the next week, I gave examples to the crawl/index team. They didn’t see any major trend either. The sitemaps team investigated until they were satisfied that it had nothing to do with sitemaps either.
- The team refreshing our supplemental results checked out feedback, and on May 5th they discovered that a “site:” query didn’t return supplemental results. I think that they had a fix out for that the same day. Later, they noticed that a difference in the parser meant that site: queries didn’t work with hyphenated domains. I believe they got a quick fix out soon afterwards, with a full fix for site: queries on hyphenated domains in supplemental results expected this week.
- GoogleGuy stopped back by WMW on May 8th to give more info about site: and get any more info that people wanted to provide.
Reading current feedback
Those are the issues that I’ve heard of with supplemental results, and those have been resolved. Now, what about folks that are still asking about fewer pages being reported from their site? As if this post isn’t long enough already, I’ll run through some of the emails and give potential reasons that I’ve seen:
- First site is a .tv about real estate in a foreign country. On May 3rd, the site owner says that they have about 20K properties listed, but says that they dropped to 300 pages. When I checked, a site: query shows 31,200 pages indexed now, and the example url they mentioned is in the index. I’m going to assume this domain is doing fine now.
- Okay, let’s check one from May 11th. The owner sent only a url, with no text or explanation at all, but’s let’s tackle it. This is also a real estate site, this time about a Eastern European country. I see 387 pages indexed currently. Aha, checking out the bottom of the page, I see this:

Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled. As these indexing changes have rolled out, we’ve improving how we handle reciprocal link exchanges and link buying/selling.
- Moving right along, here’s one from May 4th. It’s another real estate site. The owner says that they used to have 10K pages indexed and now they have 80. I checked out the site. Aha:

This time, I’m seeing links to mortgages sites, credit card sites, and exercise equipment. I think this is covered by the same guidance as above; if you were getting crawled more before and you’re trading a bunch of reciprocal links, don’t be surprised if the new crawler has different crawl priorities and doesn’t crawl as much.
- Some one sent in a health care directory domain. It seems like a fine site, and it’s not linking to anything junky. But it only has six links to the entire domain. With that few links, I can believe that out toward the edge of the crawl, we would index fewer pages. Hold on, digging deeper. Aha, the owner said that they wanted to kill the www version of their pages, so they used the url removal tool on their own site. I’m seeing that you removed 16 of your most important directories from Oct. 10, 2005 to April 8, 2006. I covered this topic in January 2006:
Q: If I want to get rid of domain.com but keep www.domain.com, should I use the url removal tool to remove domain.com?
A: No, definitely don’t do this. If you remove one of the www vs. non-www hostnames, it can end up removing your whole domain for six months. Definitely don’t do this. If you did use the url removal tool to remove your entire domain when you actually only wanted to remove the www or non-www version of your domain, do a reinclusion request and mention that you removed your entire domain by accident using the url removal tool and that you’d like it reincluded.
You didn’t remove your entire domain, but you removed all the important subdirectories. That self-removal just lapsed a few weeks ago. That said, your site also has very few links pointing to you. A few more relevant links would help us know to crawl more pages from your site. Okay, let’s read another.
- Somebody wrote about a “favorites” site that sells T-shirts. The site had about 100 pages, and now Google is showing about five pages. Looking at the site, the first problem that I see is that only 1-2 domains have any links at all to you. The person said that every page has original content, but every link that I clicked was an affiliate link that went to the site that actually sold the T-shirts. And the snippet of text that I happened to grab was also taken from the site that actually sold the T-shirts. The site has a blog, which I’d normally recommend as a good way to get links, but every link on the blog is just an affiliate link. The first several posts didn’t even have any text, and when I found an entry that did, it was copied from somewhere else. So I don’t think that the drop in indexed pages for this domain necessarily points to an issue on Google’s side. The question I’d be asking is why anyone would choose your “favourites” site instead of going directly to the site that sells T-shirts?
Closing thoughts
Okay, I’ve got to wrap up (longest. post. evar). But I wanted to give people a feel for the sort of feedback that we’re getting in the last few days. In general, several domains I’ve checked have more pages reported these days (and overall, Bigdaddy is more comprehensive than our previous index). Some folks that were doing a lot of reciprocal links might see less crawling. If your site has very few links where you’d be on the fringe of the crawl, then it’s relatively normal that changes in the crawl may change how much of your site we crawl. And if you’ve got an affiliate site, it makes sense to think about the amount of value-add that your site provides; you want to provide a reason why users would prefer your site.
In March, I was able to read feedback and identify an issue to fix in 4-5 minutes. With the most recent feedback, we did find a couple ways that we could make site: more accurate, but despite having several teams (quality, crawl/index, sitemaps) read the remaining feedback, we’re seeing more a grab-bag of feedback than any burning issues. Just to be clear, I’m not saying that we won’t find other ways to improve. Adam has been reading and replying to the emails and collecting domains to dig into, for example. But I wanted to give folks an update on what we were seeing with the most recent feedback.
Related Posts:- Gone Supplemental
Some site owners over at WebmasterWorld have been discussing an issue where on Bigdaddy data centers, the site wouldn't be crawled as much in the... - Minty Fresh Indexing
When I joined Google in early 2000, we had a stretch where we didn't update our index for 3-4 months or more. At the time,... - Generic Toolbar Indexing Debunk Post
Sometimes people think that the Google Toolbar led to Google indexing a page. Here's a recent such story, for example, which speculates how urls with... - Bigdaddy progress update
In case you don't want to download a 70 megabyte audio file, here's the latest on Bigdaddy. Bigdaddy continues to roll out and is now...
Shoemoney Said,
May 16, 2006 @ 12:37 pm
Damn Ringtone People!!!
Harith Said,
May 16, 2006 @ 12:39 pm
Hi Matt
Thanks for the much neede detailed update.
I do hope that you, GG and later Adam (when he feels ready) to post more of the same and more often than you are doing now.
IMO, its not enough of Google to tell us that they are listening. We need them to talk to us too. I.e communicate
Once again, thanks Matt. I know you must be also busy preparing for the vacation.
Mike Said,
May 16, 2006 @ 12:45 pm
Wow, looks like someone is going to have a short interview today
Thanks for the update Matt.
colin_h Said,
May 16, 2006 @ 12:54 pm
Yawn !!!
After the past 12 months of Google messing about and still no better results … I’ve completely learned how to live without you.
Best wishes, You’re gonna need it
Aaron Pratt Said,
May 16, 2006 @ 1:05 pm
Every time someone asks a novice question in google groups while at the same time saying that google s-u-c-k-s I will refer them to this post.
Is adam bot or human?
Thanks Matt.
Wayne Said,
May 16, 2006 @ 1:07 pm
Thank you Matt for the update. I really appreciate you finally using some real estate sites as examples. Since this is an indexing issue I thought I would bring it up.
After checking the logs today I noticed this coming from Google pertaining to our site.
http://www.google.it/search?hl=it&q=fistingglessons&btnG=Cerca+con+Google&meta=
LOL now as you can see the #2 site is a real estate site listed for this search term.The page showing for this search is a property description page. As you can tell from the sites description it has nothing to do with this subject matter. Would you mind checking with the index team and see why maybe this would be indexed for such a phrase.
On a side note it would be nice to see more examples of real estate sites used in the future. Thanks again for the update.
Nick Said,
May 16, 2006 @ 1:13 pm
Great post Matt. That really clears up a few things about how Bigdaddy works. Still seems like it is responding very slowly and I find that large companies are getting ahead of smaller sites for local terms even though they are not located in the same country. But that’s mostly because of my own business gripes
Keep up the great posting.
Sina Said,
May 16, 2006 @ 1:34 pm
Great post Matt, thanks for putting in the effort to explain what’s being going on.
I have a quick question - how long is it taking these days for Google to index new pages? I added a forum to my site a couple of months ago, and while it doesn’t have many deep links from external domains, it is linked to pretty well from within my site and is in my submitted sitemap. Google seems to be crawling it quite enthusiastically. However, none of it’s showing up in the index with a site: search despite the intensive crawling and waiting about a month. Does this mean that Google doesn’t think my forum is worth indexing?
Anthony Cea Said,
May 16, 2006 @ 1:35 pm
Yeah, blame this disaster on webmasters, Google can’t index the web properly and it is the fault of webmasters working bad links?
Funny that those that are running the biggest links scams on the net are ranking great Matt?
Explain that one, will ya ???
Where are the indexed pages Matt, do they just disappear, do you have an answer for all of us or are we all using linking scams?
Matt Cutts Said,
May 16, 2006 @ 1:37 pm
Thanks everybody. I’m glad that I sat down and got all this down. Yup Mike, I figured if I could get this post out before I talked to Danny, then we could just sit around and shoot the breeze.
Danny: So, how’s life?
So how ’bout those Reds?
Matt: Not bad. How are you doing?
Danny: Pretty good, pretty good.
Matt: The communists??
Danny: No, the Cincinnati Reds!
Matt: There’s communists in Cincinnati!?!?!
Matt Cutts Said,
May 16, 2006 @ 1:43 pm
Sina, it’s by design in Bigdaddy that we crawl somewhat more than we index in Bigdaddy. If you index everything that you crawl, you never know what you might be missing by crawling a little more, for example. I see at least one indexed post from your forum, so the fact that we’ve been visiting those pages is a good indicator that we’re aware of those pages, and they may be incorporated in the index in the future.
arubicus Said,
May 16, 2006 @ 1:44 pm
Great post Matt! Good job. Nice to hear some more detailed feedback.
Hey can you answer this for me? Finally we have been seeing some improvement to the indexing of our site. I have seen other webmasters mention the same occurance of indexing down to about level 3 pages and that is it. Althought deeper pages are being crawled (level 4+) they just don’t want to stick very long in the index. Linking a bit higher can get them to stick (turning them to level 3 and 2) but that just impossible to do with alot of content. Is this something that will correct in time? We have PLENTY of links at all levels so I don’t see this as a huge problem. Pretty much looking for reassurance to sit tight.
Chris Bartow Said,
May 16, 2006 @ 1:45 pm
I read two real estate sites and hoping one was mine, but neither applied to me. My real estate site only has outbound links to Home Builders, so I doubt this should quality as spam.
It still seems to me that you are blaming this on penalties, which I’m fine with, but why would you crawl my site thoroughly on a weekly bases, then never put the results in the index? This has been happening for 2 months now.
Lunov Said,
May 16, 2006 @ 1:47 pm
Hello Matt
Thanks for the information.
“Bigdaddy: Done by March” Is it really true. It means that I do not understand why there are still different search results between
http://66.249.93.104/ and http://64.233.179.104/
Please could you give us more details. It’s confusing.
Where is really Bigdaddy!
Thanks for your reply.
Mark Said,
May 16, 2006 @ 1:48 pm
Thanks for a very informative post. Just one quick question though, is there ever a time when link exchanges are considered legitimate? Maybe even an example of the case? It’s easy to tell the irrelevant link exchanges, but there has to be some instances that maybe a … real estate agent exchanges links w/ a … local moving company.
Can you comment on this?
Aaron Pratt Said,
May 16, 2006 @ 1:49 pm
HA!!!
To celebrate this new information I deleted an old directory that was hanging off my most valued website. It made an awful shriek as I removed the database. In the coming weeks there will be a few autoemails asking “where is my link”??? and I will reply, “you will not drain my power anymore, die die!!!
(ok enough of this Matt Cutts fellah for today, I got work to do, how about you?)
Jim Said,
May 16, 2006 @ 1:50 pm
Hi Matt,
Thanks for the post. Problem is… none of your explainations seem to fit my site. I’m trying to maintain a straight ship in a dirty segment. My links have been accumulated by form relationships with related sites (thus I’m building links a bit slower than straight link exchange would allow). My content is most certainly provided to educate the visitor. My affiliate linkage is quite low. But yet my pages seem to continue dropping and supplementals are increasing.
Thanks for reading this,
jim
Coen Said,
May 16, 2006 @ 1:53 pm
Matt thank you for the explaination about big daddy. But I have checked my websites for points you just wrote down. And I can’t find any of them for my site.
I have pretty much backlinks. I don’t link to crappy sites and still my indexpages is like a wave.
On monday I can have 800.000 pages indexed on tuesday 350.000, then back to 600.000 down to 400.000. The difference is way to big. And we had over a million records.
I also requested a reinclusion request but we never heard from it or saw any changes. My domain name is techzine.nl I have www, forum, babes, msn and pricecheck.techzine.nl in use.
We did have some problems in the past I e-mailed it a couple of times to google but never got an awnser about it.
We changed to domain name of the website from tweakzone.nl to techzine.nl (oktober 2005). We forwarded it with 302 (stupid) I found that out later and changed it to 301 (permanent) redirect. No I am still trying to get the whole tweakzone.nl domain out of google and get techzine.nl indexed correctly. We asked many many webmasters to update their links and that worked. Our HTML code is by the book. But still we are not being indexed as we were. I’m running out of ideas and options to fix this. Can you explain to me what I am doing wrong. I have been reading SEO sites, webmasterworld.com, Google guidelines for months now and I can figure out what I’m doing wrong…..
Kind Regards,
Coen
Ronald R Said,
May 16, 2006 @ 1:56 pm
Strange how you ignored comments before, and now you have decided to respond.
Unfortunately, the serps have become absolute trash, so the changes have failed, and I see more spam sites doing well than before.
CrankyDave Said,
May 16, 2006 @ 2:01 pm
Thank you for the timeline.
I find it rather frustrating to follow how your timeline basically outlines how everthing is working just as it should, and watch pages display as regular one day, supplemental the next, a week later regular and then back to supplemental. Searchable as regular listing, completely unsearchable as a supplemental.
Good to hear you guys have plenty of machines with plenty of room. Perhaps someone should inform the CEO.
I look forward to you finding other ways to improve.
Dave
Brian M Said,
May 16, 2006 @ 2:02 pm
Please, please, please delete all of the old supplemental results! I think if you took a poll, you would find very few webmasters (or end users) who actually value any of those old junk pages (many of which do not even exist anymore).
I have even used the URL removal tool in the past - but those old pages just keep coming back!
Joe Hayes Said,
May 16, 2006 @ 2:04 pm
I don’t think what Mr. Cutts meant the mortgages sites, credit card sites, and exercise equipment sites were junk, most likely that they were unrelated.
Now, I don’t think it’s fair to penalize a site for linking to an “unrelated” site, since many webmasters link to their other websites etc. Links being devalued because their coming from an unrelated page would be more fair.
And what’s the deal with reciprocals? Although I rarely do them (time related), I don’t think it’s unfair. A vote is a vote right? Even if two people vote for each other. As long as it’s not automotive I don’t see why it would be a problem…
What about the impact of getting a bunch of unrelated inbound links to your site? Image if someone used a linking scheme to point hundreds, or thousands, of links at your domain? All those links from “unrelated” or “junk” sites would surely put a hurting on you. Not fair.
Anthony Cea Said,
May 16, 2006 @ 2:08 pm
I agree that reciprocal link directories should be removed as they are link farms, so Google is doing the right thing there!
Some reciprocal linking is natural though and sites should only have their sites removed if they have a high percentage of reciprocals in their totals.
Jason Duke Said,
May 16, 2006 @ 2:09 pm
[quote]Google should NEVER * NEVER * even entertain the idea of deciding what Products or Services are “JUNK”
This is a recipe for disaster, and extremely arrogant.
What gives any search engines the right to decide that someone’s business category is “JUNK”. This would be analogous to Yahoo Directory or DMOZ devaluing certain TYPES of products or services.
[/quote]
It aint that often you’ll see me stick up for Google but MR SEW you are VERY wrong.
Google can do what the hell they like with their search engine, cos it is THEIRS.
If they want to devalue links in their algorithm, that’s their perogative, cos the algo is THIERS
If they want to say certain business models are junk in their search engine then that is their right, cos the search engine is THIERS
You have exactly the same right. On YOUR web properties you can say and do what you want. If you want to link out via affiliate URLs you can as the web site is YOURS.
If you want to buy or sell links, you can as the web site is YOURS.
When all is said and done when you own something it is up to you what you do with it. Google is no different with whatever it decides to stick anywhere on its domains than you or I am with mine.
Personally I think Google makes lots of mistakes. I also believe so do many webmasters, myself included but they are our mistakes to make the way we see fit at the time.
I’m happy with what I do and I am sure Google are happy with what they do. Personally I am going to carry on trying to beat Matt and his team at Google and I am pretty sure he and his team will carry on trying to beat me.
He wins some, I win some but therein lies the nature of the web. On his site he can do what he wants. On my site I can do what I want. I suggest you, Mr SEW do the same
alek Said,
May 16, 2006 @ 2:09 pm
Damn … great summary Matt … the “other Matt” must be saying “gulp” to try to follow that act while you are gone. And yea, what are you going to talk about in a couple of hours on the radio show?
BTW, here’s an oddball corner case that I would classify as a bug - one of your favorite subjects - redirects!
So URL1 ranked well for keyphrase1. The SERP’s show a title, some text, and a URL. A (legit) 302 (temporary) redirect was setup to URL2. After a few days, the SERP’s for keyphrase1 show URL2, but was still using the title tag for URL1. The “other text” is pulled from URL2. Looking at the cache, it is all URL2. This persisted for several days - looked pretty darn funny actually in the SERP’s, since the URL2 title tag had nothing to do with keyphrase1.
I think (?) correct behavior would be that if you are going to show a URL in the SERP’s, you should show title/text associated with that page … but in this case, some part of the indexing machine got confused by the redirects and the title1 piece got left in even though URL2 was displayed.
Email me if you want more info, but you should easily be able to setup a test case based on that description. BTW, Yahoo has a similar bug in the SERP’s (I forgot how MSN handled it), so it’s not just the big “G” struggling with redirects.
Joe Hayes Said,
May 16, 2006 @ 2:10 pm
I had some clerical errors in my post above (automotive should be automated :), wish I could edit it… sorry.
Tim Said,
May 16, 2006 @ 2:10 pm
Hi Matt, great information as always. I have a question about this:
How might this impact the typical blog with a lengthy blogroll? Many people have blogs with lengthy blogrolls… and many of those sites in my blogroll end up linking back without it really being arranged as a reciprocal exchanged.
From what you are saying I get the idea that having a blogroll/recommended reading list doesn’t sound like a good idea.
John Qazu Said,
May 16, 2006 @ 2:10 pm
Doesn’t matter…..they don’t care about results. Bad results means more money for Adwords:)
Microsoft will squash Google like it did Netscape. When Vista comes out….Google will fall.
PhilC Said,
May 16, 2006 @ 2:15 pm
Matt. For me, that was the best post that you’ve ever posted here - by a very long way.
I’m one of the people who has sites that are suffering right now. One of them is the site that we spoke about last year. It had a clean bill of health from you, and nothing has changed since then, and yet it’s pages are being dropped daily. Right now it’s down from a realistic 18k-20k pages to 9,350, but only around 500 of them are fully indexed - the rest are URL-only partials. Yesterday it had 11,700 but only ~600 of them were actually listed, and some of those were partials.
From your post, I would say that the site fits the description of not having many trusted IBLs. Would that be correct? Reminder - http://www.holidays.org.uk
To be honest, if it is correct, then I dislike it a lot. It would mean that it isn’t sufficient to have a decent and useful site any more to be fully indexed by Google, if the site has quite a lot pages. It would mean that we have to run around getting unnatural IBLs just to be fully represented in the index, and unnatural IBLs are one thing that Google doesn’t want.
Matt Cutts Said,
May 16, 2006 @ 2:15 pm
Chris, I talked about this a couple comments above:
http://www.mattcutts.com/blog/indexing-timeline/#comment-27002
With Bigdaddy, it’s expected behavior that we’ll crawl some more pages than we index. That’s done so that we can improve our crawling and indexing over time, and it doesn’t mean that we don’t like your site.
arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.
Ronald R, I’ve got a finite amount of time.
I spent a large chunk of Saturday writing this up, but I don’t have time to respond to every comment. I wish I did. But improving quality is an ongoing process; if you see spam, I’d encourage you to do a spam report so we can check it out.
CrankyDave, the supplemental results are typically refreshed less often than the main results. If your page is showing up as supplemental one day and then as a regular result the next, the most likely explanation is that your page is near the crawl fringe. When it’s in the main results, we’ll show that url. If we didn’t crawl the url to show in the main results, then you’ll often see an earlier version that we crawled in the supplemental results. Hope that helps explain things. BTW, CrankyDave, your site seems like an example of one of those sites that might have been crawled more before because of link exchanges. I picked five at random and they were all just traded links. Google is less likely to give those links as much weight now. That’s the simple explanation for why we don’t crawl you as deeply, in my opinion.
Brian M, I’ve passed that sentiment on. I believe that folks here intend to refresh all of the supplemental results over the summer months, although I’m not 100% sure.
arubicus Said,
May 16, 2006 @ 2:15 pm
How about a tool so that we know who we should be linking to or not?
I see spammers in the google index. Maybe they should get penalized down to a PR of 3 for linking to a bad neighborhood! LOL. Just kidding.
I guess you just may as well nofollow every external link just in case.
Anthony Cea Said,
May 16, 2006 @ 2:15 pm
Yes a good example of this is our link backs here, I linked to this blog entry from my forums and my link here goes back to the forum!
Is this what Google is going to take out or are you looking for a high concentration of reciprocal links Matt?
kael Said,
May 16, 2006 @ 2:16 pm
Problem with this post is that most of us would have identified the spam examples that you listed and yet most of us still don’t understand what has been happening to our sites, in our case going from 20000 pages indexed to less than 100 instead.
You had indicated that there were only a “dougle-digit number” of emails sent to the bostonpub address and that someone was going through them over a week ago already. Today, you also stated that someone was still going throught them. We did send an email and we still have not received a reply. Based on the most recent thread on wmw, it looks like we are not the only ones.
Real answers would help.
Many small businesseses are suffering from these massive de-listings. It is not a light subject for us. From our point of view, bigdaddy has not been “pretty successful” and general replies are now a bit short on comfort at this point.
Valentine Said,
May 16, 2006 @ 2:17 pm
Nice post Matt. Very informative and not at all too long.
Shoemoney - Was that one of your ringtones sites?
arubicus Said,
May 16, 2006 @ 2:20 pm
“arubicus, typically the depth of the directory doesn’t make any difference for us; PageRank is a much larger factor. So without knowing your site, I’d look at trying to make sure that your site is using your PageRank well. A tree structure with a certain fanout at each level is usually a good way of doing it.”
Thanks MATT!
I think it is a PR the factor but nothing is trickling down from the home page - (Backlinks for the homepage reported from google are completely ?????)
We keep the most logical structure you could possible have. A pyramid strucure drilling down to the articles. Articles linking to related articles. Googlebot crawls just does not like level 4 +. If pr is a factor (I thought it now updates continuous) I am not sure why it does not filter down (besides I have no clue if it actually does since what is shown on toolbar may not be accurate).
Matt Cutts Said,
May 16, 2006 @ 2:24 pm
Jason Duke, I did another pass to mark all SEW links as spam. Gotta muck around and delete SEW from my user database.
Anthony Cea, I gave a quick example above. Someone was complaining about their pages being supplemental, but that’s the effect, not the cause. The right question is “Why aren’t as many of my pages showing in Google’s main results?” I picked five links to the domain at random and they were all reciprocal links. My guess is that’s the cause. I mentioned that example because CrankyDave still has an open road ahead of him; he just needs to concentrate more on quality links instead of things like reciprocal links if he wants to get more pages indexed. (Again, in my opinion. I was just doing a quick/dirty check.)
Valentine, I made the links I showed an image so no one would feel the need to go digging into actual sites.
Jack Mitchell Said,
May 16, 2006 @ 2:24 pm
What if our problem isn’t crawling so much as seeing those pages indexed at all. I have checked the supp index and haven’t seen them there either but I have seen the Googlebot crawling the pages.
P.S. Is there an email I should send to asking about this and if so where?
Anthony Cea Said,
May 16, 2006 @ 2:29 pm
OK Matt, so what you are saying is that we should produce great content and hope we get linked to because of the value of the page!
But when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded?
Ronald R Said,
May 16, 2006 @ 2:34 pm
Matt… I have previously reported spam, and not in my sector. But nothing happens, so in the end I just gave up.
I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?
CJK Said,
May 16, 2006 @ 2:34 pm
@Matt:
Some days I really wonder why you even post to your blog at all lol It seems that for every 1 legitimate query there are 10 others holding you personally accountable/responsible for their serp/penalty/crappy result.
I mean really… if the amount of Q&A here was T&A a team of plastic surgeons couldnt wipe the grin of your face
anyway …. “my site is getting crappy results and no traffic …” its your fault and Google sucks… LOL not really… but I want to get in on the fun too !
Paul Said,
May 16, 2006 @ 2:35 pm
Dear Matt, thank you for explaining us google’s view of link exchanges.
We have dropped low-quality link exchanges months ago, now going on only with high quality links, added tons of new and unique stuff to our site, but the crawler does not crawl much, and the site is low rated. One year ago it was on top of many competitive searches.
Is it possible to overcome this bad backlink reputation? It’s almost impossible to get rid of low-quality links once they are there. Do you have an advise for sites like ours?
Mike B Said,
May 16, 2006 @ 2:38 pm
I have a snall site that offers a free downloadable tool. So I registered a sitemap and waited.. some months. Still not indexed. Every day the bot visits, picks up the site map then the index page then the download exe (which is about 3.5M) Any idea why the bot should try to spider exe files?
I needed a slightly different version of the tool for a specific audience. so I registered a new domain, copied the site with minor changes. Did not register a sitemap because I wasn’t particularly bothered if it was index or not. The new site was indexed in a week or so, and now has a PR of 4. The original, near identical site, still not indexed.
The original site has been in Yahoo and MSN for months….
Anthony Cea Said,
May 16, 2006 @ 2:39 pm
I don’t blame Google for dumping on webmasters that try to game the engine with manufactured links, purchased links, traded links, links from reciprocal link farm directories and so on, this is good long term if they can index the web properly taking these things into consideration!
Xig Said,
May 16, 2006 @ 2:39 pm
Better late than never
Thanks Matt, you put my mind to rest on a lot of issues
Bob Rains Said,
May 16, 2006 @ 2:42 pm
I cannot wait to forward this to my mortgage lender, who just asked me just the other day,
“You work in SEO any idea why I’ve lost so many of my pages in Google?”
Your explanation sounds so much nicer and more official than… “It could be because your website has a bunch of crap in it, and on it, and connected to it”
BTW- “It could be because your website has a bunch of crap in it, and on it, and connected to it” is an accurate analysis for many of the mortgage and realtor sites who do not rank well on Google right now.
arubicus Said,
May 16, 2006 @ 2:46 pm
Personally I don’t care about where my site ranks. I believe that would happen ranks would happen naturally if you serve your visitors well.
What many of us DO care about is having equal treatment as any other website owner large and small as well as equal opportunity. Spammers should not be there when legit sites should be there but are not being indexed for some reason. I believe that it is healthy for to get a bit of feedback and give feedback to google so that such equal opportunities can exist.
Justin Said,
May 16, 2006 @ 2:53 pm
Matt, thanks for the information…but it doesn’t help me at the moment! My most important pages just aren’t getting indexed but are getting crawled. We have a really useful website with thousands of members but it seems that only Google thinks its not good enough! Any advice would be greatly appreciated.
Matt Cutts Said,
May 16, 2006 @ 3:09 pm
Anthony Cea, you’ve got some people who were relying on reciprocal linking or link buying complaining specifically that they’re not crawled as much. So as far as “when is Google going to get real about schemes to game the engine so that natural links that are earned are rewarded,” I think that we’re continually making progress on judging which links are higher-quality.
Ronald R, we’ve been checking spam reports more closely lately. You ask “I’m wondering how you gain relevant links, in some sectors, without reciprocating, or paying? Do you believe that rivals would give you a free one way link, lol?” My answer is that trying to force your way up to the top of search engines is in many ways not working in the most efficient way. To the degree that search engines reflect reputation on the web, the best way to gather links is to offer services or information that attract visitors and links on your own. Things like blogs are a great way to attract links because you’re offering a look behind the curtain of whatever your subject is, for example.
Mike B, I’ve talked to the sitemaps folks a lot. Having a sitemap for your site should *never* hurt your domain. On the other hand, don’t expect that just listing a sitemap is enough to get a domain crawled. If no one ever links to your site, that makes Googlebot less likely to crawl your pages.
That’s a very concise way to say it, Bob Rains, although a lot of variation that I see is also if someone’s domain is hardly linked at all. At the fringe of the crawl is where you’re likely to see the most variation, while a site like cnn.com with tons of links/PageRank is going to be less likely to not be crawled.
It’s funny, because most people understand that on a SERP there are 10 results, and if one webmaster is unhappy because they dropped out of the top 10, then some other webmaster is happy that they have joined the top 10. In the same way, we have a finite amount of crawling that we can do as well. Bigdaddy is more deep, but we still have to make choices about whether to crawl more from site A or site B.
Well said, arubicus. Adam recently sent me 5-6 sites that he thinks we could do a better job of crawling, for example. So I wanted to give people an update of how things looked right now, but we’ll keep looking for ways to improve crawling and indexing and ranking.
Harith Said,
May 16, 2006 @ 3:10 pm
Hi All
Anybody wish to say hello to our new friend Adam_Lasnik of Google Search Quality team
graywolf Said,
May 16, 2006 @ 3:10 pm
>Linking to a free ringtones site, an SEO contest, and an Omega 3 fish oil site? I think I’ve found your problem. I’d think about the quality of your links if you’d prefer to have more pages crawled.
So is the conclusion that sites that are deemed “low quality” will also have “light crawling” correct?
Sina Said,
May 16, 2006 @ 3:10 pm
Thanks for the feedback Matt, I really appreciate it. Made me feel thoroughly warm and fuzzy inside :). Seriously, it’s really great to have people at Google who directly talk to webmasters and demistify things that can seem a unusual to outsiders. Keep up the great work!
Matt Cutts Said,
May 16, 2006 @ 3:17 pm
graywolf, it’s true that if you had N backlinks and some fraction of those are considered lower quality, we’d crawl your site less than if all N were fantastic. Hope that makes sense. Light crawling can also mean “we just didn’t see many links to your domain” as well though.
Glad I could answer questions, Sina. It’s nice that I didn’t have any meetings this afternoon, so I could just hang and answer questions. Then I’ve got Danny in a half-hour or so. But that’s okay too. Maybe for some of the questions, I can just be like “Ah yes, Sina and I talked about this in paragraph 542. It helps us to crawl some more pages than we index so that we can see which pages might help us improve our crawl coverage in the future.”
arubicus Said,
May 16, 2006 @ 3:19 pm
Boy matt you have to have a vacation after all of these posts you are doing.
“improve crawling and indexing and ranking.”
I personally expect things to move more from an SEO standpoint to more of a QUALITY standpoint in that businesses and sites to compete more on the QUALITY level rather on the SEO level. I believe now (after what you mentioned) this is where you want us webmasters to compete (probably always have). This push for quality will make this a WIN WIN WIN game for all of us.
Matt Cutts Said,
May 16, 2006 @ 3:22 pm
Yup, exactly, arubicus. There’s SEO and there’s QUALITY and there’s also finding the hook or angle that captivates a visitor and gets word-of-mouth or return visits. First I’d work on QUALITY. Then there’s factual SEO. Things like: are all of my pages reachable with a text browser from a root page without going through exotic stuff. Or having a site map on your site. After you’re site is crawlable, then I’d work on the HOOK that makes your site interesting/useful.
graywolf Said,
May 16, 2006 @ 3:22 pm
Yep, clear enough, and what I suspected, thanks.
Michael Said,
May 16, 2006 @ 3:30 pm
Matt, I have to agree with Joe and Anthony in that spanking webmasters for reciprocal links is often unfair. And I don’t have an intelligent suggestion on how to spot reciprocal link breeding facilities vs. honest, natural reciprocal links…at least not anything that can’t be instantly and easily “gamed”.
My industry might be a good example to use to look at reciprocal linking, actually (it’s weddings & honeymoons). In this market, there are certainly a large number of blind link-exchangers out there, adding no value to the end user with their hydroponically engineered reciprocal link spaghetti. But on the other hand, a site like mine (honeymoon travel) might have pages that list a small number of recommended related businesses (e.g. half a dozen wedding coordinator companies in Hawaii…an online jeweler for rings…an association of wedding officiants…etc.). We list other wedding-related companies on our site with whom we’ve done business (and been happy with)…and naturally, many of them also recommend us on their sites. We each are happy to recommend other companies in our general industry whom we believe do a great job for our customers and yet don’t compete with us.
Now, without thinking algorithms, should this kind of link be very important in determining good sites to return to users?
And what should one think about two companies where one thinks the other is great and links to them….but the feeling ISN’T mutual?
So there’s my argument for SEs being VERY careful when it comes to designing algorithms to discredit or punish for reciprocal links. Yes, I realize that massive reciprocal linking campaigns are evil and manipulative, but there may be some baby parts being thrown out with this bathwater.
jonah stein Said,
May 16, 2006 @ 3:30 pm
Matt:
I haven’t experience the pages dropping problem webmasters are attributing to big daddy, but I have seen some behavior I would like to understand.
Through the middle of April, our SERPs showed with our homepage and then the product page indented on the next item. It looked really great. Over the last month, the deep linked pages no longer show up for some high volume keywords, only the homepage.
I won’t list the keywords in a blog, but if you want to look into it, I would be glad to provide a list.. Alternatively, look at our sitemap page and you can see it for the 3rd, 4th, 6th and 7th term listed (terms 1 and 2 are our brand name)
Am I alone in seeing this or does it represent a trend?
Thanks
Jim Said,
May 16, 2006 @ 3:31 pm
Matt,
Somethings been eating at me…
If link exchanges are frowned upon and buying links is a no no. How is a new site supposed to ever be able to succesfully enter a competitive space? It seems the only people who would be able to compete are very old sites (not neccesarily the best) and people who maintain a zillion domains for interlinking purposes. Google seems to be placing an unfair barrier to entry UNLESS spammy tactics are employed.
-jim
-Jim
Matt Cutts Said,
May 16, 2006 @ 3:33 pm
Circling back to folks who just had comments approved. Joe Hayes, it’s not that reciprocal links are automatically bad. It’s more that many reciprocal links exist for the wrong reasons. Here’s an email that I just got:
I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.
Michael Martinez Said,
May 16, 2006 @ 3:34 pm
Matt, everyone knows that Google has a Supplemental Index, but no one outside of Google knows exactly what it is and what its purpose is.
Even if you cannot give us the details, will you please share a working definition that SEOs can point to as the most reliable description?
Matt Cutts Said,
May 16, 2006 @ 3:35 pm
Okay, I gotta go do a pass at email before meeting up with Danny. Talk to everyone later..
Matt Cutts Said,
May 16, 2006 @ 3:36 pm
Michael Martinez, personally I’d think of it as a fallback way that we can return results for specific queries where we might not have as many results in the main index. Okay, now I really am going to go.
Nintendo Said,
May 16, 2006 @ 3:39 pm
>>>>The sites that fit “no pages in Bigdaddy” criteria were sites where our algorithms had very low trust in the ***inlinks*** or the outlinks of that site.
Nice. We can destroy our competition by making spammy sites and then linking to the competition!!! SWEET!!!!
Maybe Google should update ‘There’s almost nothing a competitor can do to harm your ranking or have your site removed from our index.’
at
http://www.google.com/support/webmasters/bin/answer.py?answer=34449&topic=8524
Now it’s easy to harm the competitions ranking!!!!
arubicus Said,
May 16, 2006 @ 3:40 pm
Thanks again for the feedback!
AHFX Said,
May 16, 2006 @ 3:48 pm
Great Update Matt!!!
It looks like I had put it together pretty well in my explanation of why people were disappearing from Google that can be found at http://www.ahfx.net/weblog/80 . I just needed to build on the devaluation of reciprocal links.
The only remaining question is whether it is the reciprocal link that is bad (we had already discussed that reciprocal links were losing value back in November.), or that the “unrelated” outgoing/incoming link that is bad. My bet is on the lack of quality of the inbound/outbound links. It seems the “tighter” the content, links, and tags are, the better the page does. Although, I agree also that reciprocal links should be devalued.
Chrispcritters Said,
May 16, 2006 @ 3:49 pm
Matt,
I’ve seen mentioned that duplicate content can potentially hurt a site. On one of my sites I’ve had people write FAQs, etc, and am now wondering how much of what was written might not be original content. Can you, or anyone else, point me in a direction of being able to check for duplicate content, other than just pluggin sentances into Google. How divergent does content need to be to be considered original?
Matt Cutts Said,
May 16, 2006 @ 4:00 pm
I’m sitting here watching Danny.
Stuey Said,
May 16, 2006 @ 4:01 pm
Matt,
Example, a website contains a “link exchange” button within their navigation. When you look closer, the websites forming the link exchange are real companies but the majority of links are unrelated, e.g. car-hire, wood art gifts, labels. Would I be correct in assuming that the non-related links carry no weight and that the domain is scoring only from the related “link exchanges”. Note: I say link exchanges and cringe as I’ve usually been against this however, having just read your latest note I feel encouraged to build a link exchange page and provide reciproical links to associated quality websites. Have I got the wrong end of the stick here? Thanks in advance for your time.
Adam Senour Said,
May 16, 2006 @ 4:10 pm
Clearly, I’m a bot.
Aaron Pratt, what is your a/s/l?
Matt Cutts, c/t/c?
I am a magic 8-ball. Type !future to read your future.
Okay, goofy stuff aside, this sort of a statement was long overdue. I can’t speak for anyone else, but I was ripping my hair out for the longest time watching people bitch, moan, and complain because their spamtastic sites weren’t getting indexed or that they were dropping. Tough **** for those people. Let ‘em build something worth visiting.
The only problem is that now the idiots will come up with some random and illogical explanation that “linking to other websites and forming alliances isn’t a bad thing, and Matt should be listening to me because I’ve created some 3-page keyword stuffed piece of crap and think I’m an expert.”
Anyone else wanna bet that SEW says something stupid in response?
I just have one very stupid question:
Doesn’t this also lead to the possibility of increased blogspam as far as people reading this comment going and creating BSLogs (TM) full of meaningless drivel about something loosely related to the topic at hand and/or cross-posting to other blogs related to topics (moreso the former concern)?
Personally, I’d rather not see blogs like yours and Aaron Pratt’s and Jaan Kanellis’ blog get dragged down into the mud because a few dumbasses ruin the concept.
Trisha Said,
May 16, 2006 @ 4:11 pm
Matt: ‘I’d recommend people spend less time on trying to gather links that way or via some automated network, and more on making a great site with a creative angle or two that makes the site stand out from the crowd.’
The thing is, just writing great content isn’t enough. I’m not saying my content is the greatest ever in the whole world, but its pretty good. If people can’t find your site, along with all its great content, they will never link to it. I don’t know what the answer is, I can see how some reciprocol links are bad, and how buying links is a problem for SE, etc. But it is extremely difficult to get links to a site with just good content. Unless maybe you know lots of people who can give you links, etc. For shy people like myself its tough, I just don’t know enough people and because of the shyness I haven’t participated in any online communities like I should have - I’m working on that though. It seems that getting traffic from SE is kind of like a popularity contest - its like highschool all over again - I could be real nice and real smart, but too shy to be popular so my site is just ignored by SE.
Oh well, sorry to whine. I’m trying to write high quality blogs to attract links. (Doesn’t seem to be working too well yet though. )
Adam Senour Said,
May 16, 2006 @ 4:13 pm
That’s not what he said. He said the spammy IBLs would not help. He didn’t say they’d hurt. They basically have no effect at all.
The worst thing you’ll do is give that person no increase in traffic. The best thing you’ll do is give them a bunch of direct traffic from your spamlinks.
arubicus Said,
May 16, 2006 @ 4:18 pm
“They basically have no effect at all.”
The only thing I see happening is when your site used to rely on the effects of such links in the SERPS and now since the effects are gone you may see decreased rankings and spiderings (even fewer indexed pages) and lower PR.
Dave Said,
May 16, 2006 @ 4:22 pm
Matt,
That was your best post so far on this site!
The reason I liked it so much was that you gave many examples.
Please keep the examples coming. That’s where we learn the most!
Dave
PhilC Said,
May 16, 2006 @ 4:27 pm
Matt. What you’ve described really sucks, and not only from a webmaster’s point of view, but also from a Google user’s point of view. I know that you are the spam man, so it’s not your fault, but the whole thing is just plain crazy.
What you described means that a website with quite a lot of good, useful pages, won’t be fully indexed unless the site has enough IBLs, and not just any IBLs - certain types mustn’t dominate. What kind of search engine is that? FWIW, I don’t mind the death of reciprocals (I’ve never got invloved in it anyway), but it’s crazy for a search engine to require a certain number of IBLs for a site with a lot of pages to be fully indexed.
For one thing, as a user I want a search engine to show me all the relevant pages that it knows about, and I don’t want good pages left out just because the sites they belong to didn’t have enough IBLs. I want good service from a search engine, and depriving me of good relevant pages is a very bad service.
For another thing, as a webmaster, if my pages are good, index them, dammit. What on earth do IBLs have to do with it? Doesn’t Google want to show good pages to its users? If you don’t want to rank them very highly, don’t rank them very highly, but there is no reason in the world to leave them out of the index, and deprive Google’s users of the possibility of seeing them. It’s just crazy, and makes no sense at all.
No, I’m not talking about the site I mentioned earlier in the thread. Forget that site - there’s nothing wrong with it, but let it go out of the index. I’m talking about Google users who are being *intentionally* deprived by Google, and the owners of perfectly good websites who are being shafted because their sites just don’t happen to have enough IBLs to satisfy Google.
The other nonsense is the outbound links that you mentioned. What the hell has it got to do with a search engine what links a website owner puts on his/her pages? If people want to put affiliate links on a page it’s entirely their own business. And if they want to link to off-topic sites it’s entirely their own business. And if they want to sell real estate on their sites, it’s entirely their own business. It has nothing whatsoever to do with search engines, so why are they penalised by not indexing all of their pages? Why are Google’s users *intentionally* deprived of good and useful information, just because a site’s pages contain things that are nothing to do with search engine’s?
From what you described in your post, Google has consigned many perfectly good sites to the scrap heap, just because they didn’t have enough IBLs, or because the sites had some perfectly valid links in them. And they’ve intentionally deprived their users of a lot of perfectly good results for the same stupid reasons.
Yeah right. Just what Google has always said - concentrate on making a great for visitors. And if the site doesn’t have enough IBLs to satisfy Google??? What a load of ….
Frankly, the whole thing stinks, and it stinks big time! I’m just not going to run around getting unnatural links to satisfy a bloody search engine, as you suggested to a couple of your examples. Why should anyone need to do that? My attitude to it is “stuff it”, and stuff Google!
Anthony Cea Said,
May 16, 2006 @ 4:33 pm
Great post from PhilC, I agree with his statement that IBL should not determine if a sites pages are indexed, Google should not be guilty of selective indexing of the web as Microsoft calls it.
To be a world class search engine you have to index pages to serve relevant results, Microsoft is indexing pages on the web much better than Google and so is Yahoo at this point in time, thus their results are much better and more relevant than Google SERPs.
Chris Bartow Said,
May 16, 2006 @ 4:45 pm
PhilC said it perfectly.
And what really sucks is this is KILLING small businesses that just want clients to be able to find information on them. What do they know about inbound linking or reciprocal linking? They just want to be found for [product anytown, usa]
I have a one off italian pizza place that just wants people searching for catering to be able to possibly find him the area. He’s in Google Local, but some people don’t even look at that, or depending on the query it doesn’t come up. He links with all his other local buddies: a clown, a hotel for catering, an iron worker they did his little cafe fence. Now this seems to be discouraged. They just want to share business, not join this big link scheme.
If i type in my small town name on Google now, the top 20 hits are all gigantic spam sites, that contain the equivalent of a Wikipedia article.
Herb Said,
May 16, 2006 @ 4:55 pm
and what is wrong with affiliate links? how else do some sites make money?
David W Sacco Said,
May 16, 2006 @ 4:57 pm
Thank you for addressing my concerns directly Matt. I do appreciate it.
I must say that I’m really disappointed.
I’m really disappointed that related sites with good and logical reasons to exchange can no longer exchange links without harming themselves.
I’m really disappointed that if an authority site links to me, I cannot link back to the authoritative information they provide without damaging the crawling of my site and theirs.
This is not a matter of “not counting” something. This is a matter of blindly punishing sites, and most importantly, searchers.
No, Google has not not moved forward. They’ve taken several steps back.
Dave
SpamHound Said,
May 16, 2006 @ 4:57 pm
So, how does this relate to the inented index page event that people have been seeing. It’s not hosting crowding
Example: Search for “MY company Name” would normally brining up the listing index page from Google directory. Now, it brings up another page from the site with index page indented under it.
Penalty, fluke, ??
arubicus Said,
May 16, 2006 @ 5:02 pm
“They just want to share business, not join this big link scheme.”
The way I see it is that there is NOTHING wrong with trading links. Just don’t expect higher rankings and faster indexing because of them. If you rely on recip. links and junk scraper/directory links and have not much for any other quality links you may see some adverse effects because those links are not counting for much anymore. Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.
arubicus Said,
May 16, 2006 @ 5:10 pm
EDIT: Go out and promote sure but be smart on who who cross promote with just do expect your ranking to go up because of it.
should read
Go out and promote sure but be smart on who who cross promote with just don’t expect your ranking to go up because of it.
tmoney Said,
May 16, 2006 @ 5:11 pm
thanks for clearing everything up matt.
enjoy your new man-boobs on your plastic surgery vacation.
love,
tmoney
Nancy Said,
May 16, 2006 @ 5:16 pm
Matt,
First, I appreciate you maintaining this blog and responding to some of the comments.
I realize you can’t analyze every site, but from what I’ve seen at Webmaster World, the sites you have picked are not very representative of the sites which are having problems with the supplemental index and not being crawled. The sites you have picked are obvious offenders, but sites such as my own and many others have none of these issues. To us, it seems that building a site to the best of one’s ability isn’t good enough; unless you can play the Google game, you’re out of luck. For instance, the inbound link issue. There are only a couple active fansites related to mine (most are no longer updated, and my site is only a few months old). Therefore, I am stuck with a couple inbound links unless I try to contrive inbound links, which I have no desire to do. Of course, the related sites also naturally link back to me - I’m related to them too, after all! Now that’s bad? It’s quite a Catch 22.
I think one should hesitate to imply that all the websites with supplemental problems “deserve it” because they’re all doing something so terribly wrong that they no longer are recognized by the index. There are many sites which do not fit into this penalty schema that have lost pages - too many to blow off as abberations in an otherwise successful change.
I care because my site, the last time I checked, had seven pages out of over 600 that are non-supplemental, and it is jumping wildly in the Google rankings daily for main keywords, varying from 35-75 any given day. Meanwhile, it varies between #6 and #8 on other search engines.
But frankly I am more concerned with the fact that so many pages with good content are being ignored. If I were #105 for my keywords but could look at site:[my site] and see that my pages are indexed, I would be OK with that. At least they’re there, and people who are looking for content unique to my site can find it. However, now, according to Google, only 7 pages on my site are searchable for the average Google user - only seven pages of my site exist in Googleland. I can put exact phrases from supplementally indexed pages in the search engine and get no results returned. With almost nothing indexed, I feel like all my honest efforts are worthless to Google for some mysterious reason.
Yes, it’s your search engine and you may do what you like. However, I’m sure you understand that a search engine that throws out good content is not doing its job. Hopefully, you will not shrug off the numerous legitmate concerns because you were able to find in the vast array of e-mails you received some egregious offenders.
Alan Said,
May 16, 2006 @ 5:23 pm
Matt,
Thanks for confirming my theory. I - and a few others - have been saying all along that the Dropped Pages bug is being caused by a faulty or out-of-date backlink index.
You just confirmed it. Do you honestly think that all of the people making a noise at the moment are naughty people with some irrlelevent outbound links, or “not enough inbound links”? Isn’t it far more likely that Google just arent’t finding or indexing the backlinks properly since Big Daddy?
Are you looking on Yahoo or MSN for backlinks before you go generalising about sites not having enough? Because that’s Big Daddy’s problem: many, many, high quality backlinks are just not registering as backlinks anymore. It’s a bug. You must have a very low opinion of an awful lot of people to just dismiss us all as whining idiots who didn’t know you need a few backlinks. Take a look at Yahoo’s backlinks for the effected sites before you condemn them all to the garbage.
How long is it going to take you guys to notice your backlink bug? It probably doesn’t help that you keep deleting any comments that mention it.
Wayne Said,
May 16, 2006 @ 5:25 pm
I would recommend as one other poster that if Google wants to get a handle on reciprocal link farms to look at real estate sites. I have pointed out before and I have been guilty of this myself but there are huge link farms operating with high Google rankings that are nothing but link farms.
Multiple site creations on the same subject, directory creations, scrapper sites all that are created to increase the manipulation of Google and to benefit the present link farm group even further in Google.
A Good example of this was some research that I performed last week on our # 1 competitor in Google. Out of 1000 links, this site had 40% of them coming from 5 IPs. Yet Google has rewarded this type of linking scheme with top rankings.
Based on my own personal experience Google has rewarded reciprocal link farms and continues to do so. Based on these subject sites if a link farm is created and is themed, Bigdaddy is rewarding these unnatural link schemes.
You have groups and some Seo companies that are able to point 1000s of links at their clients sites or create a closed off network of themed reciprocal link exchanges that are not natural according to Googles definition. Myself and others as I am sure you understand Matt that these systems are only meant to manipulate Googles serps.
On the flip side of this coin is the fact that new sites who are trying to compete with these sites must follow the example set by Googles reward of high rankings of these practices. As long as Google rewards even a few sites with these type practices new sites that may offer more to the online user will forever face an uphill battle for business in Google.
dude Said,
May 16, 2006 @ 5:29 pm
so, no affiliate links? or how many is ok? cause you know, why not just kill the affiliate business model all together.
let’s have a look at some examples: amazon.com - currently nothing but a site promoting other site’s merchandise but have own transaction processing capability and sell some books whathaveyou on the side (177 million pages indexed by google). any site providing syndicated news? nothing but a “duplicate content” aggregator. every coupon site on the web (type in “coupons” in google, all those sites are there) is nothing original but a bunch of affiliate links (mostly cloaked). are you gonna not index any of those? i say let the users decide which ones they like most. bookmarking rate maybe? i don’t know. things like that. backlinks? well if you delisted all the sites that originally linked to some site, there will be no backlinks left i guess. you know all the small sites that decided to give each other a boost.
Falcon Sky Said,
May 16, 2006 @ 5:33 pm
Great post Phil C. It’s nice to see somebody who is pro business. Google wants to corner the market on search but has stifled small business’s ability to make money. BD seems to favor only their “fat cat friends”.
Google: Our goal is to index the entire world’s information but
alas we’ve found it more lucrative to censore.
Leigh-Ann Said,
May 16, 2006 @ 5:37 pm
I have a question about sites missing from the index, and I wasn’t sure where else to get a reply, so I hope you don’t mind me asking here.
Last fall I had five sites completely banned from Google for having “outgoing links to pharmacy sites”. I removed all outgoing links from all the sites, and filed reinclusion requests. One site, a PR 7, was immediately back in the index and continues to show up on page one of the search results. The other four sites have never reappeared at all, despite the fact I made the same modifications to them.
The Google reinclusion people wrote to me in March about my missing four websites, saying, “Please be assured that your site is not currently
banned or penalized by Google.” When I wrote back and asked why my sites were missing completely (grey bar, and the domain not in the index at all), I was told the matter would be investigated by the engineers. That was three months ago, and my sites are still invisible. They’ve been gone from Google for 8+ months now, after being in the index previously for over two years.
Have my sites been “sandboxed” or something, prior to reinclusion? They were only a PR 5 or 6, so did the PR 7 site get some sort of priority? I really would like my sites back in your index, and I’m at a loss as to how to achieve that when your own engineering team claims my sites aren’t banned at all.
HaHa Said,
May 16, 2006 @ 5:43 pm
Matt, it seems that google picking on reciprocal links just makes it more attractive to buy expired domains.
you always avoid talking about this type of webspam, yet its doing more to upset the balance of good serps tahn any other type of spam.
You also mention that blogs are a great way to develop one way links.
That also plays into the spammers hands.Expired blogs still work a treat and that profile I gave you many weeks ago is still live and active. http://www.blogger.com/profile/17839170
So much for your inside man at blogger taking care of it.
Paul Said,
May 16, 2006 @ 5:52 pm
Matt, thank you for the update While I appreciate the information it does little to change my philosophy that it is almost impossible for small site (25 – 100 pages) playing by the rules in a competitive market to rank in Google.
It is sad to come to the realization that the only sites that Google feels provide any value to the web are the large multi-nationals or sites with 10k+ pages and thousands of incoming links. How relevant will Googles results be if webmasters abandon efforts to rank in your index and focus their efforts on the other engines?
Jon Said,
May 16, 2006 @ 6:07 pm
So Matt,
Are you partly responsible for this debbacle then? Even if you didn’t have a backlink bug (which clearly you do), your logic is fatally flawed. The innevitable end result of requiring more and more inbound links before you will even dane to index a site is Spam. Spammers do this stuff full-time. They spend no time on content, and no time on value-added functionality.
The more ludcirous hoops you make sites jump through to qualify for the index, the more you pave the way for Huge Companies or Spammers. The in-betweens get sidelined.
Incidentaly, why does a site need a gazillian artifically bartered inbound links before it is worthy? No one at Google seriously believes that inbound links are still a measure of relevance do they? Have you read your own posts? They talk none stop about how to go about aquiring the right kind of links.
You’ve all lost the plot. You’ll delete this message without even bothering to pause and consider whether or not I’m right.
pageoneresults Said,
May 16, 2006 @ 6:08 pm
Ah, finally. Maybe now we can finally kill off the link exchange program cottage industry. A few particular countries are not going to be happy about this!
Hey Matt, when is Google going to implement the long awaited SERPs Randomizer? I mean, we’ve talked about it in the past and it would be great to see those first 30 SERPs rotating randomly. Do that and watch the life expectancy of a search engine marketer drop by a few years.
Jeff Said,
May 16, 2006 @ 6:36 pm
Matt,
I know google is not giving us webmasters a full picture with the link command. I did the link command on yahoo and msn and I noticed some scraper sites copied my content and added some links to a few of my websites. I have a feeling google is looking at these links as questionable. I am in the process of emailing these scraper sites webmasters and getting the links removed because I did not request to put them there and they violated copywrite by taking our content.
Since google crawls better than msn and yahoo, will there be a way in the future for us webmasters to see these links? Honestly right now if a competitor wants to silently tank a websites rankings in google all they need to do is drop a bunch of bad links. Without google giving us webmasters the ability to see the links we may never even know this could happen.
Linda Said,
May 16, 2006 @ 6:36 pm
Hi Matt. I appreciate what you have explained here. I suffered through supplemental pages earlier than many others, and at this time I am happy to report that nearly all of my pages have returned when doing a “site:” type search.
Unfortunately my Google traffic has not recovered yet. At one point it dropped down to about 2% and has recently risen to around 5%. This is not good as it used to run closer to 75-80%. Have surfers changed search engines? I don’t think so as the total numbers from other engines hasn’t varied a whole lot.
Earlier I did a search for a page on my site and it was found on the 4th page. That’s fine for that page, but the sites that came up ahead of it were not even related to the subject and only mentioned in passing the words that I had searched for. I expected to see well known sites in the very same niche appear in that search, however none did. It looked like crap was floating to the surface instead. It looked like relatively had disappeared out the window and that cannot be good for Google’s business.
ScottW Said,
May 16, 2006 @ 6:45 pm
Great post Matt, thanks for sharing all the insight. Congrats on getting more help recently, I hope that this frees you up to make more posts like this.
Caios Said,
May 16, 2006 @ 6:52 pm
Hello, Im really new to dealing with google and I really appreciate finding some feedback from you guys, great!
I have new site that has about 3750 pages. The total indexed pages are constantly hopping from 30 to 340. It would be great if I could get them all indexed. lol
But I’m completely lost as to what I am supposed to do to get all my pages indexed? I really dont want to be going around the net trying to get links to my site and we are being told its better we create good content instead. But hang on how will my great content get indexed if I have no links? As your also saying we need links to get indexed, but not any links they must be “good” links. Im lost again! lol What I mean is that for someone with little experience reading that they need links its really hard to judge what are good links and be able to find places to get good links. This again seams to mean that established sites with big SEO budgets are always going to be ahead regardless of there content.
I think PhilC made a really good point above too. I have some unusual specialist information on my site that isn’t indexed. There are currently no results for related search terms for this information. Now where is the benefit for people that these pages arent indexed as there is not enough links pointing to them?
What if you have one large site with 60k inbound links that has a page of information about a subject and it’s the only page returned for a search term. Then you have a small site with no links that hasn’t been indexed but has a similar page that’s a 100 times better content wise. Why not index it and show it second in the results? Surely that’s better for everyone?
Lastly, if the dropping of people index is because of site trust issues then why is my own new sites index going up and down like a yo-yo? Newly indexed pages then hardly any pages and then newly indexed again. Is it having trouble making up its mind if my site is trusted or not?
Chris Said,
May 16, 2006 @ 6:54 pm
Hi, Matt!
I was wondering if you guys changed something to the algo in the last days…
A few hours back, my site dropped from 3 pos to nothing, although it’s a good site. The sitemap acct doesn’t show any spam warning, but google started to delist the pages…
Can you have a look? I’m a total mess now…
Thank you,
Chris
Jon Said,
May 16, 2006 @ 7:03 pm
Are you kidding Chris?
Did you read Matt’s post? Your site is a piece of junk not worthy of Google’s index. It’s true. Matt has personally checked. And every site that has been de-indexed that he has looked at has not had enough inbound links or else has had outbound links that are just completely off the wall. Imagine a real estate site having the gaul to link to some other kind of site. What a joke. You’d better get busy and go after links. It’s links links links from now on. It’s official Matt says so. You are junk if you don’t have links. Google love blogs you know. You shouldn’t really be allowed to have a website nowadays unless you are willing to link yourself silly on your own blog. It’s the future you know. And it’s great. Matt says so.
Anthony Cea Said,
May 16, 2006 @ 7:24 pm
Yeah, for a porn site that is some great spam work indeed man!
Google is taking porn sites out of the index if you have been reading the news there are lawsuits flying around about them being in the index!
Zoe C Said,
May 16, 2006 @ 7:43 pm
Oh, my, goodness! It just so happens that at about the same time my remaining indexed pages disappeared I had just added a reciprocal link to my site!!! Ugg!
Soo… now that I’ve removed all links from my minute template based website and added a no follow command to the three remaining links, should I expect to see a change in indexed pages on the next crawl? Or am I banned for a year or something?
By the way thanks for the update I’ve been stalking your blog for over a month waiting for something like this post.
Heh… and I’ve only ever had two internet customers… (but they were recently which is why I was inspired to get my site indexed
)
Wayne Said,
May 16, 2006 @ 7:50 pm
Yeah the funny thing about that ranking is that my site is real estate, not porn. It only shows a flaw in Googles algo and ranking system. I kind of liked Midwestnets comment on DP
Fisting lessons with your new house, anyone?
The page ranked for that term is a property detail page of a listing in Las Vegas. First I thought ok maybe this page was hijacked but it hasnt been, then I thought ok did someone get access to the site to change title tags and meta descriptions, wasnt that.
Checking that page I found no backlinks to it with that anchor text, so this only leads me to believe that somehow someone at Google turn over a cup of coffee on their computer
and caused all this mess..LOL
Harry Said,
May 16, 2006 @ 7:54 pm
Zoe C,
Shame on you. You added a reciprocal link! Why? It’s a simple fact that natural links just materialise out of thin air if you are any good. How? Because people find you, think you’re great and link to you. How do they find you? Why, on a search engine of course…ummm…wait a minute…Oh my god. The system is flawed! Heh Google. You’re a bunch of idiots.
I guarantee, history will not look kindly on this particular period in Google’s history.
Dimitris Said,
May 16, 2006 @ 8:00 pm
Hi Matt,
I wanted to ask a couple of questions. In the next days I am going to launch a new site that will be offering a certain service to bloggers and webmasters. Basically it will offer a script for free. I am going to ask the people using it at their blogs and websites to link back to my site, that can attract all kinds of backlinks because the script can be used at any kind of site. If some sites from bad neighborhoods according to google use this script and link back to me will this penalize my site?
The other thing that I would like to ask is: on my blog I have a niche affiliate store related to my blog’s theme as way of monetizing it. Will this lower the overall trustrank of my domain? for example can this cause a decrease of the rate my blog is being crawled or cause my site to loose it’s current rankings for certain keywords?
If that’s the case I think it would be very unfair, it would be like msn penalising sites that have adsense code on them.
Thank you,
Dimitris
Halfdeck Said,
May 16, 2006 @ 8:01 pm
Matt, thanks for your great post.
One question relating to sites that send traffic in exchange for linkbacks. Say 20,000 sites link to a page, and in tern that page sends traffic to each of those sites. Here’s the twist: that page rotates links in and out periodically, so that on any given day, it only displays 200 links. I consider the 20,000 incoming links as manufactured links, but technically, 16,000 of those links are not reciprocal. Will Google be dealing with this type of linking scheme anytime in the future?
“What do you think of that? Hmm? I said ‘What do you think of that?’ Don’t answer. You don’t have to answer everything.”
OBizTek Said,
May 16, 2006 @ 8:06 pm
Hi, Matt !
This is a very valuable post indeed ! It has given good insight over the quality parameters which Google considers when indexing the pages.
A better web can be made by openly sharing the problem & comments.I feel that there is need for something/ some forum where volunteers /enthusiastic can contribute to share their real time expereince about black hat seo /non ethical SEO practices followed by many sites in an annoymous way.This will help to improve the Google filters continuously and a better web can be made.
Thanks & Regards,
Ajay
Khoj Badami Said,
May 16, 2006 @ 8:31 pm
Hi Matt,
Thanks for the post! I pretty much expeted everything you have said. After all Google is going to keep trying to improve itself so after all in the long run only the quality sites are going to last. Any thing that tryies to game the SE with backlinks or whatever will eventually get kicked out!
Anyways,on my site I have a link to my “web stat counter” at the bottom. Will that be concidered as a bad link at the bottom to have?
I have other bad links too…but i want to know specifically about the web stat counter link? Is it a bad link to have?
Thanks
right reading Said,
May 16, 2006 @ 8:49 pm
I have a small, noncommercial, ad-free site (with good-quality content). You could say I’m not so much a webmaster as just some guy with a website. There are a lot of people like me, who seem to be being left behind by the new Google with its infatuation with giant business enterprise.
From my perspective, both yahoo and msn do a far better job than G at returning results from my site when they are pertinent to specific search queries. At some point — early February as I recall — I noticed that traffic to my website had virtually stopped. I then found I had dropped out of the Google index. After a little research I decided I was being penalized for duplicate content (which probably occurred when I moved the site to a new domain). I filed a reinclusion request and at least got my site indexed, although at its previous host — defunct for almost a year — it was still showing better results than the same site its current location last time I checked.
Right now I feel I’m doing about all I can, which is to improve and expand my content and hope someone notices. Maybe Google will some day start to return better results from my site so that traffic will pick up again, but it’