Guest post: Vanessa Fox on Organic Site Review session

I almost made it. I got through one day of a three day conference, and I was still blogging, caught up on email, and I’d checked my RSS feeds. Then on the second day, I spoke on three panels, stayed up talking SEO until 3:30, and it all crashed down. Now 80-90 emails sit unread in my inbox, and I’m behind on everything else too.

During the conference, I was talking to Vanessa Fox from the Sitemaps team. You probably know her from the Sitemaps blog, and she was also at WMW Boston. It turns out that she took lots of notes at the organic site review panel.

“Would you like to do a guest post on my blog?” I asked. “Sure, why not?” Vanessa replied. That’s cool, because my summary of that panel would have been something like:

The last time I did this panel, SEOs realized how much things like paid links could stick out like a sore thumb. In a different panel at WMW Boston, Rae Hoffman illustrated that other SEOs could easily see paid links using open tools like Yahoo’s Site Explorer, so it doesn’t even require the special tools that a search engine has. On the bright side, every site in the organic site review panel looked white-hat and had serious questions; paid links didn’t come up for discussion once during the panel.

without going into the detail of all we talked about. So I’m glad Vanessa is willing to cover it in more detail. Without further ado, here are Vanessa’s notes on that session:

“””
I’ve been having a great time here at Pubcon Boston, talking to webmasters, getting feedback, and learning about what they’d most like from Google. I sat in on the Organic Site Reviews session, both because Google Sitemaps is a site review tool from a different perspective and because I wanted to make funny faces at Matt while he was talking.

I normally blog for Google Sitemaps, but Matt asked me if I wanted to do a guest post here (probably to keep me too busy paying attention to make funny faces at him).

The strongest point I got from the session (and I knew it already, but it became so apparent) is that you don’t need any special tools or secret knowledge to evaluate your site. Search engines want to return relevant and useful results for searchers. All you really need to do is look at your site through the eyes of your audience. What do they see when they get to your site? Can they easily find what they’re looking for? Webmasters are looking for some other secret key, but really, that’s all there is to it.

The panelists (Matt, Tim Mayer from Yahoo, Thomas Bindl from ThomasBindl.com and Bruce Clay from Bruce Clay, Inc.) looked over the sites that the audience asked about and offered up advice.

Subscription-based content
Googlebot and other search engine bots can only crawl the free portions that non-subscribed users can access. So, make sure that the free section includes meaty content that offers value. If the article is about African elephants and only one paragraph is available in the free section, make sure that paragraph is about African elephants, and not an introductory section that talks about sweeping plains and brilliant sunsets. If it’s the latter, the article is likely to be returned in results for, oh say, [sweeping plains] and [brilliant sunsets] rather than [African elephants].

And compare your free section to the information offered by other sites that are ranking more highly for the keywords you care about. If your one free paragraph doesn’t compare to the content they provide, it only makes sense that those sites could be seen as more useful and relevant.

You could make the entire article available for free to users who access it from external links and then require login for any additional articles. That would enable search engine crawlers to crawl the entire article, which would help users find it more easily. And visitors to your site could read the entire article, see first-hand how useful your articles are, which would make a subscription to your site more compelling.

You could also structure your site in such a way that the value available to subscribers was easily apparent. The free content could provide several useful paragraphs on African elephants. Then, rather than one link that says something like “subscribe to read more”, you could list several links to the specific subcategories availabe in the article, as well as links to all related articles. You could provide some free and some behind a subscription (and make the distinction between the two obvious).

For instance:

African elephants — topics available once you register:
Habitat ($)
Diet ($)
Social patterns ($)

Related articles:
Asian elephants
Wildlife refuges ($)
History of elephants ($)

Sure, having all those keywords on the page might help the page return in results for those keywords, but it’s not just about search engines. Visitors to your site have a much better idea about what’s available after registration with a linking structure such as that than with “subscribe to read more”. Visitors want to know what “more” means. Ultimately, you care about users, not search engines. You just want the search engines to let the users know about your site. You want to make uses happy once they do know about it.

Flash-based sites
It’s not only Googlebot who doesn’t watch a 20 second video load before the home page comes into view. A lot of users don’t either. Some users don’t want to wait that long; other users don’t have Flash installed. If all of your content and menus are in Flash, search engines may have a harder time following the links. If you feel strongly about using Flash, just make an HTML version of the page available as well. The search engine bots will thank you. Your users will thank you. Feel free to block the Flash version from the crawlers with a robots.txt file, since you don’t need your pages indexed twice. If your home page is Flash, put the navigation outside of the Flash content. You could offer a choice on the home page so users can choose either the HTML version or the Flash version of the site. (You might be surprised at what users choose.)

Images as text and navigation
What is true for Flash is also true of images. Many users have images turned off. Try viewing your site with images turned off in your browser. Can you still see all the content and links?

Sites in general
You know what your site’s about, so it may seem completely obvious to you when you look at your home page. But ask someone else to take a look and don’t tell them anything about the site. What do they think your site is about?

Consider this text:

“We have hundreds of workshops and classes available. You can choose the workshop that is right for you. Spend an hour or a week in our relaxing facility.”

Will this site show up for searches for [cooking classes] or [wine tasting workshops] or even [classes in Seattle]?

It may not be as obvious to visitors (and search engine bots) what your page is about as you think.

Along those same lines, does your content use words that people are searching for? Does your site text say “check out our homes for sale” when people are searching for [real estate in Boston]?

Next, consider this page name:

123244ffgfhdsled99eddgdd.html

It doesn’t take a special tool to know that the URL isn’t user-friendly. Compare it to:

african-elephants.html

But you can have too much of a good thing. It also doesn’t take a special tool to know that this page name isn’t user-friendly:

african-elephants-and-their-habitats-and-diet-and-history-and-extinction-possibilities-and-this-page-is-really-great.html

And speaking of putting a dash in URLs, hyphens are often better than underscores [Ed. Note: bolded by Matt 🙂 ]. african-elephants.html is seen as two words: “African” and “elephants”. african_elephants is seen as one word: african_elephant. It’s doubtful many people will be searching for that.

Other tips:

* Don’t break your content up too much. Users don’t like to continuously click to get the content they’re looking for. Search engines can better know what your site is about when a lot of key content is on one page. When the content is broken up too much, it’s not as easily searchable. FAQ pages don’t have to be 15 different tiny pages; often one page with 15 questions on it is better for users and search engines.
* Make sure your content is unique. Give a searcher a reason to want to click your site in the results.
* Make sure each page has a descriptive <title> tag and headings. The title of a page isn’t all that useful if every page has the same one.
* Minimize the number of redirects and URL parameters [Ed. Note: I’d keep it to 1-2 parameters if possible]. And don’t use “&id=” in the URL for anything other than a session ID. Since it generally is a session ID, we treat it as such and usually don’t include those URLs in the index.

Google isn’t secretive about these tips (http://www.google.com/webmasters/guidelines.html). And the panelists in the session reviewed the sites using tools readily available. They looked at the sites, read through them, and clicked around. No magic needed.
“””

Thanks, Vanessa!

88 Responses to Guest post: Vanessa Fox on Organic Site Review session (Leave a comment)

  1. Hi Vanessa

    Thanks a bunch for a very informative educating post.

    Wish to see more Googlers posting here, not to say that Matt isn’t doing a great job. But diversity add value, you know 😀

  2. Excellent review of the one of the best sessions during the conference! The only thing I can add is that there was a discussion about large sites with load-balancing servers, where one server could be generating an error that only a robot would see. If that error page happens to be the only thing a robot fetches during a crawl, the real page will not rank highly for its targeted search terms.

    All of you Googlers did a great job during the conference, and I’m glad that you made it back safely. I look forward to seeing you again on the East Coast!

    Brian M

  3. I’m curious why an underscore does not delineate 2 seperate words, where as a hyphen does. It seems to me a space can also be represented as an underscore, why are the 2 not treated the same?

  4. Hi Vanessa,

    Re the note about hyphens.

    Why does african_elephants.htm & african-elephants.htm get bolded in a search for African Elephants when African_Elephants.htm & African-Elephants.htm do not?

    Why exclude capitals from bolding even when used in the search phrase.

    Gary

  5. Wow. The stuff Vanessa has highlighted is mostly obvious, but it’s great to have it spelt out in black and white by an authority.

    Flash is something that ad. agencies love to do to wow the client – it’s not something that users love to watch, except for in very rare instances. There’s way too much gratuitous flashing going on 😉

    And user testing your site is so obvious, but so rarely done. (Mine is a two-bit blog, but even that can probably do with some user-testing!) A webmaster’s familiarity with a site and its content creates a completely different baseline from which (s)he is working, which means that assumptions are made that shouldn’t be.

    Great post.

    Dan.

  6. Wonders what’s secretly being hidden in the ‘postmeta’ class tag that’s so teeny-tiny small I can’t read it 😉

  7. Hi Matt,

    While I respect your opinion, I find it silly that you often recommend dashes over underscores. I’ve detailed my reasons in an article I wrote several months ago:

    http://12pointdesign.com/advice/dashes_vs_underscores.asp

    You can probably guess my opinion based on the URL, but give it a read before you dismiss me as ignorant.

    Thanks,

    Shawn K. Hall

  8. Can Google detect a link network like the one Clear Channel runs for all of it’s radio stations?

    All of their stations have those text links at the bottom and I hear they charge a pretty penny to have one placed there.

    This is just one of the stations:
    http://www.aggie96.com/main.html

  9. Thanks Vanessa 😉
    We can’t know what do you looks like because there is no picture about you on the web 🙁
    Keep up the good work!

  10. Of topic or ….
    I honestly can’t believe that google still isn’t able to match african_elefants in a url as “african elefants”.
    And that &id=xxx or ?id=xxx is being taken as session id by the crawler.

  11. It’s interesting that google treats phrases containing _ as one word but treats a hyphen as a space. There aren’t many words in the dictionary with an underscore in, but plenty that contain hyphens (co-pilot, re-cover (as in cover again), extra-curricular) but in a URL co_pilot is one word while co-pilot is two? 🙂

  12. Nice reading Vanessa – and thank you for your time. Just to pick up on point regarding the use of variables in site query strings. What about popular forum software such as phpBB, vBulletin and InvisionBoard? Are they penalized in any way for having URLs such as forumdisplay.php?f=2 or showthread.php?t=1?

    It’s long been a question plagueing forum webmasters and some “official” confirmation would be a great insight.

  13. The recommendation about hyphens in URLs is GOOD.

    Always avoid using underscores or spaces in URLs. Those can both cause a lot of problems.

    However, my preference is to use dots in URLs, something like http://www.domain.com/the.folder/the.sub.folder/some.page.html and that also works very well too.

    In fact, dots, commas, colons, and hyphens all appear to work the same way as each other. To me, dots look much more neat than hyphens do too.

  14. Gary Elliott, I’ll pass that feedback on about capitalization.

    Dean Clatworthy, I think that should be fine. It’s the specific string “id=” that Googlebot is often allergic to.

  15. Matt,

    Maybe you could clarify if &id= makes the allergy, or is it also ?id=

    I’ve heard from people who think it’s only &id= (i.e. second parameter) and not if it’s the only or first parameter.

    Thanks for your feedback / insight. I don’t care to change one of my sites to test it, but If I could point people to a reason to change it on their sites that’d be great.

  16. I have seen pages named abc.def.html
    How does that stack up against abc-def.html?
    The difference between the hyphon and underscore was noted.
    Is there a difference between the hyphon and dot?

    Thanks for the insite!

  17. abc-def.html is the abc-def domain. abc.def.html def is the domain, and abc would be a subdomain of that, not unlike mail.google.com is still part of google, as apposed to mail-google.com would be a completely different domain. Hope that’s phrased correctly.

  18. Hi Vanessa – great to meet you in Boston and hope your first Pubcon was a lot of fun. This mean you are now required to go to Las Vegas!

    Matt some pix of you here: http://www.flickr.com/photos/53175402@N00/

    I’m also blogging that excellent site review session and listing all nine of the reviewed sites.

  19. Jeremy Wong 黃泓量

    Thank Vanessa for your guest post.

    It sounds that Google is using some words of a page to classify the page, or even word statistics to verify the field a page should belong to.

  20. Vanessa!

    “* Don’t break your content up too much. Users don’t like to continuously click to get the content they’re looking for. Search engines can better know what your site is about when a lot of key content is on one page. When the content is broken up too much, it’s not as easily searchable.”

    The “Classical Webmaster Thinking” has been to break content when its “naturaly” possible. That allow for better possibility that one of the pages might rank high on the serps for the specific keywords/keyphrases. Furthermore, it also contributes to site growth. Of course, when all done in ethical manner 😉

  21. Thanks Venessa,

    >>>make sure that paragraph is about African elephants…

  22. Thanks Matt,

    As Google bolds both african & elephant in the url african_elephant.htm for a search for African Elephant this seems to contradict the “One Word” underscore advice in Vanessa’s post.

    We also have URL’s using underscore that rank No.1 or very high in the serps.

    Gary

  23. Consistent use of human-friendly directory and page naming schemes will have the additional benefit of rendering Web Analytics (site statistics) reports much more intelligibile.


  24. But you can have too much of a good thing. It also doesn’t take a special tool to know that this page name isn’t user-friendly:
    african-elephants-and-their-habitats-and-diet-and-history-and-extinction-possibilities-and-this-page-is-really-great.html

    and is this URL is user-friendly?
    http://www.mattcutts.com/blog/guest-post-vanessa-fox-on-organic-site-review-session/

    Not sure…
    Do What I Say, Not What I Do…

    (thanks Aye-Aye)

  25. Matt,

    Waiting for your post on Adsense bot doubling as Googlebot in crawling after your confirmation on Jensense.

    So practically are they the same now? If adsense bot crawls a page and googlebot doesn’t, is the page going to be in SERPs? Or is adsense bot only for a refresh crawl of a page, and not for the first crawl?

    As googlebot visits have dropped on many sites (and new pages not being crawled despite linking from the homepage) will an adsense bot visit to those pages mean that its OK, and no need to worry?

    As you said, if the double duty is a bandwidth saving measure, is the bandwidth issue that has led to fewer crawls on several sites?

    Thanks

  26. > Why does african_elephants.htm & african-elephants.htm get bolded
    > in a search for African Elephants when African_Elephants.htm &
    > African-Elephants.htm do not?

    Because the last time they rewrote the text highlighting routine for URLs, they (mistakenly?) used a case sensitive search instead of a case insensitive search.

    And since the keywords inserted by the user are alway “lowerized” before the comparison, the routine highlights only lowercase keywords in the URL.

  27. Dashes vs. underscores

    In fact, Matt has also posted last year about the same subject

    http://www.mattcutts.com/blog/dashes-vs-underscores/

    And it seems that Vanessa and Matt do agree upon:

    Always choose dashes instead of underscores 🙂

  28. All good guys.. Insightfull information but apart from the long urls advice, nothing new here. You keep puzzling those who develop a genuine website with unique content. There is probably a ton of reading to do in order to find what is happening and we’re not ranked well for our target keywords, but no crucial tips.

    Imagine what would our sites look like if we did not have to spend hours reading around forums to find out how spammy sites make it to the top of your search results and genuine sites have difficulty appearing even in the top 100 results.

    I am gonna keep saying that if Google spent more on improving their search algorithm(s?) and not on making the most out of AdWords, we wouldn’t be facing any such problems. Nothing personnal here. It is just really unfair for genuine sites.

    You say that you care about the results, but still you get like hundreds of spammy subdomains in both generic and non-generic keywords. And those sites make it to the top, while most of us who are small sites struggle to survive…

    Hey is the fact that all these reports about 15% of ppc traffic is fraudulent lowered your advertisers’ spending?? Is it a coincidence that ever since so many webmasters complain about being badly hurt in terms of ranking?

    Is that why you sent credits to adwords advertisers? You seem to push us towards advertising and I can prove my sayings with hard facts. If this is indeed the case, then your business practices have really changed in a completely different direction than the original google idea.

    Sorry for the tone of my message… I am so dissapointed by the search engine I grew up with… Nothing personnal to Matt or Vanessa here, who I am sure are great folks…

  29. Wow Venessa!

    Well written. Great advice for anyone! Kind of reminds me a bit of some of the old school SEOs.

    I really like your comments on the Flash, but my felling is that Adobe (formerly Macromedia) 🙂 needs to work closer with Google and the other SEs to get a better solution for indexing the content and not the animations and movies in Flash.

    I notice some companies using CSS content and a Flash Replace script to avoid a splash page that stops the visitor and makes them have to give an extra click (I personally hate that extra click and splash pages). Is this new Flash Replace methodology safe? I think its great, but I expect it will get used for spam and then eventually some SEs may start penalizing it. Any opinions on this are welcome!

  30. QUOTE: “* Don’t break your content up too much. Users don’t like to continuously click to get the content they’re looking for. Search engines can better know what your site is about when a lot of key content is on one page. When the content is broken up too much, it’s not as easily searchable.”QUOTE

    From a consumers point of view and I will use real estate content as the example.

    Most consumers will not nor do they want to read through a homepage that has 1500 or more words on the page. Peoples attention spans are short and therefore lose interest after reading through about 500 to 700 words. You can judge this by the average amount of time a visitor spends on a page.

  31. Hi Matt,

    thanks for the collection of site-creating hints.

    Did you and Google ever thought about loosing white Hat Webmasters to the “Dark side”?
    The guys on wmw
    (see details on the last 3 comments on
    http://www.webmasterworld.com/forum30/33893-15-10.htm )
    discuss about this circumstance.
    Many White-Hat-SEOs with good unique content websites hire high-quality programmers and other expensive investments.
    Terrible Problems like the socalles “Supplemental Hell” (4 Weeks) and the ongoing mysterious site: and “sites dropping fast” Problem lowers their income to 20% of the January or 2005 level.
    So they seriously thionk about producing spamsites and/or creating black-hat sites for buffering their income for such Problems. (so called “go to the dark-side”)

    Sorry for my weird english – but i think, this is a serious reason for upcoming spamsites with “good quality”, because the sites were created by Professionals.

    What do you think about this “idea”? What think the other Comment-Writers?

    Greetings – Markus

  32. Bad example, as 1500 words on a real estate home page would be absurd.

    Dont confuse what Vanessa’s point is with intelligent site architecture.

  33. Interesting post, thanks Matt and hiya Vanessa!

    Will read every word of this monday morning.

    Just don’t let some madman have access to your blog like Robert Scoble did while he focuses on his mental and physical health, guest bloggers are great but we do actually come here to visit YOU Matt. 🙂

    I have been meaning to look around and write up a “Who else is cool at Google” thing then harrass them daily until the allow me to interview them.

  34. Hyphens better than underscores…. gasp. There is work to be done… I suppose chaging thousands of pages from underscores to hyphens in one swoop wouldn’t be beneficial in the short term either. You can never win at this game. 😛

  35. Brian Mark, I’d always thought of them as interchangeable (?id= vs. &id=), but it could be that they’re not because of various regex matching. I’d try it both ways as an experiment. Personally, I should try to ask the crawl team to take a fresh look at that heuristic. Adding it to my list for the next meeting.

    Thanks Joe! I really like the skull and crossbones laptop pic:
    http://www.flickr.com/photos/53175402@N00/132679356/
    That sticker is from Guitar Hero. 🙂

    Gary Elliott, if you’re happy, I wouldn’t change or migrate the page. Don’t assume that just because we’re bolding in the african_elephant url that the elephant term is getting credit though.

    TOMHTML, point taken. I should really choose a shorter slug when I do obnoxiously long (but accurate) titles.

    Matt, I was hoping to do that post today, but catching up on my email is taking longer than I expected. My wife is out of town for a few days starting tomorrow, so maybe I can do it tomorrow.

    Wayne, it’s true that you can go overboard on long FAQ pages. Let me drill down. Okay, I just found the timeline feature of Google Desktop (bitchin’!). The page I was specifically talking about was http://www.corporatecasuals.com/main/faq.aspx
    Note how the FAQ has the questions up top, and they link to the answers, which are all on that same page? For a relatively short FAQ, a page like that provides plenty of material to match in a search engine. Putting each of those answers on a separate page would give you more pages, but because each page is shorter, it will be less likely to match what people really type in. The first answer, for example, has the words “Concord MA” on it, but not the word “embroidery.” The second answer has the word “embroidery.” So that FAQ page can match a query like [custom embroidery concord ma], whereas putting those questions/answers each on a separate page wouldn’t match that query.

    I hope that makes sense. Certainly you can overdo it if you carry that out too far, because few people want to read boringly long web pages. But in general if you have a kinda-short FAQ, I’d recommend doing it in the all-on-one-page style that http://www.corporatecasuals.com/main/faq.aspx used, unless you know of a good reason not to.

  36. Thanks Vanessa for the nice post.

    Regarding the use of hyphens instead of underscores in the URL, how much an impact can it have in SERPs? I mean, would you invest your time replacing your underscores for hyphens? Or would you, say, invest it creating new fresh content?

    Actually, if I search for “Machu Picchu hotels” (http://www.google.com/search?hl=en&q=machu+picchu+hotels), hyphens don’t seem to dominate over underscores… is this example the exception to the rule?

  37. Good evening Matt

    You need to exercise more 😉

    http://static.flickr.com/48/132679356_39defd0a5d.jpg?v=0

  38. Matt,
    I just ran across this weirdness. What in the world is this????

    http://www.googlecom.com

    One of the links on it says something about Google’s homepage and that link is googlenet.com or something like that? Is this a joke or is someone getting ready to get a legal notice? Looks like there’s a duplicate of about every page that would normally be linked to Google.com….

    Jan

  39. Harith, that’s because I spend all my time fighting spam. 😉 Would you rather have a larger Matt or less spam?

    Charlotte, if you already have a system set up to do underscores and have links/rankings, it’s probably not worth changing things over. But if you’re starting on a new domain, I’d go with dashes.

  40. I really like your comments on the Flash, but my felling is that Adobe (formerly Macromedia) needs to work closer with Google and the other SEs to get a better solution for indexing the content and not the animations and movies in Flash.

    I’m normally all for progress, but what scares me about this idea is that abominations like 2advanced.com will end up outranking sites that cater more to the users with content that doesn’t consist of needless animations and sounds.

    The problem with Flash is that it’s horribly misused in some cases. Flash is best used as an accent to an existing site, perhaps for a banner ad, interactive content, and of course Group X rock videos, but not for entire sites.

  41. Would you rather have a larger Matt or less spam?

    Less spam, because spam sucks.

    Oh yeah, and according to the question you’d live longer and be healthier if you got rid of the spam. But your health and well-being doesn’t matter to the rest of us. Just get rid of the spam. 😀

  42. Good morning Matt

    “Harith, that’s because I spend all my time fighting spam. 😉 Would you rather have a larger Matt or less spam?”

    Less spam, of course. But you shouldn’t be surprised if I write Mrs Cutts to hide your car keys and give you only one choice:

    Mountain bike… from home to the plex and visa versa 😀

  43. hi matt,

    is there a difference for google(bot) between these two urls:

    http://www.domain.com/news_person.php?person=456&name=bruce+campbell
    http://www.domain.com/news-person-456-bruce-campbell.html

    i assume the second one is ‘better’ (cleaner).

  44. Thanks Matt, shall wait for the pst on the adsense bot.

    While on the topic of bots, there has been a reported sudden drop in googlebot visits and crawls of new pages for several sites.

    I am sure you must be aware of the issue.

    Among the sites I monitor:

    site a) googlebot visits drop massively from 500-1000 a day to 10-100. New pages are not crawled. Old pages retain rankings.

    b) well ranked niche site – initial page drop from index for a site: search, then remaining pages turn supplemental.

    c) powerless site left alone after creating it mostly – pages crawled, ranked, then suddenly pages vanish, old nonexistant pages reappear in index as supplemental.

    d) site that got a 100 crawls a day now gets 1-2, maybe 5 on a good day.

    In WW, some say that its only sites which had some kind of canonical issue that are seeing this problem. There has been no clarity on this issue. As far as I know, none of the sites have any kind of penalty and are as whitehat as they get – meaning, not even attempts at link-building – pure content sites.

    Is it something the webmaster has to fix? Something Google has to fix? Any information / advice on this would be a big relief.

    Matt (not Cutts)

  45. Matt,

    I have a serious new spam issue I would like you to know about. Its a trend dominating some UK financial serps. How best to report as I tried the generic spam report and nothing was done?

  46. Thanks for the post Vanessa!

    Can we say this is the definitive answer on the _ versus – in links debate now then?

    I am in the same position as many others I imagine and have fallen on the wrong side of the fence as my main site is built using underscores. Hoever I am doing well in Google for my important keywords so I am not going to create a load of 301’s to change pages to new hyphen based versions if Matt and others think it will not make a great difference but will certainly take this advice on board for my next sites!

  47. Hi,

    I have seen some sites using slashes to separate auto generated
    pages in this context. How does google treat that?

    alt 1) somesite.com/children/books/bob-the-builder.html
    alt 2) somesite.com/children-books-bob-the-builder.html

    of the two above, which would be preferrable?

    also, a small followup question on the subject would be, if both are leading to serving the exact same content, will this make google choke, or redflag the site?

    Thanks in advance
    Andreas

  48. I would like to add to Matt’s comments above.

    Googlebot visits a few of my sites about the same as it use to, but it no longer updates all the pages. One page Googlebot visited last Tuesday, and also about 2 weeks before, but the old cached version remains.

    This seems to be a new behavior because Googlebot would visit a page and about 24-48 hours later the cached version would be updated.

    Some of these sites where plagued with the supplement result problem, and now those pages have been just stripped leaving very few pages left in the index.

    Seems like I’m being penalized, but submitting a reinclusion quest doesn’t really work since I don’t know why I’m being penalized or if this is just some new behavior.

  49. Andy, that sounds exactly right on _ vs. -.

    Chris and Matt, Bigdaddy will crawl differently that the original Googlebot. Are you willing to mention some concrete sites that I could check out?

    frank, those two would be nearly equivalent in Google’s eyes, in my experience. I agree the second one looks a little bit cleaner to my eyes though.

    Gary Elliot, can you leave a comment under a different name/email so that it isn’t pre-approved, and put *** DO NOT APPROVE *** right at the top, and then include the specifics? That will let me read it.

  50. Good evening Matt

    Are you still interested in receiving spam reports regarding keywords stuffing, gateway pages and hidden texts? Any keywords to be mentioned at top of such reports?

    http://www.google.com/contact/spamreport.html

    Can you assure us that such reports would be taken care of, and if justified a prompt action shall be taken?

    Thanks.

  51. “Chris and Matt, Bigdaddy will crawl differently that the original Googlebot. Are you willing to mention some concrete sites that I could check out?”

    Come on Matt, you must know there is a problem.

    Every webmaster I know has been wondering why they are losing pages, why some pages have turned supplemental, and why the googlebot isn’t crawling properly.

    Check the forums for post after post about these issues.

  52. With regard to the comment about http://www.googlecom.com

    I think that is just a domain alias – that domain is owned by Google itself.

    I have noticed that people sometimes miss off the dots in the domain name – and browsers can pressume to add a .com to the end of domain names at times.

    Hence, typing in http://www.googlecom (minus a dot) will sometimes be rewritten by a browser (depending on its settings) to http://www.googlecom.com

    The big (scam) domain name is to buy up wwwdomain.com – minus the leading dot – which sort of points to http://www.wwwdomain.com

    Try it for big brand domain names – you’ll be surprised how many do not go where you expect.

    Try Sprint Nextel’s website – http://wwwsprint.com for example (sorry about the popups!)

    There is money to be made from misstyped domains – and the fingerslip miss of a dot when typing is quite a common error. I now typically buy the wwwdomain.com variant when buying a domain name as well. Just to be safe.

  53. Matt,

    I have been looking for a guideline somewhere concerning the length of a url but have not found one anywhere.

    I posted a query on a couple of fora about the levels to which Google has crawled my site as I am seeing none of my lowest level pages indexed. If I type in the name of one of my pages (xxx-xxxxxxx-xxxxxxxxx.htm), I can find the parent page with the list of related topics but not the page that is most relevent.

    When discussed it was posited that it might be because Google could consider that the page and folder names were too long and thus counted as spam. If this is the case I would hope that the guideline that we were breaching might be found easily and in the same place as the other webmaster guidelines: /webmasters/guidelines.html#quality, but I cannot find such advice.

    I know you don’t usually comment on individual cases, but could you give advice in general?

    Séan

  54. “And speaking of putting a dash in URLs, hyphens are often better than underscores [Ed. Note: bolded by Matt ]. african-elephants.html is seen as two words: “African” and “elephants”. african_elephants is seen as one word: african_elephant. It’s doubtful many people will be searching for that.”

    Are you talking about the title tag or the URL?

  55. Vanessa,

    Excellent round-up of the organic site session, unfortunately I missed this PubCon but hope to make it over “the pond” next year – time permitting.

  56. You say: “Search engines want to return relevant and useful results for searchers. All you really need to do is look at your site through the eyes of your audience. What do they see when they get to your site? Can they easily find what they’re looking for? Webmasters are looking for some other secret key, but really, that’s all there is to it.”

    IF ONLY! Try looking for lyrics on Google. HUNDREDS of sites are returned. Try looking for lyrics to a specific song: the same HUNDREDS show up.

    But visit some. Oh, the song title’s there on the site. There’s even a page with the meta tags set to show the lyrics are there. But the page has NO LYRICS.

    The problem? Google, in it’s desire to “return relevant and useful results for searchers” is doing anything but. It’s actually sending searchers off on wild goose chases all over the net because it is unable to monitor what these sites actauilly offer as content.

    So you end up with searchers finding site after site listing every song a singer ever sang, promising Google and the searcher the lyrics are there, and then turning up pages with no lyrics.

    If you rocked up at a car showroom where the signs said: “We sell cars, we have cars in stock, you can have a car now, and then you found it was an empty showroom, you’d be p****d.

    But HOW do we get Google to listen to our complaints?

    I run a very modest web site for lyric hunters. But it delivers what it says on the can – lyrics. If we don’t have ’em, we don’t pretend to unlike others who use Google to propogate the myth that they do.

    It’s time you guys invested some of the cash we fund you with (through adsense etc) by employing folks to actually check out web sites rather than simply rely on your bots which, with the best will in the world, are thick as two short proverbials.

  57. As with johns note above I have been submitting re inclusions requests that just dont seem to get looked at. For some 2 years we have been redevelping the site to get away from affiliate content and now it is 90 / 95% unique. Even after this investment and devleopment has taken place we are loosine pages in your index not gaining them. What are we supposed to do about this?

    Unique content on and around the subject matter. Internat tezxt links to the pages that say what the pages are about. Unique titles and meta tags that once again say when the pages are about. What more can one do?

  58. Good to see common sense practice being confirmed by the authority on such things.

    Couple of points…

    Underscores in URLs also disappear when you underline them (and they look like a space). Another reason why I prefer dashes.

    And I don’t understand why there is such a discussion about ?id=45 vs &id=45 at all. Both SUCK as URLs for a public website, so instead of trying to figure out which format is best, go read up on mod_rewrite (apache) or ISAPI Rewrite (IIS).
    Apart from replacing that ugly question mark with google friendly keywords, it has the added bonus of being good for the user. As Vanessa rightly pointed out, what is the user going to expect to find when they click on “african-elephants.htm” compared with “index.php?id=45”

  59. Thanks Matt and Vanessa for another great post. It is most amusing to read these thread comments thoguh, and notice that even when senior Google staffers themseleves spell out widely known ‘SEO 101 facts’ many web designers ‘shoot the messenger’ and question Google’s proven system rather than accomodate it.

  60. Although its been an year for this conference and the 2006 conference is on the verge of happening in 3 months’ time, these Guidelines still look fresh with all the important tips and careful guidelines.

    Thanks Vanessa for your guest post at our favorite blog, it surely has helped many a people.

  61. I started a little experiment to check the claim that google ignores the id parameter in URLs at .
    The setup is simple: The script accepts a parameter “id”, “page” or “pid” to select content – the content is different for different values of the parameter, but it doesn’t matter which parameter is used to select the content.

    After two weeks, all combinations of parameter names and values have been visited by Google’s and MSN’s bots and the content can be found via Google and MSN. A Google search shows the pages with the “id” parameter, while an MSN search shows the pages with the “page” parameter.

    So, I conclude that the claim that Google ignores the id parameter is false. In fact it seems to prefer the URL with the id parameter to other URLs with the same content (maybe because it is shorter).

  62. Oops, the URL got lost because I enclosed it in angle brackets:
    http://www.hjp.at/tests/google/paramtest.cgi

  63. Hi Matt,

    Thanks for your tremendous insight. I am wondering about your comment pertaining to subscription-based content…
    “Googlebot and other search engine bots can only crawl the free portions that non-subscribed users can access.”

    If this is the case, how is it that big sites like the New York Times and Wall Street Journal seem to have some content indexed that is only accessible via login/subscription. I don’t notice it as much today as I have seen in the past.

    Here’s one example in the search results:
    WSJ.com – Special Report – Breakaway: Focus on Small Business
    It can be a drag going on vacation with small-business executives. … See previous editions of Breakaway: A Focus on Small Business: …
    online.wsj.com/documents/breakaway2001-4.htm

    Are they using cloaking? Thx!

  64. Hello Matt,

    I’m quite happy to have discovered your blog via Aaron Wall’s website. Postings like this one are very helpful and for this I thank you. Now I will be following the insights and useful information that Vanessa Fox has to share. Your apparent desire to help like minded people is admirable. I just wanted to let you know that your efforts are appreciated.

  65. Hi Matt and Vanessa,

    Thanks for the information. I am always looking for ways to improve and it is articles like this that help me to do so. Thanks for sharing your wisdom.

  66. Can someone name me one single word in any dialect of English (apart from Ebonics, perhaps) which includes an UNDERSCORE? In fact, I’m unaware the underbar appears in ANY natural language word. So how come search engines have promoted this symbol to express a meaning it has never had in the real world? — acknowledging, of course, that search engine logic is never inevitably connected with the real world.

    A bit of history: several generations of *nix folk have been DELIBERATELY using underscore in file names rather than dashes to assure these names are unambiguously understood to be spaces, and they have likewise avoided using dashes for that purpose because many SINGLE words in natural language DO include a dash. Underbars may be reliably understood to indicate word separators, therefore, whereas dashes most certainly cannot.

    So while it is reasonable to have search engines treat a dash as a space in domain names and other URL elements (which — like *nix — do not allow literal spaces), deliberately wiring search engine logic to regard underbars as characters in human words is spectacularly bizarre and perversely counter-intuitive, since underbars never appear within words when used be real people.

    So if search engine logic is actually broken in this way, the appropriate response is not to merely explain it away like a spring thunderstorm, but to open up the source code — and fix it.

  67. I was not at this session at Webmasterworld about organic site review but Vanessa Fox (who I spoke with a couple of times at the WMW conference) was and made some pretty good notes – and I’m including those notes

  68. Thanks Vanessa for useful tips!

  69. Vanessa, can you advise the link to your own blog

  70. Vanessa, you are the BEST! 🙂
    thanks!

  71. I’ve had some good results with africanelephants.htm – but I take it Google sees this as one word?

    Still seems to work and google can definately pick out the word (according to serps) – hyphens do seem more intuitive though….

  72. I have read the entire topic and comments, And I must say it is a really useful insight in the way Google sees URLs. I have always known google and other search engines preferred african-elephant.html instead of id=154

    But I thought this was in the past (like in 2002), but it seems this practice still stands. so I it makes sense to be changing all our links from the id=458&jde=144 to the african elephant style.

    But I am still confused on sessionids used by amazon.com in their URLs. How come they do not get penalized by search engines for using them? They must serve duplicate content immensely. Or do they serve different pages (withoud the sessionid) to bots?

  73. Good site. How to start a presentation design. OK

  74. “…However, my preference is to use dots in URLs, something like http://www.domain.com/the.folder/the.sub.folder/some.page.html and that also works very well too.

    In fact, dots, commas, colons, and hyphens all appear to work the same way as each other…”

    Can anybody confirm this?

    thanks
    fox
    Switzerland

  75. Thanks Vanessa for useful tips and thanks a bunch for a very informative educating post

  76. Thank you very much. These infos make websmasters’ work easier!

  77. “I’m getting married!”

  78. Bad example, as 1500 words on a real estate home page would be absurd.
    http://pornoru.ho.com.ua porno
    Dont confuse what Vanessa’s point is with intelligent site architecture.

  79. Okay..
    Firstly, hi folks.
    Secondly, thank you for the fantastic article.

    Now to get cranky 🙂

    So, does no one else spot how damaging the distinction between a – and a _ (hyphen/underscore) is going to be to a lot of sites?
    There are some out there that have been following a naming convention for several years… there are others using CMS’s that use one or the other with no real choice between the two…

    … and someone(s) somewhere decided to go ahead and make such a fundemental and serious decision without making it major knowledge?

    Just wondering if anyone else finds that sort of thing not only annoying and concerning, but more than a little … well, irresponsible ?

    Regards,
    Lyndon

  80. Hi Matt (and Vanessa),

    I’m interested in this comment:

    “You could make the entire article available for free to users who access it from external links and then require login for any additional articles.”

    We plan to offer free books online but required a user to register to read beyond the first chapter. I’m a little confused by the suggestion above. If we link to the book from another website, track the http referrer and allow the whole book to be viewed when people come from that site, but not from other sites, would that be helpful for SEO or would that be considered cheating?

    Thanks!

    Nat

  81. Nice post .. hope it will help … But I have quite good results for index.html then anyone else !!

  82. Interesting point on the short FAQ pages. I hadn’t really thought about the fact that longer pages might get better results (more visits) due to the fact they have a wider range of keywords. I’ve always worked on trying to concentrate subjects so they would rank on more specific terms.

  83. Hello,
    Thanks Vanessa for the nice post.i also agree i m getting more results as compare to anyone else !!

    Thanks for your attention and prompt response to my letter.

    Regards
    Alex Bell.

  84. Misleading information Matt!. “&id=” is indeed typically a session id however “?id=” is typically a page and is 100% index able and used my millions of database driven websites.

  85. The hyphen is better and information Matt has provided is true but it is a Google only bug/feature as the underscore is a well used separator.

  86. Which would be better – the hypen, underscore, directory etc.,? Also why do I get so much trouble for now having dates in my sitemap – it’s not accepted but for me as the user, I don’t want them…hmm

  87. Vanessa and Matt,

    Thanks for this post. I’m not an expert into websites. I’m volunteer for this spiritual website that we maintain to update the devotees with our mission activities, events etc.

    Recently, I’ve been concerned about our website not being listed (only few pages are displayed) by search engines. In this post I’ve found that “hyphens are often better than underscores” – I’ll start implementing it as most of the pages I’ve created have underscores. Also I created a sitemap.xml and have submitted to google using webmaster tools 🙂

  88. Thanks for the good information. Im getting better results.

css.php