Google Europe Blog: Working with News Publishers

Working with News Publishers

Wednesday, July 15, 2009

declaration

User-agent: *
Disallow: /<meta name="googlebot" content="noindex">unavailable_after
Image: 'Robots wallpaper,' Jelene (Creative Commons Attribution)

Update on 7/20/2009: The word "crawling" in the fourth paragraph has been replaced with "indexing."

Posted by Josh Cohen, Senior Business Product Manager

31 comments :

The Asnah's JourneyJuly 15, 2009 at 10:37 PM
Totally agree! The publisher should wake up and re-consider themselves how to stay with the rest of people who are always 'online' in humble and sustainable way...

stop ego-ing and they will survive from the down side of printed media era... pray for them!
ReplyDelete
Replies
ThesinghJuly 15, 2009 at 11:10 PM
Very well and politely phrased reaction. I'm interested if anyone from the new industry will respond.
ReplyDelete
Replies
UnknownJuly 16, 2009 at 12:19 AM
I believe the problem is not Google, search engines, or crawlers; is the content business model that has to be review, adv fragmentation does not cover alone quality content (created by someone who lives of doing so); google will become shortly in the largest content seller when publishers start to have suscriptions or micropyments for content, thats my end game.
ReplyDelete
Replies
Andy BeardJuly 16, 2009 at 12:41 AM
You might want to be a little more accurate with your descriptions.

A page blocked with robots.txt can still appear in the SERPs, but Google won't have crawled it.
The snippet would use either link anchor text or DMOZ title for the title, and DMOZ description if available.

Noindex, Google will still crawl the page, and links can confer PageRank to other pages.
The page won't appear in the SERPs.

For Noindex to work, Google has to have access to a page, thus if you mix both the robots.txt disallow directive with meta noindex, Google will obey the robots.txt, and thus can't read the noindex.
The page can still appear in the SERPs, with title from anchor text or DMOZ.
ReplyDelete
Replies
AnonymousJuly 16, 2009 at 1:38 AM
Reaction in style. I'm eagerly awaiting as tension builds up on what news industry has to say in reply.
ReplyDelete
Replies
AnonymousJuly 16, 2009 at 1:50 AM
The problem is the Robots Exclusion Protocol is far too binary. A publisher must either allow all uses or none.

Those, like the author of this article, that want to think that the solution is so simple are not looking at the problem from all sides.
ReplyDelete
Replies
Denis GorodetskiyJuly 16, 2009 at 2:38 AM
kick'em off, Google!
ReplyDelete
Replies
UnknownJuly 16, 2009 at 5:46 AM
I'm surprised here was no mention of ACAP the standard for robot.txt files that the World Association of Newspapers has been working on with, I understood, google and others.
ReplyDelete
Replies
UnknownJuly 16, 2009 at 6:15 AM
Big news media can't have it both ways.

If they want to play in the social media sandbox, they will have to play by social media rules as they evolve.

They don't want to be indexed?

Perfect.

More eyeballs for me.

Maurice Cardinal
Editor: www.OlyBLOG.com
ReplyDelete
Replies
Viajar TemporadaJuly 16, 2009 at 6:23 AM
Well Google invades the sites, many people do not know how to block indexing.

Some cases we have seen on the Internet for sites that are hacking even before being launched as the Google index the page without permission.

I believe that the way that Google works in some parts is very failure, including the exploitation of content from other people, we know that small businesses could not grow as much of the bill is intended to advertising within the search engines (where the content is used so wrong).

I can not be unfair and say that Google is a bad company, most believe that the internet goes beyond online advertising, the business model used by Google is no different from old media and believe it will become obsolete as old media.

Besides that if you have good content you are competing with low-quality sites that are only there because they are paying part of the links, even with an entire process of verification within the Adwords quality is still very imperfect, because in the world various searches and some have little content in search engines, where you can see a clear difference in quality of service quality, even if free could have a better way to find the company paying the bills.
ReplyDelete
Replies
AnonymousJuly 16, 2009 at 6:33 AM
Ah, you guys are cool tempered. I would have just removed Burda Media from the index after they complained.
ReplyDelete
Replies
EZJuly 16, 2009 at 7:01 AM
I've been in the newspaper business since 1989. I was on the internet before most newspapers were, before there was even a World Wide Web. I always thought that as soon as publishers started to realize that they were losing money because they had content available for free they would just pull it back.

Some have and some haven't. The NYT has gone back and forth. The Wall Street Journal has always had a pay wall. Neither are doing particularly well.

That's because it's not about the price charged for the content; it's about selling the audience.

Newspapers are in trouble because they forgot how to sell audiences to advertisers.

When I worked as a newspaper circulation executive my goal was to make enough revenue to cover variable costs of printing and distribution, in essence making it free to distribute. Advertising sales had to cover all of the overhead - salaries, benefits, building, maintenance, presses, trucks, news gathering, etc.

This is not unlike TV, radio or even the internet. Radio and TV always gave away content because once the studio was built, the contract with the talent was signed and the transmitter was in place, there was very little cost to deliver the content.

However that content was sponsored so all of those other fixed costs could be paid with enough left over for profit, pension plans and a decent Christmas party.

This is what newspapers need to relearn. They can block Google or any other search engine from indexing their content so they can charge for it, but that's just not going to generate enough money to run the rest of the operation. Newspapers need to engage audiences and sell those audiences to people that want to reach them. Until they start doing that again they will never be profitable. People are just not going to pay enough for content to make up for all the losses in advertising dollars newspapers have seen.

Hiding from Google isn't going to help. Finding new ways to delight and amaze audiences, and proving to advertisers that your audience is delighted, will.
ReplyDelete
Replies
kochi64July 16, 2009 at 8:36 AM
Looks like a fair solution to me. Media have a choice.
ReplyDelete
Replies
Michael AndersenJuly 16, 2009 at 8:43 AM
Well, I'm in the news industry (and let me tell you, it's an old industry) and I also agree completely with the above.

That said, it's worth acknowledging that the newspaper/magazine publishers do have a valid argument: Google is more valuable because of their content, and maybe there ought to be a half-measure between "block all bots" and "search engines crawl everything for free."

Maybe Google thinks there's no profit for them between those two extremes. And they're probably right. But let's not pretend traditional publishers are the only ones making a choice here.
ReplyDelete
Replies
UnknownJuly 16, 2009 at 8:50 AM
@Brendan:
What should the news industry respond to that polite and *right* article? "Sorry, we didn't our homework and didn't read the the f....g manual?"

Two lines of code. As easy as this. Nothing else more. End of discussion.

I think, the news industry has to learn, that they can earn money even on the web, but under new and other conditions than within the last decades. But they have to change, and they have to have the willingness to change. Otherwise, their companies will die.

Yours,
Thomas, Munich, Germany
ReplyDelete
Replies
ElmarJuly 16, 2009 at 9:34 AM
There ist a fundamental difference between a) indexing web pages and then directing traffic to them and b) using (parts of) the content on your own sites.

Unfortunately, Googles reply completely disregards this aspect, which IMHO is the true core of the problem.

Elmar Thiel, Hamburg, Germany
ReplyDelete
Replies
UnknownJuly 16, 2009 at 3:22 PM
Way to go Google !
ReplyDelete
Replies
vanderleunJuly 16, 2009 at 6:27 PM
"Well Google invades the sites, many people do not know how to block indexing."

So, you're saying that people who know how to build a web site don't know how to block indexing? Sorry but that doesn't make any sense at all.

One of the stalking horses here is typical of Europe. Wanting someone else to pay.

What would make these news sites happy would be if they could force Google to pay them for content they won't keep off the web. A share of google ad money for content.

That's what's up here. The quest for easy money.
ReplyDelete
Replies
Patrick A. GoffJuly 16, 2009 at 9:04 PM
But what if you are a publisher of original news and feature stories and Google refuses to carry your content whilst carrying that of your competitor publications? They have an anti-European, pro-American bias. They should either publish all news stories in a subject area or none, and be impartial. Currently Google is myopic and very US oriented. They should be barred from all European News media until they oiperate fairly
ReplyDelete
Replies
AnonymousJuly 16, 2009 at 10:14 PM
let them die in peace and rest in history
ReplyDelete
Replies
UnknownJuly 16, 2009 at 11:17 PM
A VERY elegant answer! Google search result page is also an aggregator. If anyone doesn't want to be aggregated, ha can easily avoid that.

Igor Loginov
ReplyDelete
Replies
UnknownJuly 16, 2009 at 11:29 PM
Hmmm . . . but what happens when Google ignores a Robots.txt protocol?

This . . .
http://healthkey.com/robots.txt

Versus this . . .
http://www.google.com/search?source=ig&hl=en&rlz=1G1GGLQ_ENUS263&=&q=site%3Ahealthkey.com&aq=f&oq=&aqi=

I see it ALL the time, btw. ;-)

Brent D. Payne
SEO Director
Tribune Company
ReplyDelete
Replies
Douglas Carnall, @juliuzbeezerJuly 16, 2009 at 11:36 PM
I heartily concur with your article. If publishers don't wish their content to be online, there is a simple solution: take it off.

Underlying this, I wish Google would NOT serve links to material that is hidden behind toll access barriers, or at least make a browsing option that hides this from view.

When I use the internet I want FREE information. I know that there is more information out there on a topic published commercially, but if I wanted to buy that I would be looking in a library or bookshop.

It seems to me publishers want it both ways: they want Google to generate demand for their product by placing it in searches, then hide it behind toll access so they can charge for it. This isn't how the internet works: the internet is primarily about about free access to information. If you don't want to play in this game, that's fine: please do go and consign yourself to irrelevance.
ReplyDelete
Replies
L ' IndividuJuly 17, 2009 at 12:24 AM
Totally agree.
No publisher is forced to have a website, so get out of internet if you don't like it, but don't try to force the others to follow your dead rules. The rules of internet are clear since Sir Berners-Lee created the WWW long time ago, and no businesmen has the right to change them.
ReplyDelete
Replies
Lhasa-ApsoJuly 22, 2009 at 1:45 PM
the is a lot you can do with robots.txt and specify the dir's wich are index and noindex....
ReplyDelete
Replies
SockeJuly 24, 2009 at 1:32 AM
It's not about Google, the name is just used as a synonym for internet by people who can not adapt do new distribution channels.

They claim that their "quality journalism" has to be protected as their source of income. The internet, and especially the search engines, help us consumers to find the source of most news, the big agencies like Reuters! Most of the "quality journalism" is copied word by word from the agencies tickers.
In the days of dead tree publishing we rarely found out.

A lot of the rest of the "quality journalism" is copied from blogs and social websites, often without credit to the author and no payment thrown in as good measure.

I have to subscribe to the printed edition of the local paper so I can subscribe to the online edition at extra cost, this just doesn't make any economic sense! Especially when said paper has not much more to offer than the agency news.

Oh, and they have a sports journalist who used one of my pictures without my permission!
ReplyDelete
Replies
AnonymousJuly 28, 2009 at 6:14 AM
Great one.. Thanks for posting..

Work from home
ReplyDelete
Replies
AnonymousJuly 28, 2009 at 6:14 AM
Great one.. Thanks for posting..

Work from home
ReplyDelete
Replies
VijayAugust 11, 2009 at 8:07 AM
Dear Josh Cohen,

I have a site that I did "protect" from indexing, from the very beginning, with a robots.txt file, i.e.:
User-agent: *
Disallow: /
This robots.txt file is still there. However, my site content is showing up on Google.

Whilst I agree that publishers should protect their content if they don't want Google to index it, it seems to me, at this point, that Google has ignored the robots.txt file. Can you help me figure out how THAT happened and what I can do to have it fixed.

Thank you
ReplyDelete
Replies
RNB ResearchSeptember 5, 2009 at 2:53 PM
Hello I just entered before I have to leave to the airport, it's been very nice to read your post, it is very interesting and very informative. I liked it!!!!!!
ReplyDelete
Replies
AnonymousNovember 5, 2009 at 10:46 AM
hi! This template is simply super.... website development
ReplyDelete
Replies

Add comment

You are welcome to comment here, but your remarks should be relevant to the conversation. To keep the exchanges focused and engaging, we reserve the right to remove off-topic comments, or self-promoting URLs and vacuous messages

Europe Blog

Working with News Publishers

31 comments :

Labels

Archive

Feed

Company-wide

Products

Developers