- Oct. 4th, 2011
- 7 comments
After SMX, everyone is scrambling to support Schema.org's embodiment of Microformats. Our customers are asking us to implement it, but I don't think everyone is thinking this through. What do you think? This is what I think—
1. It's Sex for Spammers
Take a Mozilla Instance, fire up jQuery, walk the internet as you please, and select the same things over and over again. It changes a whole lot for spammers because you can write exactly one commodity spammers' toolkit to handle exactly 1 Microformat standard (Schema.org) and get reliably clean data. It hurts you. It hurts Google. It hurts the internet. There is no way to avoid this.
Vendors will be ripping off other vendors trivially for their painstakingly massaged data. But Google doesn't care about that. To Google, information is free. Information is free, but facilitating plagiarism on a massive scale of refined embodiments of that information is a big deal. Furthermore the synthesis of the information is copyrighted. We all know vendors scrape other vendors' product PDFs, descriptions, etc., but up until now it required writing some custom code per site. Using tools like Boilerpipe and DiffBot make this easier, but with Microformats this becomes trivial for even amateur spammers with 1 silly little toolkit.
2. It's Broken
For example, the markup for breadcrumbs is idiotic. This was obviously a rush job, and it's not a community effort. Nope. Google owns it, much like Sitemaps. See #4.
The test tool doesn't work, ironically, mostly for the Schema.org format. The preview rarely works—if ever, and it doesn't even understand the (broken) breadcrumb microformat.
I would link directly to the part of the page that describes the specification for breadcrumbs, but their markup is ridiculously bad. I'm not saying I'm a saint, but if you're going to preach about the semantic web, at least use semantically-meaningful markup.
3. It's a Cop Out
Google is admitting its natural language processing just isn't there yet. Much like rel=canonical and friends, this is basically an admission by Google that it can't really figure this stuff out, so you'll just have to do it. It's like asking your friends to diagram their sentences.
Buzz (VB.) off (ADV.)?
The next step is that they get everything force fed to you, use it for Google Products, and don't even want or need your navigation. Trust me, that's next.
On a truly semantic web, Google can take your ItemPages, but ignore all of your CollectionPages. After all, they have your SKU, product name, description, and price. They don't need you. Once everyone realizes how toxic this can be, it will be too late. Much like those people who vow they won't shop at Walmart, the follow-through never really materializes. They're not going to remove the data.
4. It's a Power Grab
See #2. This is a rush job. RDFa-based approaches are much cleaner, but we have to do what Google says. Google also laid down the law on Sitemaps if we recall Google, after all, wants to organize the world's information, so it's natural that they'd also want to control the underlying format. Some others have noticed this. See:
5. It's Yet Another Thing To Do & Maintain
When you re-template, you're going to have to do it all over again. No, you can't just throw all the data in a hidden DIV.
GodGoogle said you'll be turned into a pillar of salt if you try to make your life easier. Whenever you modify your document, you're just going to have to make sure you don't mess up the microformat data.
I'm not in love with this. Not at all, but on some level I'm also disagreeing with Tim Berners Lee, so take this with a grain of salt. Joost de Valk's blog has only positive thoughts on this topic. That's not to say there aren't positive aspects. It's just pretty clear that there are things to worry about.
"7 Wise Comments Banged Out Somewhere On The Internet ..."