Monday, July 06, 2009

Follow Me On Twitter: I'm Changing My Sharing Pattern

While I've just had a burst of posts after digging out from a busy and successful second quarter, thanks to:
  • My newfound love of bit.ly and the ease with which you can Tweet from bit.ly
  • My deep backlog of blog posts; I actually blog on about 1/4 of my candidate articles
  • The realization that I can provide valuable sharing with a short comment and a link to a story
I've decided to change my sharing pattern and share more work-related items via Twitter. So my Tweet rate and my Tweet/post ratio have both already started increasing.

Historically, I viewed Facebook as about friends, the blog (and LinkedIn) about work, and Twitter somewhere in between. Going forward, I'm going to view Twitter as an extension of the blog and the only relic of my past confusion will be my username, inspired by the Allman Brothers.

XQuery's Real Potential: Transforming Application Development

Oracle's Daniela Florescu gave a talk at UC Irvine this May entitled The Magic Is In The Glue: XQuery + Cloud. A coworker pointed me to the slides here, which I thought I'd share in a more convenient SlideShare-based format and agree with violently in more than a few respects.

First, let's excerpt the abstract to tee things up:
Ten years have passed since the W3C initiated its effort to design a query language for what, in 1999, was a new and controversial semi-structured data format, namely XML. A decade (and a lot of effort) later, the (now programming) language and its implementations are finally reaching industrial strength and are being taken up by customers as a solid alternative for building complex applications.

Meanwhile, independently of the development of XQuery, and completely orthogonal to any programming language or application development infrastructure, a new buzzword is becoming more and more visible in the IT arena: the "Cloud." In this talk I will describe the poor state of current application development, which has serious limitations and inconveniences, and I will explain why, today, innovation in this area is unavoidable. The applications bubble is about to burst: existing software components, architectures, programming languages, database models, and communication protocols are under significant pressure to change.

I will argue that a combination of those two important technologies, "XQuery + Cloud," might provide a breakthrough in the area of application development infrastructure.
Because, as Daniela points out, cloud is orthogonal, I'm not going to explore that angle here. (But promise I will in the future as I agree that XQuery in the cloud is a great idea.) Instead, I want to focus on the transformative power of XQuery on web application development.

Frankly, when most people start to use MarkLogic they don't do so because of the potential to transform application development. They come to us because they are having trouble storing, searching, querying, and delivering XML content or semi-structured data. Only after they have built a few applications do they realize -- hey, wait a minute, I could build my entire application in XQuery and replace my RDBMS, my enterprise search engine, and my J2EE application server in one fell swoop by building my applications in top-to-bottom XML.

To be clear, not all of our customers do this. Many are content with the rest of the stack and use MarkLogic to help with XML heavy lifting. But a growing fraction do.

Let's examine some of Daniela's points:
  • The problems with the traditional application development stack she highlights on slides 9-12: high cost, inflexibility, and slow time to market.
  • The argument for XML on slide 16: spot on.
  • (Warning: code on slides 19-26, but keep clicking.)
  • Slide 29 is a reasonable argument in favor of XQuery
  • The whole permissiveness angle on ACID transactions on slides 32-33 is new to me so I need to think about it more. MarkLogic offers ACID transactions, by the way. But I like the idea (in part because it's good critical thinking) that perhaps the database community is too dogmatic in this regard and that we pay a high price for that dogma.
  • Feel free to skip over slide 35 entirely (kidding). I think it mostly summarizes as "XQuery is relatively new" and there is no totally free lunch. Over time the holes will be filled in and MarkLogic fills in several holes already. I don't think XQuery is particularly complicated and I'm certain it's a heck of a lot less complicated than SQL/XQuery Franglais queries that RDBMSs often want you to write to access XML columns. I've seen real deep experts argue for hours over the correct semantics when you're mixing SQL and XQuery. Stay away from that.
  • While I'm mostly skipping cloud in this post, I have two comments. First, internally we run some demo systems with MarkLogic installed (quite easily) on Amazon Web Services, so as consumers we like the model. Second, the other day I met with Chris Barbin, CEO of Appirio, and thought he was fascinating guy and Appirio a fascinating company. Among other things they help you, at a strategic level, to figure out which cloud services to use where, and how to link them to each other and to your on-premises infrastructure. In a world where you can rent anything from raw disk blocks to CPU to database to applications to application platforms to BI in the cloud, it surely helps to have a strategy.
  • But my favorite slide is back at slide 28 which shows "XQuery's real potential: standalone programming language for information intensive applications [which can let you] build extremely rich applications." I couldn't agree more. And I like the picture even better. It's what we call top-to-bottom XML.

I've embedded the entire presentation below via SlideShare. The original link off the UCI website in PowerPoint format is here.

Sunday, July 05, 2009

New York Times on the Changing Ways of Silicon Valley PR

Quick post to highlight this New York Times story, Spinning the Web: PR in Silicon Valley. The article starts with the story of a start-up first pondering, and then deciding not to pitch the big tech blogs like TechCrunch.

Excerpt:

Instead, [publicist Brooke Hammerling] decides that she will “whisper in the ears” of Silicon Valley’s Who’s Who — the entrepreneurs behind tech’s hottest start-ups, including Jay Adelson, the chief executive of Digg; Biz Stone, co-founder of Twitter; and Jason Calacanis, the founder of Mahalo.

Notably, none are journalists.

This is the new world of promoting start-ups in Silicon Valley, where the lines between journalists and everyone else are blurring and the number of followers a pundit has on Twitter is sometimes viewed as more important than old metrics like the circulation of a newspaper.
The article goes on to discuss what, in my opinion, are truly massive changes to the business of Silicon Valley PR over the past five years, driven by changes in the B2B trade press and the rise of social media.

While the article raises many good points, I think its over-reliance on Ms. Hammerling starts to make it feel -- in an ironic twist of journalistic narcissism -- like a puff piece about her: the journalist admiring the PR person instead of focusing on the changes in the business.

Over the years, her contact list swelled to the point that her stories now overflow with dropped names. There are the e-mail messages from Larry Ellison, the chief executive of Oracle, and the time she handled a client’s crisis from her BlackBerry while traveling to St. Barts to join the former Hollywood überagent Michael Ovitz and his family on his yacht. Or the time she was in her bikini at a Mexican resort, checking her e-mail at the hotel’s computer, when Ron Conway, a veteran tech investor, walked in.

Or the purportedly secret poker party she threw in her suite at a recent tech conference: “All my friends were there — Arianna was there, the Twitter boys were there,” ...

“Arianna told me I was a great hostess, and I thought I was going to die,” she said

Thursday, July 02, 2009

BlueGuru: JetBlue's MarkLogic-Based Publishing and Content Management System

Just a quick post to highlight and share this great case study by Mitch Kramer of the Patricia Seybold Group on Blue Guru, JetBlue's content management and publishing system.

Excerpt to tempt you into reading the 26-page document:
XML is BlueGuru’s enabling technology, and MarkLogic Server is its most critical architectural element. XML addresses JetBlue’s requirements for structured documents—multiple types, multiple components within each type, hierarchical relationships between components, and component sharing across documents. MarkLogic Server is an XML content management system that automates BlueGuru’s documentation processes. Its repository stores BlueGuru’s documents and supports their access and retrieval by Crewmembers, partners, and regulators.

This case study report tells the story of JetBlue’s business transformation from a documentation system of decentralized and manually maintained manuals to a distributed content management and publishing system.
I've embedded the full document below in Scribd epaper format. Thanks to Mitch for writing a great document and to the folks at JetBlue for their faith in us, for their support of Mitch in writing the case study, and for the help and input they've provided us.

Semantic Technology at the New York Times

I recently had the pleasure of meeting Evan Sandhaus, semantic technologist at The New York Times R&D, and wanted to highlight and share a few things that we discussed.

Evan gave an information-packed, 79-slide keynote address at the recent Semantic Technology Conference in San Jose. During our meeting, we went through some of the slides and they were fantastic. While the slides aren't publicly posted, I hope they soon will be and will update this post with a link once and if they are.

He also told me about the New York Times' recent release of a 1.8M article corpus to the computer science research community, known as The New York Times Annotated Corpus. The corpus includes nearly every article published in the New York Times for twenty years (between 1/1/87 and 6/19/07) in XML format (NITF to be precise) along with various metadata about the articles.

They believe the corpus can can be a valuable resource for a number of natural language processing research areas, including document summarization, document categorization and automatic content extraction. I think that's true not only because it's real content in real volume, but because that content comes with real, high-quality metadata that you can use to either build upon and/or validate various text processing algorithms.

Finally, in prepping for the meeting I found this video interview with Evan at the New York Semantic Meetup. Great stuff, embedded below.

Stonebraker: Send Relational DBMSs to the Home for Tired Software

Mike Stonebraker spoke today at SIGMOD (see Tweetstream) where, among other things there was a 40-year anniversary celebration of the relational DBMS and, in what I suspect is non-coincidental timing, Mike did a post on the CACM site entitled The End of a DBMS Era (Might be Upon Us).

Excerpt:
Moreover, the code line from all of the major vendors is quite elderly, in all cases dating from the 1980s. Hence, the major vendors sell software that is a quarter century old, and has been extended and morphed to meet today’s needs. In my opinion, these legacy systems are at the end of their useful life. They deserve to be sent to the “home for tired software.”
His key argument is all about performance: in any given use-case, Stonebraker thinks RDBMSs can be beaten by about a factor of 50.
  • In OLTP he says a memory-resident DBMS wins by 50x
  • For RDF, he says column stores do a reasonable job and is confident that specialized RDF triple stores will do better, i.e., 50x or more. (I'd add that at MarkLogic we think we do a reasonable job as well.)
  • For text, he points out that no major search engine uses a relational database so they didn't even qualify for consideration.
  • For XML, he cites a private report I sent him a while back done for one of our customers comparing MarkLogic performance to a relational DBMS. When on "our turf," we usually win by no less than 10x and sometimes 100x or more. Sometimes, queries are not even processable in an RDBMS and/or need to be hand-optimized and hand-joined between a DBMS and a search engine.
He reduces to three cases how special-purpose DBMS vendors get their advantage:
  • A non-relational data model
  • A different implementation of tables
  • A different implementation of transactions
We're in the first category, using XML as our data model instead of a table. It's a great post. Check it out and check out the cited references as well.

Wednesday, July 01, 2009

The New Phone Book's Here: Mark Logic Mentioned in Forrester DBMS Wave

One of my favorite movie scenes comes from the otherwise-average film, The Jerk, with Steve Martin. Script excerpt:
Navin R. Johnson: The new phone book's here! The new phone book's here!

Harry Hartounian: Boy, I wish I could get that excited about nothing.

Navin R. Johnson: Nothing? Are you kidding? Page 73 - Johnson, Navin R.! I'm somebody now! Millions of people look at this book everyday! This is the kind of spontaneous publicity - your name in print - that makes people. I'm in print! Things are going to start happening to me now.
The first "thing" that happens to Johnson is he's selected by a sniper as his next random victim, leading to the famous "he hates the cans, stay away from the cans" line as the sniper repeatedly misses Johnson, blowing holes in the oil cans all around him.

But I'm getting too deep into my metaphor.

The purpose of this post is to say that Mark Logic is mentioned in the new Forrester Wave: Enterprise Database Management Systems , Q2 2009, published 6/30/09 and authored by Principal Analyst Noel Yuhanna.

(The new Forrester Wave's here! The new Forrester Wave's here!)

One of the more stunning highlights of the report is the degree of dominance held by the IBM, Microsoft, Oracle oligopoly: they estimate that those three vendors control more than 88% of the market. Huge market shares and high operating margins breed complacency faster than stagnant water breeds mosquitoes, so I remain confident in the disruption potential in segments of this, per Forrester, $27B market.

We get a mention in the description of the database market landscape, which breaks the market into three segments: OLTP databases, data warehouse databases, and specialized databases.

(I'm in print! Things are going to start happening to me, now.)

Excerpt:
Specialized databases. Beyond the OLTP and warehouse categories, the specialized database category provides DBMSes used by applications for specific purposes — such as mobile applications, XML applications, or standalone applications that need an embedded database repository. Most of these requirements come from value-added resellers (VARs), original equipment manufacturers (OEMs), and independent software vendors (ISVs) that use a specialized database to store data and metadata for their applications. Vendors of specialized databases include IBM, Microsoft, Oracle, and Sybase, as well as smaller vendors such as Mark Logic, Progress, and Software AG.
I am happy for two reasons:
  • The growing acceptance of special-purpose DBMSs as a valid segment of the DBMS market. Ten years ago, most of the analyst didn't concede the need for specialized DBMSs to exist.
  • We get mentioned as a member of the class. There are literally scores of specialized DBMSs out there (e.g., column stores, stream stores, XML stores, DW stores) so I'm happy that we were cited as an example.
The wave itself is focused on the relative positions of the various general-purpose DBMS vendors, so it doesn't delve into specialized DBMSs and/or do comparisons among them. It nevertheless makes for good reading so if you're a Forrester subscriber, I'd read it here. Otherwise it costs $2000 a la carte (though with a money-back guarantee) so you better have a keen interest in DBMSs.

Tuesday, June 30, 2009

TinyURL vs. Bitly: Web Remains Not For Those Flat of Foot

In a MapQuest-like display of flat-footedness, the original URL-shortener TinyURL seems on the cusp of being crushed by the new kid in town, Bit.ly.



To me, it happened overnight. One day everyone was using TinyURL on Twitter, the next Bitly. As it turns out, that wasn't an accident as I learned in this TechCrunch story, URL Shortening Wars: Twitter Ditches TinyURL for Bitly. It seems that Summize (subsequently acquired by Twitter and now Twitter Search) and Bitly are backed by the same entity, Betaworks, which also has common investors with Twitter.

So my gut says the story goes something like:
  • Hey, we're driving a lot of traffic for TinyURL
  • Maybe we should get that traffic ourselves
  • I bet we could get Twitter to make us the default URL shortener
  • And -- this part's also key -- I bet we could do it better
Hence I wasn't surprised to start reading stories in the past two days about Bitly's big vision. See, for example, this TechCrunch story, Bitly's Grand Plans, and Their Inevitable Clash with Digg: Bitly Now.

Excerpt:

The magic behind Bit.ly are the stats that the service makes available on the underlying domains being clicked. Investor John Borthwick explained it all to investors in an email we obtained earlier this month:

bit.ly has been on a tear since we launched it last summer ...bit.ly is on its surface a link or URL shortener, helping people take long and unwieldy links and make them short and easy to share via email, Twitter, Facebook etc. But once you shorten a link with bit.ly the fun begins. You can put a simple “+” on the end of any bit.ly link and see, real time, the pace at which that link is getting shared and clicked on as it moves around these social distribution networks.

Bit.ly Now will take all of this deep (and wide) data on popular real time URLs and turn it into a service. That’s where the inevitable clash with Digg comes in.

Bitly, as it turns out, think it has some key advantages against Digg, reminding me that the web is also not for those bad at math.

Bit.ly says that the data flow they are seeing is so massive that they are getting very good at predicting the number of clicks a link will get in the future. They look at acceleration of clicks as well as the source (Facebook, Twitter, IM, whatever) and whether people are clicking that are outside of the social graphs of other people clicking.

In other words, you could say that Bit.ly knows what will be on the Digg home page tomorrow.

The amazing thing, from a strategic marketing perspective, is that TinyURL has been written out of the story. By looking ahead towards Digg as the new competition, by painting a vision of where it wants to go, by discussing how it wants to get there, Bitly just blows by TinyURL and writes them out of the story line. This isn't about URL shortening; it's about information sharing and communications.

Well done! I sometimes call this Lot's Wife's Law of Marketing Communications: don't look back; only look ahead.

The lesson for TinyURL is that you can't remain static, or someone will come along, reinvent you with a broader vision, and paint you out of the picture -- turning you, if you will, into the pillar of salt.