Friday, December 7, 2012

Your Own Private Google: The Quest for an Open Source Search Engine

Google built its web empire on commodity hardware. Rather than invest in powerful, expensive supercomputers, it harnessed the collective power of tens of thousands of ordinary servers, cutting costs but also finding a better way to deal with hardware failure. This was at the heart of the company?s genius, and now, so many others are following suit.

Google created many custom software platforms that take advantage of its massive server farms, and it made a habit of publishing academic papers that detail these innovations. That has led to a proliferation of open source clones that operate in much the same way. These include file systems for storing data, and processing platforms for crunching all that data.

What about the most famous Google innovation, the one that has used its sweeping server farms to the greatest effect? What about Google search?

But what about the most famous Google innovation, the one that has used its sweeping server farms to the greatest effect? What about Google search?

The company guards its search platform like the crown jewels. It?s not about to release a paper describing how it all works, so producing an open source clone is more difficult. But there are options, and the push toward open source versions of the Google search engine has gathered some steam in recent months, with the arrival of a new company called ElasticSearch.

These projects aren?t trying to compete with Google?s public search engine ? the one you use every day. They?re trying to compete with Google?s search appliance and other products that help enterprises ? i.e., big businesses ? find stuff inside their own private networks.

Ayan Barua ??co-founder of Credii, a startup that helps businesses select software and services ? says: ?Open source has hardly made a splash in the enterprise.? But he does believes that ElasticSearch and a similar outfit, LucidWorks, are gaining mindshare among developers.

In 1999, a man named Doug Cutting started work on a project called Lucene, which was meant to provide an open source alternative to the Google and Yahoo search engines. Eventually, Cutting shifted his focus to Hadoop ? a clone of Google?s MapReduce number-crunching platform that was originally designed to underpin Lucene ? but Lucene lives on.

?It?s probably the most advanced library out there today ? open source or not,? says Shay Banon, the founder of ElasticSearch, which oversees an open source search engine based on Lucene.

Lucene, you see, is a software library. It provides the basic building blocks for a search engine. You still need to build the search engine itself, and over the years, the library has given rise to two major projects that seek to provide a search engine capable of scaling out across hundreds or thousands of servers. First there was Solr. And now, there?s ElasticSearch.

?It?s probably the most advanced library out there today ? open source or not.?

? Shay Banon

Solr was created in 2004 by?a CNET developer named Yonik Seeley. The online publisher had been using a custom search service from AltaVista to power its site search features, but?the service was being discontinued?after AltaVista was acquired by a competitor. Seeley says the company solicited bids for another commercial solution, but because CNET is spread out over so many different properties, the bids all seemed too high.

So, the IT team decided to develop their own search solution based on the open source database MySQL. But they also wanted to have a ?Plan B? based on Lucene, in case they ever needed a more sophisticated search solution. Seeley was hired to work on the search team, and he built this ?Plan B.?

Originally called?SOLAR ? for Search On Lucene And Resin ??Solr was eventually adopted throughout CNET, according to Seeley. It was open sourced in 2006. In 2007, Seeley left CNET to co-found?LucidWorks (originally called Lucid Imagination), a company that sells tools based on Solr.

ElasticSearch arrived in 2010, but it wasn?t until this year that its creator,?Banon, founded a?company?that seeks to commercialize the code. ElasticSearch raised its first round of funding, $10 million from Benchmark Capital, just last month.

The difference is that ElasticSearch was specifically designed to scale across hundreds of thousands of servers ? Google-style. Banon built his first open source search server, Compass, using Lucene in 2004, but ElasticSearch is a different animal. He says he originally thought about scaling Solr to many servers in this way, but he soon decided that it would be better to start over and build another search server from scratch ? while still using Lucene as a starting point.

Pages: 1 2 View All

Source: http://feeds.wired.com/~r/wired/index/~3/psv2BvrN3rg/

how i met your mother sons of anarchy PNC Bank Jordan Pruitt real housewives of new jersey Kanye West sex tape emmys

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.