Add enterprise level search into your site.
News and Follow/Ups – 01:00
Square now being sold in Apple’s store
Check-Ins dying out?
Dropbox: 25 million users
Geek Tools – 14:13
Yikerz! - Super fun magnet game
Webapps - 16:12
Surfboard - Flipboard as a web app
InstaLyrics - Find lyrics quickly
Full Text Search - 22:11
Options
Google Custom Search
Commercial
Benefits
Super fast to setup
Easy to implement
Ability to add adsense into search results
Downsides
Unable to adjust content ranking and do custom integration
Mainly for just indexing HTML pages, not search queries and other text.
Sphinx
“Searching via SphinxAPI is as simple as 3 lines of code, and querying via SphinxQL is even simpler, with search queries expressed in good old SQL.”
Open source with commercial support
Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too.
API for:
Java, PHP, Python, Ruby, Perl, C, and other languages.
Written in C++
Stats
60+ MB/sec per server
500+ queries/sec
Biggest known Sphinx cluster indexes 5 billion documents, resulting in over 6 TB of data. Busiest known one is, unsurpisingly, Craigslist, that serves 50+ million search queries/day.
Companies using Sphinx
Craigslist
Slashdot
Mozilla
Wordpress.org
Lucene
Done by the Apache foundation
Open source
Written in Java
Search types
ranked searching -- best results returned first
many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
fielded searching (e.g., title, author, contents)
date-range searching
sorting by any field
multiple-index searching with merged results
allows simultaneous update and searching
Stats
over 95GB/hour on modern hardware
small RAM requirements -- only 1MB heap
index size roughly 20-30% the size of text indexed
Solr
Lucene is a library where Solr is a server that supports XML, REST
Benefits over Sphinx
Solr is easily embeddable in Java applications.
Solr can be integrated with Hadoop to build distributed applications
Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
Companies using Solr
eHarmony
Ticketmaster
Digg
AOL
Zappos