Building a High Performance Search Engine with Perl and Swish3

Peter Karman

http://peknet.com/slides/swish3/

In the beginning was SWISH

A patchy project

Limitations of 2.4.x

Swish3

Swish3 is like Perl6

Swish3 is like DBI or CHI

Aside: Anatomy of a search engine

Every search application has these basic five components:
  1. aggregator
  2. normalizer
  3. analyzer
  4. indexer
  5. searcher
Swish3 does (1) and (2), and optionally (3), deferring (4) and (5) to the backend engines.

One implementation: SWISH::Prog

Another implementation: swish_xapian

SWISH::Prog

Aggregators for:
  • filesystem (File::Find)
  • web (WWW::Mechanize)
  • database (DBI)
  • email (Mail::Box)
  • Perl objects (JSON)

SWISH::Prog (cont...)

Normalization via SWISH::Filter for:
  • pdf
  • Office (.doc, .xls, .ppt)
  • gzip
  • images (IPTC)
  • mp3 (ID3)

Real World Project

What I Did

What I Used

For indexing:

What I Used (cont...)

For searching:

Demo Search::OpenSearch

Further Reading