Crawling the web, Harnessing the power of Nutch with Scala

Nutch is a very powerful, open source webcrawler written in Java. Apache Nutch can run very large crawls in parallel, downloading, indexing, and archiving millions of pages. In this talk we understand key architectural details about Nutch. We would see how it is easy to extend the Nutch behavior with Scala plugins.

[…]