Nutch is a very powerful, open source webcrawler written in Java. Apache Nutch can run very large crawls in parallel, downloading, indexing, and archiving millions of pages. In this talk we understand key architectural details about Nutch. We would see how it is easy to extend the Nutch behavior with Scala plugins.
The presentation would show the power that Scala can bring to the plugin development with inherent support of actors to make the crawl process much more efficient.
Takeaways include understanding of web crawling, Apache Nutch and how to integrate Scala plugins in the Nutch framework.