Crawling the web, Harnessing the power of Nutch with Scala

Nutch is a very powerful, open source webcrawler written in Java. Apache Nutch can run very large crawls in parallel, downloading, indexing, and archiving millions of pages. In this talk we understand key architectural details about Nutch. We would see how it is easy to extend the Nutch behavior with Scala plugins.

The presentation would show the power that Scala can bring to the plugin development with inherent support of actors to make the crawl process much more efficient.

Takeaways include understanding of web crawling, Apache Nutch and how to integrate Scala plugins in the Nutch framework.

Speaker:

Speaker At Java ConferenceVikas is the co-founder and software craftsman at Knoldus. In his 16 years of experience he has become a recognized speaker, mentor, and practitioner in the software industry.Cont…

Sessions:
IndicThreads Conference On Software Development will be held on 13-14 July 2012 in Delhi India. Click for details on Sessions, Speakers and Location. *Register now to grab the current discounted rates!

Comments are closed.