The newthinking store Berlin is hosting the Hadoop Get Together user group meeting. It features talks on Hadoop, Lucene, Solr, UIMA, katta, Mahout and various other projects that deal with making large amounts of data accessible and processable. The event brings together leaders from the developer and user communities. The speakers present projects that build on top of Hadoop, case studies of applications being built and deployed on Hadoop. After the talks there is plenty of time for discussion, some beer and food. There is also a related Xing Group on the topic of building scalable information retrieval systems. Feel free to join and meet other developers dealing with the topic of building scalable solutions.
Talks scheduled so far:
* Torsten Curdt: Data Legacy - the challenges of an evolving data warehouse Abstract: MapReduce is great for processing great data sets. A distributed file system can be used to store huge amounts of data. But what if your data format needs to adapt to new requirements? This talk will cover a simple introduction to Thrift and Protocol Buffers and sprinkle in some rants and approaches to manage your big data sets.
* Christoph M. Friedrich (Fraunhofer Institute for Algorithms and Scientific Computing): "SCAIView - Lucene for Life Science Knowledge Discovery".
Abstract: "In the Life Sciences, there is an immense growing of freely available information. In Medline, a medical information system, every day more than 3000 citations are newly indexed. Today Medline contains approx. 19Mio references and abstracts. Using machine learning and dictionary based Named Entity Recognition, we extracted information of genes, drugs, SNPs and other Life Science entities from Medline. SCAIView a Life Science Knowledge Discovery system will be presented, that uses a multi-threaded Lucene to allow semantic search and ontological search on this data. Questions, that can be solved now quickly are: What drugs are mentioned in the context of Alzheimers disease? or: What genes are co-mentioned with Diabetes and are on the insulin signalling pathway? "
* Uri Boness from JTeam in Amsterdam: Solr - From Theory to Practice.
Abstract: "This session will introduce the attendees to Solr by a real world example. We will show how Solr enabled us to replace an existing commercial search engine in one of the most popular online company directories in The Netherlands. We'll briefly discuss the decision making process that led the company to explore open source alternatives to their search back end in general and why Solr was chosen in particular. We will then show how Solr extensible infrastructure enabled us to implement non-trivial search functionality such as geo-location search and complex ranking rules schemes."
If you yourself would like to give a presentation: There are additional slots of 20 minutes each available. There is a beamer provided. Just bring your slides. To include your topic on this web site as well as the upcoming.org entry, please send your proposal to Isabel mainec ät isabel-drost.de
After the talks there will be time for an open discussion. We are going into a nearby restaurant after the event so there will be plenty of time for talking, discussing and new ideas.
Official Website: http://www.newthinking-store.de/stammtisch/hadoop/20090625
Added by newthinking on June 5, 2009