Fast Analytics on Fast Data for Apache Hadoop™.
Apache Kudu is an open source storage layer that provides the Hadoop ecosystem with a database capable of fast analytics on fast data. Kudu eliminates the need for complex Lambda architectures that bifurcate data processing into speed and batch layers, instead combining them within one data layer. The simplification of the architecture opens up the Hadoop ecosystem as a low-cost alternative to a variety of use cases—including time series data, machine data analytics, online reporting, and predictive IoT.
Kudu was specifically built for the Hadoop ecosystem, allowing Apache Spark, Apache Impala (incubating), and MapReduce to process and analyze data natively.
At Cloudera, we’re constantly pushing the boundaries of what’s possible with Hadoop—making it faster, easier to work with, and more secure.
Colm Moynihan – Partner Sales Engineer EMEA
Colm Moynihan is a Senior Sales Engineer at Cloudera, enabling, integrating and developing Cloudera partners to drive solutions across EMEA.
Before that, Colm was a Director of Presales for Informatica in EMEA working on Big Data, Data Integration, Data Quality, MDM and Cloud solutions.
Prior to that Colm was a J2EE developer, eCommerce Consultant and Architect working for SI, Stock Exchanges and software companies across the Globe.
Colm holds a Masters Degree (MSc.) in Distributed systems from Trinity College.
Integrating the processing of geographical information into your Big Data workloads
Big Data platforms are used routinely to process, filter, aggregate and analyze a variety of data types and structure: numbers, strings, dates, but also JSON and XML structures.
One aspect that has not yet been fully integrated is geographic / geospatial data. While some hadoop deployments are able to process simple long/lat coordinates, none fully handle complex geographical structures such as custom administrative boundaries, natural features, satellite imagery ...
The presentation will go over the way we use Big Data concepts - specifically Hadoop and Spark - for solving those questions: processing large streams of geographic data via classification against any kind of geographical data: administrative areas, roads, points of interest … Discover clusters of events (such as flu cases), binning into regularly spaced cells.
We will also talk about more “GIS-specific” processing for structures like satellite imagery, terrain models, statistical rasters, by applying raster algebra processing.
The session will be covered by Albert Godfrind from Oracle Corporation. Albert has over 25 years of experience in designing, developing, and deploying IT applications. His interest and enthusiasm for spatial information and geographical information systems started at Oracle when he started using the spatial extensions of the Oracle database in 1998. Ever since, Albert has been evangelizing the use of spatial information to GIS and BI communities across Europe, consulting with partners and customers, speaking at conferences, and designing and delivering in-depth technical training.
Albert is one of the authors of the first book on Oracle Spatial, "Pro Oracle Spatial - The essential guide to developing spatially enabled business applications"