From the Big Bang to Bitcoins: Data-intensive Approach in Sciences
We live in a revolutionary age when the exponential developments in technology make possible previously unimaginable endeavors. The new instruments collect the most detailed and largest data-sets ever about elementary particles, genetic sequences, weather systems, far away galaxies, etc. Most of this information and the knowledge we extract from them is promptly accessible through a planet-wide communication network. The human mind is very good at discovering associations and gaining insights but very poor at working with myriads of small details. As we need a microscope to see tiny details of the living cell, our brain needs a “prosthesis”, a tool with fast and tireless capacity to crunch data, estimate parameters and run simulations. Moore’s law has given us both sensitive, high throughput instruments and also high performance and large storage capacity computers. In the talk, I will cover examples from astronomy through network engineering and genetics to social networks and overview some of the challenges and opportunities of data-intensive sciences.
♦
István Csabai is a professor of physics at the Department of Physics of Complex System at Eötvös Loránd University, Budapest, Hungary. Prof. Csabai is doing research in several multidisciplinary fields where new technologies make it possible to collect and analyze large amounts of data. His research focuses on understanding complex systems, whether it is the large scale structure of the Universe, the living cell, or the complex networks of the manmade Internet and the social web. In astronomy he has earned the “builder” status in the Sloan Digital Sky Survey project, which produced the first large scale 3D map of the Universe and has created the first multi-terabyte science database and Virtual Observatory. He was part of the team that developed the database and several statistical data analysis tools. Building on this experience with large scientific databases and data mining technologies Prof. Csabai was part of several European ICT projects focusing on measuring and modelling the internet, social and financial networks. Prof. Csabai’s current interest is in genomics, one of the rapidly developing fields of science where the data avalanche resulting from the new sequencing equipment requires the application of similar advanced statistical and informatics-heavy methods which proved to be efficient in other areas. Prof. Csabai has co-authored over 200 publications and is one of the most highly cited scientists.
Big Data Management and Scalable Data Science: Key Challenges and (Some) Solutions
Abstract. The shortage of qualified data scientists is effectively limiting Big Data from fully realizing its potential to deliver insight and provide value for scientists, business analysts, and society as a whole. Data science draws on a broad number of advanced concepts from the mathematical, statistical, and computer sciences in addition to requiring knowledge in an application domain. Solely teaching these diverse skills will not enable us to on a broad scale exploit the power of predictive and prescriptive models for huge, heterogeneous, and high-velocity data. Instead, we will have to simplify the tasks a data scientist needs to perform, bringing technology to the rescue: for example, by developing novel ways for the specification, automatic parallelization, optimization, and efficient execution of deep data analysis workflows. This will require us to integrate concepts from data management systems, scalable processing, and machine learning, in order to build widely usable and scalable data analysis systems. In this talk, I will present some of our research results towards this goal, including the Apache Flink open-source big data analytics system, concepts for the scalable processing of iterative data analysis programs, and ideas on enabling optimistic fault tolerance.
♦
Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) group at the Technische Universität Berlin (TU Berlin). Volker also holds a position as an adjunct full professor at the University of Toronto and is director of the research group “Intelligent Analysis of Mass Data” at DFKI, the German Research Center for Artificial Intelligence. Earlier in his career, Dr. Markl lead a research group at FORWISS, the Bavarian Research Center for Knowledge-based Systems in Munich, Germany, and was a Research Staff member & Project Leader at the IBM Almaden Research Center in San Jose, California, USA. His research interests include: new hardware architectures for information management, scalable processing and optimization of declarative data analysis programs, and scalable data science, including graph and text mining, and scalable machine learning.
Volker Markl has presented over 200 invited talks in numerous industrial settings and at major conferences and research institutions worldwide. He has authored and published more than 100 research papers at world-class scientific venues. Volker regularly serves as member and chair for program committees of major international database conferences. He has been a member of the computer science evaluation group of the Natural Science and Engineering Research Council of Canada (NSERC). Volker has 18 patent awards, and he has submitted over 20 invention disclosures to date. Over the course of his career, he has garnered many prestigious awards, including the European Information Society and Technology Prize, an IBM Outstanding Technological Achievement Award , an IBM Shared University Research Grant , an HP Open Innovation Award , an IBM Faculty Award, a Trusted-Cloud Award for Information Marketplaces by the German Ministry of Economics and Technology, the Pat Goldberg Memorial Best Paper Award, and a VLDB Best Paper award. He has been speaker and principal investigator of the Stratosphere collaborative research unit funded by the German National Science Foundation (DFG), which resulted in numerous top-tier publications as well as the “Apache Flink” big data analytics system. Apache Flink is available open source and is currently used for in production by several companies and serves as basis for teaching and research by several institutions in Germany, Europe and the United States. Dr. Markl currently serves as the secretary of the VLDB Endowment, is advising several companies and startups, in 2014 was elected as one of Germany’s leading “digital minds” (Digitale Köpfe) by the German Informatics Society (GI).
Volker has a strong history in innovation and technology transfer to companies. The transfer of the UB-Tree multidimensional indexing technology to the German SME TransAction Software earned him an award by the European Commission. For creating information marketplaces he earned the Trusted Cloud Award of the German Ministry for Economics and Technology. Volker also has transferred several of his ideas into IBM products during his tenure at IBM. Volker also received the “Innovation Supporter 2012” award by TU Berlin, for supporting several entrepreneurially-minded students teams and helping them to get seed funding. Volker collaborates with several IT companies such as IBM, SAP, Microsoft, Deutsche Telekom, HP as well as many SMEs (Internet Memory Foundation, Parstream, Vico Research, Datamarket, Okkam, Neofonie, etc.) on research and technology transfer in the context of industry-funded projects, as well as projects funded by German and European government programs, as well as the EIT. Volker is currently advising two startup companies in the information management field.
Volker is director of the Berlin Big Data Center, a collaborative research center bringing together research groups in the areas of distributed systems, scalable data processing, text mining, networking, machine learning and applications in several areas, such as healthcare, logistics, Industrie 4.0, and information marketplaces. Volker is also the speaker of the “data analytics and cloud lab” at TU Berlin.