Fill your Data Lake with Potable Data

2017 Oracle OpenWorld 大会已经结束,从10月1日至10月5日,历时5天,60,000 人的参会规模,声势浩大的数据库盛宴,喧嚣散去,我来总结一下具体大会的感受,国内的国庆假期也接近尾声,大家可以在假期之后来扫描一下Oracle的发展创新和技术发布。

2. CON-5465 Filling your Data Lake with potable data using Oracle Data Integration Mike Matthews Senior Director, Product Management Jayant Mahto Senior Product Manager October 2nd 2017 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted
3. Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 3
4. Oracle Cloud Platform Develop & Deploy Integrate & Extend Publish & Engage Analyze & Predict Secure & Manage Innovate with a Comprehensive, Open, Integrated and Hybrid Cloud Platform that is Highly Scalable, Secure and Globally Available Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 4
5. Oracle Cloud Platform Comprehensive Open Integrated Hybrid Oracle Public Cloud Oracle Data Center Data Management Application Development Enterprise Integration Data Integration Analytics and Big Data Content & Experience Identity & Security Systems Management Built on High Performant Oracle Cloud Infrastructure Oracle Cloud at Customer Your Data Center Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 5
6. Oracle Cloud Platform Momentum 14,000+ Oracle Cloud Platform Customers 3,000+ Apps in the Oracle Cloud Marketplace $1.4 Billion 10 PaaS FY17 Oracle Cloud Platform Revenue (60% YoY Growth ) Categories where Oracle is a Leader According to Industry Analysts Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 6
7. Oracle Cloud Platform for Integration Application and Data Integration APPLICATION INTEGRATION Complete Simplified Open DATA GOVERNANCE DATA QUALITY BULK DATA TRANSFORMATION REAL TIME DATA STREAMING AND DATA REPLICATION 10/3/2017 Copyright © 2017, Oracle and/or its affiliates. All rights reserved. API MANAGEMENT PROCESS AUTOMATION STREAM ANALYTICS 7
8. Data Lake… or Data Swamp? Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 8
9. Key Success Factors for your Data Lake Timely access to data Flexibility to extract and work the data as needed Trust in the quality of the data Ability to find and understand the available data Source: Knowledgent - https://knowledgent.com/whitepaper/design-successful-data-lake/ Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 9
10. Reference Architecture with Oracle Data Integration Oracle Data Integration GoldenGate Data Integrator Fast Data Delivery Enterprise Data Quality Metadata Management Assured Data Trust Your Data Lake SaaS Apps Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted
11. Key Success Factors for your Data Lake Timely access to data Flexibility to extract and work the data as needed Trust in the quality of the data Ability to find and understand the available data Source: Knowledgent - https://knowledgent.com/whitepaper/design-successful-data-lake/ Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 11
12. Why GoldenGate? • The Sushi Principle – ‘Data is best served raw’ • Some of the biggest data lakes use Oracle GoldenGate’s change data capture capability for real-time ingestion from source databases • Traditional normalization, aggregation and schematization are skipped to simplify data flows and improve timeliness and performance Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 12
13. Oracle GoldenGate for Big Data Modular & Pluggable Architecture GoldenGate for Big Data (Running On-Premises or Cloud) HDFS Hive Flume Kafka HBASE Mongo Cassandra Capture Trail Files Network Firewall Cloud Trail Files Native Java Replicat Elastic JMS High Performance Low Impact and Non-Intrusive Flexible and Heterogeneous OSA Kinesis Resilient and FIPS Secure JMS Big Data and Cloud Replicat Parameters Big Data Properties JAR JDBC Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 13
14. Key Success Factors for your Data Lake Timely access to data Flexibility to extract and work the data as needed Trust in the quality of the data Ability to find and understand the available data Source: Knowledgent - https://knowledgent.com/whitepaper/design-successful-data-lake/ Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 14
15. Continued Focus on Our Vision: Integrate Any Data Shape, Speed, Action, Volume & Location Any Data Shape Polyglot Any Data Speed Lambda Any Data Action Dataflow Pipes Any Data Volume Open Source Platforms Any Data Location Cloud Infrastructure Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 15
16. Why Oracle Data Integrator? • To provide true analytical flexibility and accuracy, some data re-shaping may be needed, especially as Data Lakes are increasingly working with Master Data as well as Transactional Data • ODI’s EL-T architecture can be very important when working with large volumes • This may be done reading from a Data Lake and writing to a Data Warehouse • ODI can also pushdown data transformations into the Data Lake Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 16
17. Big Data Transformation with Data Integrator Capture Trail Pump Route Deliver Oracle Data Integrator Raw Data Layer GG Speed Layer Streaming Analytics Serving Layer REST Services Data Integrator for Big Data  Batch data ingestion with Sqoop, native loaders & Oozie  Generate data transformations in Hive, Pig, Spark & Spark Streaming  Extract data into external DBs, Files or Cloud API/File SQOOP Batch Layer Visualization Tools Reporting Tools SQOOP Data Marts + Native Loaders Benefits  No ETL Engine native E-LT execution, 1000s of references  Zero Footprint does not require any Oracle install on cluster  Loosely Coupled design time means you can reuse mapping logic in many big data languages Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 17
18. Key Success Factors for your Data Lake Timely access to data Flexibility to extract and work the data as needed Trust in the quality of the data Ability to find and understand the available data Source: Knowledgent - https://knowledgent.com/whitepaper/design-successful-data-lake/ Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 18
19. Some data can only be trusted if it is prepared • Data Consumers need access to Master Data as well as Transactional Data • Relating the two can be very powerful… • … but this is where raw data can be poisonous to strong business analytics • Incomplete records • Hard-to-find Duplicates • Out-of-date information • Inconsistencies in data capture Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 19
20. Why Oracle Enterprise Data Quality? Common Access/UI Govern Monitor effectiveness & resolve problems Match Identify & merge duplicates Standardize Drive conformance to standards Profile Quickly understand data content Enterprise DQ Platform Market-leading usability for all types of data Unparalleled time-to-value High performance engine Out-of-the-box global knowledge-base Foundation for governance program Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 20
21. EDQ ∙ Collaborative Data Quality Governance Data Analysts • Immediate Data Insight • Reusable DQ Services and Rules • Transparent, self-documenting configuration Data Stewards • Flexible Data Review and Remediation options in EDQ Case Management • Integrated with DQ Rules • Fully audited with comments, attachments, history, reports Data Stakeholders • Zero Training EDQ Dashboard • View by Data Asset, Data Domain, Rule • Trend Analysis Copyright © 2016 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Restricted 21
22. Key Success Factors for your Data Lake Timely access to data Flexibility to extract and work the data as needed Trust in the quality of the data Ability to find and understand the available data Source: Knowledgent - https://knowledgent.com/whitepaper/design-successful-data-lake/ Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Confidential – Oracle Internal/Restricted/Highly Restricted 22
23. Why Metadata Management for the Data Lake? Without Metadata Management ₓ Silos of Data known only to their owners ₓ No documentation ₓ Duplicate effort and inefficient usage ₓ No data usage analysis With Metadata Management:  Searchable  Enriched with documentation  Shared knowledge  Lineage/impact analysis  Semantic analysis Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 23
24. Value of Enterprise Metadata Management Solves significant pain points for wide variety of business consumers and technical staff Which reports use this customer data? Application User App What reports use the mainframe data? What will happen if I change this table? GG Sys Admin ETL Developer ETL CDC ETL Data Steward Can I trust the sources of this customer data? How do I organize my DW and Reports Enterprise Architect Executive ETL How was sales BI Dashboards figure calculated? Data Reservoir Data Scientist BI Developer Where did this data come from? I want to design an experiment to measure the success of a signup page. What data do I have? Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 24
25. Find and Understand your Data • Metadata Management – horizontal and semantic data lineage for all data sources • Business Glossary – simple tools to catalog, link and collaborate on business terms  Business Data Catalog  Report to Source Lineage  Impact Analysis  Audit, Versioning & Diff Reports  Social/Collaboration Features  Annotations and Tagging  Comprehensive Harvesting  3rd Party BI Metadata  3rd Party ETL Metadata  3rd Party DB Metadata  3rd Party Modeling Tools  Big Data Metadata  Metadata Standards Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 25
26. What does Potable Data mean? • Quickly and Easily Consumable and Trusted • You can use GoldenGate to make data more quickly available, streamed into (and through) the Lake using CDC • You can use ODI to make the data easier to consume • Trust is not only about ‘how good it is’, but knowing how good it is (or not), and where it came from • You can use EDQ to add Data Quality dimensions to your data as it is streamed into the Lake…and the analytics tools you already use to tell you how good the data is • You can use OEMM to understand the data, and where it comes from Copyright © 2016 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Restricted 26
27. Get a sneak peek at cutting-edge data integration designs and receive a free gift! • Oracle is constantly developing new software and features that will make your work easier, and Oracle's User Experience team would love to get your feedback on new data integration designs. • Feedback sessions will take place at a date and time of your own choice. • You can take part via webconference, from the comfort and convenience of your own office. • If you’re interested, please fill out the 1-page form at http://bit.ly/2vIHlSg uppercase I lowercase l • To show our appreciation, we will post all participants their choice from a wide selection of thank-you gifts. Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 27
28. Data Integration Programme – FOCUS ON DOC LINK Presentations on: Oracle Data Integration Platform Cloud Oracle Data Integrator Oracle GoldenGate Oracle Enterprise Data Quality Oracle Enterprise Metadata Management Hands- Oracle GoldenGate Real-Time Data Replication on Labs: in the Cloud HOL7715 Oracle Enterprise Data Quality HOL7653 ODI and OGG for Big Data HOL7708 Oracle Data Integration Platform Cloud HOL7673 Demo Stations: The EXchange Integration Area - Moscone West The EXchange Data Management Area - Moscone West The EXchange Analytics & Big Data Area - Moscone West Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 28
29. Data Integration Programme – FOCUS ON DOC LINK Sunday, October 1 • Lift and Shift Workloads to Cloud with Oracle Data Integration Platform Cloud [SUN6653] • Data Movement between On-Prem, Fusion ERP Cloud, Fusion HCM Cloud and Salesforce [SUN7286] • Accelerate Migration to Cloud Infrastructure with Data Integration Platform [SUN6896] Monday, October 2 • Oracle Data Integration Platform Strategy and Roadmap [CON6646] • Filling Your Data Lake with Potable Data, Using Data Integration [CON5465] • GoldenGate : Deep Dive into Automating OGG using the new Microservices [CON6569] • Oracle Data Integration Platform: Foundation for Cloud Integration [CON6650] • Oracle Data Integration Platform Empowers Enterprise Grade Big Data Solutions [CON6893] • Oracle Data Integration Platform Cloud Deep Dive [CON6651] • Oracle GoldenGate Cloud Service: Real-Time Data Replication in the Cloud [HOL7715] Tuesday, October 3 • Oracle Data Integrator Product Update and Strategy [CON6654] • Oracle Enterprise Data Quality: Product Overview and Roadmap [CON6656] • Accelerate Cloud On-Boarding Using Oracle GoldenGate Cloud Service [CON6894] • Oracle Enterprise Data Quality for All Types of Data [HOL7653] • Oracle Data Integration Platform: a Cornerstone for Big Data [CON6655] • GoldenGate: MAA and Best Practices for Oracle GoldenGate Microservices [CON6570] • Oracle GoldenGate Product Update and Strategy [CON6897] Wednesday, October 4 • A Practical Path to Enterprise Data Governance with Oracle Enterprise Data Quality [CON6657] • Oracle Data Integrator and Oracle GoldenGate for Big Data [HOL7708] • Introduction to Oracle Data Integration Platform Cloud [HOL7673] • An Enterprise Databus: GoldenGate in the Cloud Working with Kafka and Spark (CON6895] • GoldenGate: Best Practices & Deep Dive on OGG 12.3 Microservices at Cloud [CON6568] • Oracle GoldenGate for Big Data [CON6898] • Oracle Data Integration Platform Cloud Service Governance Edition [CON6652] Copyright © 2017, Oracle and/or its affiliates. All rights reserved. 29
30. Connect with Oracle Integration Oracle Data Integration @OracleDI Blogs.oracle.com/DataIntegration/ Oracle Data Integration Oracle FMW @OracleIntegrate Blogs.oracle.com/Integration/ Oracle SOA Copyright © 2017, Oracle and/or its affiliates. All rights reserved.
31. Stay Informed During and After OpenWorld Twitter: @OracleExadata, @OracleBigData, @Infrastructure Follow #CloudReady LinkedIn: Oracle IT Infrastructure– Oracle Showcase Page Oracle Big Data – Oracle Showcase Page Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 31
32. Converged Infrastructure Forum Tuesday, Oct 3 from 6:30-9pm SF MOMA RSVP Required: https://www.oracle.com/goto/Openworld/CIEventOct3 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. 32
34. Copyright © 2017, Oracle and/or its affiliates. All rights reserved.