Learn basics of Data Warehouse
Data Warehouse is a core component of Business Intelligence and Data Analysis. Data Warehouse is a system used for data analysis storage and reporting. LiveEdu is a great platform to start learning and improve your Data warehouse skills. We have a dedicated section for data science tutorials and resources. You can also watch how data warehouse works by searching for data warehouse topics in our video library. Join the data warehouse community and become part of it!
Data Warehouse Introduction
Data Warehouse is a core component of Business Intelligence and Data Analysis. It is a system used to store for data analysis and reporting. DWs acts as a central reposition which helps to keep everything in one place. Data Warehouse ensures that everything remains in one place and not in disparate sources. All current and historical data are stored in one place.
Data stored in a single place is then used for different purposes such as sales or marketing. The typical operation of a Warehouse is ETL(Extract, transform, load). Before data is stored, it is cleaned, cataloged, transformed and managed by a business professional.
History of Data Warehouse
The concept of Data Warehouse is not new, and it dates back to 1980s. Most of the works were done by the Paul Murphy and Barry Devlin as they developed the “business data warehouse.” The initial aim of data warehouse is to provide an architectural model to solve flow of data to decision support environments. Let’s list some key events in the history of Data Warehouse.
- 1960s – General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
- 1970s – ACNielsen and IRI provide dimensional data marts for retail sales.
- 1970s – Bill Inmon begins to define and discuss the term: Data Warehouse.
- 1975 – Sperry Univac introduces MAPPER (MAintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL. First platform designed for building Information Centers (a forerunner of contemporary data warehouse technology)
- 1983 – Teradata introduced the DBC/1012 database computer specifically designed for decision support.
- 1984 – Metaphor Computer Systems, founded by David Liddle and Don Massaro, released a hardware/software package and GUI for business users to create a database management and analytic system.
- 1988 – Barry Devlin and Paul Murphy publish the article An architecture for a business and information system where they introduce the term "business data warehouse".
- 1990 – Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing.
- 1991 – Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.
- 1992 – Bill Inmon publishes the book Building the Data Warehouse.
- 1995 – The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
- 1996 – Ralph Kimball publishes the book The Data Warehouse Toolkit.
- 2012 – Bill Inmon developed and made public technology known as "textual disambiguation". Textual disambiguation applies context to raw text and reformats the raw text and context into a standard database format. Once the raw text is passed through textual disambiguation, it can easily and efficiently be accessed and analyzed by standard business intelligence technology. Textual disambiguation is accomplished through the execution of textual ETL. Textual disambiguation is useful wherever raw text is found, such as in documents, Hadoop, email, and so forth.
Data Warehouse Tools
Data Warehouse tools enable data scientists, data wranglers, managers and anyone working with data to quickly make decisions or extract/import data. There are many open source warehouse tools one can use to manage maximum efficiency in their work process. We will list both open source tools and proprietary tools that you can use in Data Warehouse adventure.
- Ab Initio Software: The product offers a full range of features including batch processing, data analysis, graphical user interface(GUI) based parallel processing software, etc. Their approach to data processing is evident from their 20 years experience.
- Amazon RedShift Amazon RedShift is one of the best data warehouse solutions right now. It is managed by Amazon and falls under the Amazon web services. With Redshift, a company can easily use SQL and existing BI tools to run complex queries on petabytes of structured data.
- CodeFutures CodeFutures Dbshards offers excellent features for database sharing. It offers scalability and the opportunity to work with traditional database platforms such as PostgreSQL and MySQL.
- Teradata Corporation Teradata offers unique featureset for companies who want to make their data usable in a feature-set functionality. They offer analytics data platforms and other crucial functionalities required to make companies competitive.
- Oracle: Oracle offers data warehousing service for its customers. It uses the famous Oracle 12C database to maintain performance and scalability. It follows industry standards. The company’s main platform for maintaining the warehouse functions is the Oracle Exadata Machine.
- Cloudera:Cloudera is another major player in the data warehouse systems. It offers hadoop-based data processing and storage solutions. Cloudera’s Enterprise Data Hub(EDH) is what makes it an interesting choice.
- MarkLogic: MarkLogic uses NoSQL database for the data warehouse solution. It recently got some attention and it was also included in the Gartner’s Magic Quadrant on DBMS.
Education Ecosystem Data Warehouse Project Creators
If you are wondering where to get started to learn Data Visualization, then our recommendation to you will be to watch data visualization Project Creators on Education Ecosystem. Let’s list the top 5 data visualization Project Creators on Education Ecosystem.
Data Warehouse Best Books
There are many Data Warehouse books online. The best way to start learning data Warehouse is to invest in the books. So, why the wait? Let’s go through the best books for learning Data Warehouse. The books are categorized into Beginner, Intermediate and Advanced. So pick the book that best suits you.
This book provides an enhanced, comprehensive overview of data warehousing together with in-depth explanations of critical issues in planning, design, deployment, and ongoing maintenance. IT professionals eager to get into the field will gain a clear understanding of techniques for data extraction from source systems, data cleansing, data transformations, data warehouse architecture and infrastructure, and the various methods for information delivery.
In this IBM Redbooks publication we describe and demonstrate dimensional data modeling techniques and technology, specifically focused on business intelligence and data warehousing. It is to help the reader understand how to design, maintain, and use a dimensional model for data warehousing that can provide the data access and performance required for business intelligence.
This book presents the solution: a clear, consistent approach to defining, designing, and building data integration components to reduce cost, simplify management, enhance quality, and improve effectiveness. Leading IBM data management expert Tony Giordano brings together best practices for architecture, design, and methodology, and shows how to do the disciplined work of getting data integration right.
Data Warehousing in the Age of the Big Data will help you and your organization make the most of unstructured data with your existing data warehouse.
This reference provides strategic, theoretical and practical insight into three information management technologies: data warehousing, online analytical processing (OLAP), and data mining. It shows how these technologies can work together to create a new class of information delivery system: the information factory.
This book begin with fundamental design recommendations and gradually progress step-by-step through increasingly complex scenarios.
Learn the best practices of dimensional design. Star Schema: The Complete Reference offers in-depth coverage of design principles and their underlying rationales. Organized around design concepts and illustrated with detailed examples, this is a step-by-step guidebook for beginners and a comprehensive resource for experts.
The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence
Designing a complete visualization system involves many subtle decisions. When designing a complex, real-world visualization system, such decisions involve many types of constraints, such as performance, platform (in)dependence, available programming languages and styles, user-interface toolkits, input/output data format constraints, integration with third-party code, and more.
Data Warehouse Projects
The best way to learn is to evolve yourself with Projects. Let’s look at some of the best Data Warehouse projects that you can follow. You can also find Data Warehouse projects on Education Ecosystem. If you are interested, check Education Ecosystem Data Warehouse Project Creators section for more information.
A simple data warehouse project that offers a good way to get started with Data Warehouse. If you are a newbie and is looking for a sample project to work on, this is what you need.Explore this project!
Wolfram Data Repository is a perfect example of data warehouse project. Even though you cannot dive deep into the code and learn from it, but you can use the tool to extract data for your Data Warehouse project.Explore this project!
Data Warehouse Community
Data Warehouse community is one of the biggest when it comes to growth and numbers. If you want to learn Data Warehouse, it is best to be part of a community and contribute accordingly. Let’s list some of the Data Warehouse community you can become part of.
- Data Warehouse Community by Toolbox: A community surrounding Data Warehouse where you can ask questions, read about Data Warehouse and explore everything regarding the subject.
- TDWI TDWI is a Data Warehouse community that is completely dedicated to its growth. They regularly held leadership, conferences, seminars, bootcamps, etc.
- Education Ecosystem Here you can find all the awesome Project Creators who love to share their knowledge about Data warehouse.
Data Warehouse Gurus
David McChandless is one of the well-known data-visualization specialist. He maintains his blog and has also written popular books. He also has TED talk for the data enthusiasts. All his new work is on the use of data visualization and infographics.
Aaron Koblin is an entrepreneur and loves data visualization. He is well known for his work in data visualization. His works also reflected on his career significantly as he created the data arts team at Google and also did multiple TED talk for the people he loves and cares.
Evan Sinar is the chief scientist and VP at the Development Dimensions International. He has over 36K followers on Twitter and shares regular insights on data visualization.
Cole Nussbaumer is a renowned data visualization expert for her ability to tell stories using data. She is also the author of “Storytelling with data” which helps business to understand their data better.
Naomi Robbins is a seminar and consultant leader who specialize in graphics data display. If you want to learn about new things, it is must to follow Naomi Robbins on Twitter. She has also written the “Creating More Effective Graphs”.
Data Warehouse Conferences
Since Data Warehouse is a trending topic in the market, there are many conferences out there that you can attend. Let’s list some of the best data visualization conferences out there.