What is a data warehouse?

Data warehouse, English name is DataWarehouse, can be abbreviated as DW or DWH. Data warehousing is a set of strategies that provide support for all types of data for all levels of decision making in the enterprise. It is a single data store created for analytical reporting and decision support purposes. Provide guidance to business process improvement, monitoring time, cost, quality, and control for businesses that require business intelligence.

The characteristics of the data warehouse 1. Thematic

Data warehouse is generally based on the actual needs of users, and the data sources of different platforms are divided and integrated according to the set theme. Different from the traditional transaction-oriented operational database, it has high abstraction. The subject-oriented data organization method is a complete, unified and consistent description of the analysis object data at a higher level, which can completely and uniformly describe the data of the relevant enterprises involved in each analysis object, and between the data. contact.

2. Integration

Most of the data stored in the data warehouse comes from the traditional database, but it is not a simple direct import of the original data, but needs to be preprocessed. This is because the data in transactional data is generally noisy, incomplete, and inconsistent in data form. The direct import of these "dirty data" will confuse the data mining based on the data warehouse. "Dirty data" must be extracted, cleaned, and transformed before it enters the data warehouse to generate a data set that is turned from transaction-oriented to topic-oriented. Data integration is the most important and most complicated step in data warehouse construction.

3. stability

The data in the data warehouse mainly provides data basis for decision maker analysis. The data on which the decision is based is not allowed to be modified. That is, after the data is saved to the data warehouse, the user can only query and analyze through the analysis tool, and cannot modify it. Data update and upgrade are mainly done in the data integration process, and the expired data will be directly filtered out in the data warehouse.

4. Dynamic

The data warehouse data will be updated regularly with time. The non-updating is for the application, that is, the data is not updated when the user analyzes the processing. After a fixed time interval, the data generated in the running database system is extracted and converted into a data warehouse. Over time, data is continuously integrated at a higher level of integration to accommodate the requirements of trend analysis. When the data exceeds the storage period of the data warehouse, or is useless for analysis, the data is deleted from the data warehouse. The structure and maintenance information about the data warehouse are stored in the metadata (Metadata) of the data warehouse. The maintenance of the data warehouse is automatically performed by the system according to the definitions thereof or periodically maintained by the system administrator.

The basic architecture of the data warehouse

The purpose of the data warehouse is to build an analytic-oriented integrated data environment that provides decision support for the enterprise (DecisionSupport). In fact, the data warehouse itself does not "produce" any data, and it does not need to "consume" any data. The data comes from the outside and is open to external applications. This is why it is called "warehouse" rather than "factory". the reason. Therefore, the basic architecture of the data warehouse mainly includes the process of data inflow and outflow, which can be divided into three layers - source data, data warehouse, and data application:

What is a data warehouse? The characteristics of a data warehouse _ the difference between a data warehouse and a database

It can be seen from the figure that the data of the data warehouse comes from different source data, and provides various data applications. The data flows into the data warehouse from top to bottom and opens to the upper layer, and the data warehouse is only a platform for intermediate integrated data management. .

Data source for the data warehouse

The data warehouse can obtain data from each data source and the data conversion and flow in the data warehouse can be regarded as the process of ETL (Extra, Transform, Load). ETL is the pipeline of data warehouse, and can also be considered as data warehouse. Blood, which maintains the metabolism of data in the data warehouse, and most of the daily management and maintenance work of the data warehouse is to keep the ETL normal and stable.

Data warehouse data storage

The data warehouse does not need to store all the original data, and the data warehouse needs to store some details. Simply explain:

a. Why not all the raw data is needed? Data warehousing is analytic processing, but some source data is of no value to the analysis or it may generate much less value than the implementation and performance costs of the data warehousing needed to store the data. For example, we know that the provinces and cities of the users are sufficient. As for the location of the users, it may only be the concern of the logistics providers, or the user's comments on the blog may be just text mining, but it is worth the loss of these lengthy comments in the data warehouse. ;

b. Why do you want to save the details? The details of the data are required, the analysis requirements of the data warehouse will change at any time, and with the details of the data can be done invariably. If we only store data models that are built according to certain needs, then obviously the need for frequent changes will be overwhelming;

The data warehouse processes the data based on maintenance detail data to make it truly applicable to analysis. It mainly includes three aspects:

1. Aggregation of data

The aggregated data here refers to a simple aggregation based on specific needs (multidimensional data-based aggregation is embodied in the multidimensional data model). Simple aggregation can be aggregated data such as total pageviews, Visits, UniqueVisitors, etc. of the website, or Avg.timeonpage, Average data such as Avg.timeonsite, which can be directly displayed on the report.

2. Multidimensional data model

The multidimensional data model provides multi-angle and multi-level analysis applications, such as sales star model and snowflake model based on time dimension, geographic dimension, etc., which can realize cross-query in each time dimension and geographic dimension, and based on time dimension and geographic dimension. Subdivision. So data warehouses for specific groups of data marts are built on multidimensional data models.

3. Business model

The business model here refers to a data model based on some data analysis and decision support, such as the user evaluation model I have introduced before, the association recommendation model, the RFM analysis model, etc., or the linear programming model of decision support. Inventory model, etc.; at the same time, the processing of pre-data in data mining can also be done here.

Data application for data warehouse

Report display

Reports are an indispensable type of data application for every data warehouse. Presenting aggregated data and multidimensional analysis data to reports provides the simplest and most intuitive data.

Instant query

In theory, all data in the data warehouse (including detailed data, aggregated data, multi-dimensional data and analytical data) should be open for instant query. Instant query provides flexible data acquisition methods, and users can query and obtain data according to their own needs.

data analysis

Most of the data analysis is based on the built business model. Of course, the aggregated data can also be used for trend analysis, comparative analysis, correlation analysis, etc., while the multidimensional data model provides the data foundation for multidimensional analysis; and some sample data is obtained from the detailed data. Conducting specific analyses is also a common approach.

Data mining

Data mining uses some advanced algorithms to make the data show a variety of surprising results. Data mining can be based on the business model that has been built in the data warehouse, but most of the time data mining will start directly from the detailed data, and the data warehouse provides data interfaces for mining tools such as SAS, SPSS and so on.

Metadata

An important aspect of the data warehouse environment is metadata. Metadata is data about data. As long as there are programs and data, metadata is part of the information processing environment. But in the data warehouse, metadata plays a new and important role. It is also because of the metadata that you can make the most efficient use of the data warehouse. Metadata enables end users/DSS analysts to explore possibilities.

Metadata is on top of the data warehouse and records the location of objects in the data warehouse. Typically, metadata records:

The data structure known to the programmer.

The data structure known to DSS analysts.

Source data for the data warehouse.

The conversion of data when it is added to the data warehouse.

Data model.

The relationship between the data model and the data warehouse.

Extract the history of the data.

What is a data warehouse? The characteristics of a data warehouse _ the difference between a data warehouse and a database

Data warehouse use

In the environment of information technology and data intelligence, data warehouse provides many economical and efficient computing resources in the field of software and hardware, Internet and intranet solutions and databases. It can store a large amount of data for analysis and allows for multiple uses. Data access technology.

Open system technology makes the cost of analyzing large amounts of data more reasonable, and hardware solutions are more mature. The main techniques used in data warehousing applications are as follows:

parallel

Computed hardware environments, operating system environments, database management systems, and all related database operations, query tools and technologies, applications, and more can benefit from the latest in parallel success.

Partition

Partitioning makes it easier to support large tables and indexes, while also improving data management and query performance.

data compression

Data compression reduces the cost of disk systems that are typically required to store large amounts of data in a data warehouse environment. New data compression techniques have also eliminated the negative impact of compressed data on query performance.

What is a data warehouse? The characteristics of a data warehouse _ the difference between a data warehouse and a database

Five benefits of data warehousing

1. Provide enhanced business intelligence (BI)

With data from a variety of data sources, managers and executives no longer need to make business decisions with limited data or their intuition. In addition, “data warehousing and related business intelligence (BI) can be used directly in business processes including market segmentation, inventory management, financial management, and sales.”

2, can save time

Because business users can quickly access many data sources in one place, they quickly make informed decisions on critical scenarios without wasting valuable time retrieving data from multiple data sources.

Not only that, business executives can query their own data with little or no IT support—saving more time and money. This means that business users don't have to wait for the emergence of IT to generate reports, and those who work hard at IT can do what they should do best – to keep the business running.

3, can improve the quality and consistency of data

The implementation of a data warehouse involves transforming data from a multitude of data source systems into a common format. As each data from each department is standardized, each department will produce results that are consistent with all other departments. So you can have more confidence in the accuracy of your data. And accurate data is the foundation of strong business decisions.

4, can provide historical wisdom

A data warehouse stores a large amount of historical data, so you can make predictions about the future by analyzing different periods and trends. These data can't usually be stored in a transactional database or used to generate reports from a trading system.

5, can create a high return on investment

Finally, the most noteworthy is the return on investment. Organizations that have installed data warehouses and improved business intelligence (BI) systems can generate more profits and save more money than those that do not invest in business intelligence (BI) systems and data warehouses. And this should be a sufficient reason for senior management to quickly join the trend of data warehousing.

What is a data warehouse? The characteristics of a data warehouse _ the difference between a data warehouse and a database

The difference between database and data warehouse

In short, the database is a transaction-oriented design, and the data warehouse is designed for themes.

The database generally stores online transaction data, and the data warehouse stores generally historical data.

The database design is to avoid redundancy as much as possible. Generally, it is designed according to the rules of the paradigm. The data warehouse is intentionally introduced with redundancy in the design, and is designed in an anti-paradigm manner.

The database is designed to capture data. The data warehouse is designed to analyze data. Its two basic elements are dimension tables and fact tables. Dimensions are the point of view of the problem, such as time, department, dimension table is the definition of these things, the fact table contains the data to be queried, and there is a dimension ID.

From a conceptual point of view, there are some flaws. Any technology is for the application, and the application can be easily understood. Take banking business as an example. The database is the data platform of the transaction system. Every transaction made by the customer in the bank is written into the database and recorded. Here, it can be simply understood as accounting with the database. The data warehouse is the data platform of the analysis system. It obtains data from the transaction system, and summarizes and processes it to provide decision makers with the basis for decision-making. For example, how many transactions occur in a branch of a bank in a month, and what is the current deposit balance of the branch. If there are more deposits and more consumer transactions, then it is necessary to set up an ATM in the region.

Obviously, the volume of transactions in banks is huge, usually calculated in millions or even millions of times. The transaction system is real-time, which requires timeliness. It takes a few tens of seconds for the customer to deposit a sum of money, which is unbearable, which requires the database to store only a short period of time. The analysis system is post-event, and it provides all the valid data for the time period of interest. These data are massive, and the summary calculations are slower, but as long as they provide effective analytical data, the goal is achieved.

Data warehousing, in the case of a large number of databases, is not a so-called "large database" in order to further mine data resources and make decisions for decision-making. So, what are the differences between data warehouses and traditional databases? Let's take a look at WHInmon's definition of data warehousing: topic-oriented, integrated, time-related, and unmodifiable data sets.

"Theme-oriented": Traditional databases are mainly for data processing of applications, not necessarily storing data according to the same theme; data warehouses focus on data analysis work, which is stored according to the theme. This is similar to the difference between traditional farmer's market and supermarkets. In the market, cabbage, radish and parsley will be on a stall if they are sold by a hawker; in the supermarket, cabbage, radish and parsley are each a piece. That is to say, the dishes (data) in the market are piled up (stored) according to the hawkers (applications), and the supermarkets are piled up according to the type of dishes (same theme).

"Time-related": When the database saves information, it does not emphasize that there must be time information. The data warehouse is different. For the purpose of decision making, the data in the data warehouse must be marked with time attributes. Time attributes are important in decision making. They are also customers who have purchased a total of nine car products. One is buying nine cars in the last three months, and one has never bought them in the last year. This is different for decision makers.

“Not modifiable”: The data in the data warehouse is not up-to-date, but is derived from other data sources. The data warehouse reflects historical information, not the kind of daily transaction data that many databases process (some databases such as telecom billing databases and even real-time information). Therefore, the data in the data warehouse is minimal or not modified at all; of course, adding data to the data warehouse is allowed.

The emergence of data warehouses is not to replace the database. Currently, most data warehouses are managed using a relational database management system. It can be said that the database and the data warehouse complement each other and each has its own merits.

So the main difference is:

(1) The database is a transaction-oriented design, and the data warehouse is designed for themes.

(2) The database generally stores online transaction data, and the data warehouse generally stores historical data.

(3) Database design is to avoid redundancy as much as possible, and the data warehouse is designed to introduce redundancy.

(4) The database is designed to capture data, and the data warehouse is designed to analyze data.

DC Cross-flow Fan

Dc Cross-Flow Fan,Dc Axial Fans,Dc Cross Flow Blower,Dc Cross Flow Fan

Original Electronics Technology (Suzhou) Co., Ltd. , https://www.original-te.com

Posted on