Data Warehouse Modeling Using Online Analytical Processing Approach

Abstract


INTRODUCING
Increasing business needs affect business competition in many companies that utilize information technology. The competition aims to gain advantages that utilize information technology in facilitating business processes [1], [2]. One of the conveniences offered is the use of information technology to support decision making in carrying out existing business processes in middle-to-upper scale companies that have large amounts of data. The success factor of a company is its ability to analyze the market well. The behavior of consumers can be able to be captured well by the company in this case, of course, managerial parties to be evaluated, analyzed so as to produce a strategic policy. Evaluation, planning, and decision-making activities will be carried out better if an organization or company has completed, fast, precise, and accurate information.
Sales transaction data in the sales division related to goods sales data is so large and numerous that it requires large storage media and it is difficult to process these data for making reports used by management for decision making. Processing sales data to produce reports takes a long time because it is still processed with the help of Microsoft Excel applications. The amount of transactional sales data stored cannot be used as a basis for analysis before further data processing is carried out to be used as a basis by the top-level management in making a decision.
Online Analysis Processing (OLAP) is an approach method to present answers to the demand for a dimensional analysis process quickly, namely the design of applications and technologies that can collect, store, manipulate multidimensional data for analyst purposes [3]- [5]. OLAP is part of a more global category of business thinking, which also encapsulates the relationship between reporting and data mining. An OLAP database has its own schema structure and is usually a data warehouse. Data warehouse is a collection of data from various sources that are stored in a data warehouse (repository) in a large capacity and used for the decision-making process. However, it is possible that OLAP takes from this operational database with the record that this database already has an OLAP friendly design structure.
The use of data warehouse is expected to be able to assist the marketing division, sales division, and finance division in managing sales transaction data so that the decision-making process can be carried out quickly and precisely by the executive information system. Using OLAP can make it easier for ETL is a collection of processes for preparing data from operational sources to data. This process consists of extraction, transformation, loading, and several processes that are carried out before being published in the data warehouse. The ETL process is the phase of processing data from incoming data sources in a data warehouse. The purpose of ETL is to collect, filter, process and combine relevant data from various sources for storage in a data warehouse. ETL can also be used to integrate data. With a preexisting system. The result of the ETL process is the production of data that meets data warehouse criteria such as historical, integrated, summarized, static data and has a structure designed for the purposes of the analysis process.

Extraction
The sales database on CV AL is in the form of an excel file. Data used for staging data and data warehouse will use MySql. The reason for using this database platform was chosen with consideration because the development and license currently owned is opensource. Users who can access the data warehouse system are users who are connected to the web server. Operational data stored in the database can be provided in the form of tables based on research needs on sales data on the basis of company confidentiality.
Data extraction from sales operational database in MySql. With different platforms, changes to the data structure side are needed. The stages of the extract are carried out from the data source used through the process of selecting data which is then stored on the database. Database placement is placed on the same database storage, machine and platform as the data warehouse so that the ETL process can be done faster and does not interfere with operational processes. This extraction process takes the necessary sources in the form of data on goods, customers, drivers, districts, provinces, and sales to then create a database for sales warehouse data.
The data cleansing stage is carried out on the data because this study uses a data source that does not have a uniform data format used. The data was cleansed and organized consistently according to the needs in this study.

Transformation
The transformation stage in principle changes the form of data from data that has been extracted into a form that suits user needs. This process is carried out after the existing data has passed the extraction and cleaning process. The transformation process carried out is divided by level, namely record level and field level. In this process, the process of selection, merging and aggregation is carried out to obtain summary data in accordance with the dimensions to be created. The transformation process takes time in creating procedures in SQL. In the sales table, the transformation process is carried out to take tables and de-normalize starting from customers, goods, drivers, districts, provinces and sales.
In this transformation stage, the function process and field level are carried out, namely: a. Map input data from the original data schema to the data warehouse schema, namely in the form of goods data to be transformed in dim_barang, customer data to be transformed in dim_pelanggan, driver data to be transformed in dim_sopir, district data to be transformed in dim_kabupaten, provincial data to be transformed in dim_provinsi, and sales data to be transformed in fact_penjualan. b. Converting data types or data formats, namely the type that is converted to data types with date, int, decimal types. Because in the previous data all data type formats are the same from one another. c. Cleaning and disposal of duplication and data errors, namely cleaning data that has the potential to cause redundancies during the loading process. This data cleaning process is checked against the value that will be the primary key in the dimension and fact table in the data warehouse. d. Calculation of derivative values or first, namely carrying out the process of adding calculations to the total price and total pay in the transaction data. e. Data merge is combining transaction data that occurs within 1 year.

Loading
Loading is moving OLTP data in a data warehouse. There are two data loading that we do in the data warehouse. First is the initial load, this process is done when we have finished designing and building the data warehouse. The data we enter will certainly be very large and take a relatively longer time. In the initial load process, enter data in the dimension table, namely dim_barang, dim_pelanggan, dim_sopir, dim_provinsi, dim_kabupaten, and fact_penjualan for processes that occur for a period of 1 year. Incremental loading, performed when the datawarehouse has been operated. We do data extraction, transformation, and loading of the data. For the initial load, it is done only once, but for incremental load we can schedule according to needs, it can be every day, month, quarter or year according to the needs of the transaction process that occurs in the company.

Star Schema
The star schema created from this prototype uses a data store dimension consisting of 5 (five) dimensions, namely dimbarang, dimcustomer, dimdriver, dimkabupaten, and dimprovincial. This star schema has 1 (one) fact, namely factsales. The design of the data warehouse star schema can be seen in figure 2 below.

OLAP Monthly Goods Sales Report
The first step at this stage displays the reports generated by the data warehouse. Here is what the monthly item sales report looks like.

Figure 3. Display of Monthly Item Sales Report
The next step in the monthly item sales report is to design the hypercubes report that has been described and efficiency will be carried out in each hypercubes, where the dimension table will be reduced and unified into a fact table with the aim of improving query performance where reducing the number of tables in the process of joining in the query will improve query performance.

Figure 4. Hypercube of Monthly Item Sales Report
From the picture above, the hypercube design consists of date fields, item code, item name, item price, number of sales, and total for monthly item sales reports. Based on the design of hypercubes reports on monthly sales of goods, 1 fact table and 1dimension table will be formed. From the fact table fact_penjualan only take the item code, quantity, and total price fields. While the dimension table only takes the fields of the name of the goods and the price of goods, therefore 1 fact table and 1dimension table will be formed. Facts and dimension tables can be seen in the following figure.  The next step in the report on sales of goods by region is to design hypercubes reports that have been described and efficiency will be carried out in each hypercubes, where the dimension table will be reduced and unified in a fact table with the aim of improving query performance where reducing the number of tables in the process of joining in the query will improve query performance.

Figure 7. Hypercube of Sales Report by Region
Based on the design of hypercubes sales reports by region, 1 fact table and 2dimension tables will be formed. From the fact table fact_penjualan only take the item code, quantity, and total price fields. While from the item dimension table only take the fields of the name of the goods and the price of the goods. The district dimension table takes the district code field, and the name of the district, therefore it will form 1 fact table and 2 dimension tables. Facts and dimensiontables can be seen in the following figure.

Testing the Response Time of OLAP Prototype Results
This stage is tested on the prototype that has been made, with the purpose of this test is to test the response time of the hypothesis that has been made in the previous chapter to see whether the results of prototype testing can solve the problem. The results of this test by looking at how long it takes to display sales reports on CV AL so as to help the executive information system in producing sales reports. The following is a prototype display in displaying the goods sales report.

Figure 9. Goods Sales Report Response Time Test Results
Based on the picture above, it can be seen that the response time to display the report on the sales data report on the prototype of OLAP Development Sales Data on CV AL has a response time of 0.0038769999519 seconds. The test results show that the hypothesis of this research is correct, namely that with the development of OLAP, it will make it easier for executive information systems to produce sales data reports faster with a response time of less than 10 seconds to display goods sales reports.
This stage is carried out testing the prototype that has been made, with the purpose of this test is to test hypothetical items that have been made in the previous chapter to see whether the results of prototype testing can solve the problem of response time in displaying sales graphs. The results of this test by looking at how long it takes to display a graph of sales of goods on the CV AL so as to help the executive information system in decision making. The following is a prototype display in displaying a graph of sales of goods.

Figure 10. Goods Sales Report Response Time Test Results
Based on the picture above, it can be seen that the response time to display the item sales data report on the prototype of OLAP Development Sales Data on CV AL has a response time of 0.0009969999908239 seconds. The test results show that the hypothesis of this research is correct, namely that with the development of the OLAP prototype, it will make it easier for the executive information system to produce sales data graphs faster with a response time of less than 10 seconds to display sales graphs for decision making based on graph-shaped reports generated from faster prototypes.

CONCLUSION
The implementation of this prototype has hardware and software needs that are in accordance with those in non-functional needs, and for the response time of this prototype requires a different time for each query execution in displaying graphs and generating sales reports on CV AL. But the response time of each query executed is no more than 10 seconds to produce graphs and reports on sales of good.