/
New_The Data Structure Explained

New_The Data Structure Explained

The objective is to initiate the development process of a data warehouse (DW) and take its initial steps. On top of the DW, we intend to develop the PBI reports. Traditionally, a data warehouse involves transferring data from a transactional database to a dedicated repository at specific intervals. However, our current stage does not involve this transfer. Instead, we establish an intermediary layer with a warehouse-like design consisting of DIM (dimension) and FACT tables. These tables are views created from our relational database tables, aiming to replicate the structure and functionality of a data warehouse.

We currently don't possess expert-level knowledge in data warehousing but have studied its principles. Consequently, the design might need to be improved.

Getting this right is very important to us, and therefore, I would appreciate your insights regarding the design and any recommendations you have. Perhaps we should discuss this person-to-person and ensure we have a proper data model to move forward. 

Please let me try and explain the design and the details. 

FACT_LANDINGS

FACT_LANDINGS contains duplicated rows. Should we consider all of them - which will cause cartesian on the final metrics results - or should we consider removing duplicates? Should we consider the same rule for the other fact/dimension tables if that is the case?

Within our relational database, we have three interconnected tables: SURVEY, SURVEY_ITEM, and SURVEY_ITEM_DETAILS, each associated through one-to-many relationships.

Our current endeavor involves crafting a data model that effectively tackles the following scenarios.

SURVEY = Survey Session, master

SURVEY_ITEM = Interview, detail1

SYRVEY_ITEM_DETAILS = Catches, detail2

By designing FACT_LANDINGS we flatten this data structure. So for each combination of Species, Gear, Fishing_Zone and Condition_id a new line is created in the database (at the third level) and therefore there are repetitions in the two levels above.

Lucas it is possible to work with this in a flattened table (FACT_LANDINGS) or would you recommend to for us to create an API for SURVEY, SURVEY_ITEM, SYRVEY_ITEM_DETAILS?

 

SELECT s.survey_id, s.survey_date AS ldate, s.landing_site_id, si.vessel_id, sid.species_id, sid.weight, sid.price, si.fuel_used, sid.weight * sid.price AS total, sid.gear_id, sid.fishing_zone_id, sid.condition_id, s.tenant_id, s.is_active, s.created_by, s.created_time FROM survey s, survey_item si, survey_item_dtl sid WHERE s.survey_id = si.survey_id (+) AND si.survey_item_id = sid.survey_item_id(+)

DIM_DATES

What is the purpose of the DIM_DATES table? We intend to create a new dynamic calendar table to encompass all dates period - from Jan 1th to Dec 31th of all your data.
DIM_DATES contains duplicate values for the indicated key, which causes a N:N relationship in Power BI, which is not recommended

The purpose of DIM_DATES is simply to extract the dates found in table SURVEY. Perhaps we should redesign and have all possible dates?
Please advise!

SELECT survey.survey_date, survey.survey_id, EXTRACT(DAY FROM survey.survey_date) "DAY", EXTRACT(MONTH FROM survey.survey_date) "MONTH", EXTRACT(YEAR FROM survey.survey_date) "YEAR", to_char(survey.survey_date, 'IW' ) "WEEK", to_char(survey.survey_date, 'Q' ) "QUARTER", survey.tenant_id, survey.is_active FROM survey WHERE is_active = 0

DIM_PARTIES

DIM_PARTIES only matches the join criteria for PartyID on the fact tables, not with OPERATORID and OWNERID. Should DIM_PARTIES consider OPERATORID and OWNERID for their relationship to the fact tables? If yes, what are the corresponding keys on the fact tables?

PARTY_ID should match the join for OPERATORID and OWNERID in FACT_VESSELS—the following works.

 

SELECT datistica.fact_vessels.vessel_id, datistica.fact_vessels.vessel_name, datistica.fact_vessels.registration_no, datistica.fact_vessels.district_no, datistica.dim_parties.party_id, datistica.fact_vessels.operator_id, datistica.dim_parties.first_name as "OPERATOR_FIRST_NAME", datistica.fact_vessels.owner_id, dim_parties1.first_name AS "OWNER_FIRST_NAME" FROM datistica.fact_vessels LEFT JOIN datistica.dim_parties ON datistica.fact_vessels.owner_id = datistica.dim_parties.party_id LEFT JOIN datistica.dim_parties dim_parties1 ON datistica.fact_vessels.owner_id = dim_parties1.party_id

 

VESSEL_ID
VESSEL_NAME
REGISTRATION_NO
PARTY_ID
OPERATOR_ID
OPERATOR_FIRST_NAME
OWNER_ID
OWNER_FIRST_NAME
2056
Easy Does It / Niesha
V4-469-BE
14359
14359
Shenequa
14359
Shenequa
2039
God's Grace
V4-615-SP
14361
14307
Silas
14361
Silas
2022
Busy
V4-470-OL
14362
14362
Spencer
14362
Spencer
2022
Busy
V4-470-OL
14362
14362
Spencer
14362
Spencer
1986
TERENZ
V4-308-LH
14363
14363
St.
14363
St.
1872
First Time
V4-039-BE
14364
14364
Stuyesant
14364
Stuyesant
1884
Just In Time
V4-451-BE
14365
14365
Sydney
14365
Sydney
1877
Sea Dreams
V4-497-OL
14366
14366
Sylpatra
14366
Sylpatra

DIM_COORDINATES

DIM_COORDINATES do not relate to any other table. Is it just for map plotting purposes?

I don't think we are using DIM_COORDINATES as of now. The only coordinates we are using are in the DIM_LANDING_SITES.

 

 

Related content