New_The Data Structure Explained

The objective is to initiate the development process of a data warehouse (DW) and take its initial steps. On top of the DW, we intend to develop the PBI reports. Traditionally, a data warehouse involves transferring data from a transactional database to a dedicated repository at specific intervals. However, our current stage does not involve this transfer. Instead, we establish an intermediary layer with a warehouse-like design consisting of DIM (dimension) and FACT tables. These tables are views created from our relational database tables, aiming to replicate the structure and functionality of a data warehouse.

We currently don't possess expert-level knowledge in data warehousing but have studied its principles. Consequently, the design might need to be improved.

Getting this right is very important to us, and therefore, I would appreciate your insights regarding the design and any recommendations you have. Perhaps we should discuss this person-to-person and ensure we have a proper data model to move forward.

Please let me try and explain the design and the details.

FACT_LANDINGS

FACT_LANDINGS contains duplicated rows. Should we consider all of them - which will cause cartesian on the final metrics results - or should we consider removing duplicates? Should we consider the same rule for the other fact/dimension tables if that is the case?

Within our relational database, we have three interconnected tables: SURVEY, SURVEY_ITEM, and SURVEY_ITEM_DETAILS, each associated through one-to-many relationships.

Our current endeavor involves crafting a data model that effectively tackles the following scenarios.

SURVEY = Survey Session, master

SURVEY_ITEM = Interview, detail1

SYRVEY_ITEM_DETAILS = Catches, detail2

By designing FACT_LANDINGS we flatten this data structure. So for each combination of Species, Gear, Fishing_Zone and Condition_id a new line is created in the database (at the third level) and therefore there are repetitions in the two levels above.

Lucas it is possible to work with this in a flattened table (FACT_LANDINGS) or would you recommend to for us to create an API for SURVEY, SURVEY_ITEM, SYRVEY_ITEM_DETAILS?

SELECT
        s.survey_id,
        s.survey_date          AS ldate,
        s.landing_site_id,
        si.vessel_id,
        sid.species_id,
        sid.weight,
        sid.price,
        si.fuel_used,
        sid.weight * sid.price AS total,
        sid.gear_id,
        sid.fishing_zone_id,
        sid.condition_id,
        s.tenant_id,
        s.is_active,
        s.created_by,
        s.created_time
    FROM
        survey          s,
        survey_item     si,
        survey_item_dtl sid
    WHERE
        s.survey_id = si.survey_id (+)
        AND si.survey_item_id = sid.survey_item_id(+)

DIM_DATES

What is the purpose of the DIM_DATES table? We intend to create a new dynamic calendar table to encompass all dates period - from Jan 1th to Dec 31th of all your data.
DIM_DATES contains duplicate values for the indicated key, which causes a N:N relationship in Power BI, which is not recommended

The purpose of DIM_DATES is simply to extract the dates found in table SURVEY. Perhaps we should redesign and have all possible dates?
Please advise!

SELECT
        survey.survey_date,
        survey.survey_id,
        EXTRACT(DAY FROM survey.survey_date)   "DAY",
        EXTRACT(MONTH FROM survey.survey_date) "MONTH",
        EXTRACT(YEAR FROM survey.survey_date)  "YEAR",
        to_char(survey.survey_date, 'IW' )  "WEEK",
        to_char(survey.survey_date, 'Q' ) "QUARTER",
        survey.tenant_id,
        survey.is_active
    FROM
        survey
    WHERE
        is_active = 0

DIM_PARTIES

DIM_PARTIES only matches the join criteria for PartyID on the fact tables, not with OPERATORID and OWNERID. Should DIM_PARTIES consider OPERATORID and OWNERID for their relationship to the fact tables? If yes, what are the corresponding keys on the fact tables?

PARTY_ID should match the join for OPERATORID and OWNERID in FACT_VESSELS—the following works.

SELECT
    datistica.fact_vessels.vessel_id,
    datistica.fact_vessels.vessel_name,
    datistica.fact_vessels.registration_no,
    datistica.fact_vessels.district_no,
    datistica.dim_parties.party_id,
    datistica.fact_vessels.operator_id,
    datistica.dim_parties.first_name as "OPERATOR_FIRST_NAME",
    datistica.fact_vessels.owner_id,
    dim_parties1.first_name AS "OWNER_FIRST_NAME"
FROM
    datistica.fact_vessels
    LEFT JOIN datistica.dim_parties ON datistica.fact_vessels.owner_id = datistica.dim_parties.party_id
    LEFT JOIN datistica.dim_parties dim_parties1 ON datistica.fact_vessels.owner_id = dim_parties1.party_id

VESSEL_ID	VESSEL_NAME	REGISTRATION_NO	PARTY_ID	OPERATOR_ID	OPERATOR_FIRST_NAME	OWNER_ID	OWNER_FIRST_NAME
2056	Easy Does It / Niesha	V4-469-BE	14359	14359	Shenequa	14359	Shenequa
2039	God's Grace	V4-615-SP	14361	14307	Silas	14361	Silas
2022	Busy	V4-470-OL	14362	14362	Spencer	14362	Spencer
2022	Busy	V4-470-OL	14362	14362	Spencer	14362	Spencer
1986	TERENZ	V4-308-LH	14363	14363	St.	14363	St.
1872	First Time	V4-039-BE	14364	14364	Stuyesant	14364	Stuyesant
1884	Just In Time	V4-451-BE	14365	14365	Sydney	14365	Sydney
1877	Sea Dreams	V4-497-OL	14366	14366	Sylpatra	14366	Sylpatra

DIM_COORDINATES

DIM_COORDINATES do not relate to any other table. Is it just for map plotting purposes?

I don't think we are using DIM_COORDINATES as of now. The only coordinates we are using are in the DIM_LANDING_SITES.

New_The Data Structure Explained

Please let me try and explain the design and the details.

FACT_LANDINGS

DIM_DATES

DIM_PARTIES

VESSEL_ID

VESSEL_NAME

REGISTRATION_NO

PARTY_ID

OPERATOR_ID

OPERATOR_FIRST_NAME

OWNER_ID

OWNER_FIRST_NAME

2056

Easy Does It / Niesha

V4-469-BE

14359

14359

Shenequa

14359

Shenequa

2039

God's Grace

V4-615-SP

14361

14307

Silas

14361

Silas

2022

Busy

V4-470-OL

14362

14362

Spencer

14362

Spencer

2022

Busy

V4-470-OL

14362

14362

Spencer

14362

Spencer

1986

TERENZ

V4-308-LH

14363

14363

St.

14363

St.

1872

First Time

V4-039-BE

14364

14364

Stuyesant

14364

Stuyesant

1884

Just In Time

V4-451-BE

14365

14365

Sydney

14365

Sydney

1877

Sea Dreams

V4-497-OL

14366

14366

Sylpatra

14366

Sylpatra

DIM_COORDINATES

Related content