Provide users with meaningful names
Models & Datasets naming
This applies to both model and dataset files:
General rules
- Naming:
- File names and model names can reuse the name defined in the dbt project.
use_lower_case_and_dash
For i.e: Usefirst_name
, not"First Name"
.- Entity name in the name of files should be in plural form
- If verb is used in the name, they must follow the forms
- verb (active voice) - entity:
map_users_visitors
- entity name - verb (passive voice):
search_results_unnested
- verb (active voice) - entity:
- Labeling: Since the label is what the end-user sees, it should be descriptive and easy to understand
- Avoid too much abbreviation
- Remove prefixes and suffixes (like dim, facts...)
Use Upper Case And No Dash
Staging models
- Staging models used to produce a final model. Multiple actions can be done here.
- These tend to be convenient models used for reporting purpose, not for exploration.
- Format:
stg_<some useful names>
- Example:
stg_query_reports
stg_dashboards
stg_map_holistics_landing_visitors_trials
Assisting models
Assisting models should not be exposed to end-user, commonly perform one specific action.
The general name format is verb (present tense)- entity (plural)
. Some common verbs:
map_
: use for identity stitching tasksdedup_
: use to deduplicate recordsunnest_
: Use to unnest the nested fields, and use the result to do other things- Examples
- map_exchange_rates
- map_visitors_trial
- map_pipedrive_deals_holistics_tenants
- dedup_trial_submission_emails
- unnest_holistics_app_global_search_results
- unnest_trial_submissions_what_you_like
- unnest_trial_submissions_reporting_data_sources
Another format is entity (plural) - verb (passive voice)
.
- For example:
visitors_trials_mapped
trial_emails_deduped
Cleaned and reporting models
These data models are where we start applying business logic, and as a result, typically have heavier transformations than staging models. Here is a guide of naming models which adapt Kimball's dimensional modeling techniques
dim_
models
- Dim models represent certain objects/entities in our products. These models tend to be short and wide (few rows, many fields).
- Format:
dim_<object name in plural>
- Example:
dim_query_reports
dim_dashboards
.
fact_
models
- contains information about "transactions" or "interactions".
- Have measurements, is not necessary a raw, atomic event table
- Tend to be a long, narrow table
- Examples:
- fact_jobs
- fact_business_metrics_monthly
Field naming & labeling
A model field has two things: Field name and field label.
Field names should follow stricter rules compared to field labels. This is applicable for anything that is analyst-facing instead of business user-facing.
Field name
- Case: use
snake_case
- Name fields consistently and represent values consistently across the model. For example:
- user vs. customer vs. member vs. account
- company vs. account vs. organization
- registration vs. creation
- In denormalized tables, should try to retain the original name of the table. For example,
id
,created_at
,updated_at
field in tableusers
should still beid
,created_at
,updated_at
instead ofuser_id
,user_created_at
,user_updated_at
- In a normalized table (where you joined different tables to get info of an entity into another entity's model), ambiguous field names should be made clear: For example, in a derived
query_report
model (that create a normalized table):select
query_reports.id as query_report_id
, query_reports.name as query_report_name
, query_reports.user_id
, users.name as user_name -- `name` exists in both tables
from query_reports
left join users on query_reports.user_id = users.id
Dimensions
Numeric Dimensions
- General form
[aggregation name]_object_number
- If the aggregated dimension is monhubspot_customer_statusetary:
[aggregation name]_value_amount
- Aggregation name is optional
- Ratios:
x_per_y
- Percentage:
pct_x_over_y
For example:
user_stats: {
dimensions: {
user_id
, user_name
, query_reports_number -- Number of reports belong to this user
, avg_reports_per_day_number
, pct_models_in_dataset_number
, total_revenue_amount
, total_payment_this_monnth_amount
, total_active_days_number
}
, measures: {}
}
Boolean Dimensions
Dimensions that have boolean (TRUE/FALSE) type
- General form:
is_...
,has_...
,clicked_button_a
,created_multi_ds_report
- Verb tense depends on the context.
Measures
General form: [aggregation name]_object
- Average:
avg_
- Sum:
total_
(Use Total instead of Sum because it sound more natural) - Count:
count_
- Ratios:
x_per_y
- Percentage:
pct_x_over_y
Example:
user_stats: {
dimensions: {
user_id
, user_name
}
, measures: {
count_created_users: @sql count(users.id)
, count_existing_users: @sql count(users.id if is not deleted)
, avg_activity_days: @sql avg(total_active_days_number)
}
}
Field label in Holistics modeling layer
- Case:
Please Use Capital
- Field label is what the end-user see when exploring the dataset, so it should be descriptive
- Fields that have the same base name across multiple models (id, created_at, updated_at...) should be labeled explicitly. For example:
- Field
jobs.created_at
should have label Job Created At - Field
users.created_at
should have label User Created At - Field
query_reports.created_at
should have label Report Created At
- Field