{"id":1594,"date":"2024-03-02T10:42:54","date_gmt":"2024-03-02T09:42:54","guid":{"rendered":"https:\/\/staging.sql.marcus-belz.de\/?p=1594"},"modified":"2026-05-07T01:06:38","modified_gmt":"2026-05-06T23:06:38","slug":"data-quality-in-an-etl-process","status":"publish","type":"post","link":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en","title":{"rendered":"Data quality in an ETL process"},"content":{"rendered":"\n<p>There is a lot of good literature on the subject of data quality. As a rule, this topic is described more theoretically, with impressive examples and the importance is proven with theories. However, I have not yet found any concrete instructions on how exactly to implement the checking and handling of bad data in an ETL process.<\/p>\n\n\n\n<p>What appears to be an easy task turns out to be challenging upon a closer look.<\/p>\n\n\n\n<p>There are essentially two aspects that need to be taken into account: From a technical point of view, it must first be checked whether the extracted data can be converted into the data types of the target system and whether they correspond to possible restrictions such as value ranges and notation. From a technical perspective, it is more about ensuring the completeness and consistency of the data and whether the data is uptodate. These properties are often summarized under the term <em>data integrity<\/em>.<\/p>\n\n\n\n<p>Based on this distinction, I summarize these two aspects under the terms <em>technical data quality <\/em>and <em>dataintegrity<\/em>.<\/p>\n\n\n\n<p>Checking the dataintegrity and technical data quality ensures that the delivered data can be processed safely and loaded into the target system. Technical and errors related to dataintegrity may lead to an error when loading the processed data into the target system and, in the worst case, to the termination of the ETL process. An ETL process should be designed to proactively identify, log and handle technical data and dataintegrity errors. If in doubt, incorrect data should not be loaded into the target system. In order to develop such an ETL process, the following must be taen into account:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Checks technical data quality<\/li>\n\n\n\n<li>Check dataintegrity<\/li>\n\n\n\n<li>Logging of all found errors<\/li>\n\n\n\n<li>Mark erroneous data<\/li>\n\n\n\n<li>Exclude erroneous data from further processing <\/li>\n<\/ul>\n\n\n\n<p>With this series of articles I will present a design pattern for an ETL process in which the above mentioned tasks can be implemented with a reasonable amount of effort.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Related Posts<\/h1>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1594\">Data quality in an ETL process<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1181\">Datenqualit\u00e4t \/\/ Grundlagen der Typ-Konvertierung mit T-SQL<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1576\">Data quality \/\/ Converting data to&nbsp;bigint, int, smallint, tinyint<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1564\">Data quality \/\/ Converting data to decimal or numeric<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1607\">Data quality \/\/ Converting data to money, smallmoney<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1651\">Data quality \/\/ Converting data to float, real<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1673\">Data quality \/\/ Converting data to bit<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>There is a lot of good literature on the subject of data quality. As a rule, this topic is described more theoretically, with impressive examples and the importance is proven with theories. However, I have not yet found any concrete instructions on how exactly to implement the checking and handling of bad data in an &#8230; <a title=\"Data quality in an ETL process\" class=\"read-more\" href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1594&#038;lang=en\" aria-label=\"Mehr Informationen \u00fcber Data quality in an ETL process\">Weiterlesen<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[106,108],"tags":[110,114,120],"class_list":["post-1594","post","type-post","status-publish","format-standard","hentry","category-all-languages-en","category-english-en","tag-data-quality-en","tag-design-pattern-en","tag-etl-process-en"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data quality in an ETL process - Just another SQL blog<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en\" \/>\n<meta property=\"og:locale\" content=\"de_DE\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data quality in an ETL process - Just another SQL blog\" \/>\n<meta property=\"og:description\" content=\"There is a lot of good literature on the subject of data quality. As a rule, this topic is described more theoretically, with impressive examples and the importance is proven with theories. However, I have not yet found any concrete instructions on how exactly to implement the checking and handling of bad data in an ... Weiterlesen\" \/>\n<meta property=\"og:url\" content=\"https:\/\/staging.sql.marcus-belz.de\/?p=1594&amp;lang=en\" \/>\n<meta property=\"og:site_name\" content=\"Just another SQL blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-02T09:42:54+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-06T23:06:38+00:00\" \/>\n<meta name=\"author\" content=\"marcus\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Geschrieben von\" \/>\n\t<meta name=\"twitter:data1\" content=\"marcus\" \/>\n\t<meta name=\"twitter:label2\" content=\"Gesch\u00e4tzte Lesezeit\" \/>\n\t<meta name=\"twitter:data2\" content=\"2\u00a0Minuten\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en\"},\"author\":{\"name\":\"marcus\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/#\\\/schema\\\/person\\\/98ab042e47d7286f64530ee18f20f675\"},\"headline\":\"Data quality in an ETL process\",\"datePublished\":\"2024-03-02T09:42:54+00:00\",\"dateModified\":\"2026-05-06T23:06:38+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en\"},\"wordCount\":381,\"keywords\":[\"Data Quality\",\"Design Pattern\",\"ETL-Process\"],\"articleSection\":[\"All Languages\",\"English\"],\"inLanguage\":\"de\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en\",\"url\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en\",\"name\":\"Data quality in an ETL process - Just another SQL blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/#website\"},\"datePublished\":\"2024-03-02T09:42:54+00:00\",\"dateModified\":\"2026-05-06T23:06:38+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/#\\\/schema\\\/person\\\/98ab042e47d7286f64530ee18f20f675\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en#breadcrumb\"},\"inLanguage\":\"de\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?p=1594&lang=en#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data quality in an ETL process\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/#website\",\"url\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/\",\"name\":\"Just another SQL blog\",\"description\":\"Marcus Belz \u00b7 SQL Server \u00b7 ETL \u00b7 Datenqualit\u00e4t\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"de\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/#\\\/schema\\\/person\\\/98ab042e47d7286f64530ee18f20f675\",\"name\":\"marcus\",\"url\":\"https:\\\/\\\/staging.sql.marcus-belz.de\\\/?author=1\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data quality in an ETL process - Just another SQL blog","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en","og_locale":"de_DE","og_type":"article","og_title":"Data quality in an ETL process - Just another SQL blog","og_description":"There is a lot of good literature on the subject of data quality. As a rule, this topic is described more theoretically, with impressive examples and the importance is proven with theories. However, I have not yet found any concrete instructions on how exactly to implement the checking and handling of bad data in an ... Weiterlesen","og_url":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en","og_site_name":"Just another SQL blog","article_published_time":"2024-03-02T09:42:54+00:00","article_modified_time":"2026-05-06T23:06:38+00:00","author":"marcus","twitter_card":"summary_large_image","twitter_misc":{"Geschrieben von":"marcus","Gesch\u00e4tzte Lesezeit":"2\u00a0Minuten"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en#article","isPartOf":{"@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en"},"author":{"name":"marcus","@id":"https:\/\/staging.sql.marcus-belz.de\/#\/schema\/person\/98ab042e47d7286f64530ee18f20f675"},"headline":"Data quality in an ETL process","datePublished":"2024-03-02T09:42:54+00:00","dateModified":"2026-05-06T23:06:38+00:00","mainEntityOfPage":{"@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en"},"wordCount":381,"keywords":["Data Quality","Design Pattern","ETL-Process"],"articleSection":["All Languages","English"],"inLanguage":"de"},{"@type":"WebPage","@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en","url":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en","name":"Data quality in an ETL process - Just another SQL blog","isPartOf":{"@id":"https:\/\/staging.sql.marcus-belz.de\/#website"},"datePublished":"2024-03-02T09:42:54+00:00","dateModified":"2026-05-06T23:06:38+00:00","author":{"@id":"https:\/\/staging.sql.marcus-belz.de\/#\/schema\/person\/98ab042e47d7286f64530ee18f20f675"},"breadcrumb":{"@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en#breadcrumb"},"inLanguage":"de","potentialAction":[{"@type":"ReadAction","target":["https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/staging.sql.marcus-belz.de\/?p=1594&lang=en#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/staging.sql.marcus-belz.de\/"},{"@type":"ListItem","position":2,"name":"Data quality in an ETL process"}]},{"@type":"WebSite","@id":"https:\/\/staging.sql.marcus-belz.de\/#website","url":"https:\/\/staging.sql.marcus-belz.de\/","name":"Just another SQL blog","description":"Marcus Belz \u00b7 SQL Server \u00b7 ETL \u00b7 Datenqualit\u00e4t","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/staging.sql.marcus-belz.de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"de"},{"@type":"Person","@id":"https:\/\/staging.sql.marcus-belz.de\/#\/schema\/person\/98ab042e47d7286f64530ee18f20f675","name":"marcus","url":"https:\/\/staging.sql.marcus-belz.de\/?author=1"}]}},"_links":{"self":[{"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/posts\/1594","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1594"}],"version-history":[{"count":4,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/posts\/1594\/revisions"}],"predecessor-version":[{"id":1700,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=\/wp\/v2\/posts\/1594\/revisions\/1700"}],"wp:attachment":[{"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1594"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1594"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staging.sql.marcus-belz.de\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1594"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}