From a349e9d65163081f8eefe453bc390f41e35856cb Mon Sep 17 00:00:00 2001 From: ravenscroftj Date: Sun, 20 Nov 2022 11:30:06 +0000 Subject: [PATCH] Add 'brainsteam/content/replies/2022/11/20/1668943111.md' --- .../content/replies/2022/11/20/1668943111.md | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 brainsteam/content/replies/2022/11/20/1668943111.md diff --git a/brainsteam/content/replies/2022/11/20/1668943111.md b/brainsteam/content/replies/2022/11/20/1668943111.md new file mode 100644 index 0000000..531f50d --- /dev/null +++ b/brainsteam/content/replies/2022/11/20/1668943111.md @@ -0,0 +1,81 @@ +--- +date: '2022-11-20T11:18:31' +hypothesis-meta: + created: '2022-11-20T11:18:31.041323+00:00' + document: + title: + - 'Data Engineering in 2022: ELT tools' + flagged: false + group: __world__ + hidden: false + id: EF4wWGjFEe2zrM9D4rCx-g + links: + html: https://hypothes.is/a/EF4wWGjFEe2zrM9D4rCx-g + incontext: https://hyp.is/EF4wWGjFEe2zrM9D4rCx-g/rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/ + json: https://hypothes.is/api/annotations/EF4wWGjFEe2zrM9D4rCx-g + permissions: + admin: + - acct:ravenscroftj@hypothes.is + delete: + - acct:ravenscroftj@hypothes.is + read: + - group:__world__ + update: + - acct:ravenscroftj@hypothes.is + tags: + - data-engineering + - data-science + - ELT + target: + - selector: + - endContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1] + endOffset: 383 + startContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1] + startOffset: 0 + type: RangeSelector + - end: 2093 + start: 1710 + type: TextPositionSelector + - exact: "Working with the raw data has lots of benefits, since at the point of\ + \ ingest you don\u2019t know all of the possible uses for the data. If you\ + \ rationalise that data down to just the set of fields and/or aggregate it\ + \ up to fit just a specific use case then you lose the fidelity of the data\ + \ that could be useful elsewhere. This is one of the premises and benefits\ + \ of a data lake done well." + prefix: 'keep it at a manageable size. + + + + ' + suffix: ' + + + + + + Of course, despite what the' + type: TextQuoteSelector + source: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/ + text: absolutely right - there's also a data provenance angle here - it is useful + to be able to point to a data point that is 5 or 6 transformations from the raw + input and be able to say "yes I know exactly where this came from, here are all + the steps that came before" + updated: '2022-11-20T11:18:31.041323+00:00' + uri: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/ + user: acct:ravenscroftj@hypothes.is + user_info: + display_name: James Ravenscroft +in-reply-to: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/ +tags: +- data-engineering +- data-science +- ELT +- hypothesis +type: reply +url: /replies/2022/11/20/1668943111 + +--- + + + +
Working with the raw data has lots of benefits, since at the point of ingest you don’t know all of the possible uses for the data. If you rationalise that data down to just the set of fields and/or aggregate it up to fit just a specific use case then you lose the fidelity of the data that could be useful elsewhere. This is one of the premises and benefits of a data lake done well.
absolutely right - there's also a data provenance angle here - it is useful to be able to point to a data point that is 5 or 6 transformations from the raw input and be able to say "yes I know exactly where this came from, here are all the steps that came before" \ No newline at end of file