2022-11-20 11:30:06 +00:00
---
date: '2022-11-20T11:18:31'
hypothesis-meta:
created: '2022-11-20T11:18:31.041323+00:00'
document:
title:
- 'Data Engineering in 2022: ELT tools'
flagged: false
group: __world__
hidden: false
id: EF4wWGjFEe2zrM9D4rCx-g
links:
html: https://hypothes.is/a/EF4wWGjFEe2zrM9D4rCx-g
incontext: https://hyp.is/EF4wWGjFEe2zrM9D4rCx-g/rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
json: https://hypothes.is/api/annotations/EF4wWGjFEe2zrM9D4rCx-g
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- data-engineering
- data-science
- ELT
target:
- selector:
- endContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1]
endOffset: 383
startContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1]
startOffset: 0
type: RangeSelector
- end: 2093
start: 1710
type: TextPositionSelector
- exact: "Working with the raw data has lots of benefits, since at the point of\
\ ingest you don\u2019t know all of the possible uses for the data. If you\
\ rationalise that data down to just the set of fields and/or aggregate it\
\ up to fit just a specific use case then you lose the fidelity of the data\
\ that could be useful elsewhere. This is one of the premises and benefits\
\ of a data lake done well."
prefix: 'keep it at a manageable size.
'
suffix: '
Of course, despite what the'
type: TextQuoteSelector
source: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
text: absolutely right - there's also a data provenance angle here - it is useful
to be able to point to a data point that is 5 or 6 transformations from the raw
input and be able to say "yes I know exactly where this came from, here are all
the steps that came before"
updated: '2022-11-20T11:18:31.041323+00:00'
uri: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
tags:
- data-engineering
- data-science
- ELT
- hypothesis
2022-11-26 06:57:18 +00:00
type: annotation
url: /annotation/2022/11/20/1668943111
2022-11-20 11:30:06 +00:00
---
< blockquote > Working with the raw data has lots of benefits, since at the point of ingest you don’ t know all of the possible uses for the data. If you rationalise that data down to just the set of fields and/or aggregate it up to fit just a specific use case then you lose the fidelity of the data that could be useful elsewhere. This is one of the premises and benefits of a data lake done well.< / blockquote > absolutely right - there's also a data provenance angle here - it is useful to be able to point to a data point that is 5 or 6 transformations from the raw input and be able to say "yes I know exactly where this came from, here are all the steps that came before"