brainsteam.co.uk/brainsteam/content/annotations/2022/11/20/1668943111.md

81 lines
3.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: '2022-11-20T11:18:31'
hypothesis-meta:
created: '2022-11-20T11:18:31.041323+00:00'
document:
title:
- 'Data Engineering in 2022: ELT tools'
flagged: false
group: __world__
hidden: false
id: EF4wWGjFEe2zrM9D4rCx-g
links:
html: https://hypothes.is/a/EF4wWGjFEe2zrM9D4rCx-g
incontext: https://hyp.is/EF4wWGjFEe2zrM9D4rCx-g/rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
json: https://hypothes.is/api/annotations/EF4wWGjFEe2zrM9D4rCx-g
permissions:
admin:
- acct:ravenscroftj@hypothes.is
delete:
- acct:ravenscroftj@hypothes.is
read:
- group:__world__
update:
- acct:ravenscroftj@hypothes.is
tags:
- data-engineering
- data-science
- ELT
target:
- selector:
- endContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1]
endOffset: 383
startContainer: /main[1]/article[1]/div[3]/ul[1]/li[1]/div[2]/p[1]
startOffset: 0
type: RangeSelector
- end: 2093
start: 1710
type: TextPositionSelector
- exact: "Working with the raw data has lots of benefits, since at the point of\
\ ingest you don\u2019t know all of the possible uses for the data. If you\
\ rationalise that data down to just the set of fields and/or aggregate it\
\ up to fit just a specific use case then you lose the fidelity of the data\
\ that could be useful elsewhere. This is one of the premises and benefits\
\ of a data lake done well."
prefix: 'keep it at a manageable size.
'
suffix: '
Of course, despite what the'
type: TextQuoteSelector
source: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
text: absolutely right - there's also a data provenance angle here - it is useful
to be able to point to a data point that is 5 or 6 transformations from the raw
input and be able to say "yes I know exactly where this came from, here are all
the steps that came before"
updated: '2022-11-20T11:18:31.041323+00:00'
uri: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
user: acct:ravenscroftj@hypothes.is
user_info:
display_name: James Ravenscroft
in-reply-to: https://rmoff.net/2022/11/08/data-engineering-in-2022-elt-tools/
tags:
- data-engineering
- data-science
- ELT
- hypothesis
type: annotation
url: /annotation/2022/11/20/1668943111
---
<blockquote>Working with the raw data has lots of benefits, since at the point of ingest you dont know all of the possible uses for the data. If you rationalise that data down to just the set of fields and/or aggregate it up to fit just a specific use case then you lose the fidelity of the data that could be useful elsewhere. This is one of the premises and benefits of a data lake done well.</blockquote>absolutely right - there's also a data provenance angle here - it is useful to be able to point to a data point that is 5 or 6 transformations from the raw input and be able to say "yes I know exactly where this came from, here are all the steps that came before"