Background
My Excel workbook compares two large datasets, as read from external files. Let Data_A and Data_B be those queries, loaded to my worksheets as the tables Data_A
| a_1 | a_2 | ... | a_n |
|---|---|---|---|
| ... | ... | ... | ... |
and also Data_B:
| b_1 | b_2 | ... | b_m |
|---|---|---|---|
| ... | ... | ... | ... |
I wish to extract their column names, and load those to the worksheet as the separate tables Fields_A...
let
names = {"a_1", "a_2", ..., "a_n"},
result = Table.FromColumns({Source}, {"Field Name"})
in
result
| Field Name |
|---|
| a_1 |
| a_2 |
| ... |
| a_n |
...and Fields_B.
let
names = {"b_1", "b_2", ..., "b_m"},
result = Table.FromColumns({Source}, {"Field Name"})
in
result
| Field Name |
|---|
| b_1 |
| b_2 |
| ... |
| b_m |
Inadequate Solutions
Reference in New Query
Obviously, I can simply reference Data_A and Data_B in new queries.
let
names = Table.ColumnNames(Data_*),
result = Table.FromColumns({names}, {"Field Name"})
in
result
Now I do .Buffer() both Data_A and Data_B at the end of their own workflows, right before they are loaded to the worksheets:
let
dataset = ...
output = Table.Buffer(dataset)
in
output
Unfortunately, these queries again pull the entire Data_A and Data_B datasets from their own files. Since these files are so large, the simple Table.ColumnNames() query takes forever.
Read from Worksheet
Alternatively, I can read the data back from the Data_A and Data_B tables, which I just loaded to my worksheets:
let
sheetTable = Excel.CurrentWorkbook(){[Name = "Data_*"]}[Content],
names = Table.ColumnNames(sheetTable),
result = Table.FromColumns({names}, {"Field Name"})
in
result
Unfortunately, this approach is unstable, since the user may make manual edits to the table. They may rename columns, delete them, and even add new ones!
Question
I absolutely must load to my worksheets a list of the column names from each table Data_A and Data_B. These names must reflect the original names loaded to the tables, before any manual edits by the user. However, this info must not take forever to load—as they do when referencing Data_A and Data_B.
Is there a way to store the column names as an intermediate step in the Data_* query, and then load this step to a separate sheet, without restarting the Data_* query from scratch?
Beware of
BufferI think in this case
Bufferwon't help. Does it really speed things up?I imagine if you have "Data_A" and "Data_B" queries, and you don't buffer, then
Table.ColummNamesshould pretty fast. Failing that, see ifRecord.FieldNames(Table.First(Data_A))is any faster. Power query is very lazy neither should cause all of Data_A to be read unless you buffer!Maybe if it's a text file, you can pull just the first line in the file instead of first converting the data into a table?
Users can modify sources they can modify
If you don't want to cache the data where it's possible to change it, you can't have it in a file they can modify.
When you talk about a cache within the workbook file, it means your users can modify it. So, if you can trust that users won't bother trying to hack stuff, you can cache to a hidden worksheet if that helps, but usually that means you need to guarantee that query is refreshed before your other queries are run and that makes your solution more fragile.
Dpending on your process you might decide it's better to have three external files:
I don't know if staging data in a workbook like this might work for you. It means an extra step because it "must* be reliably updated before your users try to access it. As a safeguard you could make a query in the workbook your users edit so that errors are thrown or bright yellow warnings appear if the modified date of the "staged data' workbook's is older than the Data_1 or Data_2 files it pulls from.