Updating selective columns in and appending new rows to Historical Table from Update Table

73 Views Asked by At

I have 2 tables, #"Lead Data1" and #"Enrolment Data", with ~300k and ~16k rows respectively with key (Lead Number). I am using a separate query to pull #"Enrolment Data". I want to update the #"Lead Data1" columns {"Created On", "Course", "Program", "Business"} from the #"Enrolment Data", retaining the other columns. If any key (Lead Number) in #"Enrolment Data" is not found in #"Lead Data1", those records need to be appended to it.

I tried the below code:

let

Source = Excel.Workbook(File.Contents("xxx.xlsb"), null, true),

#"Lead Data1" = Source{[Name="Lead Data"]}[Data],

#"Promoted Headers" = Table.PromoteHeaders(#"Lead Data1", [PromoteAllScalars=true]),

#"Removed Other Columns" = Table.SelectColumns(#"Promoted Headers",{"Created On", "Owner", "Lead Number", "Source", "Source type", "TL", "Course", "Program", "Business"}),

Here I remove all unnecessary columns, these are the ones I need in my output from combining #"Lead Data1" and #"Enrolment Data". I will use #"Removed Other Columns" table again to append non-matched items.

#"Removed Other Columns1" = Table.SelectColumns(#"Removed Other Columns",{"Owner", "Lead Number", "Source", "Source type", "TL"}),

I only keep columns I need to merge to #"Enrolment Data" (which don't have these)

#"Merged Queries" = Table.Join(#"Removed Other Columns1", {"Lead Number"}, #"Enrolment Data", {"Lead Number"}, JoinKind.RightOuter, 2),

I am merging with #"Enrolment Data" and not the other way around because it has much lower number of rows (if that makes a difference). This would give me the "updated" rows, including those with keys not matching #"Lead Data1".

#"Rows to Remove" = Table.Column(#"Merged Queries","Lead Number"),

Once merged, I select the list of matched keys (updated rows) in #"Enrolment Data".

#"Selected Rows" = Table.RemoveMatchingRows(#"Removed Other Columns", each List.Contains(#"Rows to Remove",[#"Lead Number"])),

I remove those rows from #"Lead Data1" (giving "non-updated" rows)

#"Appended Queries" = Table.Combine({#"Merged Queries", #"Selected Rows"})

I append the "Updated" and "Non-Updated" rows.

in #"Appended Queries"

However, the query throws an error of "Data Source not found" - I think this might be because I am referencing #"Removed Other Columns" table wrong. Is there any way around it, or my approach itself is faulty? If so, any solution will be highly appreciated.

This shows sample data as well as output

2

There are 2 best solutions below

4
Promethee On

Your example uses the function Table.Join but as far as i know, it can't work with some columns having the same name in both joined tables so I guess your sample code has been reworked.

Here is a proposal with Table.NestedJoin (that you can eventually replace by Table.Join after a few changes in your column names; I guess Table.Join might offer better performances) where I directly perform an outer join in order to get all expected lines. Then I simply update the original columns (from "Lead data") with "Enrollment data" columns content whenever not null.
Type is lost through such an operation (there might be some workaround?) so you have to re-type impacted columns.

let
    Source = LeadData,
    #"Merged Queries" = Table.NestedJoin(Source, {"Lead Number"}, #"Enrollment Data", {"Lead Number"}, "Enrollment Data", JoinKind.FullOuter),
    #"Expanded Enrollment Data" = Table.ExpandTableColumn(#"Merged Queries", "Enrollment Data", {"Created On", "Lead Number", "Course", "Program", "Business"}, {"Enrollment Data.Created On", "Enrollment Data.Lead Number", "Enrollment Data.Course", "Enrollment Data.Program", "Enrollment Data.Business"}),
    #"Replaced Lead Number Value" = Table.ReplaceValue(#"Expanded Enrollment Data",each [Lead Number],each if [Enrollment Data.Lead Number] <> null then [Enrollment Data.Lead Number] else [Lead Number],Replacer.ReplaceValue,{"Lead Number"}),
    #"Replaced Course Value" = Table.ReplaceValue(#"Replaced Lead Number Value",each [Course],each if [Enrollment Data.Course] <> null then [Enrollment Data.Course] else [Course],Replacer.ReplaceValue,{"Course"}),
    #"Replaced Program Value" = Table.ReplaceValue(#"Replaced Course Value",each [Program],each if [Enrollment Data.Program] <> null then [Enrollment Data.Program] else [Program],Replacer.ReplaceValue,{"Program"}),
    #"Replaced Business Value" = Table.ReplaceValue(#"Replaced Program Value",each [Business],each if [Enrollment Data.Business] <> null then [Enrollment Data.Business] else [Business],Replacer.ReplaceValue,{"Business"}),
    #"Replaced Created On Value" = Table.ReplaceValue(#"Replaced Business Value",each [Created On],each if [Enrollment Data.Created On] <> null then [Enrollment Data.Created On] else [Created On],Replacer.ReplaceValue,{"Created On"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Created On Value",{"Enrollment Data.Lead Number", "Enrollment Data.Course", "Enrollment Data.Program", "Enrollment Data.Business", "Enrollment Data.Created On"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Lead Number", type text}, {"Course", type text}, {"Program", type text}, {"Business", type text}, {"Created On", type date}})<br>
in
    #"Changed Type"
7
Ron Rosenfeld On

I don't know where your error is coming from. With your data, I could not get the code you posted to produce anything meaningful. So this may not help your problem, but it will work if you can read your two tables into PQ.

This is a method of merging your two tables. In this case, the assumption is that you have read the two tables into individual queries, and are then merging those two queries.

let

//Join the two tables
    Source = Table.NestedJoin(Lead_Data, {"Lead Number"}, Enrolment_Data, {"Lead Number"}, "Enrolment_Data", JoinKind.FullOuter),

//Replace Lead number if Enrolment_Data has a new entry
    #"Update Lead Number" = Table.ReplaceValue(
        Source,
        each [Lead Number],
        each [Enrolment_Data][Lead Number]{0},
        (x,y,z)=>y??z,
        {"Lead Number"}),

//Update other columns with entries from Enrolment_data if any are present
// Note the custom replacer function will maintain the Text type of the columns
    #"Update Course/Program/Business" = List.Accumulate(
        {"Course","Program","Business"},
        #"Update Lead Number",
        (s,c)=>Table.ReplaceValue(
                s,
                each Record.Field(_,c),
                each Table.Column([Enrolment_Data],c){0},
        (x,y,z) as text => if z <> null then z else y,
        {c})
    ),

//Remove Enrolment_Data table column
    #"Removed Columns" = Table.RemoveColumns(#"Update Course/Program/Business",{"Enrolment_Data"}),

//Sort by lead number if needed
    #"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Lead Number", Order.Ascending}})
in
    #"Sorted Rows"

Lead Data
enter image description here

Enrolment Data
enter image description here

Results
enter image description here

Alternate Method

let

//Join the two tables two ways
    #"Right Outer Join" = Table.NestedJoin(Lead_Data, {"Lead Number"}, Enrolment_Data, {"Lead Number"}, "Enrols", JoinKind.RightOuter),
    #"Left-Anti Join" = Table.NestedJoin(Lead_Data,"Lead Number", Enrolment_Data,"Lead Number", "Leads", JoinKind.LeftAnti),
    #"Unmodified Leads" = Table.RemoveColumns(#"Left-Anti Join","Leads"),

//Remove columns in Right Outer that are present in the table, then expand
    #"Enrol colNames" = Table.ColumnNames(#"Right Outer Join"[Enrols]{0}),
    #"Remove Columns" = Table.RemoveColumns(#"Right Outer Join",#"Enrol colNames"),
    #"Modified Leads" = Table.ExpandTableColumn(#"Remove Columns","Enrols",#"Enrol colNames"),

//Append the tables
    #"All Leads" = Table.Combine({#"Unmodified Leads",#"Modified Leads"}),

//Sort by Lead Number
    Sort = Table.Sort(#"All Leads", {"Lead Number", Order.Ascending})

in
    Sort