Tuesday, 25 October 2022

SQL Server: How to replicate pandas merge?

How can I replicate Pandas merge in SQL server?

I want to do this:

# merge and filter out rows that are in "both" dataframes

df1 = pd.DataFrame([
            ['A', 1, 'c', 'a'],
            ['A', 2, 'c', 'a'],
            ['B', 2, 'c', 'a'],
            ['B', 3, 'c', 'a'],
            ['C', 3, 'c', 'a'],
            ['C', 4, 'c', 'a'],
            ['D', 3, 'c', 'a']
            ],
        columns = ['ID', 'Period', 'Pivot', 'Group'])

df2 = pd.DataFrame([
            ['A', 1, 'c', 'a'],
            ['A', 2, 'c', 'a'],
            ['B', 2, 'c', 'a'],
            ['B', 3, 'c', 'a'],
            ['C', 3, 'c', 'a'],
            ['C', 4, 'd', 'a'],
            ['D', 3, 'd', 'a']
            ],
        columns = ['ID', 'Period', 'Pivot', 'Group'])


out = df1.merge(df2, how='outer', left_on=['ID', 'Period', 'Pivot', 'Group'], right_on=['ID', 'Period', 'Pivot', 'Group'], indicator=True).query('_merge != "both"')

What I have tried to do is implement a variant of this:

https://stackoverflow.com/a/511022/6534818

SELECT a.SelfJoinTableID
FROM   dbo.SelfJoinTable a
       INNER JOIN dbo.SelfJoinTable b
         ON a.SelfJoinTableID = b.SelfJoinTableID
       INNER JOIN dbo.SelfJoinTable c
         ON a.SelfJoinTableID = c.SelfJoinTableID
WHERE  a.Status = 'Status to filter a'
       AND b.Status = 'Status to filter b'
       AND c.Status = 'Status to filter c' 

But it does return what I get in Pandas.



from SQL Server: How to replicate pandas merge?

No comments:

Post a Comment