How can I replicate Pandas merge in SQL server?
I want to do this:
# merge and filter out rows that are in "both" dataframes
df1 = pd.DataFrame([
['A', 1, 'c', 'a'],
['A', 2, 'c', 'a'],
['B', 2, 'c', 'a'],
['B', 3, 'c', 'a'],
['C', 3, 'c', 'a'],
['C', 4, 'c', 'a'],
['D', 3, 'c', 'a']
],
columns = ['ID', 'Period', 'Pivot', 'Group'])
df2 = pd.DataFrame([
['A', 1, 'c', 'a'],
['A', 2, 'c', 'a'],
['B', 2, 'c', 'a'],
['B', 3, 'c', 'a'],
['C', 3, 'c', 'a'],
['C', 4, 'd', 'a'],
['D', 3, 'd', 'a']
],
columns = ['ID', 'Period', 'Pivot', 'Group'])
out = df1.merge(df2, how='outer', left_on=['ID', 'Period', 'Pivot', 'Group'], right_on=['ID', 'Period', 'Pivot', 'Group'], indicator=True).query('_merge != "both"')
What I have tried to do is implement a variant of this:
https://stackoverflow.com/a/511022/6534818
SELECT a.SelfJoinTableID
FROM dbo.SelfJoinTable a
INNER JOIN dbo.SelfJoinTable b
ON a.SelfJoinTableID = b.SelfJoinTableID
INNER JOIN dbo.SelfJoinTable c
ON a.SelfJoinTableID = c.SelfJoinTableID
WHERE a.Status = 'Status to filter a'
AND b.Status = 'Status to filter b'
AND c.Status = 'Status to filter c'
But it does return what I get in Pandas.
from SQL Server: How to replicate pandas merge?
No comments:
Post a Comment