详解利用Pandas求解两个DataFrame的差集,交集,并集

模拟数据

差集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

交集

方法1：merge

方法2：concat + duplicated + loc

方法3：concat + groupby + query

并集

方法1：concat + drop_duplicates

方法2：append + drop_duplicates

方法3：merge

大家好，我是Peter~

本文讲解的是如何利用Pandas函数求解两个DataFrame的差集、交集、并集。

模拟数据

模拟一份简单的数据：

In [1]:

import pandas as pd

In [2]:

df1 = pd.DataFrame({"col1":[1,2,3,4,5],
                    "col2":[6,7,8,9,10]
                   })
df2 = pd.DataFrame({"col1":[1,3,7],
                    "col2":[6,8,10]
                   })

In [3]:

df1

Out[3]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10

In [4]:

df2

Out[4]:

col1col2

0	1	6
1	3	8
2	7	10

两个DataFrame的相同部分：

差集 方法1：concat + drop_duplicates

In [5]:

df3 = pd.concat([df1,df2])
df3

Out[5]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [6]:

# 结果1
df3.drop_duplicates(["col1","col2"],keep=False)

Out[6]:

col1col2

1	2	7
3	4	9
4	5	10
2	7	10

方法2：append + drop_duplicates

In [7]:

df4 = df1.append(df2)
df4

Out[7]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [8]:

# 结果2
df4.drop_duplicates(["col1","col2"],keep=False)

Out[8]:

col1col2

1	2	7
3	4	9
4	5	10
2	7	10

交集 方法1：merge

In [9]:

# 结果
# 等效：df5 = pd.merge(df1, df2, how="inner")
df5 = pd.merge(df1,df2)
df5

Out[9]:

col1col2

0	1	6
1	3	8

方法2：concat + duplicated + loc

In [10]:

df6 = pd.concat([df1,df2])
df6

Out[10]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [11]:

s = df6.duplicated(subset=['col1','col2'], keep='first')
s

Out[11]:

0    False
1    False
2    False
3    False
4    False
0     True
1     True
2    False
dtype: bool

In [12]:

# 结果
df8 = df6.loc[s == True]
df8

Out[12]:

col1col2

0	1	6
1	3	8

方法3：concat + groupby + query

In [13]:

# df6 = pd.concat([df1,df2])
df6

Out[13]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [14]:

df9 = df6.groupby(["col1", "col2"]).size().reset_index()
df9.columns = ["col1", "col2", "count"]
df9

Out[14]:

col1col2count

0	1	6	2
1	2	7	1
2	3	8	2
3	4	9	1
4	5	10	1
5	7	10	1

In [15]:

df10 = df9.query("count > 1")[["col1", "col2"]]
df10

Out[15]:

col1col2

0	1	6
2	3	8

并集 方法1：concat + drop_duplicates

In [16]:

df11 = pd.concat([df1,df2])
df11

Out[16]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
0	1	6
1	3	8
2	7	10

In [17]:

# 结果
# df12 = df11.drop_duplicates(subset=["col1","col2"],keep="last")
df12 = df11.drop_duplicates(subset=["col1","col2"],keep="first")
df12

Out[17]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
2	7	10

方法2：append + drop_duplicates

In [18]:

df13 = df1.append(df2)
# df13.drop_duplicates(subset=["col1","col2"],keep="last")
df13.drop_duplicates(subset=["col1","col2"],keep="first")

Out[18]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
2	7	10

方法3：merge

In [19]:

pd.merge(df1,df2,how="outer")

Out[19]:

col1col2

0	1	6
1	2	7
2	3	8
3	4	9
4	5	10
5	7	10

以上就是详解利用Pandas求解两个DataFrame的差集,交集,并集的详细内容，更多关于Pandas DataFrame差集交集并集的资料请关注易知道(ezd.cc)其它相关文章！

详解利用Pandas求解两个DataFrame的差集,交集,并集

推荐阅读

win10多任务按键怎么设置在底部详细方法

硬盘库存迫切需要通过西方数据，三星已经停止

无法读取U盘中的数据

设置里程碑|设置里程碑的方法有哪些

wps数据拟合图形公式|你好,请问在WPS中拟合

计算机不能打开网页发送更多的数据包，但很少

wps删除重复数据|WPS表格中,删除重复项,只

如何使用selenium+TestNG做web数据驱动测试

Outlook的PST文件损坏的修复方法

手提电脑手写怎么设置|电脑手写设置方法

wps表格查找重复的数据|在wps表格中查找重

打印机常见故障排除方法HP5000打印机为例

手机版wps里如何给字加拼音|wps在文字上加

小编辑器教你减少计算机网络流量的最简单方

Android手机模拟器的安装方法_模拟器安装教