"/>

如何从Perl中的数组中删除重复项?

如何从Perl中的数组中删除重复项?

How do I remove duplicate items from an array in Perl?

我在Perl中有一个数组:

1
my @my_array = ("one","two","three","two","three");

如何从阵列中删除重复项?


您可以执行以下操作,如perlfaq4所示:

1
2
3
4
5
6
7
8
9
10
sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

my @array = qw(one two three two three);
my @filtered = uniq(@array);

print"@filtered
"
;

输出:

1
one two three

如果要使用模块,请尝试使用List::MoreUtils中的uniq功能


Perl文档附带了很多常见问题解答。您的问题经常被问到:

1
% perldoc -q duplicate

从上面命令的输出中回答,复制和粘贴如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
 How can I remove duplicate elements from a list or array?
   (contributed by brian d foy)

   Use a hash. When you think the words"unique" or"duplicated", think
  "hash keys".

   If you don't care about the order of the elements, you could just
   create the hash then extract the keys. It'
s not important how you
   create that hash: just that you use"keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

   If you want to use a module, try the"uniq" function from
  "List::MoreUtils". In list context it returns the unique elements,
   preserving their order in the list. In scalar context, it returns the
   number of unique elements.

       use List::MoreUtils qw(uniq);

       my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
       my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

   You can also go through each element and skip the ones you've seen
   before. Use a hash to keep track. The first time the loop sees an
   element, that element has no key in %Seen. The"next" statement creates
   the key and immediately uses its value, which is"undef", so the loop
   continues to the"push" and increments the value for that key. The next
   time the loop sees that same element, its key exists in the hash and
   the value for that key is true (since it'
s not 0 or"undef"), so the
   next skips that iteration and the loop goes to the next element.

       my @unique = ();
       my %seen   = ();

       foreach my $elem ( @array )
       {
         next if $seen{ $elem }++;
         push @unique, $elem;
       }

   You can write this more briefly using a grep, which does the same
   thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;


安装列表::来自CPAN的MoreUtils

然后在您的代码中:

1
2
3
4
5
6
7
use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @dup_list = qw(1 1 1 2 3 4 4);

my @uniq_list = uniq(@dup_list);

我通常的做法是:

1
2
3
4
5
6
my %unique = ();
foreach my $item (@myarray)
{
    $unique{$item} ++;
}
my @myuniquearray = keys %unique;

如果您使用哈希并将项目添加到哈希中。您还可以知道每个项目出现在列表中的次数。


变量@array是具有重复元素的列表

1
2
%seen=();
@unique = grep { ! $seen{$_} ++ } @array;

可以使用简单的Perl one衬板完成。

1
2
3
my @in=qw(1 3 4  6 2 4  3 2 6  3 2 3 4 4 3 2 5 5 32 3); #Sample data
my @out=keys %{{ map{$_=>1}@in}}; # Perform PFM
print join ' ', sort{$a<=>$b} @out;# Print data back out sorted and in order.

PFM模块执行以下操作:

@in中的数据被馈送到MAP中。 MAP建立一个匿名哈希。从哈希中提取密钥并将其输入@out


方法1:使用哈希

逻辑:哈希只能具有唯一键,因此要遍历数组,为数组的每个元素分配任何值,并保留元素作为该哈希的键。返回哈希的键,它是您的唯一数组。

1
my @unique = keys {map {$_ => 1} @array};

方法2:扩展方法1以实现可重用性

如果我们应该在代码中多次使用此功能,则最好创建一个子例程。

1
2
3
4
5
sub get_unique {
    my %seen;
    grep !$seen{$_}++, @_;
}
my @unique = get_unique(@array);

方法3:使用模块List::MoreUtils

1
2
use List::MoreUtils qw(uniq);
my @unique = uniq(@array);

最后一个很好。我会稍微调整一下:

1
2
3
4
5
6
7
8
my @arr;
my @uniqarr;

foreach my $var ( @arr ){
  if ( ! grep( /$var/, @uniqarr ) ){
     push( @uniqarr, $var );
  }
}

我认为这可能是最易读的方法。


先前的答案几乎总结了完成此任务的可能方法。

但是,对于那些不在乎计算重复项却在乎顺序的人,我建议进行修改。

1
2
3
my @record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, @record;

请注意,先前建议的grep !$seen{$_}++ ...在取反之前会递增$seen{$_},因此无论是否已经成为%seen,都会发生递增。但是,当$record{$_}为true时,以上内容会短路,一旦"离开%record",就会听到声音。

您还可以选择这种可笑性,它利用了自动生存性和哈希键的存在:

1
2
...
grep !(exists $record{$_} || undef $record{$_}), @record;

但是,这可能会导致一些混乱。

而且,如果您既不关心顺序又不关心重复计数,则可以使用哈希切片和我刚才提到的技巧进行另一次黑客入侵:

1
2
3
...
undef @record{@record};
keys %record; # your record, now probably scrambled but at least deduped

使用唯一哈希键的概念:

1
2
3
4
5
my @array  = ("a","b","c","b","a","d","c","a","d");
my %hash   = map { $_ => 1 } @array;
my @unique = keys %hash;
print"@unique","
"
;

输出:
a c b d


试试看,似乎uniq函数需要一个排序列表才能正常工作。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
use strict;

# Helper function to remove duplicates in a list.
sub uniq {
  my %seen;
  grep !$seen{$_}++, @_;
}

my @teststrings = ("one","two","three","one");

my @filtered = uniq @teststrings;
print"uniq: @filtered
"
;
my @sorted = sort @teststrings;
print"sort: @sorted
"
;
my @sortedfiltered = uniq sort @teststrings;
print"uniq sort : @sortedfiltered
"
;


推荐阅读