并查集--Junk-Mail Filter

Junk-Mail Filter

Recognizing junk mails is a tough task. The method used here consists of two steps:
1) Extract the common characteristics from the incoming email.
2) Use a filter matching the set of common characteristics extracted to determine whether the email is a spam.

We want to extract the set of common characteristics from the N sample junk emails available at the moment, and thus having a handy data-analyzing tool would be helpful. The tool should support the following kinds of operations:

a) “M X Y”, meaning that we think that the characteristics of spam X and Y are the same. Note that the relationship defined here is transitive, so
relationships (other than the one between X and Y) need to be created if they are not present at the moment.

b) “S X”, meaning that we think spam X had been misidentified. Your tool should remove all relationships that spam X has when this command is received; after that, spam X will become an isolated node in the relationship graph.

Initially no relationships exist between any pair of the junk emails, so the number of distinct characteristics at that time is N.
Please help us keep track of any necessary information to solve our problem.

Input

There are multiple test cases in the input file.
Each test case starts with two integers, N and M (1 ≤ N ≤ 10 5 , 1 ≤ M ≤ 10 6), the number of email samples and the number of operations. M lines follow, each line is one of the two formats described above.
Two successive test cases are separated by a blank line. A case with N = 0 and M = 0 indicates the end of the input file, and should not be processed by your program.

Output

For each test case, please print a single integer, the number of distinct common characteristics, to the console. Follow the format as indicated in the sample below.

Sample Input

Sample Output

Case #1: 3
Case #2: 2

测试样例分析：

解题思路：（此思路详解为转载）

删除某个节点时，不是将与此节点所有关系都删除，只是将此节点隐藏，
别的结点的父节点还可以是此节点，用replace【】表示修改后的结点，例如，
原来结点是1、2、3，删除2结点，最后的1、2、3结点经replace【1、2、3】后
存的虚结点为1、4、3；即删除的结点用replace【i】=n++;然后看最终合并后的
每个结点的父节点，如果父节点相同，则在同一集合内，可以用flag【】来看有多少
不同的父节点，即多少不同的集合

#include<stdio.h>
#include<string.h>
int set[1000005];
int replace[1000005];

int flag[1000005];
int find(int x){//查找根 
    int r=x;
    while(r!=set[r])
    	r=set[r];
    int i=x;
    while(i!=r){
        int j=set[i];
        set[i]=r;
        i=j;
    }
    return r;
}

void merge(int x,int y){
    int fx=find(x);
    int fy=find(y);
    if(fx!=fy)//若两个的根不是同一个，就合并 
        set[fx]=fy;//将fx并入fy的根下 
}
int main(){
    int n,m;
    int num_Case=0;
    while(scanf("%d%d",&n,&m)!=EOF){//n个点，m个操作关系 
        if(n==0&&m==0)	break;
        
        getchar();//吸收回车 
        num_Case++;
        
		//初始化 
        int num_VirtNode=n;//虚结点初始值
        for(int i=0;i<n;i++){
            set[i]=i;
            replace[i]=i;
        }
        
		//输入 
        char operate;
        int a,b,c;
        for(int i=0;i<m;i++){
            scanf("%c",&operate);
            if(operate=='M'){
                scanf("%d%d",&a,&b);
                getchar();
                merge(replace[a],replace[b]);
            }
            else{
                scanf("%d",&c);
                replace[c]=num_VirtNode;
                set[num_VirtNode]=num_VirtNode;//新虚结点给定初始父节点，类似于set[i]=i;
                num_VirtNode++;
                getchar();
            }
        }
        
        //输出 
        int count=0;//记录有多少个根 
        memset(flag,0,sizeof(flag));//用来标记根是否已经参与计数 
        for(int i = 0;i<n;i++){
        
            if(flag[find(replace[i])]==0){
				flag[find(replace[i])]=1;
           		count++;
            }
        }
        printf("Case #%d: %d\n",num_Case,count);
    }
    return 0;
}

关于replace和set数组如何实现删除节点的，我做了张图：（灵魂画师，别太在意/哭笑）

奈何我看了另外一种比较好理解的思路，可惜死活超时(T^T),不过也可以看看

思路：（此思路详解为转载）

并查集，M代表合并，S代表删除，下面讲一下删除操作

大家都知道合并操作就是找到找到两个节点的父亲，修改父亲，如果删除就是将该点的父亲重新设置成自己，这样行不行呢？

这是不行的，比如1，2，3的父亲都是1，现在删除1，1的父亲还是1，2,3也是1，集合还是1个，正确的应该是2个。

那删除节点的父亲不设成自己给新申请一个节点当做父亲，比如1,2,3的父亲都是1，在一个集合，现在删除1，申请了4当做1的父亲，2,3父亲都是1，然后Find(2)找2的父亲

2的父亲是1，但是1的父亲是4，所以给2的父亲更新成了4,3同理，所以还不行。

正确的方法是每一个点都设立一个虚拟父亲比如1,2,3的父亲分别是4,5,6，现在合并1,2,3都在一个集合，那他们的父亲都是4，现在删除1，那就给1重新申请一个节点7

现在2,3的父亲是4,1的父亲是7，删除成功。

#include<stdio.h>
#include <string.h>
#define N 1000047

int father[N],flag[N];
int n,m,ID;
int find(int x){
	int r=x;
    while(r!=father[r])
    	r=father[r];
    int i=x;
    while(i!=r){
        int j=father[i];
        father[i]=r;
        i=j;
    }
    return r;
}

void union_set(int x, int y){
	int fx = find(x);//找到x的根 
	int fy = find(y);//找到y的根 
	if(fx != fy)//若不是同一个根就合并 
		father[fy] = fx;//将fy并入进fx中 
}

void Delet(int x){
	father[x] = ID++;//重新申请一个父节点 
}

int main(){
	int num_Case = 0;
	while(scanf("%d%d",&n,&m)!=EOF){//n个点，m个操作 
		if(n==0&&m==0)	break;
		
		//初始化
		ID = n+n;
		for(int i = 0;i<n;i++){
			father[i] = i+n;//虚拟父节点 
		}
		for(int i = n; i<=n+n+m;i++){
		//n+n+m: 最多可能删除m个节点
			father[i] = i;
		}
		
		//输入 
		char operate[3];//操作信号 
		int a,b,c; //输入变量 
		for(int i = 0;i<m;i++){
			scanf("%s",operate);
			if(operate[0] == 'M'){
				scanf("%d%d",&a,&b);
				union_set(a,b);
			}
			else if(operate[0] == 'S'){
				scanf("%d",&c);
				Delet(c);
			}
		}
		
		//输出 
		int count = 0;
		memset(flag,0,sizeof(flag));
		for(int i = 0; i < n; i++){
			int x = find(i);
			if(flag[x] == 0){//flag数组用于记录根节点是否已经计算过了 
				count++;
				flag[x] = 1;
			}
		}
		printf("Case #%d: %d\n",++num_Case,count);
	}
	return 0;
}