题目3(二叉排序树)[问题描述]

利用二叉查找树(又称为二叉排序树、二叉搜索树)实现对输入的英文单词进行搜索,同时可给出单词出现的次数。(难易程度:高)

[实验目的]

1、掌握二叉链表的存储结构。
2、掌握搜索和过滤的方法。
3、掌握二叉排序树的插入和删除操作。

[实验内容及要求]

1、构造二叉查找树

(1)从文本文件中读入文本内容,能够分离出单词,过滤掉阿拉伯数字和标点符号,并将英文字母的大写形式全部转换成小写形式。

(2)按照英文字母表的顺序构造英文单词的二叉查找树。当两个英文单词的首字母相同时,按第二个字母进行排序,依次类推。

(3)当待插入的单词已在二叉查找树中,则将该单词的出现次数增1。

2、遍历二叉查找树

(1)搜索:输入一个待检索单词,在二叉查找树中进行查找,如果能找到该单词,则输出该单词及其出现次数;

(2)实现二叉查找树的中序遍历,并将遍历结果输出到屏幕上,包括单词和单词出现的位置。

3、删除结点:

给定一个停用词列表(停用词是指对搜索没有作用的词,如: of , and , a , an , the 等等),将二叉查找树中的属于停用词表中的单词依次删除。

4、可以显示菜单,在菜单中可以进行如下四项操作(但并不局限这些操作):

(1)读入文本内容,包含若干英文单词、标点符号以及阿拉伯数字,用于构建二叉查找树。

(2)输入停用词,每个停用词占一行。对于每个停用词,都需要删除二叉查找树中的相应结点,即:每输入一个停用词,执行一次删除结点的操作。

(3)中序遍历二叉查找树,遍历结果中的每个单词占一行,先输出该单词,然后输出一个空格,再输出该单词出现的次数。

(4)输入查询词。对每个查询词,都需要在二叉查找树中的搜索相应结点,如果找到,则输出该单词及其出现次数;如果未找到,则输出相应的信息。每个查询词的查询结果占一行,先输出该单词,然后输出一个空格,再输出该单词出现的次数。

[测试数据]

1、输入的文本含有大小写字母、阿拉伯数字、标点符号及其它字符。
2、单词的数量应当足够多,并有一定量的相同单词。

#include<iostream>
#include<algorithm>
#include<vector>
#include<cstdlib>
using namespace std;
 
typedef struct BST {
   
	int cnt;
	string data;
	BST* lchild, * rchild;
}BSTNode,*BSTree;
 
bool BST_Insert(BSTree& T, string data){
   
	if (T == NULL)	{
   
		T = new BSTNode;
		T->data = data;
		T->cnt = 1;
		T->lchild = T->rchild = NULL;
		return true;
	}
	else if (data < T->data) {
   
		return BST_Insert(T->lchild, data);
	}
	else if (data > T->data) {
   
		return BST_Insert(T->rchild, data);
	}
	else {
   // 已存在值不插入
		T->cnt++;
		return false;
	}
}

bool BST_Dele(BSTree& T, string data){
   
	if (T == NULL) return false;
	if (T->data == data){
   
		BSTree p, q;  // 前驱 后继
		//如果左右儿子都有比较难删除
		if (T->lchild && T->rchild){
   
			p = T;
			q = p->lchild;
			while (q->rchild) {
    // 寻找左子树最大 
				p = q;
				q = q->rchild;
			}
			T->data = q->data;
			T->cnt = 0;
			if (p != T) {
   
				p->rchild = q->lchild;
			}
			else {
   
				p->lchild = q->lchild;
			}
			delete(q);
		}
		else if (T->lchild) {
     //只有左儿子
			p = T;
			T = T->lchild;
			delete(p);
		}
		else {
    //只有右儿子或左右儿子都没有(叶子)
			p = T;
			T = T->rchild;
			delete(p);
		}
		return true;
	}
	else if (data < T->data) {
   
		return BST_Dele(T->lchild, data);
	}
	else if (data > T->data) {
   
		return BST_Dele(T->rchild, data);
	}
}
 
void _Order(BSTree& T){
   
	if (T != NULL) {
   
		_Order(T->lchild);
		if(T->cnt!=0||T->data!="")
			cout << T->data << " " << T->cnt << endl;
		_Order(T->rchild);
	}
}
void Prior_Order1(BSTree& T,string s){
   
	if (T != NULL) {
   
		if(T->data==s)
			cout << T->data << " " << T->cnt << endl;
		Prior_Order1(T->lchild,s);
		Prior_Order1(T->rchild,s);
	}
}
int Get_word(string &temp_s,FILE *in){
   
//bug可能是这里产生的,但我不知道错在哪里了
	int i=0;
	char s;
	s=fgetc(in);
	while(!isalpha(s)){
   
		s=fgetc(in);
		if(s==EOF)
			return 1;
	}
	while(isalpha(s)&&!isspace(s)){
   
		s=tolower(s);
		temp_s.push_back(s);
		s=fgetc(in);
	}
	return 0;
}

int main()
{
   
	//-----------------简单测试函数-----------------
	// BSTree T = NULL;
	// string s, ss;
	// for (int i = 0; i < 10; i++)
	// {
   
	// cin >> s;
	// BST_Insert(T, s);
	// }
	// for (int i = 0; i < 1;i++)
	// {
   
	// cin >> ss;
	// BST_Dele(T, ss);
	// }
	// Prior_Order(T);
	// return 0;
	//----------------end----------------------------
	BSTree T = NULL;
	string ss, sss;
	int number;
	FILE *in;
	int i,j,key;
	if((in=fopen("essay.txt", "r"))==NULL)
	printf("No such file!\n");
	while (!feof(in)){
   
	//文件读入后创建二叉搜索树有一个小bug
	//运行结果有一个诡异的结点
		string temp_s;
		Get_word(temp_s,in);
		BST_Insert(T, temp_s);		
	}
	fclose(in);
	_Order(T);
	printf("请输入你想停用的单词的个数:\n");
	int t;
	cin >> t;
	while(t--){
   
		cout << "请输入停用单词:";
		cin >> sss;
		BST_Dele(T, sss);
	}
	_Order(T);
	printf("你想查询单词的数量是:");
	scanf("%d",&number);
	for (int i=0;i<number;i++){
   
		cout << "要查询的单词:";
		cin >> ss;
		Prior_Order1(T, ss);
	}
	return 0;
}

测试样例:

Tale of Two Cities (1859) is a novel by Charles Dickens, set in London and Paris before and during the French Revolution. With well over 200 million copies sold, it ranks among the most famous works in the history of fictional literature.The novel depicts the plight of the French peasantry demoralised by the French aristocracy in the years leading up to the revolution, the corresponding brutality demonstrated by the revolutionaries toward the former aristocrats in the early years of the revolution, and many unflattering social parallels with life in London during the same time period.

运行结果:

 1					//bug所在地,这里的输出莫名其妙,我感觉是读入了空字符
a 1
among 1      
and 3        
aristocracy 1
aristocrats 1
before 1     
brutality 1  
by 3
charles 1    
cities 1
copies 1
corresponding 1
demonstrated 1
demoralised 1
depicts 1
dickens 1
during 2
early 1
famous 1
fictional 1
former 1
french 3
history 1
in 5
is 1
it 1
leading 1
life 1
literature 1
london 2
many 1
million 1
most 1
novel 2
of 4
over 1
parallels 1
paris 1
peasantry 1
period 1
plight 1
ranks 1
revolution 3
revolutionaries 1
same 1
set 1
social 1
sold 1
tale 1
the 15
time 1
to 1
toward 1
two 1
unflattering 1
up 1
well 1
with 2
works 1
years 2

请输入你想停用的单词的个数:
2
请输入停用单词:of
请输入停用单词:to

 1					//bug所在地,如果停用a,那么这行会和第二行一起删掉
a 1
among 1
and 3
aristocracy 1
aristocrats 1
before 1
brutality 1
by 3
charles 1
cities 1
copies 1
corresponding 1
demonstrated 1
demoralised 1
depicts 1
dickens 1
during 2
early 1
famous 1
fictional 1
former 1
french 3
history 1
in 5
is 1
it 1
leading 1
life 1
literature 1
london 2
many 1
million 1
most 1
novel 0
over 1
parallels 1
paris 1
peasantry 1
period 1
plight 1
ranks 1
revolution 3
revolutionaries 1
same 1
set 1
social 1
sold 1
tale 1
the 15
time 0
toward 1
two 1
unflattering 1
up 1
well 1
with 2
works 1
years 2
你想查询单词的数量是:3
要查询的单词:of
要查询的单词:to
要查询的单词:a
a 1
 1							//此时还有问题
a 1
among 1
and 3
aristocracy 1
aristocrats 1
before 1
brutality 1
by 3
charles 1
cities 1
copies 1
corresponding 1
demonstrated 1
demoralised 1
depicts 1
dickens 1
during 2
early 1
famous 1
fictional 1
former 1
french 3
history 1
in 5
is 1
it 1
leading 1
life 1
literature 1
london 2
many 1
million 1
most 1
novel 2
of 4
over 1
parallels 1
paris 1
peasantry 1
period 1
plight 1
ranks 1
revolution 3
revolutionaries 1
same 1
set 1
social 1
sold 1
tale 1
the 15
time 1
to 1
toward 1
two 1
unflattering 1
up 1
well 1
with 2
works 1
years 2

请输入你想停用的单词的个数:
2
请输入停用单词:a
请输入停用单词:of
						//问题行很神奇的没有了
among 1
and 3
aristocracy 1
aristocrats 1
before 1
brutality 1
by 3
charles 1
cities 1
copies 1
corresponding 1
demonstrated 1
demoralised 1
depicts 1
dickens 1
during 2
early 1
famous 1
fictional 1
former 1
french 3
history 1
in 5
is 1
it 1
leading 1
life 1
literature 1
london 2
many 1
million 1
most 1
novel 0
over 1
parallels 1
paris 1
peasantry 1
period 1
plight 1
ranks 1
revolution 3
revolutionaries 1
same 1
set 1
social 1
sold 1
tale 1
the 15
time 1
to 1
toward 1
two 1
unflattering 1
up 1
well 1
with 2
works 1
years 2
你想查询单词的数量是:3
要查询的单词:a
要查询的单词:to
to 1
要查询的单词:of

更多数据结构代码实现请关注我的专栏数据结构

或者进入2021-10-16【严蔚敏数据结构代码实现合集】【c语言学习必备】学习