Elasticsearch 从零开始

https://db-engines.com/en/ranking/search+engine

Elasticsearch 是一个基于Apache Lucene(TM)的开源搜索引擎，无论在开源还是专有领域，Lucene可以被认为是迄今为止最先进、性能最好的、功能最全的搜索引擎库
同时，Elastic 公司也拥有 Logstash 及 Kibana 开源项目。这个三个项目组合在一起，就形成了 ELK 软件栈。
他们三个共同形成了一个强大的生态圈。
简单地说，Logstash 负责数据的采集，处理（丰富数据，数据转换等），Kibana 负责数据展示，分析，管理，监督及应用

入门-预热篇

前提要求: Java最新版 + 安装包

在 Unix/Linux上运行 bin/elasticsearch，或在 Windows 上运行 bin\elasticsearch.bat
运行 curl -X GET http://localhost:9200。
你在 Windows 上可以安装 cygwin 来运行 curl 指令使用 cURL 命令和 Elasticsearch 对话

curl –X ‘://:/

：适当的 HTTP 方法或动词。例如，GET，POST，PUT，HEAD 或 DELETE ：http 或 https。如果你在 Elasticsearch 前面有一个 HTTPS 代理，或者你使用 Elasticsearch 安全功能来加密 HTTP 通信，请使用后者：Elasticsearch 集群中任何节点的主机名。或者，将 localhost 用于本地计算机上的节点：运行 Elasticsearch HTTP 服务的端口，默认为9200

：JSON 编码的请求正文（如有必要）

curl –u elastic:password –X ‘://:/

检查 Elastic 是否正确安装好在Terminal上打入如下的命令： curl –XGET ‘http://localhost:9200/’ –H ‘Content-Type: application/json’ 建立索引index hl 索引名, _doc 是document endpoint, 1 是文档id curl –XPUT http://localhost:9200/hl/_doc/1?pretty’ –H ‘Content-Type: application/json’ { “user”: “kimchy”, “post_date”: “2009-11-15T13:12:00”, “message”: “Trying out Elasticsearch, so far so good?” } 通过 GET 来查询查看操作内容是否加入到索引当中 curl –XGET ‘http://localhost:9200/hl/_doc/1?pretty=true’ 搜索找到kimchy发布的所有推文: curl –XGET ‘http://localhost:9200/hl/_search?q=user:kimchy&pretty=true’ 1.查看查询的内容 { "query":{ "match_all": {} } } 2.查看一个范围内的内容 { "query":{ "range": { "match_all": {} } } } 多租户-索引和类型 Elasticsearch允许多个索引允许完全控制索引级别,如: { "settings":{ "index.number_of_shards": 2, "index.number_of_replicas": 1 } } 哪些时候 Elasticsearch 可能不是正确的工具？处理关系数据集 / 执行ACID事务 / Elasticsearch 中的一些重要概念 cluster, node, index, document, shards 及 replica cluster 集群,可以在 config/elasticsearch.yml 里定制集群的名字 GET _cluster/state 来获取整个 cluster 的状态。这个状态只能被 master node 所改变

node 节点 ==> 在大多数环境中，每个节点都在单独的盒子或虚拟机上运行。一个集群由一个或多个 node 组成

根据 node 的作用，可以分为如下的几种： master-eligible：可以作为主 node。一旦成为主 node，它可以管理整个 cluster 的设置及变化：创建，更新，删除 index；添加或删除 node；为 node 分配 shard data：数据 node ingest: 数据接入 machine learning Document Elasticsearch 是面向文档的，索引或搜索的最小数据单元是文档, 文档在 Elasticsearch 中有一些重要的属性: 【独立，分层，结构灵活】 { "name": "Elasticsearch Denver", "organizer": "Lee", "location": "Denver, Colorado, USA", } 当文档被 Elasticsearch 索引时，它存储在 _source 字段中。每个文档中还添加了以下附加系统字段：

存储文档的索引名称由 _index 字段指示。文档的索引范围的唯一标识符存储在 _id 字段中. (类似MongoDB) 索引是文档的集合 Shard 分片 Replica 副本 Primary shard 主分片和 replica shard副本分片入门-准备篇 Java 的版本不可以低于1.7_55。从 Elastic 7.0开始，我们可以不安装 JAVA。安装包包含一个相匹配的 JAVA 版本在里面在 Windows 下安装 Elastic Stack 8.x 【2022】新建一个目录 elastic放在c盘新建一个Terminal 【Windows PowerShell】解压缩包 cd c: ==> cd elastic ==> tar xzf .\kibana-8.1.2-windows-x86_64.zip cd .\ kibana -8.1.2 .\bin\ kibana.bat 【然后就启动kibana了】

cd c: ==> cd elastic ==> tar xzf .\elasticsearch -8.1.2-windows-x86_64.zip cd .\elasticsearch-8.1.2 .\bin\elasticsearch.bat 【然后就启动elasticsearch了】下载(Elasticsearch和Kibana)地址 https://www.elastic.co/cn/downloads/ Elastic Stack 8.x

使用 Docker 安装 Elastic Stack 8.0 并开始使用【2022】 Docker Desktop安装Elastic Stack 使用 RPM 安装包来安装 Elastic Stack 8.x 【2022】给 RedHat,CentOS 使用的

Windows安装 Elasticsearch 7.3 过去版本 https://www.elastic.co/cn/downloads/past-releases

下载并安装 Windows .zip 文件从以下位置下载 Elasticsearch v7.3.1 的 .zip 存档：

https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.3.1-windows-x86_64.zip

或者，你可以下载以下软件包，其中仅包含 Apache 2.0 许可证下提供的功能：

用你最喜欢的解压缩工具解压缩它。这将创建一个名为 elasticsearch-7.3.1 的文件夹，我们将其称为％ES_HOME％。在终端窗口中，cd 到％ES_HOME％目录，例如： cd c:\elastcsearch-7.3,1 在 Windows 下时，请打开具有管理员权限的 Command Prompt 来启动 Elasticsearch： bin\elasticsearch.bat

或从命令行运行 Elasticsearch 7.3 可以从命令行启动 Elasticsearch，如下所示：./bin/elasticsearch 默认情况下，Elasticsearch在前台运行，将其日志打印到 STDOUT，并可以通过按 Ctrl-C 来停止。有两个重要的配置选项 elasticsearch.yml（文件位于安装目录下的 config 子目录）：path.data: /data/elasticsearch jvm.options：-Xms512m，配置 JVM 的内存大小的

如果你想你的 Elasticsearch 绑定你电脑上的所有网络接口，而不是仅仅 localhost，那么你需要修改 config/elasticsearch.yml 文件中的如下设置： network.host: 0.0.0.0 【可以绑定电脑所有IP地址】 discovery.type: single-node 【单节点的集群】可以通过如下的方法在命令行中配置这两个选项 $. / b i n / e l a s t i c s e a r c h - E p a t h . d a t a = / d a t a / e l a s t i c s e a r c h 【或者】$ ES_JAVA_OPTS="-Xms512m" ./bin/elasticsearch 【或者】 $E S_{J} A V A_{O} P T S = " - X m s 512 m - X m x 512 m " . / b i n / e l a s t i c s e a r c h 【用如下方法覆盖默认的 n o d e 名字为 e l a s t i c s e a r c h 】$ ./bin/elasticsearch -E node.name=mynodename

Kibana 不支持运行 Kibana 和 Elasticsearch 的不同主要版本（例如 Kibana 5.x 和 Elasticsearch 2.x），也不支持比 Elasticsearch 版本更新的 Kibana 次要版本（例如 Kibana 5.1 和 Elasticsearch 5.0）

6.0版本开始只支持 64位操作系统

从 Kibana下载页面下载所需要的 Windows zip 文件。
将 zip 文件的内容提取到计算机上的目录中，例如 C:\Program Files
以管理员身份打开 Command Prompt，然后导航到包含解压缩文件的目录，例如：

cd C:\Program Files\kibana-7.3.0-windows 4) 启动 Kibana：bin\kibana.bat 默认情况下，Kibana 在前台运行，将其日志打印到标准输出（stdout），按 Ctrl-C 可以停止。默认输入地址 http://localhost:5601

通过 config 配置 Kibana Kibana 默认从 $KIBANA_HOME/config/kibana.yml 文件加载其配置 kibana.yml i18n.locale: "zh-CN" Kibana 的界面设置为中文的界面

我们也可以在命令行中在不用修改 kibana.yml 文件的前提下运行 Kibana: ./bin/kibana --elasticsearch.hosts="http://localhost:9200" --host=0.0.0.0

./bin/kibana --elasticsearch.hosts="http://localhost:9200" --elasticsearch.usernm=kibana --elasticsearch.password=password 【 Kibana上面的 password 为在配置安全时所设置的密码】

（1）:
开始使用 Elasticsearch （2）：了解如何进行搜索
开始使用 Elasticsearch （3）：了解如何进行分析数据: analyze 及 aggregate 数据
Elasticsearch 存储：

Elasticsearch：inverted index，doc_values 及 source Elasticsearch: 理解 mapping 中的 store 属性 Elasticsearch：从搜索中获取选定的字段入门-上手篇 Elasticsearch 的数据类型： text：全文搜索字符串 keyword：用于精确字符串匹配和聚合 date 及 date_nanos：格式化为日期或数字日期的字符串 byte, short, integer, long：整数类型 boolean：布尔类型 float，double，half_float：浮点数类型分级的类型：object 及 nested 创建 index索引 PUT hl/_doc/1 { "user": "GB", "uid": 1, "city": "GuangNing", "province": "ZhaoQing", "country": "China" } POST : 自动 ID 生成如果我们不指定文档的 ID，转而让 Elasticsearch 自动帮我们生成一个 ID，这样的速度更快。在这种情况下，我们必须使用 POST，而不是 PUT PUT hl/_doc/1 这个有ID POST hl/_doc 这个没有【自动ID生成】如果我们只对 source 的内容感兴趣的话，我们可以使用： GET hl/_doc/1/_source 我们也可以只获取 source 的部分字段： GET hl/_doc/1?_source=city,age,province 【一次请求查找多个文档，我们可以使用 _mget 接口】 GET _mget { "docs": [ { "_index": "twitter", "_id": 1 }, { "_index": "twitter", "_id": 2 } ] } 【同时请求 id 为1和2的两个文档】 GET twitter/_doc/_mget { "ids": ["1", "2"] } 修改一个文档通常我们使用 POST 来创建一个新的文档。在使用 POST 的时候，我们甚至不用去指定特定的 id，系统会帮我们自动生成。但是我们修改一个文档时，我们通常会使用 PUT 来进行操作，并且，我们需要指定一个特定的 id 来进行修改 PUT cat/_doc/1 { "user": "GB", "uid": 1, "city": "北京", "province": "北京", "country": "中国", "location":{ "lat":"29.084661", "lon":"111.335210" } } 使用 GET来查询文档 GET cat/_doc/1
我们使用 PUT 的这个方法，每次修改一个文档时，我们需要把文档的每一项都要写出来。这对于有些情况来说，并不方便，我们可以使用如下的方法来进行修改： POST cat/_update/1 { "doc": { "city": "成都", "province": "四川" } } 对于那些名字是中文字段的文档来说，在 painless 语言中，直接打入中文字段名字，并不能被认可。我们可以使用如下的方式来操作：【搜索】 POST pdd/_update_by_query { "query": { "match": { "姓名": "张彬" } }, "script": { "source": "ctx._source["签到状态"] = params["签到状态"]", "lang": "painless", "params" : { "签到状态":"已签到" } } } 更新文档 “upsert” 宽松地表示更新或插入，即更新文档（如果存在），否则，插入新文档。下面的示例使用 doc_as_upsert 合并到 ID 为3的文档中，或者如果不存在则插入一个新文档： POST /catalog/_update/3 { "doc": { "author": "Albert Paro", "title": "Elasticsearch 5.0 Cookbook", "description": "Elasticsearch 5.0 Cookbook Third Edition", "price": "54.99" }, "doc_as_upsert": true } 检查一个文档是否存在 HEAD hlhs/_doc/1 删除 DELETE hlhs/_doc/1 检查一个索引是否存在我们可以使用如下的命令来检查一个索引是否存在： HEAD cat 成功则返回 200 – OK 否则就会返回： {"statusCode":404,"error":"Not Found","message":"404 - Not Found"} 删除一个索引 DELETE qhl 批处理命令 POST _bulk { "index" : { "_index" : "twitter", "_id": 1} } {"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}} { "index" : { "_index" : "twitter", "_id": 2 }} {"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}} { "index" : { "_index" : "twitter", "_id": 3} } {"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}} { "index" : { "_index" : "twitter", "_id": 4} } {"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}} { "index" : { "_index" : "twitter", "_id": 5} } {"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}} { "index" : { "_index" : "twitter", "_id": 6} } {"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}}

在输入命令时，我们需要特别的注意：千万不要添加除了换行以外的空格，否则会导致错误【这些数据还指定了id】 _count 命令查询有多少条数据 GET cat/_count

批处理-create POST _bulk { "create" : { "_index" : "twitter", "_id": 1} } {"user":"双榆树-张三","message":"今儿天气不错啊，出去转转去","uid":2,"age":20,"city":"北京","province":"北京","country":"中国","address":"中国北京市海淀区","location":{"lat":"39.970718","lon":"116.325747"}} { "index" : { "_index" : "twitter", "_id": 2 }} {"user":"东城区-老刘","message":"出发，下一站云南！","uid":3,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区台基厂三条3号","location":{"lat":"39.904313","lon":"116.412754"}} { "index" : { "_index" : "twitter", "_id": 3} } {"user":"东城区-李四","message":"happy birthday!","uid":4,"age":30,"city":"北京","province":"北京","country":"中国","address":"中国北京市东城区","location":{"lat":"39.893801","lon":"116.408986"}} { "index" : { "_index" : "twitter", "_id": 4} } {"user":"朝阳区-老贾","message":"123,gogogo","uid":5,"age":35,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区建国门","location":{"lat":"39.718256","lon":"116.367910"}} { "index" : { "_index" : "twitter", "_id": 5} } {"user":"朝阳区-老王","message":"Happy BirthDay My Friend!","uid":6,"age":50,"city":"北京","province":"北京","country":"中国","address":"中国北京市朝阳区国贸","location":{"lat":"39.918256","lon":"116.467910"}} { "index" : { "_index" : "twitter", "_id": 6} } {"user":"虹桥-老吴","message":"好友来了都今天我生日，好友来了,什么 birthday happy 就成!","uid":7,"age":90,"city":"上海","province":"上海","country":"中国","address":"中国上海市闵行区","location":{"lat":"31.175927","lon":"121.383328"}} 批处理-delete POST _bulk { "delete" : { "_index" : "twitter", "_id": 1 }} 批处理-update POST _bulk { "update" : { "_index" : "twitter", "_id": 2 }} {"doc": { "city": "长沙"}} 对脚本编程比较熟悉的话，可以把大量的数据通过脚本的方式来导入 $ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @request_example.json

下载测试数据 wget https://github.com/liu-xiao-guo/elasticsearch-bulk-api-data/blob/master/es.json
curl -u elastic:123456 -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @es.json

-u 用户名:密码如果我们没有为我们的 Elasticsearch 设置安全，那么可以把 “-u elastic:123456” 整个去掉

curl --cacert /home/elastic/ca.crt -u elastic:123456 -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary @es.json

在上面，我们使用 --cacert /home/elastic/ca.crt 来定义证书的地址。等我们运行完上面的指令后，我们可以在 Kibana 中查看到我们的叫做 “bank_account” 的索引。

Freeze/unfreeze index 冻结索引（freeze index）在群集上几乎没有开销（除了将其元数据保留在内存中），并且是只读的。只读索引被阻止进行写操作。。冻结索引受到限制，以限制每个节点的内存消耗。 POST hl/_freeze POST hl/_search?ignore_throttled=false 可以找到冻结的索引解冻索引 POST hl/_unfreeze

使用 Postman 来访问 Elastic Stack