Indexing Data into Elasticsearch using Python

Search with Elasticsearch

Rishab Batra
3 min readMay 16, 2023
Image Source

Introduction

Elasticsearch is a robust and highly scalable open-source search and analytics engine. The Elasticsearch Python library provides a comprehensive API for interacting with Elasticsearch, allowing you to leverage its search, indexing, and data manipulation capabilities in your Python applications.

Elasticsearch is widely used in applications such as search engines, logging and monitoring systems, e-commerce platforms, recommendation systems, and data analysis. Its flexibility, scalability, and robust feature set make it a popular choice for efficiently handling and searching large amounts of data.

Usage of this doc:

To interact with the ELK (Elasticsearch, Logstash, and Kibana) stack using Python, you can use the Elasticsearch Python library elastic search for Elasticsearch operations.

Prerequisites

To work with Elasticsearch in Python, you’ll need to install Elasticsearch and Kibana and ensure their servers are running, so install them from their official links.
Install the Elasticsearch-Py library using pip:

pip install elasticsearch

Interaction with the ELK (Elasticsearch, Logstash, and Kibana) stack using Python.

try:
import os
import sys
import elasticsearch
from elasticsearch import Elasticsearch
import pandas as pd
from elasticsearch import RequestsHttpConnection
print(“All modules loaded”)
except Exception as e:
print(“Some modules are missing”)

es=Elasticsearch([
{‘host’:’localhost’,
‘port’:9200,
‘scheme’:’https’
}
])
client = Elasticsearch(hosts=[‘https://localhost:9200’],
verify_certs = False, basic_auth=[“my_admin”, “yourpassword”])


print(client.cluster.health)
print(client.ping())

In the above code:

— Import the Elasticsearch class from the elasticsearch library.

— Create an instance of the Elasticsearch class and specify the Elasticsearch cluster’s host, port, certs and authentication credentials.

client = Elasticsearch(hosts=[‘https://localhost:9200’],
verify_certs = False, basic_auth=[“my_admin”, “yourpassword”])

— Verify the Elasticsearch connection:

print(client.cluster.health)

Indexing a document

document = {
‘title’: ‘Example Document’,
‘content’: ‘This is some example content’
}
response = client.index(index=’my-first-index’, body=document)
#Searching Results
search_results = client.search(index=’first-index’, body={‘query’: {‘match’: {‘content’: ‘example’}}})
#Process the search results:
for hit in search_results[‘hits’][‘hits’]:
print(hit[‘_source’])

In the above snippet:

First, you create a document and then index it using the below command

client.index(index=’my-first-index’, body=document)Indexing a document

Now, What is indexing?

Indexing in Elasticsearch refers to storing and organizing data within an index. An index in Elasticsearch is similar to a database. It is a logical namespace that holds one or more documents.

  • MySQL => Databases => Tables => Columns/Rows
  • Elasticsearch => Indices => Types => Documents with Properties

When you index a document in Elasticsearch, you are storing it and making it searchable within a specific index. Each document in Elasticsearch is represented in JSON format and consists of key-value pairs.

Searching a document
The search method in Elasticsearch-Py allows you to perform various types of searches using different query types, filters, aggregations, and sorting options. Perform a search using the search method, searching for documents that match the query.

#Searching Results
search_results = client.search(index=’first-index’, body={‘query’: {‘match’: {‘content’: ‘example’}}})

Now, print out the source field for the matching query.

#Process the search results:
for hit in search_results[‘hits’][‘hits’]:
print(hit[‘_source’])

Conclusion:

  1. Elasticsearch is used in various applications, including search engines, logging and monitoring systems, e-commerce platforms, recommendation engines, and data analysis.
  2. Elasticsearch provides near-real-time indexing, meaning that once you index a document, it becomes quickly searchable and available for analysis. This is crucial for applications that require real-time data updates and fast search responses.
  3. Elasticsearch empowers organizations to build efficient and scalable search and analytics solutions. Whether you need to build a search engine, perform data analysis, or implement real-time monitoring, Elasticsearch provides a reliable and flexible foundation to handle your data effectively.

Takeaways:

In the above article, you saw setting up an elastic server using Python, creating a document, and executing a search query.
You have to take note of five steps :

1)Import the Elasticsearch module and create a connection to Elasticsearch
2)You have a document; create an index for it.
3)Define the search query.
4)Execute the search query.
5) Process those results.

Now you are done, Okay!!

Note — Ensure you set up the server carefully; I got a few errors while starting and wasted a few hours, which I will share in a separate article.

Thank You!

--

--

Rishab Batra
Rishab Batra

No responses yet