Nowadays, we live in a society where people have access to tons of information with just a few clicks. We have devices that communicate with others to exchange and generate any kind of information. Maybe you’ve heard and every time you hear more frequently terms like IoT, Smart city, Cloud, Big data, etc.
Probably the most listened to is the Big Data. But … What is it really? Big Data could be defined as data sets or combinations of data sets whose size, complexity and speed of growth make it difficult to capture, manage, process or analyse.Currently, it represents a problem for many companies whose volume of data keeps growing without knowing how to develop it and manage it.
One of the most important factors to manage large volumes of data efficiently is the correct choice of the database. There are several types of databases, the most popular are the following:
— SQL: Myslq, Oracle, Postgress,etc.
— NoSQL: Elasticsearch, MongoDB, Redis, Cassandra, etc.
In these conditions, the best option is to use a NoSQL database for several reasons:
— The stored information doesn’t require a defined format or scheme.
— It doesn’t guarantee ACID to improve performance and availability.
— Replication and distribution.
— They allow horizontal scalability
In IoTsens we have an Elasticsearch database for the storage and management of the data generated by the large number of devices that communicate with our platform. Among all the characteristics of this database, the most remarkable are the following:
— It is a search engine based on Apache Lucene.
— It structures the data through Inverted Index, which gives quicker searches.
— It allows the distribution of data on different servers.
— It has an API REST through which you can recover and interact with the data, in addition you can manage the database itself
— It indexes documents in JSON format free of schemas.
— It guarantees a high availability of the data, since it is able to recover automatically in case of a fall of one or several of the servers where the data is.
— It gets aggregations, with which achieve to carry out operations to exploit the data.
Taking the first steps in Elasticsearch is simple, however, before starting it is advisable to know well the nomenclature of some of the basic elements that comprise it:
— Cluster: Nodes set.
— Node: Server that is part of a cluster.
— Shard: Partition of information due to hardware limitations.
— Index: “Equivalent” to a database in SQL.
— Type: “Equivalent” to a table in SQL.
— Mapping: “Equivalent” to a schema in SQL.
— Document: Basic unit of information or the data to save.
Now, lets see several practical examples through the use of the API.
Create an Index:
Remove an Index:
Insert a document:
Recover the same document:
As you can see in the previous examples, it is not very difficult to do basic operations. However, the potential of Elastisearch lies in its advanced searches, which allow you to exploit your data in various ways. An example of an advanced query is as follows:
In addition, elasticsearch offers the possibility to refine the results of our searches and even do operations with them thanks to aggregations. Among the functionalities we could:
— Group the results obtained in buckets.
— There is the possibility of doing operations with the grouped data and obtaining new data.
— Different types of aggregations can be combined.
— More specific filters are allowed.
An example to see the use and potential of the aggregations is as follows:
Next we will show a practical example to see the use and potential of the aggregations, where the owner of a bookstore wants to obtain the average price of books by gender:
In the last examples, the complexity has increased moderately. But do not be scared, with a little patience and practice you can master and obtain great results for your company or projects.
In IoTsens we have more than 2,500,000,000 stored documents that take up 300 gigas of space. Daily more than 9,000,000 documents are stored and we are able to recover, process and show the user around 300,000 documents (approx. 31Mb) in about 2.5-5 seconds.
As you can see, with the use of Elasticsearch we have obtained fantastic results, but in addition to us, other companies such as Tesco, LinkedIn, Foursquare, Facebook, Netflix, Dell, Ebay, Wikipedia, The Guardian, New York Times, Salesforce, Docker, Orange, Groupon and Eventbrite, also use it, depending on it part of its success.