Hadoop for digital transformation

Three years ago Sergey Zolotarev, head of the Pivotal representative office in Russia and the CIS, at the forum "BIG DATA 2014" talked about the importance of being able to add Hadoop to the real corporate IT-environments (see "BIG DATA 2014: integrators of the worlds", "Computerworld Russia", April 16, 2014).

In 2017 at the forum BIG DATA he, already as the head of the direction for the development of his own product line for data management of IBS, presented the Russian version of the Hadoop distribution. "Computerworld Russia" asked Zolotarev about what happened on the Russian and global market of Hadoop over the years, for which the market has one more distribution and what the Russian development is different from foreign ones (see "Hadoop for digital transformation", "Computerworld Russia", may 10, 2017).

The situation has changed dramatically. Three or four years ago, Hadoop mainly did pilot projects – large telecom companies and banks only looked at this platform. Now the largest players in the commercial applications market, such as SAP, SAS, IBM, Tableau, began to use Hadoop as a standard data storage platform alongside traditional DBMS. And this, of course, on the one hand facilitated the adaptation of Hadoop into enterprise infrastructure, because the biggest vendors took up this task. On the other hand, the interest of the giants gave the green light for investors, and huge investments poured into the development around Hadoop. Hadoop has turned into a huge ecosystem, the development of which invested hundreds of millions of dollars and millions of man-hours.

Earlier Hadoop was a separate system for solving specific tasks, but now it is used in a standard way in conjunction with BI-systems as a storage platform, on its basis build universal storage platforms, which are used by other information systems. This is a big leap.

The paradigm changed. Before tried to integrate the traditional data warehouse and Hadoop, and the storage were the main, now the storage left to solve old tasks, and for new analytical tasks builds a platform for collecting and analyzing data based on Hadoop. If these tasks need some data from the repository, they are taken from the store, which becomes the data source for this platform.

That's partly why, in my opinion, such terms as "data lakes", and even the term "Big Data" are being used up. Often uses the concept of "Data Platform" or "Enterprise Data Platform".

It becomes the basis of digital business, digital transformation. And those companies that realized this in time and started to build their data platforms, at least a step ahead of their competitors.

– Why did you decide to create your own version of Hadoop distribution?

– During projects with customers our team is very often heard that the Hadoop distribution does not have enough components, that the versions of the components that are in the base assembly do not satisfied, etc. Satisfying these wishes, completing the distribution, we have acquired a valuable experience, and in addition, we understand how to properly build the distribution kit and which is not enough for the customer.

The next important point is the availability of a high-end expertise solution. And it concerns not only Russia but also Europe as a whole. For the main developers of the Hadoop distributions the main market is the US market. And more precisely – the California market, then the rest of the American market, then the UK. Europe and especially Russia for them is not so important. I can say this from my own experience, since I worked in a large western vendor. Resources for Europe and Russia are very limited, there are very few specialists, support is available only remotely, and also of such quality that it does not always allow to eliminate the problems arising in the course of complex projects.

At some point we realized that the shortcomings of the existing products on the market, combined with the high prices for their customization and support, created a niche for the domestic distribution Hadoop, and if it is do, it will be demand by the market. In 2015, we joined the Open Data Platform Initiative, the international community of open source software developers in the field of Big Data. Last year, our distribution ArenaData Hadoop was certified according to ODPi specification, equaling in this respect with the developments of the largest Western companies.

– Why did your team become part of IBS?

We were looking for a partner who, on the one hand, is well aware of this problem and has an established practice of working with data, and on the other hand – is focused on building universal data platforms for him clients based on open source projects. IBS is interested in the development of proprietary solutions, and our project is a platform for the development of a whole range of new products.

– What are the differences between your distribution and foreign analogues?

– "Technically" our product does not differ from the western counterparts, and should not, because there is a single ODPi-approved specification of how should be assembled the Hadoop distribution, and we follow it. ArenaData Hadoop is an enterprise distribution, it has a full set of tools to automate the installation and planning process for the Hadoop cluster, further processes related to monitoring, administration, upgrading, etc. For Russian users the principle advantage of using our distribution is first of all – locally available expertise in developing the solution architecture, documentation and support in Russian, more affordable prices for specialists and support.

We are ready to provide ArenaData Hadoop not only as software but also as hardware-software complex developed by IBS based on the platform "Skala-R" with the unified support of the whole complex. The last is important: I already said that our big problem now is the lack of expertise, backed up by real experience.

A special emphasis on the fact that it's completely Russian, "import-substituting" solution, we don't do, but nevertheless it is so. This is a Russian product, and for those organizations where this moment is important, now we have something to offer.

An important technical feature of ArenaData is that we have collected all the necessary repositories not only for Hadoop but also for the entire software environment in a package that can be deployed without an Internet connection.

Working with the largest Russian customers, we are faced with the fact that almost all of them generated closed contour, and there is no possibility of accessing the Internet from the corporate network, but all distributions that existed at the time assumed that the installation is online and all supporting utilities, libraries, etc. are downloaded from various network resources.

– What is about product update, also offline?

– Yes, also through removable media. We envisage this when creating new product releases.

– How much does the demand for Hadoop in Russia increase?

–  If two or three years ago for projects using Hadoop we have taken telecommunications companies and banks from the first three, but now in these areas as well as in retail and industry, the companies from the top ten are ready to put the tasks for its implementation. Many state companies anyway tried this technology and found the tasks that can be effectively solved with Hadoop.

Of course, we still lag behind the West, where Hadoop in big companies and banks has become part of standard IT infrastructure. Even if they haven't yet started to use Hadoop, there is an IT strategy item on the implementation of this technology. They understand that for today there is no real alternative to Hadoop for building a universal data platform.