Interview With Balys Kriksciunas: 000webhost at Percona Live 2017
access_time
hourglass_empty
person_outline

Interview With Balys Kriksciunas: 000webhost at Percona Live 2017

Scaling a Million Databases on 000webhost

Percona Live Open Source Database Conference is one of the industry-leading events for scaling businesses and dynamic communities that use and develop open-source database software.

It focuses on MySQL, MongoDB, PostgreSQL, and other world-class open source databases, offering in-depth discussions on solution scalability, database architecture, security, and performance to suffice the needs of a growing business.

000Webhost at Percona Live Europe

000Webhost handles more than a million unique user queries per minute on almost a million databases. Our head of engineering, Balys Kriksciunas gave a talk on the obstacles encountered when working on such a scale. He explained why 000Webhost is now running on MariaDB in LXC containers and routes queries with HAProxy and ProxySQL, and presented the timeline of solution testing and trial. Balys also talked about our partnership with René Cannaò, the creator of ProxySQL and query routing logic in our new infrastructure.

000Webhost’s database issue was pretty common among hosting providers with one logical shared platform. Databases were sharded manually to different physical database servers, which were manually assigned to web servers. For 000webhost, this solution quickly became inefficient as the user base was growing extremely rapidly.

000Webhost is now running on heterogeneous hardware, where any user can be hosted on any server. 000Webhost has achieved seamless scaling by simply adding new servers. Without any downtime, the automated load balancing solution migrates users between nodes, when they are running out of capacity or are overloaded. Each web node can route web requests to any other node. An orchestration of such multiple solutions now ensures a stable and trusted environment for the end-user.

Scaling 1 Million Databases: Roadmap

 

What Issues Did You Encounter at the Beginning?

We depended on each server’s hardware resources, thus, we were unable to dynamically scale user resources. As each user is unique, servers had different workloads, some were close to idle, some were overloaded, even with the same database count. We had spiky performance and could not guarantee stability.

Due to hostnames being hardcoded in the scripts, account migration incurred downtime and was prone to an error. For example, if a user was running a WordPress website and needed to migrate from server A to server B, they would need to change WordPress configuration files manually.

How Did You Solve It?

We figured that if we used localhost in the user configuration files, all user migrations would become seamless. Their configuration files could remain untouched, even if we changed the server hosting the database. This began a chain of logical changes that we have implemented, starting with adding a HaProxy instance. It helped us to listen on a local socket on each web server. This allowed all users to use localhost in the scripts, which were connected to HAProxy and routing to physical MySQL server in the backend.

What Happened Next? How Did You Deal With the Database Issues?

Implementing HAProxy solved only the migration issue. All the other issues with database servers were still present. The main problems were caused by acl_get() and file locking glitches, so we decided to containerize physical servers, minimizing the number of databases per instance. We decided to use LXC instead of OpenVZ because at the time we used CephFS, and needed support for the newest kernel releases. Later we migrated from CephFS to pure-SSD but stayed with LXC.

What Is the Outcome of All These Implemented Changes?

We had so many containerized instances, that HAProxy configuration exploded immensely. This led us to choosing an extra query routing layer. Firstly, we tried MySQL proxy, a proxy that could rewrite database queries to remove overloads due to spikes. However, it could not route by a database username, and it performed poorly for our scale. We also tried MaxScale made by MariaDB, which seemed optimal, but it stored all configurations in physical files. With millions of users and databases, configuration files were too large to perform.

Partnering With René Cannaò

Finally, we discovered ProxySQL. It could rewrite queries based on regex and route them by a username to different shards (database instances). Furthermore, it could apply limits to users and backlog queries. ProxySQL configuration was stored in SQLite and initial tests showed a great performance boost. We did run into several issues – each time a new user was added, changed, or removed, ProxySQL required a reload of runtime configuration in memory. With a million user base, such reload, sometimes taking even up to 20 minutes,  caused global downtimes. We did not want to give up on ProxySQL, so we called out to Rene, the author of ProxySQL, to help us out. Rene rewrote some of the logic in ProxySQL, made reloads async, and they completed in sub-second time. Engineers at Hostinger appreciate ProxySQL a lot and also contributed to the ProxySQL project by adding full IPv6 support, which 000Webhost is running on at the moment.

The author

Author

Kristina Dailydaite / @kristinadailydaite

Related stories

Author

Jackson Reply

December 02, 2017

This is great

Author

Website design and developer Reply

November 05, 2017

Great article. Thanks.

Leave a reply

Please fill the required fields.Please accept the privacy checkbox.Please fill the required fields and accept the privacy checkbox.

Comment

Name*

Email*

Thank you! Your comment has been successfully submitted. It will be approved within the next 24 hours.

Become a part of Hostinger now!