Of course, VMware DRS is not the only such product in today’s IT market. The most famous analog in the world of OpenStack is Watcher. However, very often, cloud providers develop their solutions. Why is this happening? The reasons may vary. Most often – the high price of paid tools and the lack of functionality of accessible solutions and failures found in them.
In our case, everything was more straightforward: DRS began to evolve as part of an already existing internal utility for managing OpenStack resources. Initially, this utility was aimed at making the work of the support service easier. The program can track the number and state of all cloud entities, including virtual machines, hypervisors, routers, disks, file storage, K8s clusters, load balancers (Load Balancer As A Service, LBaaS), and much more.
The utility allows you to see the current resource consumption by servers and specific virtual machines and predict changes in these indicators in the future. Based on the data provided by the utility, the operator can see problems promptly and find ways to solve them.
Since all the data the utility had already extracted for its work was quite enough to implement DRS, it decided to add these functions to the utility at a particular stage. It looked like a completely logical and reasonable step, although the development process itself turned out to be far from simple.
DRS Workflow In MCS Cloud
The DRS scheme has changed several times, improving during testing. Now the following form is in commercial operation:
- The internal name of the service is Katana. Its backend, written in Python, regularly retrieves information about cloud entities using the OpenStack API. Entities are virtual machines, hypervisors, disks, and so on. The sample includes only those characteristics obtained from OpenStack: the number of elements, their configuration, and so on. Resource utilization is not retrieved at this stage.
- Katana is a Stateless application, but it uses Memcached (MemCache) to store its cache. From here, the data subsequently goes to the UI utilities for a display to the system operators.Our utility is a caching layer for OpenStack. All data that it operates on is JSON received from OpenStack and presented in the UI in a tabular form.
- There is not enough data from Openstack for the DRS algorithm to work optimally. Therefore, information about the actual use of virtual machines is collected using the exceptional katana-client service every 10 seconds. The data is taken from Libvirtd and is incremented by using incremental counters to obtain it.
- There is a lot of data from the katana-client, and it is not converted to per second values. Therefore, the collected data from the katana-client passes to the katana-collector helper HTTP + API service.
- The katana-collector calculates resource utilization per second based on incremental data from the katana client.
- Based on the data obtained, decisions are made on balancing various clusters of hypervisors.
A special algorithm looks for hypervisors on which the utilization of the processor or the remaining free physical memory goes beyond the threshold values specified in the settings. For example, utilization is 70%, or the amount of physical memory is less than 64 GB. When such hypervisors are detected, their VMs are moved to hypervisors with an acceptable level of utilization and free memory – for example, no more than 50% and no less than 64 GB.
Of course, not all VMs from the original hypervisor are selected for migration. The following options are possible:
- Consumption of Excessive memory on the source hypervisor
- In this case, many small virtual machines will migrate too quickly free memory up to the allowable value specified by the configuration.
- Increased processor utilization on the original hypervisor
- Under these conditions, will migrate the most processor-utilizing virtual machines. Utilization is calculated about one core: in a 4vCPU / 400% (CPU Load) and 16vCPU / 400% (CPU Load) situation, select a virtual machine with four seats.
- The original hypervisor has increased the utilization of both memory and processor. In this case, the most undersized VMs that use the most processors will be selected for migration. The normalization function used in the algorithms will build a tree that will be individual in each case.
All figures appearing in the calculations will be stored in the configuration file and, if necessary, can be changed. In manual migration, which is available directly from the UI utility, the thresholds are configured in the interface.
- If the algorithm behind automatic migration finds VMs that need better resource allocation, the migration is performed using the OpenStack API.