What has changed in WEDOS and what improvements are in the pipeline

[gtranslate]

We regularly improve things and procedures both technically and organizationally. We are also planning more communication towards customers. More openness and a brand new system for automatic notification in case of problems (e.g. server downtime). What didn’t we remember? What else would you recommend and advise? We will reward the ideas we implement with a gift.

What all has changed

Read what has changed or will change. Some things we are changing technically, some things we are changing organisationally, and some things relate to communication towards clients. We’ll be even more open and post a lot of things automatically (like server outages).

  • Replacement of one UPS with a new one, which also has batteries for a significantly longer time(photo gallery here).
  • Since July, we have been testing the motor-generator once a month under load for 15 minutes (we completely simulate the failure of the entire building), the normal weekly test will remain.
  • We will have a second automatic generator with its own ATS, which will back up the existing one, i.e. 100% power backup of 2 motor-generators (a matter that probably not many datacenters in the Czech Republic have).
  • Absolutely all PCs in the offices run through a special separate UPS (so far there are no PCs in 2 offices).
  • We will replace the main switchboard, which will be sized for higher capacities (with a view to the future).
  • Preparation for another electricity connection from the back of the building, where we plan to build our own substation and we have a motor-generator (and in the future there will be 2 motor-generators).
  • We are speeding up the preparations for a second datacenter (whether it is a possible oil-based or a conventional solution in another location).
  • The office firewall will also be tested once a month in full operation and we will see if it runs on the backup solution + there are spare CF cards with the installed system included with the firewalls.
  • A wifi router has been purchased for the office which supports automatic connection via mobile so there will be an emergency connection in case of any problem.
  • In the offices, on monitors with cameras and monitoring dump, we will add a dump of the Free-cooling and climate values in the server room and the power values of the whole building, so that we have a constant overview of what is happening in our datacenter and this overview is also available to 24/7 customer support.
  • We can offer customers the choice of compensation directly in the administration, because we have programmed the compensation interface. If there is a problem, you will not only be informed, but you can also collect compensation without having to contact our customer support.

Organizationally, the following has or will take place:

  • Put your social media passwords in an envelope and store them in a safe place (and if necessary, a customer support member, for example, will be able to access them).
  • We will set up special profiles on social networks where the status of our services will be displayed.
  • A specific shift leader will always be designated on support and will be responsible for handling crisis communications outward or providing someone to handle it).
  • Standardise the rules (plus introduce training) for awarding compensation to keep customers happy.
  • We have modified the procedures or we will modify the Wiki procedures for similar situations andit will be more convenient to print some passages, because in case of a crash of the internal network we are without access to some information.
  • We will add automatic reporting of failures (and outages) to the customer administration (to customers who are affected = have service on the affected server) and to chat, i.e. some kind of monitoring to transfer to the administration.
  • Larger issues or major problems will be posted immediately to the website plus administration plus chat plus social media. From our experience, we know that these are often problems that are not on our side (large-scale phishing emails, problems with large ISPs, loss of connectivity to foreign countries, a large email service provider on a blacklist). However, you need to be informed so that you can take appropriate action.
  • We are working on a solution where our website and customer support (chat, contact forms, email) will be available at all times, even in the event of a truly massive global outage.
  • We need to send a mass email to all customers urgently in case of a global problem.
  • In case of any VPS failure from our side (for example, hypervisor restart) we will automatically send an email notification, in case of prolonged failure of other services too.
  • It is necessary to set up call forwarding with the telephone operator in case of unavailability.
  • Motivational program for WEDOS with no blackouts.
  • By midnight, the junior technicians will be on the job, and they now have access to some servers where they will be able to step in and address some things.
  • We will regularly test various emergency situations.

Testing, testing and testing for live…

Yes, we will test various emergency situations on the fly. It’s risky, but if you do a shutdown or breakdown in a planned way, you know what happened and have a chance to get back to a functional state.

We plan to test various emergency situations. Of course, everything will be under the supervision of responsible persons so that situations that could jeopardize the operation of our services do not arise. Some tests will be scheduled and others unscheduled. Afterwards, everything will be evaluated in order to identify weak points. We will keep you informed.
If we want to be number one in the market, we have to be ready for anything and everything.

Anyone else have any ideas?

What didn’t we remember? What else would you recommend and advise? We will reward the ideas we implement with a gift.