Game Performance Update from Kano

Discussion in 'General Discussions' started by Kendall, Nov 23, 2016.

  1. Kendall

    Kendall Administrator

    Hey y’all,

    If you were playing actively over the first few weeks of November, you may have experienced some serious moments of “lag” in a few of the games. Let me be clear: making sure you all have fast 24/7 access to your accounts is a top concern of Kano/Apps. It has been a top concern for myself personally for the past 8 years and will continue to be into the future.

    I want to let you guys know that all of us here, particularly me, apologize for the reduced quality of game play many of you faced. We’ve put out some announcements about this as it was happening, but it’s taken us some time to get our thoughts together in a post mortem, figure out what happened and put action steps in place to ensure that this type of issue does not happen again. We’re making this post to provide details on what happened and what we are doing to help prevent it in future.

    First of all: the server side lag spikes have been fixed and performance has been dramatically improved overall. We continue to monitor our server side latency (the amount of time between our servers receiving a request and responding) closely alarming off of any spikes. Since the fix we are serving requests 10-15% faster than what we were before the November 2nd issues started. The screenshot below is of Mob Wars LCN server side latency before and after the fix was put in place on November 9th. As you can see latency spikes, which was the issue, disappeared.

    [​IMG]

    We have still heard some reports of people experiencing lag. Slow response times are sometimes caused by local network connections or browser issues and it can be hard for us to diagnose those; please do send in a ticket if you’re experiencing lag though and we’ll be happy to take a look. That being said, individual slow server side requests may still exist and we have plans to continue to improve and tackle these on a case by case basis. We are also committed to reducing server side latency further and have plans to potentially reduce it by up to 50% over the coming months, but more on that later as we get further into testing.

    So — what happened? It’s a bit of a long story that I will try to keep short.

    November 2nd we started to get reports that people were experiencing the dreaded lag at 7:30pm Pacific Time. Like many online games, LCN, PC, VC and ZS are all hosted on Amazon Web Services. To help monitor performance and reliability, we have some automatic alerts that tell us if game play is sluggish or down completely, and a few started going off.

    Digging into the problem, we found that some of our databases were locking up sporadically. This would cascade back to the game, causing everything else to lock up until the database would recover.

    We ran through all the usual suspects that we have hit in the past, including simply more players being in the game, a high and increasing number of requests, bad hardware or some bad game code we might’ve released, but nothing seemed to answer why we were seeing these spikes — and they were getting worse. At this point, one of our devs (Shawn) had been up for 36 hours. We needed more time, so around Thursday we brought up some additional servers to spread out the load as well as tweaking database config changes that seemed to improve the situation.

    By late Monday we’d found the answer, though it wasn’t something we were proud to discover. For a long time, we had been monitoring the servers at a pretty high level; we wanted to know if lots of players were experiencing a problem all at once, or if every part of the game suddenly started moving slowly. We hadn’t been looking closely enough to realize that the amount of IOPS each hard drive could handle was getting closer and closer to maxing out, and would hit moments of saturation, causing the lock ups.

    Once we figured out the problem, our Ops team worked quickly to get a test server up using higher IOPS capacity drives for the affected databases. Once the game looked to be working well on a test server, we spent Wednesday the 9th-10th transferring all games over to these new servers. We confirmed the speed was much better for all games and notified as many of you as we could. Though it’s been a dark time for us during this issue, we are happy to say that since this switch, the game is actually much quicker than it has ever been.

    Over the past couple weeks, we’ve spent time working through our notes and records to figure out how we missed this, and what to do next. For one thing, we’re currently working on more precise monitoring so that we will be able to tell if things are going wrong. And beyond that, we’ve found a number of areas in the code that we can optimize to reduce the risk of problems and generally speed things up.

    Optimization, or making things faster and more efficient, is something we are strongly committed to in all our products and hope to have some significant upgrades to rollout and announce in the next couple of months so stay tuned.

    So, in closing — to all our players please accept my most sincere and humble apology. We thank you for your patience while we fixed this. And most of all, thank you for playing our games and being a part of our community. I hope it helps to know a bit more about what happened and what we’re up to, and if you have any questions or concerns, as always please let us know via the Support site — we’ll do our best to get back to you with whatever info we have.

    - Eric Haight President & Co-founder
     
    jdc20181 and Zoonie like this.
  2. Mad Moses

    Mad Moses Active Member

    Thank you for the gift Kano Apps Team. :)
     
  3. No Nuts 4u

    No Nuts 4u Member

    TY eric for the info and fixing the nightmare issues and thanks KANO for the gift =)
     
  4. jdc20181

    jdc20181 New Member

    Thanks for the update
     

Share This Page