redundancy plan

What’s your redundancy plan? What’s your redundancy plan? What’s your…

What is your redundancy plan for internet connection failure at the fields during your soccer tournament? You should have at least two systems ready to go that have been fully tested. Failure to connect to the internet is no longer an option, yet few parks have bothered to upgrade and more still do not have full-time internet.

We were promised a “set it and forget it” internet. So far, this is a fantasy and a bit dangerous to assume when your soccer tournament now relies so much on having and keeping a reliable internet connection on the field during your tournament weekend.

It has likely been a whole year since your last tournament and in that year, these things have been rumored to have happened.

  1. Browsers have been updated (not a rumor)
  2. OSs have been updated (not a rumor)
  3. ISP have put a lot more “customer protections” in place to kill the exponential proliferation of spam and network attacks.
  4. ISPs have begun throttling data plans… with or without actually telling anyone or admitting it, but we’ve run tests… they are

Saying “we didn’t do anything differently from last year” won’t explain any connection issues you may have this year. Things may not have changed on your end with your computers, but they certainly have changed with the network your are connecting to. Test. Test. Test.

So you have a more full understanding of what TourneyCentral does during your tournament weekend, we’ve listed what we do to monitor your site on our end. We try to be as unobtrusive as possible while also being like a “guardian angel.” If you want to skip the next seven paragraphs and go straight to “Here is my BEST guess: “ feel free…

Resources
Prior to the tournament (Thursday night,) we increase the RAM and CPUs on the application server, the main server we serve your domain from. The database runs on a separate machine which rarely gets beyond 1/2 1 CPU capacity but since we have a private cloud infrastructure, we’re able to scale that up dynamically in the event we ever have to. After the tournament weekend, we ratchet everything back to “normal.”

Monitoring
On Friday, we set up a usual panel of monitoring windows which include a top on the app server, a top on the dbase server, a tail on the httpd (web server) log files (both access and error) so we can see all the traffic in real time as it flows in/out, email monitoring on the scores updates so we can see when scores are being updated in real time, a window of real time Google Analytics so we can watch where/when teams are flowing in, a tab set of front page tournament windows and a tab set of specialized internal tools we set up to monitor scores/standings updating and social media accounts so if we see folks complaining, we start poking around. On Sunday morning until all the pool games are played, we pay extra close attention to the activity as this is a stressful time for the teams and your crew. Once the semi-finals or finals are set, 2/3rds of your traffic drops off almost immediately as teams abandon their interest (sorry, it happens!)

The network engine
The data center is serviced by multiple trunk lines, redundant power including military-grade generators. (an aside, when hurricane Sandy hit, their electricity lines through New Jersey (it is in Allentown) were underwater and the entire datacenter was powered by these generators for 9 days… nobody knew it until after the fact.. imagine the website being down for 9 days! Can’t happen) In our Dayton, Ohio office locally, we have TWC business class internet as well as Verizon LTE and AT&T LTE hotspots just in case TWC fails, and Verizon fails. We also have a studio in Kingston, NY we can call and they can get online if we are unable to from here. We’re getting to the internet regardless.

If we see something spike, disconnect
So basically, if your site is down or unreachable or not generating traffic, we would be the first to know it! We’re almost borderline neurotic about making sure your site is up and running during your tournament weekend.

If we see trouble, we’ll do a traceroute immediately and see if your domain is resolving. We’ll set up a continuous ping to your domain both inside and outside the local network. Assuming the website was getting expected traffic load and the DNS was resolving and you don’t get a call from us, the issue has to be closer to your point of internet entry.

If/when we get a call from your team in a panic, we immediately look at the traffic on the site, the server load, the log files. If everything there is normal, we will feel very badly for your plight, but there will be very little we can do to help. Hearing “your site is up and running, it must be the local connection” may be the truth, but it will not be comforting. We know coaches and parents may be yelling at you — all of us here have been there — but you will have to start methodically tracing back why your connection is not working.

The trick to resolving internet connection issues is to remain calm and work the connections backwards methodically and rationally. Tune out the noise of people yelling at you.

Here is my BEST guess:

I could be entirely wrong, but without methodical testing and under high load over an extended period of time to remove the intermittent timing from the mix, this is just my best guess…

Connection issues are not likely to be hardware, i.e., the router or the MacBook, iPad, etc. The wireless router may have something to do with it, but I seriously doubt it. Same with any mifi devices. But it doesn’t ever hurt to power cycle them.

I THINK ISPs now watch traffic with smart algorithms that see traffic bursts to the same domain over a period of time and throttle network access based on the traffic pattern from the same IP address with assumptions that make perfect sense to them, but entirely wrong based on what you are doing. Soccer tournaments is such a teeny, tiny sliver of anything that very few people know — or care — about the business model and operations. If you think about it, during the day, you will hop on and update scores at a rapid pace, then it sits idle for an hour, then a ton of traffic, then idle, then a ton of traffic… that sort of pattern to an algorithm might say, “whoa, someone might be trying to inject to a domain… let’s sloooooww this down a bit…” and so the network issues a delay that lasts a few seconds, usually not long enough for a human being to notice, but just long enough for a bot to give up and move along to another domain. During the course of normal internet use, you would not even notice that, but when you are rapidly entering scores and whipping through entry screens, that three seconds becomes noticeable very quickly!

The Chrome browser appears to be more sensitive to the delay and actually tries to help the network by actively logging it into the cache, in an attempt to “protect” the user. This is where you would get that ERR_NAME_NOT_RESOLVED which may or may not be sticky, depending on if you cleared the cache, set the cache low, etc in some advanced settings. The Safari browser the iPad/iPhone uses is likely designed more for the slower mobile networks, even though you may be connected to a wifi network, so they tend to be less panicky… they may or may not be compensating for the network latency… I think they are, so the network delays are the same, you just may never know about them or be alerted less… there is no specific ERR_NAME_NOT_RESOLVED in the Safari browsers, both in the iOS and MacOS… not sure about the Windows version of Safari or the same Mozilla-based Firefox browser. Internet Explorer (or Edge now?) is absolutely horrific, so don’t even consider that as an option; it tries to protect you from everything and will end up driving you insane. We have just a passing experience with Android devices, so my guesses would just be wild hair guesses and not very helpful.

The network also sees not only the initial request to the domain but also the response and HTTPD request back to the browser on that domain. What this means in practical terms is your scores updates, etc, may be sent and processed by the server, but the confirmation page that gets sent back telling you this is what happens may have been caught up in the delay, which is good in that what you were doing got done, but bad because it looks like maybe it wasn’t. Four calls to the DNS server over a very minuscule period of time on every update is kinda the minimum cycle.

Our advice

1. Turn off cache. Flush all the cache and local files often during the weekend. Do not trust a quick trip to google.com is a sign that your connection is up and running. (If you flush local files and cookies, you will be forced to log in again into your admin area after you do. This is normal.)

2. Batch enter more scores and slow down between entries… it may seem like it takes longer, but if you can prevent a network collision from the ISP and save 3 seconds here, there, it nets out the same or better.

3. Have an independent LTE hotspot with a different mobile provider set up just in case and maybe enter some of the scores from that. If multiple people are entering scores, make sure they filter first so they don’t inadvertently update each others’ scores entry. (blank entry fields will count as a NULL update) Having redundant systems for getting on the internet at the fields may be the new normal, especially when everyone has expectations now of now now now internet.

4. Talk to whomever is providing internet at the fields and see if they have any way of white listing your domain to always put traffic through and maybe even giving you a static IP for the weekend, run through a VPN. I doubt they will as these things take expertise to set up and it’s only one weekend and what do they get out of the deal, etc., but it never hurts to ask. (just don’t accuse them of throttling!! They get their dander up over that!)

5. If it were me, I would set up a MacBookPro on the primary wifi network, but also pack my AT&T LTE iPad, my Verizon LTE iPad and my AT&T iPhone ready to go with a hotspot to switch over on my MacBookPro at a moment’s notice. After the first couple times, I would set up both iPads and switch over to each alternately for batch scores entry… but that is just me probably being obsessively redundant… (no, it’s not something TourneyCentral will do for you, not even for money… 🙂 )

Again, this is just my best guess based on what I know to be true about the server back ends and my assumptions of the front based on my real-world experience of having been on-site, entering scores for my own club tournaments when my kids were playing. Internet should be “set-it and forget it” by now, but it’s not and probably never will be. Getting and keeping a reliable connection outside at a park will more than likely be a constant challenge. The best we can do is plan for an outage by making sure there are redundant systems, tested and ready to go.

I hope this helps try to get to what is causing typical delays at the fields. I really, really think most of the delays you will experience is the ISP fiddling on you, though you will never get them to admit it. You just gotta outsmart them! Every year, we find we have to outsmart tech vendors in different ways.