<img height="1" width="1" style="display:none;" alt="" src="https://dc.ads.linkedin.com/collect/?pid=76180&amp;fmt=gif">

RoundTower Blog

The Importance of Choosing the Proper Method to Solve a Problem

The Importance of Choosing the Proper Method to Solve a Problem

By Kurtis Lindemann, VP Systems Engineering October 16, 2012 [hr]   I came across a fun brainteaser this weekend and posted it to my Facebook page. [message_box type="note" close="Close"] [one_fourth] [/one_fourth] [two_third_last]

The Problem

You see a shirt for $97.

You do not have the cash, so you borrow $50 from friend A and $50 from friend B.

50 + 50 = 100

You buy the shirt and have $3 change. You then give friend A $1 and friend B $1, keeping the remaining $1 for yourself.  You now owe friend A $49 and friend B $49.

$49 + $49 = $98 + your $1 = $99

Where is the missing $1? [/two_third_last][clear] [/message_box] I wasn’t sure if it would really be a challenge for most people.  As it turned out, most of the people that got the answer right away had a specific professional background. Most everyone else appeared to be puzzled about where that missing $1 could be.  While this problem may seem trivial to some, it illustrates a common challenge that we experience in IT consulting:  You may have all of the data that you need to solve a problem, but you can be misled by the way the data is presented, or the way the original question is positioned. One great example of this is an assessment that we performed for a customer that had posed the question “Do we need flash storage?”  On the surface, given the initial statistics the customer provided, and what we knew about their storage architecture, it appeared that their disk-based array was hitting its limits.  They had a database that was crushing one of their arrays and severely degrading performance of other application environments at random intervals during the day.  The solution that one group was anxious to implement was a PCI-based flash technology to run their production database workload on.  It sounded like the perfect solution on the surface.  It would isolate the solution from other applications, and provide guaranteed performance to the mission critical database.  The group had assumed that the case was so compelling that it was a slam dunk.  Why would the company bring in outside consultants to validate such an elegant solution? It turns out there was much more to the story.   This company wasn’t only having problems with its largest database, it was having issues in its VMware environment, another mission-critical database environment and was periodically seeing its arrays go to 100% CPU for reasons that could not be initially explained.  We dug deeper into the problem by gathering statistics from the applications, operating systems, HBAs, networks and arrays.  We interviewed key application teams within the organization as well as the teams that ran the infrastructure. As we compiled the data, we started to see that the problem wasn’t just one application bringing the storage environment to its knees.  We saw that overlapping processes ranging from nightly batch loads to cloning to backups that were causing pressure.  We saw that code in one of the databases was extremely inefficient.  We witnessed multiple select *’s being issued on a multi-TB database during random intervals throughout the day.  We also saw that they were experiencing a significant problem with one of their arrays that could bring it to its knees when virtual machines hadn’t been properly configured.   We witnessed the HBAs on one particular server being maxed out at some points during the day.   We concluded that while they could implement flash in this environment it would only mask the problem for a period of time.   Eventually, without making improvements to the environment as a whole, they were going to be right back where they started. If we had only looked at the data the way the customer had presented it, we would have designed a sub-optimal solution.  As it turned out, this customer needed more than flash, they needed high-end enterprise class storage that could scale to multiple controllers, dedicated resources for critical workloads, and segregated workloads to prevent one application from impacting another’s performance.  We discovered that they needed to put some serious work into improving their applications and how they created and accessed data.  If the customer didn’t fix the application behavior, then a high-end array would just mask the problem for awhile. At the end of the process, the customer went down two paths.  The group that wanted the PCI-based flash storage got their way and implemented the solution.  We had warned them that the solution was not an enterprise-class solution with high availability.  About two months after they implemented the solution, they lost a PCI card in the primary system and had data loss.  They were able to successfully fail over to their ‘replicated’ system, and after a day were back in production, but not without getting a little egg on their faces. The company also implemented a high-end enterprise array and fixed most of the other problems that they had with their applications in the old environment.   The company has been very happy with their purchase, and has seen improved performance and reliability in their IT infrastructure.  They looked at the problem in a different way than the PCI-flash group, identified the issues that they needed to address, and worked to solve all of the problems, rather than the just the most painful.   Sometimes people need to have to have a tragedy to learn that they didn’t use the right method to solve their problem.  Sometimes it’s best to engage a professional that has the experience to help with the issues from a more objective view point.  Those professionals can help you find the solution much faster and save you considerable pain and anguish throughout the process.   [message_box  icon="no" close="Close"] Regarding “The Problem” at the top of the page, in virtually all cases, people with a background in accounting were able to solve the problem in under 1 minute.  Those with more ‘traditional’ backgrounds, including engineers took 15-20 minutes on average to come to the correct conclusion and also be able to justify their answer with the correct solution to the problem. The first 5 people to send me the correct answer and solution will get a $20 Amazon gift card*.  Send an email with the subject line of ‘SOLUTION TO THE PROBLEM’ to me. I’ll post the answer after we have the winners. * People who I’ve already discussed this with, family members and RoundTower employees are not eligible. [/message_box]  

Share this Post:
« Critters in the Attic
ITIL: Late to the party, don’t worry! »