Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Flickr button
Stumbleupon button
Newsvine button
Youtube button

GeeForce LLC

We get technology out of the way of doing business

Browsing Posts published by aaron_gee

Most system administrators look at virtual servers such as Amazon’s EC2 or RackSpace’s Cloud Servers as a boon to IT, a way to scale to meet problems quickly, easily, and less expensively.  Server virtualization is all the rage, and virtualization has replaced “web 2.0″ in every vendors lexicon.  The advantages of cloud based virtualized servers don’t just apply to your IT department.  They also apply to the people that send spam, control botnets, and evade security.  Here’s a look at the darker side of virtualized server applications.

Spammy Spam Spam

Late last year SpamHaus added both Amazon’s EC2 server IPs and Slice Host IPs to their block list. SpamHaus’ DNSBL is widely used, and this meant that users running mail servers on EC2 or Rack Space servers suddenly found their email being refused.  This left the EC2 users in quite a conundrum.  Email administrators weren’t going to stop using SpamHaus.  It was reported that Amazon’s “hands off” approach exacerbated the situation, hurting both EC2 customers and Amazon’s reputation.

Distributed Architecture

A scalable email architecture

To give you an idea of how serious this is,  as a test, GeeForce stopped using SpamHaus, but retained the excellent NJABL and SpamCop’s RBL.  In just a few hours we saw mail server CPU loads increase and a sizable increase in the number of spam messages that were making it into the queue.  While they  got tagged by spam assassin or dspam, these emails would have never made it past the inbound smtp server (ie Zone 1) with SpamHaus in place.

Today most spam is sent by computers that have been compromised and are part of a botnet.  The rise of virtualized machines, coupled with the ability to script the start and stop of servers, have provided spammers a new tool.  Now they have the ability to turn up servers with a lot of bandwidth quickly, spam through them and then take them down.  The next time an instance is started it has a new IP address.  This makes it harder for RBLs to be effective, with more “collateral damage” as a result.

Sending spam isn’t the only way spammers can use and abuse virtualized servers. Virtualized hosts can run bots to collect email, post spam to blogs, and to click on advertisements. Content can be hosted on virtulized servers with lots of bandwidth and that content can be moved, updated, and changed incredibly quickly thanks to programming hooks that virtual server providers build in.  That means that the phishing or malware site that you block can change it’s location in a heartbeat.  Unlike cheap shared hosting these websites have the bandwidth to deal with large influxes of traffic.

The same quick provisioning and moving of services also applies to botnet command and control.  Virtual servers can be used to control large botnets, or even build large botnets.  Why go to the trouble to compromise machines with low upstream bandwidth when you can rent machines for pennies that can push more bandwidth than most companies have?  It’s not difficult to script and turn up several large servers that would only be live for a short time, but with enough bandwidth and throughput to wreak havoc on almost any website.  MySpace does this for testing, others can do the same exact thing with criminal intent.  An extortionist only needs to bring a big company down for 15 minutes to make their point.

A powerful dedicated attack machine

One of the primary aspects that makes virtualized servers so attractive to “hackers” is the ability to control large amounts of CPU power and bandwidth relatively cheaply.  With the widespread use of pre-paid credit cards, purchasing that power can be anonymous.  This provides the Internet attacker with a scalable, scriptable attack machine.

Imagine the power the hacker feels.  Dictionary attacks can be launched against multiple targets, vulnerability scanners can process thousands of URLs, and the results mailed off site for the attacker to peruse at their convenience.  Attackers are no longer limited by the fact that most consumer upstream bandwidth is relatively meager or limited CPU cycles dedicated to an attack.  Today’s virtualized servers can be optimized for a specific attack, with most of the machine’s CPU dedicated to that end.

Location obfuscation

Example of using virtual servers to circumvent content monitoring

Example of using virtual servers to circumvent content monitoring

Our analysis of a recent attack against a web based email program showed that the penetration attempt came through a secured proxy running on a virtualized server.    The server was taken down within an hour of the attack ending. These types of attacks often run through several servers chained together.  Now instead of relying on a number of compromised machines with wildly variable latency between them, attackers can script the provisioning of several machines through several accounts all with low machine-to-machine latency.  Add a couple of compromised consumer hosts in the middle of these chains and it becomes next to impossible to find the perpetrator.

Virtualized servers can also be used to avoid content filters, firewalls, and other related devices.  The ability to easily route through multiple machines gives IT security officers a new problem to worry about.  Traffic to a virtualized server might be legitimate, or it might be a user trying to circumvent the corporate network controls.

This type of proxying and network obfuscation has it’s legitimate purposes as well.  Just as MySpace uses cloud servers to load test their application, there are Asian political activists that use virtual cloud servers to circumvent government censors and spying.  Using several virtual cloud servers combined with CD/DVD based operating system distribution like dsl (Damn Small Linux) provides a layer of security for the activist that is very hard for a government to penetrate.

The unintended consequence

The human imagination is a wonderful thing.  In the coming years we will see virtual servers used in ways that the developers of virtual environments never envisioned. The side effect is that some of those uses will be dishonest and criminal.  For providers of virtual servers the race is about to begin in earnest to secure their servers against malicious use.

The consumers of virtual servers rely on the trustworthiness of the provider.  Once a provider earns a reputation for poor security and response, network admins can be expected to block traffic from that provider.  If such blocks become common place, then the virtual servers allocated by that provider are useless.  Just as some email admins had to move their email out of the EC2 cloud, a poor security track record will force users to move their cloud based servers back to their own data centers or to other providers.

Cloud based virtual servers are a double edged sword.  To keep from getting cut, IT professionals will have to modify their tactics, security policies, and update their assumptions. Virtual servers provide attractive pricing for high bandwidth servers that can be “scripted” into existence as needed for almost any application.  Unfortunately some of those applications can be malicious and securing your cloud server harder than you imagine. Providing good recommendations to your company means understanding the threats as well as the benefits created by the expanding virtual server landscape.

UPDATE 4-17-2010: Since posting this article there have been 6 responses.  All of them spam, and 4 of the 6 responses came from Amazon EC2 servers. The irony, spammers using bots on EC2 servers to try and post spam to an article about  spammers using cloud based servers.

IPCop

No comments

A surprising number of small businesses have connected their office networks to the Internet with consumer grade NAT routers.  These devices are inexpensive and easy to setup, but they lack features that most businesses should have.  These devices are often left unsecured with default passwords and access levels.  More importantly when something goes wrong they are useless when it comes to helping the network admin (or consultant) diagnose an issue.  Luckily there are several open source solutions that provide firewall protection and so much more.  One solution that I have found to be useful and extremely flexible is IPCop.

IPCop is a Linux NAT firewall distribution that is built on Linux from scratch. It has its own easy to use web based interface and most importantly a large and well developed set of add on tools.  The current version is 1.4.21 and a new version should be out this year with even more features.

IPCop was designed to be used on “older hardware” or very low powered hardware. There are people running IPCop on original Pentium class machines without issue.  Since you will probably want to take advantage of some of IPCop’s add ons, I highly suggest a more modern machine.  In today’s environment one can build a brand new Atom based system, or get a whitebox, or even an off lease deal from TigerDirect or NewEgg and get a perfect machine to be your office’s router for $300 USD or less. The old Dell that used to be at the front desk that needs a new hard drive, but is under your desk might be a good candidate too.

A smart setup for a small office involves the base IPCop setup plus the addition of the url-filter and update accelerator.    The two addons provide great functionality for businesses.  Objectionable or inappropriate content is blocked from all work stations via url-filter, while anti-virus and windows updates will be cached locally with the update accelerator.  Together with transparent web proxying businesses with limited bandwidth get a bit of a network performance boost with this setup.  It’s especially effective when large windows updates have been pushed out.  [Take the time to purposely setup one PC to upgrade before the others in the office.  This "preloads" the cache so that no other computer has to go to the internet for OS or AV updates]

There are lots of other addons to choose from and IPCop has some great features built in, including the ability to set up site to site secure VPNs. IPCop provides basic qos settings, traffic graphs, and connection tracking.  Setup can be accomplished painlessly in less than 30 minutes (10 minutes if you’ve done some planning), has no “default passwords”, and I’ve personally had an IPCop machine with more than a year of uptime (ups mandatory) on  a heavily loaded fiber internet connection.

IPCop plus an old PC or a cheap PC is an excellent, secure, cost effective way to protect a small network.  The capabilities are easily extensible and it’s powerful enough to give some big name commercial security products a run for their money.  IPCop is highly recommended.

A majority of dynamically created websites on the web today are backended* by MySQL.  Even though NoSQL solutions like project voldemort are all the rage, for a majority of people not doing Facebook type traffic, MySQL is still going to be the backend of choice.  In other words, the reports of MySQL’s death are greatly exaggerated and that means scaling MySQL is still going to be a talent and skill required for web application architects.

Which MySQL Cluster?

Often when you see web based application white boarded the entire DB backend is referred as the “SQL cluster”.    When you’re dealing with MySQL that could mean many things.  There is a high-availability, high-redundancy version of MySQL called “MySQL Cluster”.  The non cluster versions of MySQL can replicate data to multiple SQL servers via a master/slave relationship and multiple servers set up in this fashion are often called a MySQL cluster.  Do not confuse the two! The official “MySQL Cluster” only supports one type of storage engine – NDBCLUSTER. So if you develop your application to use MyISAM or InnoDB then you have to perform some major rewriting or some  other surgery for your application and data or you’re going to be out of luck.  For many environments that makes the official “MySQL Cluster” a show stopper.  If you’re not using data from a current  MySQL Cluster, or you haven’t been coding/creating with the NDBCLUSTER engine and it’s limitations in mind from the get-go, then using the official MySQL Cluster is a no-go.  This product is a specialized version of MySQL with it’s own quirks and it’s use must fit the problem you are trying to solve.   For this exercise in scaling we’ll use the regular MySQL and not the clustered versions because the databases and application code were all designed around InnoDB.

Our MySQL Scaling Goals

Our scaling goals for this project are simple.   In our configuration the application has been well thought out and it expects a read only database for read only queries and a write database for everything else, we want to take advantage of that.  We also know that read traffic is executed seven to ten times more than write traffic from testing.  The first goal is high availability.  We don’t want to change any web server config files or have our application wait for MySQL to timeout before switching to another SQL server.  Switching from a slow or down server has to happen automatically.  The second goal is higher performance. In our example we have 4 servers available for our backend, plus a monitoring server (build monitoring into your application architecture upfront and save yourself the downtime later).  The final goal is the ability to grow to meet demand.

A quick note about our goals: If you divorce your application from the database architecture you won’t be able to have an application that scales or performs very well.  In this article we’re looking at a pure backend solution, but what that architecture looks like was dictated by the application itself!  In the real world high performance applications should be able to take advantage of a caching layer provided by something like memcache and code needs to be designed from the get go to look at multiple SQL clusters or to separate read queries from writes etc. In many cases memcache alone could replace or mitigate the need for more SQL servers.  Relying on pure MySQL replication to scale only gets you so far and there is a point of diminishing returns.   Kellan Elliot-McCrea from flickr brings those points home in his article “using and abusing mysql“.

Getting the right tools

The very first thing we need to decide up front is, which compiled version of MySQL do we want to use?  Do we want to compile them ourselves? Do we use our vendors binaries or the pre-compiled binaries from MySQL or do we want to look at one of the MySQL project forks?  Here’s my advice, for most people in small, low transaction environments use what your vendor provides or the official MySQL built binaries. The releases are well supported and updates are rolled out on a regular basis. When you start needing other capabilities or need to squeeze more performance out of your SQL server, then it’s time to look at the high performance forks.  In our case we’ve been very happy with the percona MySQL builds, especially the ability to use their XtraBackup program.  This makes setting up MySQL slave servers easy and much faster, especially with larger data sets and InnoDB tables. (In actual testing doing a raw mysqldump and setting up a slave server with a 47G data base took almost an hour, using XtraBackup the same function took less than 15 minutes on a rather vanilla server).

MySQL Servers in Waterfall Master/Slave setup

MySQL Servers in Waterfall Master/Slave setup

Get, use, and love MMM (Multi Master replication Manager for MySQL).  It is a collection of scripts that performs automated fail over of your MySQL cluster in much the same way as UltraMonkey does for other services.  The advantage of MMM is that it is specifically designed for MySQL.  It allows you to define servers by their role (writer or reader).  With MMM only one node is writeable at a time, this prevents data getting out of sync in large waterfall environments. Reader roles can be balanced across several servers.  More importantly MMM will detect if a server’s replication is running behind and remove it from the being queried, until the servers replication catches up.  In the real world this is a life saver.

Get the maatkit tool set and install it on all your MySQL servers.  This toolkit should be de rigeur for any MySQL installation that has replication.  It is a collection of scripts that allows your DBA to more easily manage MySQL.  It has hooks built in for memcache and postgres as well. Like MMM it is a project that grew out of google code.

The Architecture

We set up the first two MySQL servers in master/master replication mode.  Here’s the twist, we will probably want to add more master SQL servers to the cluster later on, so plan for it now.  You can add several MySQL servers fully synced in a water fall style configuration. When you create your my.cnf file configure the auto_increment_increment to a value of two times the expected number of master servers.  So if you expect to only ever have five masters in replication, ensure that auto_increment_increment=10.  This allows you to add more servers to the cluster with a minimum of downtime.  Never set auto_increment_offset to zero and no two servers should ever have the same offset (common mistakes).

Our decision here was to have two servers in master/master replication with each master server having it’s own slave.  With a read load seven times the write load we need to spread those selects across the cluster.   This is where MMM really shines.  The read load is spread out among all of the machines while the write load is quarantined to the master servers alone.  The cluster can handle a huge read load and is orders of magnitude faster under load than a single server, satisfying the performance goal. If a server goes down or starts to fall behind in replication, it’s removed from the cluster so it has a chance to catch up.  This happens automatically and without intervention, satisfying the high availability part of our goals.

Final MySQL Architecture

Final MySQL Architecture

We satisfy our scalability goal by planning the architecture to grow upfront.  If we see a spike in read traffic we can add more MySQL slaves on the fly.  If we see the need to spread out write traffic, we can add more master servers.  Proper monitoring and logging provide those statistics.

Since we’ve planned for more masters up front we don’t have to restart each server.  The ability to add a master sever on the fly without taking down the entire cluster is what makes MMM and the Percona Xtrabackup tool so critical.  When we run the Xtrabackup tool it provides us the logfile name and position as part of the output!  That means we have all the information required to setup and start a slave, performed in one action.  We use the MMM scripts to take servers in and out of service and also monitor their status.

Caveats

The architecture offered here was for a specific problem where we had some good metrics.  If the read vs write traffic was more even we would have set up the servers in a waterfall configuration.   All of the servers were using directly attached storage utilizing SAS drives in RAID 10.  The databases were small enough so directly attached storage provided the best redundancy and performance for the cost.  Once you start talking BIG databases then one needs to look at SAN architectures and ensure those considerations are baked into any design.

*While backended isn’t really a word, it perfectly describes what we’re talking about.  Please feel free to use backended in your next database or application discussion.

Teamwork

3 comments

Often part of the magic of working in fast paced environment is the people that you work with.  Below is a true story about one of those moments. I hope you have as much fun reading as we had doing.

The Scenario

A brand new golf course had been selected to host a PGA tournament.   Since the local ILEC had not released an easement,  the trailer for tournament staff could not be fed by the in ground fiber network yet. To make sure that no deadlines  were missed, a corporate partner had a portable 80 foot tower trucked in to provide full network connectivity back to the primary club/regional network core for these temporary offices.  This same tower was also used to feed the course’s Golf operations which had just started up.  The  tournament was moved to a new location at the last minute but the trailers were left to provide power for the wireless link, and a network back to the primary club for golf operations.  The company decided to move the trailers, but the golf operations staff still needed their corporate network access.

How it started

The corporate partner that provided the tower had the engineer who had helped deploy the tower the first time in town for another meeting.  I call him up and tell him we need to go look at this thing and see when we can get this move scheduled. On the way over a project manager from corporate IT calls me and wants to meet before he goes back home (a 2 hour drive), and we agree to meet in the field at the golf course. At 5:30pm we all meet and stare into the sky.

The golf operations personnel had impressed upon me how much better their job was when they had proper access to their corporate applications.  Since the engineer responsible for the tower wasn’t going to be available again until two weeks hence, we were going to miss the deadline.  All people on hand were in agreement to not have the local operations down for any time. It was decided to redeploy the tower that night.  The tower is erected behind the tournament trailers, and is in very soft sand.  Going up to 80 feet high, and weighing well over 12,000 lbs we figure that the project manager’s diesel F250 4×4 would be perfect to move this beast.  We were almost right.

First we have to tear down the current trailer/tower, which involved lots of digging, moving of jacks, moving concrete pads, large steel beams with grimy grease, climbing on the tower, bringing the tower down into “travel position”, and using muscles we forgot we had.  Even with 3 guys this was a time consuming process, did I mention it was dark?  You never realize how many things there are to trip on in an empty lot until you do it a few times.

Finally ready to move the tower we backed the truck up, hooked on, and promptly moved the thing 1 foot before burying the truck up to its axles. We dig the truck out, re-position the tower to help put more weight on the back wheels, and move it another one foot or so before burying the truck again.  We try packing the pathway, digging a small trail, adding weight over the rear wheels, nothing works.  It is now 8pm at night.

I call the project manager for a nearby construction site and ask if we can use one of the pieces of equipment on the job site to get this done.  He makes a few suggestions as to which equipment might work best and where we might find  keys.  This involves two of us, climbing all over equipment looking for keys.  Since we don’t have flashlights, we’re using the screens of our blackberry as lights.  This attracted the attention of the local security folks, who, while professional, made sure that we weren’t up to no good.  Needless to say we had some “’splainin to do”.

Finally we are able to get a lull (4 wheel drive Combination crane and fork lift) started and drive it over to the tower.  Did I mention it was dark, and the lull has no lights?  Driving heavy equipment by Braille at night is always entertainment! Since the fork lift has no tow hooks we have to hook a chain around the front of the lift to the front tow hooks of the Ford.  The combination of the lull, plus the F250 4×4 is now able to move the tower to its new home, 75 feet away.  Now all we have to do is set it all up again, re-point the antenna to something that is only 12 inches square and more than 2 miles away – at night working under the headlights of the truck.  Easy…..

Luckily the corporate IT guy has every tool imaginable, including crimpers, screwdrivers, pliers, mastic tape, zip ties, gel filled scotch locks, punch down tools, WD-40, gear oil, and most importantly a 5 pound hammer.  We got to work re-stabilizing the tower, getting the concrete pads moved, and getting it level and tieing it in.  Breaking it down was a piece of cake compared to the work it took to get it set up again and level.  Thankfully we are able to put that 5 pound hammer to good use.  We had to perform an emergency repair on the feeder wire that usually hangs 60 feet in the air.  When we were done with the repair it had more splints and emergency wrap than a boy scout going for his first aid badge.  We take an educated guess at the proper antenna alignment and then we hoist the whole assembly into the air 80 feet.

The Moment of Truth

We plug the golf course operation’s computer in and amazingly it works the first time.  Even better, we were seeing speeds that are easily 10 times what other remote offices were able to achieve when on the corporate WAN.

Consulting

Providing superior service is more than just raw technical knowledge.  It involves worth ethic, knowledge, drive, people skills, and often a good dose of creativity.  All of these are traits that typify GeeForce consultants.

No job is complete until the documentation is done.  That is just as relevant in corporate world as it is in consulting (and often just as ignored by some **ahem**).  The de facto standard for producing network documentation has been Microsoft’s Visio.  This program is part of Microsoft’s Office suite and lots of vendors make network templates for Visio.  Until recently there wasn’t a good alternative and then came Gliffy.

Gliffy is a cloud based alternative to Visio. The web based program is easy to use, comes with good templates, and best of all the free version is good enough to get you started.  It’s user friendly and the API is extensible to other web apps.  There are plugins for wordpress, jira, and confluence available now using the Gliffy API. Even though Gliffy considers their application beta software, it’s better than a lot of companies release products.

I’ve used Gliffy drawings at GeeForce.net, for clients, and to document my own networks.  It’s not a total replacement for Visio but for a majority of the documentation that I do Gliffy fits the bill.  Visio offers more templates and when one is planning and laying out a rack environment it’s almost impossible to beat Visio, especially when the vendors provide Visio templates for their equipment. Other than that niche, Gliffy is replacing Visio for producing network documentation, flowcharts, and basic room layouts (floor plans and data center space planning).

Bottom line, Gliffy is a great program that is easy to use and exports documents in either png format or as a Visio file.  You can use it just as easily for a quick “back of napkin” drawing to throw into an email or for a full blown Enterprise wide planning document.  The only thing missing is vendor templates and that is minor, more of a nice to have, not a must have feature.  For five dollars a month you get the pro version and it’s worth every penny.

According to a study done last year by Forrester Research nearly half large enterprise are “evaluating alternative options for managing and providing email”.  Why?  It’s relatively easy to build a highly available, highly redundant email system that can support tens or hundreds of thousands of users easily with free software. The answer to the”why” is a bit complex and different for every company but the leading cause for email headaches is poor architecture.  Most corporate email systems evolved from a single box.  In a lot of SME’s there is only “the mail server”. That ideal coupled with proprietary software has lead a lot of companies down an unsustainable email path.

A lot of email problems simply go away if the system architecture has been well designed.  The architecture that we lay out here took into consideration ease of email management, high availability, storage growth, data retention, and retrieval.  It is based on open source software, but the ideas and architecture can be applied to proprietary solutions with some modifications.

The analysis of this email problem started by breaking out each action of a typical email transaction (both delivery, management, and retrieval) into very specific tasks and then based on our requirements decide where those tasks belong.  We try to push task intelligence to parts of this clustered design where they make the most sense and provide the most benefit.  The key here was to never create a single point of failure and architect the design so that each task can be scaled seperately from the other tasks.  That way adding another layer of spam protection doesn’t require a total redesign.

Our solution creates 4 zones;

  1. Inbound Zone (SMTP servers facing the Internet)
  2. Storage Zone (Mail delivery and SAN)
  3. Client Zone (Webmail & IMAP servers for client access and outbound SMTP servers)
  4. Business Intelligence Zone (Archival, Tiered Storage Decisions, Company Wide Searches)

Common Data Between Zones

There are some elements of your email infrastructure that are required to be understood across all zones such as valid usernames, while other information such as password, or mailbox location only needs to be known by some of the zones.  The user information can be stored in a SQL or LDAP server and the information is replicated to each zone.  The data stored in SQL or LDAP can be used for other applications not related to mail such as user authentication, instant messaging, and billing.  In some Enterprises this requires the user SQL/LDAP layer to be pulled out into it’s own environment in others it requires a hybrid LDAP/SQL solution.  In our sample architecture the system in question relied on MySQL and replication was used on each machine to provide a local SQL store.

Zone 1 : Inbound

Inbound mail servers are defined in a domain’s DNS and it’s simple to delegate multiple inbound servers.  In the classic single box solution, there is only one inbound server.  The single server has to handle all inbound connections, all filtering, the mail store, and client connections. When the single server is flooded with lots of traffic, that traffic eats up resources and  ruins the end users email experience.  In the properly architected solution the load of incoming traffic is spread out among multiple servers that can be geographically diverse.

The inbound servers are also the first line of defense against unwanted mail.  The ideal is to prevent all suspect mail from ever making it into the mail infrastructure.  Why waste the end user CPU cycles, or mail storage on spam or virus emails?  In this configuration the inbound servers protect the mail store from unnecessary email traffic. After processing the accepted mail the inbound servers hand the email off to the mail store over a private network and deliver messages via QMQP or SMTP, adding another layer of protection as those connections can be throttled by the mail delivery servers to protect the mail store allowing the zone1 servers to act as a buffer during extreme traffic conditions.

Zone 1 features:

  • Inbound servers have their own mail queue so that they can store mail if Zone 2 goes offline for any reason
  • Inbound servers make decisions on accepting connectivity via real time black lists (RBL)
  • Inbound servers make decisions on accepting mail for users during the SMTP transaction (don’t accept mail that has to be bounced later)
  • Inbound servers handle SPAM and Virus tagging before handing messages to Zone 2
  • Virus & spam analysis can be offloaded to other servers if the load is too high on the inbound servers providing an easy solution for additional capacity by simply adding more machines (virtual or otherwise) to the zone.

Zone 2 Storage

The mail store consists of 2 parts, the delivery machines and the storage area network (SAN).  The delivery machines receive email from Zone 1 and store in on the SAN, following any user specific delivery rules.  Unlike other systems the mail sorting is done during delivery.  This reduces the number of times a message “moves” around on the file system, and requires less handling. Both front ends mounted the same SAN share using a distributed file system (gfs2).

In our system the delivery machines were also the master SQL servers in master/master replication and master/slave replication to the other zones.  All user updates, adds and deletes are managed via a web interface attached to the SQL servers in zone2.  All of the zone 1 machines were pointed to a single IP, and the two delivery machines run in high availability mode with load balancing.

Zone 2 features:

  • Storage growth is handled by the SAN & choice of File system.  Simply add more storage and then grow the file system.
  • Tiered Storage can be provided by multiple SANs.  A high performance SAN for recent email and a slower but larger SAN for archival purposes.
  • Delivery rules are stored and executed during the first delivery.
  • Delivery can be scaled by adding front ends to either a common distributed backend storage or multiple common backends.
  • The SAN is fully mirrored.  Should the primary SAN fail the backup SAN comes online automatically.  File system mirroring is handled at the SAN level.
  • Since each clients mail store location is kept in a SQL server the ability to migrate from one SAN to another can be done “online” with no downtime.

Distributed_Architecture

Distributed Architecture

Zone 3: Clients

Zone 3 is the end user zone.  This zone takes care of webmail, smtp relaying (outbound), and imap clients (outlook & smart phones).  In our configuration there are two machines that mount the same SAN and run 3 services IMAP, HTTPS, & SMTP.  The 2 servers run in loadbalancing/high availability mode.  In this case the traffic combined with webmail load was light enough to combine all of the client services onto single machine.  Each client service can be easily moved to their own server providing scalability.  This zone deals entirely with internal client requests.  If a client receives, checks, or sends an email, regardless of device (laptop, phone, etc) it goes through this zone.

Zone 4: Business Intelligence

This zone mounts the same SAN and handles things like auto archiving, indexing of emails for better IMAP performance and other functions the touch your email but whose primary function ISN’T email.  Email management tools live in this zone (Web based in this case). The advantage of having a dedicated business intelligence zone is that this provides for application specific functionality and connectivity without adding to the performance requirements of any one specific area of typical email transactions.

Examples of good use zone 4 include document management software that indexes company wide emails.  This types of indexing becomes invaluable when discovery orders are issued or an executive leaves under dubious circumstances.  Custom reporting on email usage and quotas organized across corporate divisions provide reporting that enables IT to make rational choices on where resources will be best spent.  This zone is also where programs designed to automate tired storage and auto archiving decisions need to go.

Having one place to go to write/execute that intelligence provides an enterprise the flexibility that they need when addressing email specific issues AND it does it in a way that minimally impacts email.  A perfect example of what happens when you build that intelligence into the wrong place would be an auto archive program that a certain hypothetical email admin might install for their enterprise.  The auto archiving is too aggressive in it’s endeavor to archive everything older than (x) days (the default setting), leading to a huge slow down in the enterprise’s email delivery. The helpdesk phones won’t stop ringing and one can expect the fainter of heart support staff to be reduced to quivering piles of jello in a cubicle.  In the enterprise clients get cranky when the email doesn’t work.  When things finally get caught up the legal staff shows up on the admin’s doorsteps with pitchforks and torches.  Not Good.

Some system architects or vendors want tiered storage or auto archiving to live on the primary mail store, or in storage.  The issue is that neither of those areas has the native intelligence to understand how users use, or are required to access to email better than the user.  It gets hard to tell your SAN which users email folders needs to be faster; For example the CEO that refuses to archive and calls when searches take more than 5 seconds or try to have your mail server define which email documents are connected to a legal case. Business intelligence isn’t an oxymoron until your SAN decides which email is archived for you.

Design your business intelligence where it belongs, and where you can react quickly without impacting the primary function of your email system, which is to deliver mail.  When you tie it all together you have a low maintenance highly scalable email solution that a Fortune 100 company would be proud of.  All it took was a little bit of up front thought to design the proper architecture.

Yesterday I had the pleasure of being one of four speakers at the network storage event sponsored by the CFITS (Central Florida Information Technology Society).  All four presentations have been put together in a single power point that is posted on the CFITS website.  I’ve included a flash version of just the GeeForce slides here.  The great thing about events put on by CFITS is that they attract some really bright people and top tier vendors.  While all of the presentations were good two stuck out in my mind.

Xiotech gets storage

I enjoyed the Xiotech presentation by Peter Selin whose presentation followed mine.  His emphasis on true Total Cost of Ownership (TCO) calculations and understanding how the applications use storage dovetailed very nicely with points I had made earlier.  Xiotech’s presentation went a step further and went into application tuning and how that affects storage performance.  A good part of the presentation was “SSD  facts or fiction”.  There was an enlightening graph on  SSD (Solid State Drives) sustained IOPS  vs Time.  This was nothing new for those of us with SSD server experience, but an eye opener for a lot of people in the room.

If you’re unfamiliar with Xiotech’s concept, then now might be the time to explain what they do.  Xiotech looks at storage like a “black box”.  It doesn’t matter what’s in the box – what matters is the capacity, throughput and reliability of the data storage.  Their solutions utilize Fiber Channel and provide the foundation for a high performance SAN (Storage Area Network).   One of the most unique aspects is that the end user no longer worries about individual drives and data redundancy. Data redundancy  is taken care of by “the box”.  To add more capacity, add another box.  This moves intelligence out of controllers and applications into storage where it belongs.  Just like intelligent networks, having the right intelligence in the right place makes a lot of sense.

I haven’t had a chance to test or use the products but their architecture deserves a very close look when high performance storage is called for.  The company is talking about Fiber Channel over Ethernet in the future and I hope that they also look at Ata Over Ethernet (AoE) as well.

Cisco goes after the datacenter

Cisco’s presentations always pique my interest.  This is a company that spends a lot of time figuring out how to produce a better mouse trap (or buying the company that has) and it shows.

Network and Storage Cisco’s approach is a continuation of the approach that it helped pioneer, network convergence.  Yesterday’s converging voice, video, and data via IP is passé; Cisco is now converging the SAN/LAN (Local Area Network) networks into a unified fabric.  With Fiber Channel over Ethernet the same network is used for SAN and LAN connectivity, simplifying cabling and switches.  With Cisco’s Fiber Channel/Ethernet modules for their Nexus class switches, Cisco is providing a bridge between the current SAN and LAN.  With 10GE (Gigabit Ethernet) networks already here and 40GE just around the corner, the writing is on the wall.  Eventually all LAN & SAN traffic will be carried on the same network.  Robert Metcalfe’s invention lives eternal.

Servers and Virtualization This part of Cisco’s offering is where we see radical innovation. Cisco doesn’t have a history of building servers so their approach is clean sheet  and unique from what I’ve seen from other vendors.  What Cisco did was look at large virtualized environments holistically not just focusing on server, storage, or network individually.  Cisco has tried to converge and unify many components of a large virtulized environment and build management into the entire environment from the get go.  They call their approach the Unified Computing System or UCS.

The UCS structure combines a unified (or should we say converged?) 10GE network fabric with unique super high memory blade servers that can support up to 384 GB DDR3.  The management of the entire structure is built in.  Cisco provides for a virtualized switch within each blade, each virtualized server can be centrally managed in it’s entirety.  Moving a virtual instance from one blade to another becomes simpler because the network moves with the instance and doesn’t require reprogramming the switch.  Cisco’s approach will change the entire management experience for large virtualized environments.

Both presentations have given me a great excuse to deep dive into the vendor’s technology and applications thereof.  Cisco is showing off their C-Series servers in Orlando on March 9th (register for that even t here) and there is some good reading material over a Xiotech.  Keeping up with new technology and it’s application is one of the things I enjoy most about my job.

For a lot of places in the rural US, the only game in town for WAN connectivity is the local telephone company, know in the business as the ILEC (Incumbent Local Exchange Carrier).  Even if you didn’t get your Internet, ATM, or Frame network from that carrier the local loop (The piece between you and the telco central office) always went through the ILEC.  For businesses in a rural market that means that many types of service are priced much higher than the same product in  another market where there is competition.

My client was a local ISP that was planning an expansion.  The conventional wisdom was to build a hub and spoke network using the ILECs frame relay resources (This was before metro ethernet, MPLS, and other services were available).  This would carry the ISP’s customers Internet traffic back to their network core were it would then get to the Internet.

Telco Network Design

Proposed Telco Hub & Spole Network

This network design originated with the local ILEC sales and engineering team. They were pushing hard to lock up their customer in a multi year deal.  They pitched this network design as something that could grow with the client as the market grew and by signing a three year contract that ILEC would wave installation charges.  The same type of network was also pitched to the client from the CLEC (Competitive Local Exchange Carrier) where they were colocating their equipment.

The business side of the company just wanted to pit both providers against one another and see which one came out on top in price.  I approached the problem from a different route.  What’s the best network for the type of traffic we were expecting?  That analysis was pretty simple – being an ISP the client’s customers wanted raw Internet access.  The only traffic coming from the users to the network core was primarily for email. There was no VoIP traffic, video traffic, or VPN requirements.

The next step was to analyze traffic usage against the proposed network.  Once I had those requirements it was very clear that a hub and spoke network didn’t make sense.  Why bring data back to the most rural point in the network that had the least amount of redundancy?  Why add an extra hop (or 2) for customer traffic to get to the Internet when tier one providers sat right next to the clients equipment in the other CLEC data centers?

Final Network Design

Final Network Design

My proposed network design was to have each location directly connected to the Internet.  This gave the clients userbase the most direct connection to a tier one Internet provider and didn’t bring each region down if one location had a problem.  Any traffic that was destined for the clients core came across the Internet.  By leveraging the Internet providers and the CLEC I was able to provide four times more bandwidth to each location than I could get guaranteed with the hub and spoke network at a lower cost.  By challenging the conventional wisdom we were able to provide a more robust network for the client, that was also less expensive and easier to expand.

One of the great frustrations for road warriors, students, and travelers of all types is the need to get online while on the road without WiFi.  Usually this involves buying a separate USB based HSDPA modem (ie cellular modem), or dongling your smart phone to your laptop. (That’s if your smart phone and cellular provider allow it).

That was until JoikuSpot.

JoikuSpot is available for your Nokia Symbian or Linux based smart phones that turns your device into a mini hotspot.  This application is a paradigm changer. There is even a free “light” version  that only allows you to only access http/https content with a forced landing page.  That’s fine for the occasion user but most users will want the full blown version.  Without wires or extra devices you’re able to tap into your phone’s connectivity (3G for me) and get online in record time.

Verizon has attempted to solve the same problem with their MiFi solution, but why do I need to purchase and carry around another device?  With my current service provider (AT&T) I was able to use my cell phone while being online at the same time. With the prices coming down on unlocked E63 and E71 devices and good reviews coming on on the N900, the combination of JoikuSpot and a Nokia phone will be a must have for the hard core road warriors in the know.  If Nokia was smart they would bundle the application with their phones.

JoikuSpot isn’t just handy – it’s the holy grail for staying connected on the go.

One of the most exciting times for any company is  new construction. With so many systems network aware, ensuring that the your project’s IT needs are being properly looked at is more critical now than ever before.  Today a building or office’s network may carry traffic for voice, video, data, security, automation, power control, and HVAC (Heating Ventilation & Air Conditioning).   Unfortunately your architect(s) may not be aware of what that  integration involves and you will end up paying for that ignorance.

IT is very often an afterthought in the design process, squeezed in after a space has been laid out and all other systems have been added.  This leads to added costs, network compromises, change orders,  and future problems due to poor planning over a building’s life cycle.   Most of that can be avoided with careful upfront planning and coordination.  That means more than having your IT people talk to your architect, it means having people represent you that have construction and building systems knowledge.  Just as important is an inspection regimen during the construction process to ensure that the infrastructure is going in correctly.  A well planned design will still cost you time and money if it wasn’t installed correctly.

While a blog post is too short to cover every detail, below are some of the most common mistakes that we see from architects.

  • The primary IT room are often placed in the furthest corner of a building when they should be placed as close to the center as possible
  • IT rooms often are under powered and don’t have AC.  IT rooms need an abundance of power and HVAC.  For most larger structures a redundant AC system should be designed in.
  • Architects often design runs between rooms that are too far for common Ethernet.  If network resources are going to be further than 85 meters then intermediate network closets (IDF) need to be added to the design.
  • Roof top spaces and voids are usually ignored.    These spaces are ideal for wireless deployments and planning the space for future uses will provide an owner with maximum flexibility without a lot of expense.
  • Rooftops have no provision for IT needs. Ensure your architect designs in penetration points with pathways back down to an IDF , MDF, or telco closet.
  • Architects often don’t know what systems have network capability.  All systems should be reviewed for network awareness and provisions made for connectivity even if the network controls are not planned to be used. (Has your IT guy talked to your security vendor, your local service providers,  the engineers responsible for designing your building’s HVAC systems?)
  • Architects and contractors often only include duct or pathways for what is on the drawing or required by a local service provider.  Always include 2 spare ducts from the curb to your building’s IT room or service entrance.  Always put in a spare duct with pull string between MDF & IDF locations.

These are just a few of the suggestions that will save your company time and money when it comes to new construction.  These common sense suggestions apply to almost every type of structure from an office building to a luxury hotel. Don’t forget that a lot of these suggestions apply to retrofits and building out a space as well.  Good project management and oversight during the design process right through to construction and building occupation will save you money up front and in the future.