Sometimes you just hit a dead end. It can happen when you are out driving, and also in the rapidly changing world of IT.
Clothing retailer Burlington Coat Factory, for example, used Sequent NUMA-Q (Non-Uniform Memory Access) servers for its Oracle database. IBM bought Sequent Computer Systems and decided not to develop new NUMA-Q products.
“The capability of the Sequent systems compares favorably with other servers,” says CIO Mike Prince, “but it’s a dead end for us in terms of being able to upgrade or support new software.”
So, when it came time to migrate, Burlington’s data center moved to Linux clusters on low-end Intel boxes, significantly reducing costs in the process. PolyServe Matrix Server is used to manage the cluster’s file system.
Linux Early Adopter
For three decades Burlington Coat Factory (BCF), based in Burlington, N.J., has built a reputation of providing high-quality clothing for far less than department store prices. Its IT department takes a similar approach when servicing around 7,000 computers users.
BCF was an early adopter of Linux for use in the retail environment. The company now has more than 2,000 Linux systems in place, including point-of-sale units in more than 300 stores.
“We have been very aggressive about moving toward Linux, mostly on small servers or combination server/desktops,” says Prince. “The stores all use Linux.”
While other retailers generally choose Windows, Prince believed his staff possessed the necessary skills to support Linux, allowing BCF to plow the money saved into IT upgrades.
At the core of operations is the data center. It runs the whole business, from the loading docks to lawyers to in-store handhelds. Driving all this activity are 40 Oracle 8i databases residing on the NUMA-Q servers.
“We are an Oracle shop in all respects,” says Prince. “We use the database for everything — ERP financials, HR — we even use the Oracle tool set for building new applications.”
Cost or Commodity
Burlington’s move to Oracle 9i prompted an infrastructure upgrade. Rather than purchasing another UNIX mainframe, it opted for low-cost, commodity, Intel-based servers running Linux.
“In these challenging times it is more important to cut our costs,” says Prince. “Linux and clustering has overwhelming economic appeal.”
Making this possible was the release of Oracle 9i Real Application Cluster (RAC). 9iRAC shares a single database across all servers in a cluster in such a way that it acts as a single-instance database. Should one server go down, the rest continue to run. This simplifies administration in a clustered environment since actions like backup/restore are done as if operating a single server.
“Previously, we had to write business logic to address the issue of a node or a link going down,” says Prince. “With RAC, the coordination happens in real time and the cluster development is transparent, so there is nothing the developer or DBA has to do to facilitate the activity as the nodes go down or come up.”
To manage the cluster’s file system, Prince selected PolyServe, Inc.’s PolyServe Matrix Server, which gives multiple servers concurrent access to Storage Attached Network (SAN) data.
Its Database Option lets databases read and write directly to disk, rather than first caching the data in the file system layer. This provides the lower memory and CPU overhead that exists when operating with a raw disk partition, while retaining the advantage of using a file system environment with a database such as ease of managing and the ability to move files and directories around.
While load balancing across commodity servers is widely used as a means of achieving fault tolerance and reliability on external Internet applications, Prince reports that it was not always feasible to use clusters when sharing disks of a storage server in a conventional IT environment across UNIX systems that are better served with local disks.
“You get inherent fault tolerance with load balancing, but with disk servers it is more difficult to achieve,” he states. “With PolyServe we can do it on the network side.”
The other advantage he sees of running Oracle 9iRAC and PolyServe Matrix Server is that it eliminates some of the problems he had detected when running multiple Linux instances on a large-scale server.
“It was difficult to isolate the source of the problem with thousands of things running on the same system,” he relates. “By doing it in a clustered environment we hope to rationalize the load on each box, so we can look at what is running on those boxes and narrow the field, which we couldnt do on a single system.”
Planning for Disaster
With the ability to run the databases in a clustered environment, BCF is also setting them up to run from a second location for disaster relief. The company is building a distribution center six miles away from its headquarters. That location will also house a second datacenter. The two are connected by 24 lines of fiber.
Since the distance between the two locations falls just within the limits for a Fibre Channel connection, the servers in these locations will operate as a single cluster rather having one for backup. If one datacenter goes down, the other will generally have the capacity to fully service the traffic load without impacting employees or customers.
“Only during the holiday shopping season are we operating at more than a small portion of our peak capacity,” says Prince. “The rest of the year, you wouldn’t even notice a loss in service if one data center went down.”
The data is stored in a Hitachi Data Systems Lightning 9960 (20 Terabytes), a 9920 (1TB) and nine older 5800 disk arrays (480 GB each) at the headquarters and a 9980V (up to 70+ TB) and 9200 at the second datacenter. The switches are from Brocade Communications Systems, Inc. — SilkWorm 12000s at the core and 3800s at the edge — with Cisco Systems Catalyst 6509s making the optical connection between the datacenters.
What about the servers? They will be IBM xSeries, but which model remains to be determined. If Oracle releases 9i for the Intel 64-bit Itanium processor soon enough, BCF will use those.
“The cost of the components is so low, we are not too concerned about which server we will use for the cluster,” says Prince. “We are not locked into any one box.”
The new system has now been thoroughly tested and certified and is ready for deployment. The switch will start taking place after the holiday season and Prince estimates it will take the better part of 2003 to get all the databases switched over. But he wants to get it done as quickly as possible in order to start achieving the cost savings associated with using 9i. He estimates it will take 12-18 months for the new system to pay for itself.