You can never go fast enough in the world of Big Data. No one has ever said they don’t need to process data any faster. And while Intel continues to make faster and faster Xeons, the huge data arrays are outpacing Intel’s clock speeds.
Part of the problem is the nature of the processor design. The x86 processor is a general-purpose processor that has to be all things to all people. In the execution of a single clock cycle, it will execute processes that your Hadoop apps don’t need.
There is growing interest in field-programmable gate array (FPGA) processors as an alternative. The FPGA is very different animal from x86. The processing instructions are configured by the customer and can be reprogrammed on the fly. FPGAs contain an array of programmable logic blocks, and the end user then puts together a series of blocks like Legos, except the logic gates have to be wired together.
The result is that FPGAs can be programmed to do one specific job over and over again and not waste compute cycles doing unnecessary processing. It’s not meant as a CPU because it’s not general purpose, and it doesn’t have the math skills of a GPU. But for doing one simple, repetitive task, like pattern matching, it excels.
But it also has its drawbacks, said Jim McGregor, principal analyst with Tirias Research. “The problem with FPGA is it doesn’t reprogram on the fly. You have to send new code to do a new algorithm. It’s not as flexible as a GPU or CPU where instructions are executed in fixed instruction set,” he said.
“The other problem is the toolset to build FPGAs is still ancient. Nvidia has done well with the CUDA language to leverage GPUs. With FPGA it’s still kind of a black art to build an algorithm efficiently,” he added.
He also notes that the more complex you make the system architecture, the more challenging it is and a system with x86, GPU and FPGA would be a real stretch. “You have to design it where all these cores can work on the same memory set. Have to think about how this thing is designed, how it works, the complexity. A motherboard with three compute units is a huge challenge of heat dissipation and bandwidth,” he said.
The Microsoft Blessing
Microsoft gave the notion of FPGA processing of massive data sets a big push last year with the publication of a white paper discussing Catapult, a network of 1,600 servers using both Altera FPGA processors and Xeon chips together for processing Bing queries. According to Microsoft, searches performed by the FPGAs were 40 times faster than CPUs.
“The way we think about it, there are two components: the app or workload and an algorithm that produces output. In machine learning with an algorithm like a neural network that is the repetitive task. We see the opportunity to have an FPGA do a repetitive task a number of times to optimize for it,” said Jason Waxman, general manager of the Cloud Platforms Group at Intel.
“It doesn’t mean you don’t need a Xeon to run the broader workload. We see them as a companion. You get high performance and general purpose use from Xeon and the FPGA is there to offload and accelerate it,” he added. “FPGA is not replacing Xeon. It’s giving you a turbo boost for that one component. Even search has multiple tasks.”
Intel has become a bit obsessed with FPGAs late. It tried to buy Altera, the number two FPGA maker behind Xilinx and was initially rebuffed. So it struck an alliance with a smaller player, eASIC to make chips. Then Altera’s largest shareholders, upset that the company turned down what had been a generous offer from Intel, more or less pushed the company back to the bargaining table and the $16.7 billion deal was struck in June.
Waxman would not discuss Intel’s plans but said FPGA is part of a bigger strategy of using the right processor for the right job. “Think about the algorithm. What do you want to accelerate? In some cases, the more specialization may make sense. An algorithm can start off in a CPU or move to a parallel processor like Xeon Phi and then a special purpose processor like an FPGA,” he said.
“In all cases I’m trying to optimize an algorithm. If I reach the point where I identify a specific algorithm that will benefit from hardware acceleration, an FPGA will deliver better efficiency than a GPGPU. If I haven’t identified that algorithm, then a parallel processing may be a better approach,” he added.
Unlimited Data Constructs
Microsoft’s Catapult effort was a mix of Xeons and FPGAs, but earlier this year, Ryft introduced the Ryft ONE, a dedicated Big Data appliance using just Xilinx processors that it said could chew through data at more than 100 times the speed of a Xeon.
Pat McGarry, vice president of engineering with Ryft, said that the inherent weakness of x86 even when clustered is that software parallelism relies on sequential data sets, which date back 60 to 70 years.
“It means that you are limited by those sequential instruction sets,” he said. “The x86 instructions operate on an order n^2 or order n^4, which means you have to run an instruction up to four times to get a result. FPGA allows you to reduce them to an order n problem space, so you only need to run an instruction once.”
The other problem with sequential software is that data constructs can be as wide as you want, whereas x86 data constructs can only be 64 bits wide. This limits how big of a search you can do. In the case of a long name, x86 might only be able to do the search one letter at a time due to 64-bit data constructs, while an FPGA with theoretically unlimited data constructs can search for the entire string in one pass.
All About the Algorithms
In the end, though, the chip used to chew through giant data sets is determined not by the chip itself but the algorithms and their relative maturity and stability.
“For Big Data, [usage] will be toward very mature algorithms that are not changing a lot and dedicated to a specific function. One example is the financial market where there are specific calculations,” said McGregor.
“It really depends on the maturity of the code and algorithms,” he added. “If you have something you are running consistently, that will be a good app for it. If it’s processing a financial app or transaction processing or recognition, it will depend on how good the algorithms are.”
Kevin Krewell, also a principal analyst at Tirias, said that the software side is the real challenge for FPGA. “How do you recognize workloads and feed it off to the right side? That’s where complexity comes in. It requires a lot of human intervention,” he said.
While FPGA modules are reprogrammable to change the task, he notes that you can’t just change one or two modules, you have to reprogram the whole thing. That can take hundreds of milliseconds, which may seem ridiculously fast, but in real time computing situations that is a long time, he argues.
Also, Krewell notes that FPGAs are not very good at floating point math. That’s best left for GPUs. FPGAs are better at integer strings. “They are good for a fixed app you can pour a lot of data through,” he said.
Intel’s Waxman said that FPGAs require rewriting the code for the processor, and sometimes it’s just not worth it. “It depends on the particular function, but when you do something special purpose, you require people to do a lot more work to port to the software platform and that becomes prohibitive. The majority of companies we see use general purpose processors and when they find something to accelerate, they throw that to an algorithm with a GPGPU or FPGA,” he said.
Tossing a task over to a FPGA means at the very least a recompile but also rewriting the data model and code. “That’s a lot of work to optimize code for anything special purpose and it has to show a real benefit to do that kind of work,” he added.
Ryft’s McGarry said he feels there are three excellent uses for FPGA: fuzzy searching, where you search for exact matches and also slight mismatches (a good example of this is Google’s search, where it catches errors and misspellings and offers an alternative); term frequency, such as how often something comes up; and image search.
“It turns out image processing is very parallelizable,” said McGarry. “We see a big push for that with FPGA.”
Another good use cases for FPGA and Big Data is logging analysis, which is often used in financial areas to catch abnormalities that could be indicators of fraud. He said that is a “natural fit for FPGA” because that’s a case of, as Krewell described it, pouring data through the processor.
Where FPGA does not work well is anything with a feedback loop, meaning the findings or results of a computation require going back to start over. FPGAs need to process and never look back. It won’t work well if it has to take the findings of one computation and go back and run them against prior data. One example is very complex mathematics because it’s a sequential process.
“When you are searching, there is no dependency or feedback required. Feedback means sequential constructs. You can’t make those into systolic arrays,” said McGarry. A GPU is better for complex math, since that’s what it’s made for.
But use is growing, said Waxman. “Use right now is very nascent. We see substantial interest. If we expect that as we make it easier to use we will see substantial customer interest for broader deployment,” he said.
Photo courtesy of Shutterstock.