![]() |
||||||
|
||||||
| ||||||
Nuvation HEADLINES ![]()
|
Increasingly Nuvation is seeing next generation products that require complex algorithms with faster responsiveness, lower parts costs, smaller form factor, less power, and higher resolution. Feature rich, flexible and lightning fast algorithms are required for product differentiation, competitive advantages and commercial success. Industries that use compute-intensive applications such as digital video rendering, pattern recognition, or genomic research, depend on algorithm accuracy and performance. At the same time, companies are racing to reach market first, and shrinking product life cycles. The following article introduces the concept of “Algorithm Acceleration” from a Nuvation perspective. Running complex algorithms on high-performance computing platforms limits the product to the capabilities of a general purpose computing architecture. ASIC solutions are increasingly expensive and best suited for high volume products with long life cycles. This leaves a requirement for flexible yet specialized hardware solutions. Algorithm Acceleration is the process by which a software algorithm is “accelerated” with specialized hardware. Depending on the algorithm, performance improvements range from 10 times to 50 times faster, even up to 1000x improvements in certain situations. This is achieved by moving away from a generalized architecture that is, by definition, optimized for nothing, to specialized hardware architectures. These architectures structure hardware and data flows specifically for the algorithm at hand, and thus can have drastic performance acceleration impact. Sometimes that acceleration is used to increase the algorithm’s overall data throughput. Other times systems are accelerated to reduce time to solution, reduce power, shrink form factor or lower parts costs.Algorithm Acceleration Benefits 1) Reduce Size and Weight Often the specialized hardware is built on a reconfigurable platform. This allows for very creative solutions to reduce overall system size and weight. For example, a digital camera may have a red eye reduction algorithm – but it doesn’t need that algorithm loaded continuously, consuming active memory or logic. The specialized hardware can reconfigure, or swap, different algorithms depending on the needs of the system. The aerospace industry is also a large consumer of algorithm acceleration. Reconfigurable computing is now emerging as an approach to reduce system size and weight. The cost savings of reducing even a few pounds of electronics from a shuttle launch are substantial and allows the size and weight savings of the computational portion of a system to be retargeted for other novel and critical functions. Further, reconfigurable hardware provides system redundancy in that boards, chips and even individual gates can be routed around in the event of failure. 2) Reduce Power Use of specialized hardware can drastically reduce power and cooling requirements, which by itself also reduces system cost, size, and weight. Power reduction is achievable from efficient use of hardware specifically tailored to the required computation, drastically reducing energy waste prevalent in general purpose computing. For example, generalized architectures may have to execute many commands, consuming clock cycles and power, to achieve what can be done in a single clock cycle in accelerated hardware. 3) Reduce Cost By reducing size and weight one can often reduce packaging cost; or increasing computational performance can reduce the requirement for multiple computation units and thus reduce cost. A more efficient allocation of cost to hardware that is actually used for computation vs. general purpose computing that must support hardware both used and used in the application at hand. In fact, with large computing platforms, a large component of the cost of operating modern supercomputing is the 10s to 100s of megawatts of power that they require. One of our current design projects involves replacing a very high-end PC running a security algorithm with specialized embedded hardware at about half the price. 4) Increase Flexibility The combination of the above makes new applications practical that weren’t previously feasible using general purpose computing platforms. Further, the above benefits are achieved while simultaneously increasing your system flexibility. Reconfigurable hardware can be reprogrammed in the field, creating the ability for remote bug fixes, and in-field upgrades. This creates a new revenue channel for on demand product upgrades and maintenance agreements that provide automatic bug fixes. Underlying Technologies In general, there are two major classifications of applications: embedded and large scale computing, both of which benefit from Algorithm Acceleration. Embedded may involve custom hardware design and usually has physical constraints. Large scale computing typically involves High Performance Computing (HPC) platforms, increasingly blending multi-processor architectures with FPGAs. Embedded A myriad of platforms can be considered depending on the algorithm, system and market requirements. Primarily these are technologies that combine programmability with a tighter integration of algorithm and underlying hardware architecture to achieve drastic increase in computation efficiency. The acceleration technology that is perhaps the most well known are FPGAs – Field Programmable Gate Arrays. This can be generalized as a fine grained reconfigurable computing platform. These solutions are from companies such as Altera, Xilinx and Lattice. Recently FPGAs have entered a new domain of system on a chip (SoC) solutions. This has occurred through a combination of a continued exponential increase in size, radical decrease in cost per gate, new and high-speed serial interfaces increasing overall chip bandwidth, and a diverse set of IP blocks that can be plugged together to rapidly build systems. Due to FPGA’s fine-grained nature, they are a highly flexible palette for hardware engineers to construct accelerated architectures. They are particularly powerful when multiple custom processing blocks need to be run in parallel. One downside to FPGAs is that the highly configurable interconnect slows down the chips to run, at best, in the low hundreds of megahertz. To achieve system clock speeds in the Gigahertz range, many algorithms are well served running on DSPs – Digital Signal Processors. These solutions are from companies such as TI, Freescale and Analog Devices. Embedded solutions that are not great candidates for parallel operation, such as having a lot of data dependencies, are best served by the raw high speed processing power of a DSP. This is particularly true for applications the DSP companies have targeted with specialized hard silicon. For example, TI’s DaVinci chip is hard to beat for embedded graphics acceleration. To provide for parallel datapaths, many processors are now going to multi-core architectures, such as the IBM Cell chip. Often we design architectures with both a DSP and an FPGA – gaining the best of both worlds. Recognizing this trend, there are a variety of new hybrid chips from companies such as IP Flex, Stretch, Mathstar, Velogix and CSwitch. These chips each have their unique benefits and can open new opportunities for size, power, and processing enhancements. Some ASICs have been developed to accelerate generalized algorithms in a specific domain. For example, one common approach for graphics acceleration is to use low-cost GPUs, or Graphics Processor Units from companies such as Nvidia and ATI. The MDGRAPE chip is particularly well suited for molecular dynamics calculations. The Ageia PhysX device provides dedicated physics calculations, primarily for the gaming market.Large Scale Computing Some algorithms require a custom system built with specific attention to specialized processor, memory, and external interconnect. However, Algorithm Acceleration is extremely sensitive to potential system bottlenecks. An inappropriately sized cache, a memory bus of the wrong width, or a power limitation can derail the intended acceleration. Many customers choose to purchase off the shelf large scale computing system platforms, such as from SRC, the SGI RASC or the Cray XD1. These systems can be configured with traditional processors integrated with FPGA accelerators. Integrating and moulding algorithms to such platforms requires in-depth knowledge of both traditional computing platforms and FPGA logic design. Nuvation has invested heavily over the past several years to develop domain expertise in this area. Target Applications Complex algorithms are showing up in an increasingly diverse set of applications. Today, algorithm acceleration is being used heavily in cryptography – on both sides of the battle. A reconfigurable system can reload its architecture to shift from one encoding technique to another, real-time. On cracking codes, cryptography is sometimes targeted at techniques that would take a traditional computer an unacceptable amount of time to crack, but specialized hardware can find radical short-cuts. Other examples of how algorithm acceleration is being used in current applications include:
Some clients have algorithms running on PCs, in simulations, or Matlab code and approach Nuvation to port their algorithms for both acceleration and mass production. Other clients are at a functional requirements stage, and are looking to Nuvation to develop the first implementation. Any algorithm can be processed in a variety of different stages with varying sensitivity to intermediate precision. Such trade-offs of algorithm performance, hardware cost and engineering effort are all factors in a complex decision of the platform and approach for algorithm acceleration. The act of retargeting complex, high performance algorithms to dedicated platforms can have a dramatic effect on a technology or product feasibility, computational performance, physical size and weight, power and cooling requirements, and cost. Nuvation’s algorithm acceleration services combine world-class expertise in FPGAs, DSPs, special purpose processors, reconfigurable computing, algorithm development, and embedded systems design. Our technical expertise and experience is intertwined with our project management disciplines and quality assurance to ensure predictable, repeatable success for our clients. Nuvation has accelerated algorithms for the digital video, aerospace, military, consumer and communication markets. For more information, or a free consultation, please contact Nuvation at sales@nuvation.com. · To subscribe yourself or a friend, please click here. · Questions? Comments? Send us your feedback. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GO TO NUVATION.COM
Copyright © Nuvation Research Corporation 2006. All rights reserved. Privacy Policy | About Nuvation |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||