The TU Kaiserslautern team is designing an advanced new hardware accelerator for algebraic machine learning.
This new hardware architecture contains multi-bit-array processing cores, which enable multithreaded programming.
Bit-array processing cores have novel instruction sets, which are designed for Algebraic Machine Learning (AML). The accelerator is equipped with HBM DRAM, which provides up to 460 GB/s of memory bandwidth. Each core has a separate memory channel and can work independently from other cores. The proposed hardware architecture is implemented on a Xilinx Alveo U280 Data Centre accelerator card. The board is equipped with about one million Look-up tables (LUTs) and PCI Express 4.0 support to leverage the latest server interconnect infrastructure for high-bandwidth host processors.
What's coming next... The future work will include Processing-In-Memory
Our experiments have shown the overall throughput in an AML processing task is bounded by memory bandwidth. Also, the off-chip data access energy is playing a major role in system-level energy consumption. Thus, Processing-In-Memory (PIM) is a promising approach to address these challenges and bridge the memory-computation gap. PIM places computational logic inside the memory to exploit minimum data movement and massive internal data parallelism. Especially interesting for AML, PIM presented considerable results for a memory-bound application like “set processing”. Sparse Crossing could use PIM by directly computing the bit-array operation inside DRAM without moving the data to the processor. It is possible to compute wide bit-array operations directly in the DRAM. In fact, bit-array operations in memory can be made arbitrarily wide by dividing each bit-array among multiple memory banks operated in parallel.
By Christian Weis from Technische Universität Kaiserslautern
MORE INFORMATION ABOUT ALMA:
For any questions please contact This email address is being protected from spambots. You need JavaScript enabled to view it..