Implemenation of the 4stack processor in Verilog. The work includes, but is not limited to, comparison between different desing-flows and tools, comparison with other processors and especially the implementation in synthesizable Verilog. The thesis paper (125k gzipped Postscript) is available now, also as 360k PDF.
This paper reveals implementation details of the 4stack processor architecture, implemented in Verilog as a feasibility study as diploma thesis. Design flow, synthesis tools, and verification methods are explained. Partition in functional units and space-time tradeoff considerations are discussed as well as techniques for efficient implementation of special arithmetic units.
Special interests have been laid onto the implementation of stack register files and corresponding spill buffers. Indstruction and data caches have been implemented to satisfy the demands of a VLIW architecture. A fast, pipelined signal processing unit (integer multiply and accumulate with rounding) and an equally fast, pipelined floating point unit are described.