1. Is this a processor for Forth or Java?
No. It's a general purpose processor, although it then certainly will run Forth and Java, too. The fact that it has stacks doesn't make it more friendly to Forth and Java, since these languages use (mostly) only one stack. They have to be compiled for the concurrent use of more stacks, as any other language, too.
The processor is first a VLIW processor, and then a stack processor. I use the stacks as explicit memory hierarchy, and to split up the register file (which is very important for highly parallelized designs).
2. But stack processors are dead, anyway. They support no inherent parallelism. Even Hennessy and Patterson say so.
This one is different - it supports inherent parallelism. It's not a single-stack processor. It has four stacks, each with an independent operation slot. It further has two load/store units, which operate independently, too. Branches are separate, and for finer grained conditions, there are predicates. It is really a VLIW processor with stacks attached to the functional units. The amount of parallelism is en par with the Alpha 21264. The 4stack processor lacks the brainiac instruction scheduling (left to the compiler), and due to the shorter memory and FPU latencies it won't run at the same clock speed as the Alpha.
3. So it's a difficult compile target?
Indeed. The burden to get fast code is left on the compiler writer. However, the scheduling and register allocation technology to get fast code on a run-of-the-mill RISC isn't "easy", too. I wrote a compiler prototype that does those parts of scheduling and stack allocation that is untypical for other targets. It took me one afternoon to do that.
4. Wow! You have a real compiler?
No, since I left out all the important part that are state of the art, but still difficult, like loop unrolling, inlining, constant folding, etc. The output of the compiler prototype also isn't valid assembly code. I'll try to find some spare time to complete this work.
5. I've read that you target the 4stack processor to consumer style apps. How much work is it to make it a real workstation processor?
Not much. It already contains a MMU, a supervisor/user model, and such like. These parts however haven't been fully implemented and tested yet.
6. What's the state?
Currently, I have simulator, assembler, all docus, synthesisable Verilog, and a prototype compiler.
7. Isn't there any hardware?
I tried to synthesize the whole processor, and make a prototype in an ES2 process, but due to the limited computing resources at our institute, I couldn't finish that job before I left the university. Synthesis is a very demanding task, and even in the industry which spends many high-end workstations, it can take days to do one synthesis run on a project in that order. And synthesis is an iterative process - you must check the result and rerun it with other options or changed source.
8. I haven't found the Verilog source on your page, can't I download it?
I give the Verilog code out only on request. I want to keep track where it goes.
9. Do you have any license ideas? GPL?
The GPL is a software license. I think it's inapropriate for hardware (it would be illegal to ship the hardware, due to the "denied" rights of the end user to modify it - it's simply not possible to modify hardware, even when you have the source). The BSD license might be more appropriate, but only for those who want to harvest the fruits I planted.
I therefore rather think of a commercial license for the 4stack core - a cheap one for using it in an FPGA, and a much more expensive one for ASICs and custom silicon.
10. How did you create that cool logo?I used Pov-Ray. You can render it yourself, here are the sources.