Main Article Content
The paper provides a generalization of our previous algorithm for the parallel multiplication of a vector by a Kronecker product of matrices. For any p, a factor of the problem size, our algorithm runs on p processors with a minimum number of communication steps and memory space. Specifically, on p processors with global communication, we show that the multiplication requires at least Θ (log(p)) communication steps, assuming that there is no computation redundancy. This complexity is revised according to the underlying topology, and some performance results on the CRAY T3E are given.