We propose a Domain-Specific Architecture for elementary function computation to improve throughput while reducing power consumption as a model for more general applications: support fine-grained parallelism by eliminating branches, and eliminate the duplication required by coprocessors by decomposing computation into instructions which fit existing pipelined execution models and standard register files. Our example instruction architecture (ISA) extension supports scalar and vector/SIMD implementations of table-based methods of calculating all common special functions, with the aim of improving throughput by (1) eliminating the need for tables in memory, (2) eliminating all branches for special cases, and (3) reducing the total number of instructions. Two new instructions are required, a table lookup instruction and an extended-precision floating-point multiply-add instruction with special treatment for exceptional inputs. To estimate the performance impact of these instructions, we implemented them in a modified Cell/B.E. SPU simulator and observed an average throughput improvement of 2.5 times for optimized loops mapping single functions over long vectors.