|
TMS always strives to have the best cost-effective hardware in the DSP arena. It all comes down to the best DSP chip available. Traditionally, TMS designs special DSP chips to fill the needs of our customers. For over two years, TMS has been developing our next generation DSP chip and it is finally here. The TM-100 is running at top-speed (100 GFLOPS) for real-world applications. The TM-100 is software compatible with the previous generation DSP products, so it is executing user application programs written for the XP-30 and older cards. The first available PCI Express DSP Accelerator card with the TM-100 is the XP-100, available now.
The TM-100 is the latest generation DSP chip from Texas Memory Systems. Architecturally, it is very similar to the previous generation TM-44 chip except faster, wider, and deeper. It has a 333-MHz processor clock with a 667-MHz external memory clock. It has a dual core architecture that again doubles the processing power as compared to the TM-44. Finally, it has twice as many floating point processing units per core as compared to the TM-44. These additional processing units allow a 256-point radix FFT to be processed in one pass instead of two passes. Combined, these improvements provide a 10-12x application performance boost over the TM-44 based chip. With the dual core architecture, the XP-100 (one chip) looks like the XP-30 board (two chips).
As with all DSP chips, chip bandwidth is a very important parameter. The TM-100 has two front-side busses (I/O) and eight back-side busses (Mem) that run at 5-GB/s. These busses have been designed to complement the processing power of the TM-100 not to limit it.
The TM-100 has three buses, 8 back-side memory buses, 2 front-side DMA buses and one command bus.
The TM-100 chip has two front-side I/O buses. The front-side buses are responsible for loading each core’s program memory and for accessing each of the eight back-side RLDRAM memory banks. The front-side command bus issues control to both cores, starting DSP programs and DMAs.

The proceeding tables give a description of all the TM-100 signals. All signal levels conform to the SSTL 1.8v JEDEC standard.
The TM-100 fully utilizes high-bandwidth I/O. There are eight RLDRAM memory banks, four for each core. Each RLDRAM memory bank is independently connected with a 64-bit wide bus. Each bus runs, double-data rate at 333MHz for a total throughput, per RLDRAM bus of 5GB/sec. Each core can sustain all four memory banks running full speed for a total of 20GB/sec per core. Each chip can sustain both cores running full speed for a total of 40GB/sec per chip.
The back-side buses direct-connects to RLDRAM. Each of the eight buses uses 72 bits (64 data bits and 8 ECC bits) to move data and 31 control lines. The back-side buses run synchronous to the core clock which is specified to run at 333MHz.
Node Memory Bank Pins |
Signal Name |
Dir |
# |
Description |
MEM[u:v][3:0]_ADDR[10:0] |
O |
11 |
Connect to RLDRAM address bus |
MEM[u:v][3:0]_BA[2:0] |
O |
3 |
Connect to RLDRAM bank address |
MEM[u:v][3:0]_CS_n |
O |
1 |
Connect to RLDRAM chip select |
MEM[u:v][3:0]_REF_n |
O |
1 |
Connect to RLDRAM refresh |
MEM[u:v][3:0]_WE_n |
O |
1 |
Connect to RLDRAM write enable |
MEM[u:v][3:0]_DAT[63:0] |
IO |
64 |
Connect to RLDRAM data lines |
MEM[u:v][3:0]_DK[n:p][1:0] |
O |
4 |
Connect to RLDRAM differential free-running data strobes. Signals are split on 36-bit boundaries (32-bits of data and 4 ECC bits). |
MEM[u:v][3:0]_ECC[7:0] |
IO |
8 |
Connect to RLDRAM data lines (typically bits [71:64]) |
MEM[u:v][3:0]_QK[n:p][3:0] |
I |
8 |
Connect to RLDRAM differential free-running data strobes. Signals are split on 18-bit boundaries (16-bits of data and 2 ECC bits). |
MEM[u:v][3:0]_QVLD[1:0] |
I |
2 |
Connect to RLDRAM data valid pins. Signals are split on 36-bit boundaries (32-bits of data and 4 ECC bits). |
The TM-100 has two 64-bit double-data-rate front-side buses used to load each core’s program memory and feed pre-processed data into the RLDRAM memory banks. After processing, the same buses are used to read the processed data.
The front-side buses runs off of SYSCLK which can be set as high as 333MHz. Each DMA bus is 64-bits, double data rate and feeds a single processing node. So, each node can externally move 5GB/sec of data. The chip can move 10GB/sec of data.
Each DMA bus utilizes a fast, simple handshake protocol that easily interfaces to outside logic. A total of 19 lines of bus control are used, all synchronous to SYSCLK.
DMA Bus Pins |
Signal Name |
Dir |
# |
Description |
DMA[a:b]_DAT[63:0] |
IO |
64 |
DMA data bus. DMAa_DAT[] connects to node U’s microcode and bank memory. DMAb_DAT[] connects to V’s microcode and bank memory. |
DMA[a:b]_DK[3:0] |
O |
4 |
Output data strobe split on 16-bit boundaries (i.e. DMAa_DK[0] goes to bits DMAa_DAT[15:0]). These signals are free-running, source-synchronous clocks. |
DMA[a:b]_DVLD[3:0] |
O |
4 |
Output data valid for DMA bus split on 16-bit boundaries. (i.e. DMAa_DVLD[1] represents DMAa_DAT[31:16]). |
DMA[a:b]_GOING |
O |
1 |
Output flag to signal a DMA is in progress on the corresponding bus (A or B) |
DMA[a:b]_HSIN |
I |
1 |
Handshake to request (for writes to the TM-100) or acknowledge (for reads from the TM-100) data on the appropriate data bus (A or B). |
DMA[a:b]_HSOUT |
O |
1 |
Handshake to request (for reads from the TM-100) or acknowledge (for writes to the TM-100) data on the appropriate data bus (A or B). |
DMA[a:b]_OE_n |
I |
1 |
Output enable for reading from the TM-100 |
DMA[a:b]_QK[3:0] |
I |
4 |
Input data strobes for the DMA buses split on 16-bit boundaries. i.e. DMAa_QK[0] strobes DMAa_DAT[15:0] |
DMA[a:b]_QVLD[3:0] |
I |
4 |
Input data valid for the DMA buses split on 16-bit boundaries. i.e. DMAa_QVLD[0] frames DMAa_DAT[15:0] |
The CMD bus is also a front-side bus and is used to issue commands (i.e. DMA start, run a program, etc.) The command bus runs off of SYSCLK.
Command Bus Pins
The Command Bus runs synchronous to SYSCLK |
Signal Name |
Dir |
# |
Description |
CMD_DAT[7:0] |
IO |
8 |
Data bus used to receive commands. Example commands: load microcode, start DMA. |
CMD_DIR |
I |
1 |
Read/write line. Assert high to read from the command bus. |
CMD_DVDL |
O |
1 |
Data valid. Asserts high when reading from the command bus and data is valid on CMD_DAT[]. |
CMD_EN |
I |
1 |
Write enable. Assert high when writing to the command bus. |
CMD_OE_n |
I |
1 |
Assert low to have the TM-100 drive the CMD_DAT[] bus. Used for reading the command bus. |
Miscellaneous Pins |
Signal Name |
Dir |
# |
Description |
BZ_CAL |
-- |
1 |
Impedance calibration pin. Connect a 294ohm (+/- 1%) in series to GND. |
CHIP_TEST |
-- |
1 |
ASIC test pin. Connect to GND. |
CLKSEL[2:0] |
I |
3 |
Selects the front side bus speed, which is comprised of the DMA and command interfaces. The core is specified to run at 333MHz, and the front side clock (SYSCLK) is set relative to the core.
CLKSEL[2:0] |
Core Clock : SYSCLK ratio |
0 |
1:1 |
1 |
3:2 |
2 |
2:1 |
3 |
5:2 |
4 |
3:1 |
5 |
7:2 |
6 |
4:1 |
7 |
Reserved |
The core clock runs both processing nodes and the RLDRAM. The SYSCLK runs the front-side buses (both DMA and CMD). |
ERROR |
O |
1 |
Asserts if an error occurred, i.e. ECC error |
MC[u:v]GOING |
O |
2 |
Asserts if either processing node is running microcode |
MISC_GOING |
O |
1 |
Asserts when chip initialization is in progress |
PLL_LOCK |
O |
1 |
Asserts when the TM-100 PLL is locked |
PLL_VDDA |
-- |
1 |
Analog VDD for the TM-100 PLL |
PLL_VSSA |
-- |
1 |
Analog VSS for the TM-100 PLL (tie to GND) |
POR_n |
I |
1 |
Assert low to perform a power-on reset. Signal is used to initialize the internal PLL. |
SYSCLK[n:p] |
I |
2 |
Differential reference clock. This expects a clock rate of:

So if CLKSEL[] is set to 3, SYSCLK needs to be 133MHz.
SYSCLKn is the negative differential signal, SYSCLKp is the positive. |
SYSRST_n |
I |
1 |
Assert low to reset the TM-100. Signal is used to reset all the functional units. |
TAP_TCK |
I |
1 |
Reserved, pull-down with 1k resistor to GND |
TAP_TDI |
I |
1 |
Reserved, pull-up with 1k resistor to 1.8v |
TAP_TDO |
O |
1 |
Reserved |
TAP_TMS |
I |
1 |
Reserved, pull-up with 1k resistor to 1.8v |
TAP_TRST_n |
I |
1 |
Reserved, pull-down with 1k resistor to GND |
Power Supply Pins |
Signal Name |
Dir |
# |
Description |
VDD |
-- |
113 |
1.2v |
VDD_18 |
-- |
96 |
1.8v |
VDDI |
-- |
31 |
3.3v |
VREF |
-- |
30 |
SSTL V reference. Nominally set to 0.9v (or VDD_18/2) |
VSS |
-- |
222 |
GND |
Typical usage of the TM-100 chip follows this basic order of operation:
-
Programs are downloaded across the DMA buses to each core’s program memory
-
Unprocessed data is loaded into each core’s memory bank(s)
-
Each core is commanded to run the loaded program
-
The cores read the unprocessed data into the processing pipeline
-
When data has been processed, each core writes the results back into a different bank than is being read from. This way the core can read and write simultaneously.
-
After the program has successfully executed, it issues a “program complete” message across the command bus, synchronizing to the external connection
The command bus can accept multiple commands, putting each command into a queue and executing them in order. Some commands can be executed simultaneously if the chip has no internal bus conflicts (i.e. both cores can load their program memory at the same time).
|