PRODUCT OVERVIEW

DSP CARDS

XP-100

XP-35

XP-30

DSP CHIP

TM-100

SOFTWARE

QuickXP

For 20 years TMS has been providing advanced DSP systems and created many valuable hardware/software tools to ease the process of quickly designing custom solutions.

TM-100 highlights:

  • Dual-core DSP Accelerator
  • Harvard Architecture with VLIW
  • 12x Better Performance than TM-44
  • To order, please contact Texas Memory Systems Sales.

    The TM-100 chip is the basis of our XP accelerator cards, the XP-100, XP-35, and XP-30.

    TMS always strives to have the best cost-effective hardware in the DSP arena. It all comes down to the best DSP chip available. Traditionally, TMS designs special DSP chips to fill the needs of our customers. For over two years, TMS has been developing our next generation DSP chip and it is finally here. The TM-100 is running at top-speed (100 GFLOPS) for real-world applications. The TM-100 is software compatible with the previous generation DSP products, so it is executing user application programs written for the XP-30 and older cards. The first available PCI Express DSP Accelerator card with the TM-100 is the XP-100, available now.

    The TM-100 is the latest generation DSP chip from Texas Memory Systems. Architecturally, it is very similar to the previous generation TM-44 chip except faster, wider, and deeper. It has a 333-MHz processor clock with a 667-MHz external memory clock. It has a dual core architecture that again doubles the processing power as compared to the TM-44. Finally, it has twice as many floating point processing units per core as compared to the TM-44. These additional processing units allow a 256-point radix FFT to be processed in one pass instead of two passes. Combined, these improvements provide a 10-12x application performance boost over the TM-44 based chip. With the dual core architecture, the XP-100 (one chip) looks like the XP-30 board (two chips).

    As with all DSP chips, chip bandwidth is a very important parameter. The TM-100 has two front-side busses (I/O) and eight back-side busses (Mem) that run at 5-GB/s. These busses have been designed to complement the processing power of the TM-100 not to limit it.

    Bus Overview
    The TM-100 has three buses, 8 back-side memory buses, 2 front-side DMA buses and one command bus.

    The TM-100 chip has two front-side I/O buses.  The front-side buses are responsible for loading each core’s program memory and for accessing each of the eight back-side RLDRAM memory banks.  The front-side command bus issues control to both cores, starting DSP programs and DMAs.

    The proceeding tables give a description of all the TM-100 signals.  All signal levels conform to the SSTL 1.8v JEDEC standard.

    Back-Side RLDRAM Local Memory Bus
    The TM-100 fully utilizes high-bandwidth I/O.  There are eight RLDRAM memory banks, four for each core.  Each RLDRAM memory bank is independently connected with a 64-bit wide bus.  Each bus runs, double-data rate at 333MHz for a total throughput, per RLDRAM bus of 5GB/sec.  Each core can sustain all four memory banks running full speed for a total of 20GB/sec per core.  Each chip can sustain both cores running full speed for a total of 40GB/sec per chip.

    The back-side buses direct-connects to RLDRAM.  Each of the eight buses uses 72 bits (64 data bits and 8 ECC bits) to move data and 31 control lines.  The back-side buses run synchronous to the core clock which is specified to run at 333MHz.

    Node Memory Bank Pins

    Signal Name

    Dir

    #

    Description

    MEM[u:v][3:0]_ADDR[10:0]

    O

    11

    Connect to RLDRAM address bus

    MEM[u:v][3:0]_BA[2:0]

    O

    3

    Connect to RLDRAM bank address

    MEM[u:v][3:0]_CS_n

    O

    1

    Connect to RLDRAM chip select

    MEM[u:v][3:0]_REF_n

    O

    1

    Connect to RLDRAM refresh

    MEM[u:v][3:0]_WE_n

    O

    1

    Connect to RLDRAM write enable

    MEM[u:v][3:0]_DAT[63:0]

    IO

    64

    Connect to RLDRAM data lines

    MEM[u:v][3:0]_DK[n:p][1:0]

    O

    4

    Connect to RLDRAM differential free-running data strobes.  Signals are split on 36-bit boundaries (32-bits of data and 4 ECC bits).

    MEM[u:v][3:0]_ECC[7:0]

    IO

    8

    Connect to RLDRAM data lines (typically bits [71:64])

    MEM[u:v][3:0]_QK[n:p][3:0]

    I

    8

    Connect to RLDRAM differential free-running data strobes.  Signals are split on 18-bit boundaries (16-bits of data and 2 ECC bits).

    MEM[u:v][3:0]_QVLD[1:0]

    I

    2

    Connect to RLDRAM data valid pins.  Signals are split on 36-bit boundaries (32-bits of data and 4 ECC bits).

    Front-Side DMA Bus
    The TM-100 has two 64-bit double-data-rate front-side buses used to load each core’s program memory and feed pre-processed data into the RLDRAM memory banks.  After processing, the same buses are used to read the processed data.

    The front-side buses runs off of SYSCLK which can be set as high as 333MHz.  Each DMA bus is 64-bits, double data rate and feeds a single processing node.  So, each node can externally move 5GB/sec of data.  The chip can move 10GB/sec of data.

    Each DMA bus utilizes a fast, simple handshake protocol that easily interfaces to outside logic.  A total of 19 lines of bus control are used, all synchronous to SYSCLK.


    DMA Bus Pins

    Signal Name

    Dir

    #

    Description

    DMA[a:b]_DAT[63:0]

    IO

    64

    DMA data bus.  DMAa_DAT[] connects to node U’s microcode and bank memory.  DMAb_DAT[] connects to V’s microcode and bank memory.

    DMA[a:b]_DK[3:0]

    O

    4

    Output data strobe split on 16-bit boundaries (i.e. DMAa_DK[0] goes to bits DMAa_DAT[15:0]).  These signals are free-running, source-synchronous clocks.

    DMA[a:b]_DVLD[3:0]

    O

    4

    Output data valid for DMA bus split on 16-bit boundaries.  (i.e. DMAa_DVLD[1] represents DMAa_DAT[31:16]).

    DMA[a:b]_GOING

    O

    1

    Output flag to signal a DMA is in progress on the corresponding bus (A or B)

    DMA[a:b]_HSIN

    I

    1

    Handshake to request (for writes to the TM-100) or acknowledge (for reads from the TM-100) data on the appropriate data bus (A or B).

    DMA[a:b]_HSOUT

    O

    1

    Handshake to request (for reads from the TM-100) or acknowledge (for writes to the TM-100) data on the appropriate data bus (A or B).

    DMA[a:b]_OE_n

    I

    1

    Output enable for reading from the TM-100

    DMA[a:b]_QK[3:0]

    I

    4

    Input data strobes for the DMA buses split on 16-bit boundaries.  i.e. DMAa_QK[0] strobes DMAa_DAT[15:0]

    DMA[a:b]_QVLD[3:0]

    I

    4

    Input data valid for the DMA buses split on 16-bit boundaries.  i.e. DMAa_QVLD[0] frames DMAa_DAT[15:0]

    Front-Side Command Bus
    The CMD bus is also a front-side bus and is used to issue commands (i.e. DMA start, run a program, etc.)  The command bus runs off of SYSCLK.

    Command Bus Pins
    The Command Bus runs synchronous to SYSCLK

    Signal Name

    Dir

    #

    Description

    CMD_DAT[7:0]

    IO

    8

    Data bus used to receive commands.  Example commands: load microcode, start DMA.

    CMD_DIR

    I

    1

    Read/write line.  Assert high to read from the command bus.

    CMD_DVDL

    O

    1

    Data valid.  Asserts high when reading from the command bus and data is valid on CMD_DAT[].

    CMD_EN

    I

    1

    Write enable.  Assert high when writing to the command bus.

    CMD_OE_n

    I

    1

    Assert low to have the TM-100 drive the CMD_DAT[] bus.  Used for reading the command bus.


    Miscellaneous Signals and Power

    Miscellaneous Pins

    Signal Name

    Dir

    #

    Description

    BZ_CAL

    --

    1

    Impedance calibration pin.  Connect a 294ohm (+/- 1%) in series to GND.

    CHIP_TEST

    --

    1

    ASIC test pin.  Connect to GND.

    CLKSEL[2:0]

    I

    3

    Selects the front side bus speed, which is comprised of the DMA and command interfaces.  The core is specified to run at 333MHz, and the front side clock (SYSCLK) is set relative to the core.


    CLKSEL[2:0]

    Core Clock : SYSCLK ratio

    0

    1:1

    1

    3:2

    2

    2:1

    3

    5:2

    4

    3:1

    5

    7:2

    6

    4:1

    7

    Reserved

    The core clock runs both processing nodes and the RLDRAM.  The SYSCLK runs the front-side buses (both DMA and CMD).

    ERROR

    O

    1

    Asserts if an error occurred, i.e. ECC error

    MC[u:v]GOING

    O

    2

    Asserts if either processing node is running microcode

    MISC_GOING

    O

    1

    Asserts when chip initialization is in progress

    PLL_LOCK

    O

    1

    Asserts when the TM-100 PLL is locked

    PLL_VDDA

    --

    1

    Analog VDD for the TM-100 PLL

    PLL_VSSA

    --

    1

    Analog VSS for the TM-100 PLL (tie to GND)

    POR_n

    I

    1

    Assert low to perform a power-on reset.  Signal is used to initialize the internal PLL.

    SYSCLK[n:p]

    I

    2

    Differential reference clock.  This expects a clock rate of:

    So if CLKSEL[] is set to 3, SYSCLK needs to be 133MHz.
    SYSCLKn is the negative differential signal, SYSCLKp is the positive.

    SYSRST_n

    I

    1

    Assert low to reset the TM-100.  Signal is used to reset all the functional units.

    TAP_TCK

    I

    1

    Reserved, pull-down with 1k resistor to GND

    TAP_TDI

    I

    1

    Reserved, pull-up with 1k resistor to 1.8v

    TAP_TDO

    O

    1

    Reserved

    TAP_TMS

    I

    1

    Reserved, pull-up with 1k resistor to 1.8v

    TAP_TRST_n

    I

    1

    Reserved, pull-down with 1k resistor to GND


    Power Supply Pins

    Signal Name

    Dir

    #

    Description

    VDD

    --

    113

    1.2v

    VDD_18

    --

    96

    1.8v

    VDDI

    --

    31

    3.3v

    VREF

    --

    30

    SSTL V reference.  Nominally set to 0.9v (or VDD_18/2)

    VSS

    --

    222

    GND

    Operational Description
    Typical usage of the TM-100 chip follows this basic order of operation:

    • Programs are downloaded across the DMA buses to each core’s program memory
    • Unprocessed data is loaded into each core’s memory bank(s)
    • Each core is commanded to run the loaded program
    • The cores read the unprocessed data into the processing pipeline
    • When data has been processed, each core writes the results back into a different bank than is being read from.  This way the core can read and write simultaneously.
    • After the program has successfully executed, it issues a “program complete” message across the command bus, synchronizing to the external connection

    The command bus can accept multiple commands, putting each command into a queue and executing them in order.  Some commands can be executed simultaneously if the chip has no internal bus conflicts (i.e. both cores can load their program memory at the same time).