mirror of
https://github.com/im-tomu/fomu-workshop.git
synced 2024-09-20 03:10:12 +00:00
docs: update background
This commit is contained in:
parent
e0556b2727
commit
f6641f398e
@ -1,29 +1,42 @@
|
||||
.. _background:
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
About FPGAs
|
||||
~~~~~~~~~~~
|
||||
|
||||
Field Programmable Gate Arrays (FPGAs) are arrays of gates that are
|
||||
programmable in the field. Unlike most chips you will encounter, which
|
||||
have transistor gates arranged in a fixed order, FPGAs can change their
|
||||
configuration by simply loading new code. Fundamentally, this code
|
||||
programs lookup tables which form the basic building blocks of logic.
|
||||
|
||||
These lookup tables (called LUTs) are so important to the design of an
|
||||
`Field Programmable Gate Arrays (FPGAs) <https://en.wikipedia.org/wiki/Field-programmable_gate_array>`_
|
||||
are integrated circuits containing arrays of gates which are programmable
|
||||
in the field. Most chips you will encounter, have transistor gates arranged
|
||||
in a fixed order, thus providing a fixed functionality. Conversely, FPGAs
|
||||
can change their internal connections by simply loading new configurations.
|
||||
Fundamentally, configurations program lookup tables (LUTs), which form the basic
|
||||
building blocks of logic. These lookup tables are so important to the design and usage of an
|
||||
FPGA that they usually form part of the name of the part. For example,
|
||||
Fomu uses a UP5K, which has about 5000 LUTs. NeTV used an LX9, which had
|
||||
about 9000 LUTs, and NeTV2 uses a XC7A35T that has about 35000 LUTs.
|
||||
Fomu uses a `UP5K <http://www.latticesemi.com/Products/FPGAandCPLD/iCE40UltraPlus>`_,
|
||||
which has about 5000 LUTs. NeTV used an `LX9 <https://www.xilinx.com/products/silicon-devices/fpga/spartan-6.html>`_,
|
||||
which had about 9000 LUTs, and NeTV2 uses a `XC7A35T <https://www.xilinx.com/products/silicon-devices/fpga/artix-7.html>`_
|
||||
that has about 35000 LUTs.
|
||||
|
||||
.. image:: _static/ice40-lut.png
|
||||
:width: 100%
|
||||
FPGA LUTs are almost always *n*-inputs to 1-output. The ICE family of
|
||||
FPGAs from Lattice have 4-input LUTs. Xilinx parts tend to have 5-input or
|
||||
6-input LUTs which generally means they can do more logic in fewer LUTs.
|
||||
Comparing LUT count between FPGAs is a bit like comparing clock speed
|
||||
between different CPUs; not entirely accurate, but certainly a helpful
|
||||
rule of thumb.
|
||||
|
||||
.. figure:: _static/ice40-lut.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:alt: The ICE40 LUT4 is a basic 4-input 1-output LUT
|
||||
|
||||
This is the ``SB_LUT4``, which is the basic building block of Fomu. It
|
||||
has four inputs and one output. To program Fomu, we must define what
|
||||
each possible input pattern will create on the output.
|
||||
The ICE40 LUT4 is a basic 4-input 1-output function table.
|
||||
|
||||
To do this, we turn to a truth table:
|
||||
The basic building block of Fomu is the ``SB_LUT4``. It
|
||||
has four 1-bit inputs and one 1-bit output. To program Fomu, we must define what
|
||||
each possible input 4-bit pattern will create on the output. To do this, we turn to
|
||||
a `truth table <https://en.wikipedia.org/wiki/Truth_table>`_:
|
||||
|
||||
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|
||||
| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|
||||
@ -40,60 +53,104 @@ To do this, we turn to a truth table:
|
||||
+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|
||||
|
||||
For example, to create a LUT that acted as an AND gate, we would define
|
||||
O to be 0 for everything except the last column. To create a NAND gate,
|
||||
we would define O to be 1 for everything except the last column.
|
||||
|
||||
FPGA LUTs are almost always *n*-inputs to 1-output. The ICE family of
|
||||
FPGAs from Lattice have 4-input LUTs. Xilinx parts tend to have 5- or
|
||||
6-input LUTs which generally means they can do more logic in fewer LUTs.
|
||||
Comparing LUT count between FPGAs is a bit like comparing clock speed
|
||||
between different CPUs - not entirely accurate, but certainly a helpful
|
||||
rule of thumb.
|
||||
|
||||
It is from this simple primitive that we build up the building blocks of
|
||||
``O`` to be 0 for everything except the last column. Conversely, to create
|
||||
a NAND gate, we would define ``O`` to be 1 for everything except the last column.
|
||||
It is from this simple primitive that we create the building logic blocks of
|
||||
FPGA design.
|
||||
|
||||
.. figure:: _static/ice40-plb.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:alt: PLB Block Diagram (from iCE40 UltraPlus Family Data Sheet)
|
||||
|
||||
Programmable Logic Block (PLB), Block Diagram.
|
||||
|
||||
Modern FPGA devices are no longer composed of arrays of gates (i.e. LUTs) only.
|
||||
Typically, LUTs are grouped with flip-flop registers (DFF) and some carry logic, in
|
||||
so-called Programmable Logic Blocks (PLBs) or Configurable Logic Blocks (CLBs);
|
||||
depending on the vendor. These resources allow combining multiple LUTs for
|
||||
describing larger logical functions and for providing sequential behaviour.
|
||||
|
||||
Furthermore, additional purpose-specific blocks are included in the devices:
|
||||
Block RAMs (BRAMs), Digital Signal Processing (DSP) blocks, high-performance serial
|
||||
transmitters/receivers, etc. Those so-called *hard* blocks allow better
|
||||
area and power usage than LUT-based components. Nevertheless, the behaviour
|
||||
of any of those *hard* blocks can be replicated using PLBs only. At the same time,
|
||||
custom behaviour can only be described through LUTs, because all *hard* blocks
|
||||
are programmable within the pre-fixed functionality only.
|
||||
|
||||
.. _background:code-into-gates:
|
||||
|
||||
Turning code into gates
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Writing lookup tables is hard, so people have come up with abstract
|
||||
Hardware Description Languages (HDLs) we can use to describe them. The
|
||||
Configuring all the LUTs, carry logic and other *hard* blocks manually is
|
||||
hard and very challenging; so people have come up with abstract
|
||||
`Hardware Description Languages (HDLs) <https://en.wikipedia.org/wiki/Hardware_description_language>`_,
|
||||
that allow us to describe the expected behaviour and/or structure. The
|
||||
two most common languages are Verilog and VHDL. However, a modern trend
|
||||
is to embed an HDL inside an existing programming language, such as how
|
||||
Migen is embedded in Python, or SpinalHDL is embedded in Scala.
|
||||
is to achieve hardware description by embedding a Domain Specific Language (DSL)
|
||||
inside an existing programming language, such as how Migen is embedded in Python,
|
||||
Clash is embedded in Haskell, or Chisel and SpinalHDL are embedded in Scala.
|
||||
|
||||
Here is an example of a Verilog module:
|
||||
Here is an example of a counter module with a registered output:
|
||||
|
||||
.. code:: verilog
|
||||
.. tabs::
|
||||
|
||||
module example (output reg [0:5] Q, input C);
|
||||
reg [0:8] counter;
|
||||
always @(posedge C)
|
||||
begin
|
||||
counter <= counter + 1'b1;
|
||||
Q <= counter[3] ^ counter[5] | counter<<2;
|
||||
end
|
||||
endmodule
|
||||
.. group-tab:: Verilog
|
||||
|
||||
We can run this Verilog module through a synthesizer to turn it into
|
||||
``SB_LUT4`` blocks, or we can turn it into a more familiar logic
|
||||
.. code:: verilog
|
||||
|
||||
module example (
|
||||
input C,
|
||||
output reg [0:5] Q
|
||||
);
|
||||
reg [0:8] counter;
|
||||
|
||||
always @(posedge C)
|
||||
begin
|
||||
counter <= counter + 1'b1;
|
||||
end
|
||||
|
||||
always @(posedge C)
|
||||
begin
|
||||
Q <= counter[3] ^ counter[5] | counter<<2;
|
||||
end
|
||||
endmodule
|
||||
|
||||
We can run this HDL module through a synthesizer to turn it into
|
||||
LUT and DFF blocks, or we can turn it into a more familiar logic
|
||||
diagram:
|
||||
|
||||
.. image:: _static/verilog-synthesis.png
|
||||
:width: 100%
|
||||
.. figure:: _static/verilog-synthesis.png
|
||||
:align: center
|
||||
:width: 600px
|
||||
:alt: A syntheis of the above logic into some gates
|
||||
|
||||
If we do decide to synthesize to ``SB_LUT4`` blocks, we will end up with
|
||||
a pile of LUTs that need to be strung together somehow. This is done by
|
||||
a Place-and-Route tool. This performs the job of assigning physical LUTs
|
||||
to each LUT that gets defined by the synthesizer, and then figuring out
|
||||
A syntheis of the above logic into basic blocks.
|
||||
|
||||
Turning gates into a bitstream
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
If we do decide to synthesize to PLBs, we will end up with
|
||||
a pile of LUTs and DFFs that need to be strung together somehow. This is done by
|
||||
a Place-and-Route (PnR) tool. This performs the job of assigning physical PLBs
|
||||
to each LUT and DFF that gets defined by the synthesizer; and then figuring out
|
||||
how to wire it all up.
|
||||
|
||||
Once the place-and-route tool is done, it generates an abstract file
|
||||
that needs to be translated into a format that the hardware can
|
||||
recognize. This is done by a bitstream packing tool. Finally, this
|
||||
bitstream needs to be loaded onto the device somehow, either off of a
|
||||
SPI flash or by manually programming it by toggling wires.
|
||||
Once the PnR tool is done, it generates an abstract file that needs to be
|
||||
translated into a format that the hardware can recognize. This is done by a
|
||||
bitstream packing tool.
|
||||
|
||||
Loading a bitstream into a device
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Finally, the bitstream needs to be loaded onto the device somehow. It can be
|
||||
done manually by toggling wires of the FPGAs programming interface. However,
|
||||
most FPGAs are volatile, which means the the configuration is lost when
|
||||
unplugged. Therefore, most FPGA boards include some flash memory, which is
|
||||
typically programmed through SPI. Then, when the FPGA is plugged again, it loads
|
||||
the configuration from a predefined location in the flash.
|
||||
|
||||
About the ICE40UP5K
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
@ -104,42 +161,57 @@ very nice features:
|
||||
1. 5280 4-input LUTs (LC)
|
||||
2. 16 kilobytes BRAM
|
||||
3. **128 kilobytes “SPRAM”**
|
||||
4. Current-limited 3-channel LED driver
|
||||
5. 2x I2C and 2x SPI
|
||||
6. 8 16-bit DSP units
|
||||
4. 1x Current-limited 3-channel LED driver
|
||||
5. 2x I2C and 2x SPI interfaces
|
||||
6. 8x 16-bit DSP units
|
||||
7. **Warmboot capability**
|
||||
8. **Open toolchain**
|
||||
|
||||
Many FPGAs have what’s called block RAM, or BRAM. This is frequently
|
||||
used to store data such as buffers, CPU register files, and large arrays
|
||||
of data. This type of memory is frequently reused as RAM on many FPGAs.
|
||||
The ICE40UP5K is unusual in that it also as 128 kilobytes of Single
|
||||
Ported RAM that can be used as memory for a softcore (a term used for a
|
||||
CPU core running inside an FPGA, to differentiate it from a ‘hard’ -
|
||||
.. figure:: _static/ice40-arch.png
|
||||
:width: 400px
|
||||
:align: center
|
||||
:alt: iCE40UP5K Device, Top View (from iCE40 UltraPlus Family Data Sheet)
|
||||
|
||||
iCE40UP5K Device, Top View.
|
||||
|
||||
BRAM is frequently used to store data such as buffers, CPU register files,
|
||||
and large arrays of data. This type of memory is frequently reused as RAM on
|
||||
many FPGAs. The ICE40UP5K is unusual in that it also has 128 kilobytes of
|
||||
Single Ported RAM that can be used as memory for a softcore (a term used for
|
||||
a CPU core running inside an FPGA, to differentiate it from a ‘hard’ -
|
||||
i.e. fixed chip - implementation). That means that, unlike other FPGAs,
|
||||
valuable block RAM isn’t taken up by system memory.
|
||||
|
||||
Additionally, the ICE40 family of devices generally supports “warmboot”
|
||||
Additionally, the ICE40 family of devices generally supports *warmboot*
|
||||
capability. This enables us to have multiple designs live on the same
|
||||
FPGA and tell the FPGA to swap between them.
|
||||
|
||||
As always, this workshop wouldn’t be nearly as easy without the open
|
||||
toolchain that enables us to port it to a lot of different platforms.
|
||||
source tools that enable us to port it to a lot of different platforms.
|
||||
|
||||
.. _background:about-fomu:
|
||||
|
||||
About Fomu
|
||||
~~~~~~~~~~
|
||||
|
||||
Fomu is an ICE40UP5K that fits in your USB port. It contains two
|
||||
megabytes of SPI flash memory, four edge buttons, and a three-color LED.
|
||||
|
||||
Unlike most other ICE40 projects, Fomu implements its USB in a softcore.
|
||||
That means that the bitstream that runs on the FPGA must also provide
|
||||
the ability to communicate over USB. This uses up a lot of storage on
|
||||
this small FPGA, but it also enables us to have such a tiny form factor,
|
||||
and lets us do some really cool things with it.
|
||||
|
||||
.. image:: _static/fomu-block-diagram.png
|
||||
:width: 100%
|
||||
.. figure:: _static/fomu-block-diagram.png
|
||||
:align: center
|
||||
:width: 600px
|
||||
:alt: Block diagram of Fomu
|
||||
|
||||
Block diagram of Fomu.
|
||||
|
||||
The ICE40UP5K at the heart of Fomu really controls everything, and this
|
||||
workshop is all about trying to unlock the power of this chip.
|
||||
|
||||
.. NOTE::
|
||||
Some figures in this section were extracted from the `iCE40 UltraPlus Family Data Sheet <https://www.latticesemi.com/view_document?document_id=51968>`_.
|
Loading…
Reference in New Issue
Block a user