Chapter 5. Primary and Shard Key Design

Table of Contents

Primary Keys
Data Type Limitations
Partial Primary Keys
Shard Keys
Row Data

Primary keys and shard keys are important concepts for your table design. What you use for primary and shard keys has implications in terms of your ability to read multiple rows at a time. But beyond that, your key design has important performance implications.

Primary Keys

Every table must have one or more fields designated as the primary key. This designation occurs at the time that the table is created, and cannot be changed after the fact. A table's primary key uniquely identifies every row in the table. In the simplest case, it is used to retrieve a specific row so that it can be examined and/or modified.

For example, a table might have five fields: productName, productType, color, size, and inventoryCount. To retrieve individual rows from the table, it might be enough to just know the product's name. In this case, you would set the primary key field as productName and then retrieve rows based on the product name that you want to examine/manipulate.

In this case, the CLI script that you would use to create this table might be:

## Enter into table creation mode
table create -name myProducts
## Now add the fields
add-field -type STRING -name productName
add-field -type STRING -name productType
add-field -type ENUM -name color -enum-values blue,green,red
add-field -type ENUM -name size -enum-values small,medium,blue
add-field -type INTEGER -name inventoryCount
## A primary key must be defined for every table
## Here, we will define field 'productName' as the primary key.
primary-key -field productName
## Exit table creation mode
exit
## Add the table to the store. Use the -wait flag to
## force the script to wait for the plan to complete
## before doing anything else.
plan add-table -name myProducts -wait 

However, you can use multiple fields for your primary keys. On a functional level, doing this allows you to delete multiple rows in your table in a single atomic operation. In addition, multiple primary keys allows you to retrieve a subset of the rows in your table in a single atomic operation.

We describe how to retrieve multiple rows from your table in Reading Table Rows. We show how to delete multiple rows at a time in Using multiDelete().

Data Type Limitations

Fields can be designated as primary keys only if they are declared to be one of the following types:

  • Integer

  • Long

  • Float

  • Double

  • String

  • Enum

Partial Primary Keys

Some of the methods you use to perform multi-row operations allow, or even require, a partial primary key. A partial primary key is, simply, a key where only some of the fields comprising the row's primary key are specified.

For example, the following example specifies three fields for the table's primary key:

## Enter into table creation mode
table create -name myProducts
## Now add the fields
add-field -type STRING -name productName
add-field -type STRING -name productType
add-field -type STRING -name productClass
add-field -type ENUM -name color -enum-values blue,green,red
add-field -type ENUM -name size -enum-values small,medium,large
add-field -type INTEGER -name inventoryCount
## A primary key must be defined for every table
primary-key -field productName -field productType -field productClass
## Exit table creation mode
exit
## Add the table to the store. Use the -wait flag to
## force the script to wait for the plan to complete
## before doing anything else.
plan add-table -name myProducts -wait 

In this case, a full primary key would be one where you provide value for all three primary key fields: productName, productType, and productClass. A partial primary key would be one where you provide values for only one or two of those fields.

Note that order matters when specifying a partial key. The partial key must be a subset of the full key, starting with the first field specified and then adding fields in order. So the following partial keys are valid:

productName
productName, productType

But a partial key comprised of productType and productClass is not.

Shard Keys

Shard keys identify which primary key fields are meaningful in terms of shard storage. That is, rows which contain the same values for all the shard key fields are guaranteed to be stored on the same shard. This matters for some operations that promise atomicity of the results. (See Executing a Sequence of Operations for more information.)

For example, suppose you set the following primary keys:

primary-key -field productType -field productName -field productClass

You can guarantee that rows are placed on the same shard using the values set for the productType and productName fields like this:

shard-key -field productType -field productName

Note

Shard key fields must be a first-to-last subset of the primary key fields, and they must be specified in the same order as were the primary key fields. In the previous example, the following would result in an error:

shard-key -field productClass