ML-KEM & ML-DSA: What is the secret key?

The TL;DR: The seed is the secret key. Store the seed and expand it on use as needed.

FIPS 203 (ML-KEM) and FIPS 204 (ML-DSA) both define two representations of a private key: a small seed and a large expanded form containing precomputed polynomial vectors. The expanded form is deterministically derived from the seed through a one-way function, meaning the two are mathematically equivalent, but the ecosystem has not (yet) agreed on which one to store and serialize.

Seed vs expanded key

ML-KEM key generation (FIPS 203, Algorithm 16) starts from 64 bytes of randomness: d (32 bytes, used to derive the public matrix seed and the secret polynomial vectors s and e) and z (32 bytes, the implicit rejection secret). The value z is needed because ML-KEM needs to behave identically whether a ciphertext is valid or not, otherwise an attacker with access to a decapsulation oracle can learn the secret key through chosen-ciphertext queries. When decapsulation detects an invalid ciphertext, it returns SHAKE-256(z || c) instead of failing, making the two paths indistinguishable to the caller. From d || z, you can regenerate the full decapsulation key: the NTT-domain secret vectors, the public encapsulation key, and a hash of the public key that binds decapsulation to a specific key pair.

ML-DSA key generation (FIPS 204, Algorithm 1) starts from a single 32-byte seed xi. From xi, it derives the public matrix seed rho, the signing secret K, and the secret vectors s1, s2, and t0 through SHAKE-256. The expanded secret key bundles all of these together with tr, a hash of the public key that gets mixed into every signature to bind it to the signer (this is the same tr value that gives pure ML-DSA its non-resignability property).

For ML-KEM-768 the seed is 64 bytes and the expanded secret key is 2,400 bytes (~37x). For ML-DSA-65 the seed is 32 bytes and the expanded secret key is 4,032 bytes (126x).

The expanded key is a precomputed cache that lets implementations skip the SHAKE derivation on repeated operations and the seed is the irreducible secret from which the expanded key is derived.

Only store the seed

An expanded ML-KEM decapsulation key contains polynomial coefficients in NTT form (Number Theoretic Transform, the lattice equivalent of FFT for efficient polynomial multiplication) that must each lie in the range [0, 3329), a full copy of the encapsulation key, and an embedded hash that must be consistent with that copy. FIPS 203 Section 7.3 mandates that implementations validate all of this on import: bounds-check every coefficient, recompute the hash, and reject on a mismatch. Expanded ML-DSA keys carry analogous constraints on the encoded vectors s1, s2, and t0. Whereas a seed is just 32 or 64 bytes of opaque randomness where any value produces a valid key, so there is no parsing or validation to get wrong.

This distinction has security implications. Sophie Schmieg's Kemmy Schmidt research demonstrated that ML-KEM lacks certain key-binding properties when an attacker can manipulate components of the expanded key independently. The paper describes two attacks. The first modifies the embedded public key hash h, breaking the binding between the shared secret and the ciphertext. The second replaces the implicit rejection secret z across two keys with a shared value, so that both keys produce the same output on the implicit rejection path for a crafted ciphertext. Both attacks require the attacker to construct malformed expanded keys where individual fields have been altered without touching the others. Seed-only storage removes the expanded key parsing surface entirely and prevents the first attack, because the hash is recomputed from the seed on every expansion. It does not fully prevent the second. The ML-KEM seed is d || z, two independently generated 32-byte values concatenated together, so an attacker who can modify the stored seed can still vary z while holding d fixed. Full mitigation would require deriving both from a single master seed, which NIST chose not to mandate. Even so, seed-only storage is strictly better than expanded key storage: it closes one attack completely and reduces the other to seed-level tampering rather than arbitrary field manipulation.

The performance cost of expanding from a seed is negligible. Filippo Valsorda benchmarked seed expansion for ML-KEM-768 at about 40 microseconds on an M2, which is less than the decapsulation operation itself. If you are in a high-throughput setting where even 40 microseconds matters, you can expand the key once at load time and keep it in memory for the lifetime of the process. The on-disk format does not need to match the in-memory representation, and in fact it is better if it doesn't, because it means the stored key is always in the simplest most validatable form.

Serialization

The IETF LAMPS working group initially specified seed-only private keys for ML-KEM and ML-DSA in their certificate drafts. Then OpenSSL, Bouncy Castle, and others pushed back: some implementations had already generated and stored expanded keys without retaining the original seed, and because the derivation runs through SHAKE you cannot recover a seed from an expanded key. Those keys would become unrepresentable in a seed-only format.

The compromise was an ASN.1 structure with optional fields for both the seed and the expanded key, with implementations expected to output both when possible and accept whichever is available on import. This post explains why this is bad: when a serialized key contains two representations of the same secret and the format does not mandate consistency verification, one library can read the seed field and expand it while another reads the expanded key field directly, and if the two fields disagree (through an attacker modifying the blob, a bug, or a bad migration), these implementations will operate with different key material from the same file. A serialization format where two conforming implementations can derive different keys from the same input is a security hazard.

The right answer would have been either seed-only (which is what LAMPS originally proposed) or an explicit CHOICE in the ASN.1 that forces each key to declare which representation it carries, rather than allowing both to coexist in the same structure.

What to do

For new systems, I recommend to store and transmit seeds only. For ML-KEM, that is the 64-byte d || z value. For ML-DSA, that is the 32-byte xi value. Expand on use and cache in memory if performance requires it.

If your system already stores expanded keys without the seed, you will most likely need to re-key (hopefully your system is agile enough to do this). The derivation through SHAKE is one-way and there is no way to recover the original seed. For ML-DSA signing keys bound to long-lived certificates this probably means reissuance, and it is better to plan for that now than to discover the incompatibility when you try to migrate key material between implementations that disagree on which field to read.

If you are designing a cryptographic API, do not expose expanded key import as a public interface and instead accept seeds and expand them internally. If you have to accept expanded keys for backward compatibility, validate every field fully and treat any failure as an error.

This is the same convergence problem that I wrote about in pre-hash versus pure ML-DSA: the standard defines multiple modes but the ecosystem needs to pick one. Again, the right answer is the simpler one that eliminates unnecessary degrees of freedom. For secret key representation, use the seed.