We want to turn a list of strings (for example ["hello", "world"])
into a single string that we can later decode back into the exact same list.
To do this reliably, we need a way to know where each original string starts and ends β
even when the string itself contains digits, symbols, or repeated characters.
Encoding Strategy
We encode each string as <length>#<string>, where:
- length is the number of characters in the current original string being decoded (written in decimal, since weβre working with text). It tells us exactly how many characters belong to this specific original string within the encoded data.
-
The original string might even contain
#, but decoding still works safely.
This is because, within the portion of the encoded string that represents the current string, the first#we encounter is guaranteed to be the separator β only the length appears before it in the segment encoding that string. Any#characters belonging to the original string itself come after this separator and are treated as part of the actual string content. - string is the original string itself.
Then we just concatenate these chunks one after another with no extra delimiter.
Example:
"hello" β 5#hello
"world" β 5#world
Final encoded result: 5#hello5#world
Why This Encoding Is Safe
The key is that # gives us a clear boundary between
"this part is the length" and "this part is the actual content."
When decoding, we can scan forward until we see #,
and everything before it is guaranteed to be just the number
(the length). After that, we know exactly how many characters
to read for the string.
This solves common edge cases:
-
The original string might start with digits
(like
"12345"). -
The original string might even contain
#. - The original string might be long (2-digit, 3-digit, 10-digit length, etc.).
All of these cases are still unambiguous, because we do not
guess where the string ends β we trust the length we just parsed.
The # only ends the length portion, not the string.
Decoding Strategy
We walk through the encoded string using two moving indices:
one called left and one called right.
-
Initialize
leftat the start of the encoded string. -
Set
rightto the same position asleft, then move it forward until you find#.
The substring fromleftup to (but not including)rightrepresents the length as text. For example,"12"means the next string has length 12. -
Convert that length text into a number
L. -
Read exactly
Lcharacters immediately after the#; this substring is the decoded string. -
Move
leftso it points to the position immediately after the decoded string β this marks the beginning of the next encoded chunk to process.
Continue this process until left reaches the end of the encoded string.
At that point, all original strings have been successfully rebuilt in their original order.
Why This Works
This method is:
- Deterministic: There is never any ambiguity about where one string ends and the next begins.
-
General: It supports any string content β including digits, punctuation,
#, Unicode β and scales naturally to long strings, since lengths can have multiple digits (e.g."42#...","105#..."). -
Efficient: Both encoding and decoding run in
linear time
O(N)whereNis the total number of characters across all strings. We only scan what we actually need.
Intuition Recap
Weβre basically turning the list of strings into a self-describing stream: each piece says βI am this long, here is my body.β Because we always read the declared length instead of βguessing boundaries,β decoding can reconstruct the original list exactly, with no information loss.