Téxstego

Text is
posted on the
internet willy-nilly.
Innovations for abbreviations
and indentations are a fertile ground.
Left justification leaves plenty of room for
seemingly arbitrary wrap around to be used for repositories
of permanent secret messages. Indentation steganography was first
introduced in postings to sci.cry surreptitiously by this author in 1994.
The plaintext that is weakly encrypted against investigators might look innocuous.

Draft carrier paragraph rev 27 (j.txt):
The Perl languages would be used to process this page to provide an automated way to create an text indentation steganography result file. There are many way to do that. For examples, any new essay (a single several line draft carrier paragraph) are written using one long paragraph, and this is one. Also, a secret sentence is written in any separate file. The program outputs a new essay with multiple discrete lines of text using some words on each line in any form determined by a key. Line wraps are not used but newline characters end each line. The secrets messages are coded into the number of full letters in a line, not counting spaces. The writer then can edit the single one draft carrier paragraph so new words and new spelling mistakes or abbreviations are bad. This will result in the correct number of letters on each line, with whole words. Lines that have the correct number of letters are processed by the Perl program so no correction is needed on those lines. Lines that still have partial words will need to me manually edited. When the secret message is accurately represented in the steganographic multi-line result paragraph, it is published on a website. The authorized recipient of the paragrapher has a second Perl program to interpret the lines using a key. The secret sentence is reconstructed by that program by counting the number of letters in each line. This carrier paragraph is being used as a starting draft for this protocol.

Here is a secret sentence to be embedded in the previous draft carrier paragraph:
"meet me 11pm gazebo"
It has 19 letters, spaces and numbers, so 19 lines are needed in the future carrier paragraph.

INDENTATION CODE KEY:
the letter count padding is 23 (P)
numbers 0-9 are coded as 27-37 (N)
spaces coded as 0 (S)
letters a-z are coded 1-26 (L)

The indentation code key is easy to customize, perhaps with common letters given small numbers. The number of letters on each text line (M) will be set by a letter in the secret sentence.


.............................................

The ramping shape seen on some paragraphs illustrates how the R function can be implemented to introduce a variability into the indentations. As each line is put onto a carrier paragraph, the padding of letters that do not contain information can vary according to a function defined in the key code file.

Here is the resulting indentation stego, Rev 27:
The Perl languages would be used to process
this page to provide an automated
way to create an text indentation
steganography result file. There are many way to do
that. For examples, any new
essay (a single several line draft carrier
paragraph) are written using one
long paragraph, and this is
one. Also, a secret sentence is written in any separate file.
The program outputs a new essay with multiple discrete lines
of text using some words on each line in any form
determined by a key. Line wraps are not used
but newline characters end
each line. The secrets messages are
coded into the number of full
letters in a line, not counting spaces. The writer then can
edit the single one draft carrier
paragraph so new words and new
spelling mistakes or abbreviations are bad.

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

The Rev. 52 Indentation program has a better technique shown next. Two lines are paired at represent one Unicode character, coded as numbers from 00 to 93. Even lines show the first digit of two, odd llines show the second of 2 numerals in a two digit code for a character. Padding here is 40 characters on each line, first. Spaces are treated poorly in Rev 52, so the result looks rough. A future version gives the option of ignoring spaces in the carrier or using the spaces.

MNDMXNEARSENSTAYNREASMTFRNENPTNSENPBSERCBRNS
ENPRSEINCPRSENMRSEOPREHLDWLDNCBETFXLEITCXLNL
BEALPRPPITXLYPPIYNCBEMGKSEWCDRCBRNSEPRSEWLDRC
BRNSENTOGNENTXSECRSLECLTRSEWLDNCBENOPFSE
NLSRENCBENTEGDDMNSENCURERCBRNETENETFRNENCBRT
SENCBEINCFLRSEPRSEONDEINCBECDNSEPRSEONSE
DENCBEPRTSEPRSEONREDENCBETFNBCMSPSOLEMRD
ELUSETOTEWLDNINLDNCBEIWLDSNCBETRFXLALPNTEGSLSE
SEERtEVLSEMTSECTSEWSEFRTSEPNRTRSEONPRSEWL
DNCBENWLDXLRCMSPNEWLDSTSMEXLDVLMTTUNSENCBEXLAIU
NSRSTENMUNARSEKLSELRSTETRSETRSEMKSENMRSESAE
NSESENMBSENMNRCBRNSEPTEPTEWSRCBREEEMLSESP
RKSEKENOOLEIRTRSELECLGSEVUNVTREBKRSEPSESHLEIMTCSE
HTLSENCVTVTRSNMRFSUNEPLSEVCRSEOLTSENSKSENBS
ENSREOVSEPVTSEWLDNCBEX0RLPNMSENRSEINNTRLE
RCBRNSENTSRCRNELSPNSENGSPSEMKSERBSENCBEAVXL
RHMCRENMRENCBEIMUNDDLSEDWMIHPLXDRLXMNDMXN
EARSENSTAYNREASMTFRNENPTNSENPBSERCBRNSENPRSEINCPR
SENMRSEOPREHLDWLDNCBETFXLEITCXLNLBEALPRPPI
TXLYPPIYNCBEMGKSEWCDRCBRNSEPRSEWLDRCBRNSENTOGNEN
TXSECRSLECLTRSEWLDNCBENOPFSENLSRENCBENTE
GDDMNSENCURERCBRNETENETFRNENCBRTSENCBEINCFLRSEP
RSEONDEINCBECDNSEPRSEONSEDENCBEPRTSEPRSEON
REDENCBETFNBCMSPSOLEMRDELUSETOTEWLDNINLDNCB
EIWLDSNCBETRFXLALPNTEGSLSESEERtEVLSEMTSE
CTSEWSEFRTSEPNRTRSEONPRSEWLDNCBENWLDXLRCMSPNEWLD
STSMEXLDVLMTTUNSENCBEXLAIUNSRSTENMUNARSEK
LSELRSTETRSETRSEMKSENMRSESAENSESENMBSENMN
RCBRNSEPTEPTEWSRCBREEEMLSESPRKSEKENOOLEI
RTRSELECLGSEVUNVTREBKRSEPSESHLEIMTCSEHTLS
ENCVTVTRSNMRFSUNEPLSEVCRSEOLTSENSKSENBSENS
REOVSEPVTSEWLDNCBEX0RLPNMSENRSEINNTRLERCB
RNSENTSRCRNELSPNSENGSPSEMKSERBSENCBEAVXLRHM
CRENMRENCBEIMUNDDLSEDWMIHPLXDRLXMNDMXNEARSENST
AYNREASMTFRNENPTNSENPBSERCBRNSENPRSEINCPRSENMRSEO
PREHLDWLDNCBETFXLEITCXLNLBEALPRPPITXLYPPIYN
CBEMGKSEWCDRCBRNSEPRSEWLDRCBRNSENTOGNENTX
SECRSLECLTRSEWLDNCBENOPFSENLSRENCBENTEGDDMNS
ENCURERCBRNETENETFRNENCBRTSENCBEINCFLRSEPR
SEONDEINCBECDNSEPRSEONSEDENCBEPRTSEPRSEONRED
ENCBETFNBCMSPSOLEMRDELUSETOTEWLDNINLDNCB
EIWLDSNCBETRFXLALPNTEGSLSESEERtEVLSEMTSECT


The plaintext looks like this: ZYXabc 1234567890 def
The advantage is that each line has a length within 9 characters of the length of any other line, using 93 possible characters. The previous version has a 37 character variation in line length for an alphabet of 37 characters.

That key has some randomness. A new key as been prepared so the most common letters get the best numbers. The 26 small letters have codes near 55 so most lines have little variation from the length of 5 characters added to a constant padding of 40.


$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$


$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

4/12/11 TEXSTEGO BEGINS HERE...

EXAMPLE CARRIER WITH HIDDEN MESSAGE:

Introducing Texstego, the text steganography program. It
hides a short secret message in a public text paragraph.
The secret message here is a phrase of twenty letters
hidden in forty lines of a carrier paragraph. Pairs
of any text lines combine to give a secret two digit code.
Each secret character comes from a set of some 99 keyboard
characters. Each line of public text has a length that
varies up to 9 characters. Pairs of lines compose secret
codes from 00 to 99. Also, common letters like " e " are
given up to five codes so codes are less frequent
than that common letter. The character "e" was then
assigned the following five codes in the key file: 89, 46
, 20, 34, and 57. But the uncommon letter " z " only
gets one code number in the keys file. The letter
"t" gets four secret codes. The paragraph you are reading
now has forty lines. That is twenty pairs of lines.
Each pair encodes one character. A carrier paragraph
is being written as some longish line with no newline
characters. The wrap around from the wordprocessor
normally indents a right side of the page by about
eight characters, depending on the vocabulary. If I
use big words, the indentation increases. This means that
the nine character variations for steganography is near
ly this same amount of line length variation as ordinary
writings. When I'll finish writing this carrier paragraph
with no newlines, it will be an input to texstegi-
02.pl, a Perl program. The output from that program
has newlines added so the coded message will correspond
to the number of characters on each line. We will manually
edit the new paragraph so whole words are never split onto
new lines. I innovate edits so the number of character
s remain the same. Then run the Perl program again and
iterate this process until each text line only has whole
words. A second Perl program texstego-01.pl decrypts.

Key file contents:

0`03
1`32
2`61
3`12
4`68
5`38
6`07
7`47
8`53
9`31
a`83
a`44
a`56
a`62
b`39
c`77
d`24
e`89
e`46
e`20
e`34
e`57
f`30
g`76
h`19
h`91
h`29
i`25
i`75
i`04
j`21
k`70
l`43
m`52
n`01
n`50
n`92
o`18
o`55
o`88
o`02
p`85
q`95
r`42
r`90
r`69
s`99
s`63
s`08
t`45
t`37
t`96
t`64
u`13
u`84
u`79
v`72
w`65
x`80
y`71
z`09
A`33
B`94
C`23
D`66
E`35
F`14
G`74
H`26
I`40
J`58
K`05
L`87
M`48
N`97
O`11
P`86
Q`49
R`73
S`10
T`82
U`36
V`93
W`22
X`81
Y`78
Z`00
.`27
,`15
"`67
(`41
)`59
!`06
@`51
$`60
?`28
=`54
/`98
:`16
`17
```50`pad
```90`ext

pad is a padding of 50 characters to start each line
ext is an extension code for flexible improvements, later.

Here is a Perl snippet of an important allocation of line lengths ($offset[]):
LT means Less Than
# make array called @offset of keyed numbers
# according to each letter in secret sentence
my @offset;
$count=0;
print "offsets are: ";
# loop secret of about 20 characters (sizeS)
# watching for frequent letters

# e tao inshr dlu cmfwypvbgkqjxz  english frequencies
# 5 444 33333 222 11111111111111

my $flag_match;
my $frequ_e = 5;
my $count_e = 0;
my $frequ_t = 4;
my $count_t = 0;
for ($i=0; $i LT $sizeS; $i++)
{
$flag_match = 0;
for ($j=0; $j LT ($key_size - $additives); $j++)
{
@key_map = split (/`/,$key_line[$j]);
if ((ord($key_map[0]) == ord($sentence[$i]))&&($flag_match == 0))
{
$offset[$i] = $key_map[1];


if ((ord($key_map[0]) == ord("e")) && ($count_e LT $frequ_e))
{
$flag_match = 1;
$j += $count_e;
$count_e++;
if ($count_e == $frequ_e)
{
$count_e = 0;
}
@key_map = split (/`/,$key_line[$j]);
$offset[$i] = $key_map[1];
}

if ((ord($key_map[0]) == ord("t")) && ($count_t LT $frequ_t))
{
$flag_match = 1;
$j += $count_t;
$count_t++;
if ($count_t == $frequ_t)
{
$count_t = 0;
}
@key_map = split (/`/,$key_line[$j]);
$offset[$i] = $key_map[1];
}
}
}
}

# look at offsets for secret of length $sizeS
for ($i=0; $i LT $sizeS; $i++)
{
print "$offset[$i] ";
}
Téxstego work continuing April 26 2011 while checking success over a variety of inputs...