Word Processing
Current Word Processors are a joke. What we need is an
html based Word Processor.
html is a sub-set of sgml - perhaps that is the way to go.
MS Word
I had a small MS Word file,
added about 100 bytes in MS Word and saved it.
As far as I can tell, both documents were save as
Word.Document.8 files.
The second file was saved with unicode (2 bytes per character).
Size Before | Size After | Word 97 SR-2
| Size in MS Word Generated html
|
---|
48 Kb | 388 Kb | 60 Kb | 21 Kb
|
The html document only took about an hour to clean it up
(fix the formatting).
With this kind of performance, why would anyone use MS Word?
(In a typical month, I use MS Word as only a spelling checker.
All my other word processing is done with notepad/html.)
Bugs and Features
You won't believe the problems reported at
Woody's Watch
- Using fields, MS Word can read files from your disk and include them
in the document without your knowledge.
Vol 3 No 24, 10-2-02 gives explicit details.
{If {IncludeText {If { Date } = { Date } "c:\\a.txt"
"c:\\a.txt"} } = "" "" }
Use the
Hidden Field detector to help protect yourself.
Corrupt Templates
MS Word is vulnerable to various viruses and file corruption.
If it isn't acting right, try renaming the default template
C:\Program Files\Microsoft Office\Templates\Normal.dot
With Office 2000, try
C:\windows\application data\microsoft\templates\Normal.dot
With Office XP on Windows XP, try
C:\Documents and Settings\[UserName]\Application Data\Microsoft\Templates\Normal.dot
When Normal.dot can not be found, MS Word will simply create a new copy.
In general, this will fix missing menu options.
In one case I read about, Windows crashed every time MS Word started until
the template was deleted.
Normal.dot (the default template)
- Defines the default font, tabs, and similar properties
- Controls the display of some of the menu items
- Contains the style definitions
- May contain macros, in particular, a macro that is
automatically run when Word starts
Spyware
This is interesting - open a .doc file in notepad and look at what is in it.
There are several items I find unacceptable from a privacy point of view.
- The full paths and filenames used the last 8 (or more) times the the file was edited.
- The name of the machine that the file was edited on.
- The name of the company that owned the machine.
- A _PID_GUID that can be used to trace the file and every file created on that machine.
- Miscellaneous data from other files on the hard drive.
That's right, every time you save an MS Word document, Microsoft hides
information inside it that can be used to violate your privacy.
Using SaveAs makes no difference - the information in the original file
is saved in the new file. It appears that the _PID_GUID is
determined by the machine that the file was originally created on.
This identifier does not cahnge when you edit the file on another machine.
For instance, I don't want everyone who gets a copy of my resume
to know the name of the directory on my machine where it is stored.
Some of this information can be removed via File / Properties.
Title, Author, and Company are automatically populated by Word -
Just delete them from the Summary tab.
Be sure to check the other tabs to make sure that all identifying
information is removed.
You think that I am over reacting. Consider this -
A company wants to distribute a resume with the applicant's name removed.
Most people simply remove the name from the document. Sounds simple.
I have received resumes like this.
In order to retrieve the applicant's name, you just
- Open the file in notepad
- Scroll down to the directories
- Read the name of the original writer
- Also, perhaps, see the names of the people this resume was copied from
The only way I know to delete the history and to get a new _PID_GUID
is to copy the contents of the file to the clipboard and to paste
it into a new document.
Unfortunately, it will still have the computer name and path
for the new file.
The "Miscellaneous data from other files" was even more suprising.
One of my doc files actually had registry entries in it.
My best guess on this is that Word allocates blocks of data and
simply writes data to them without first erasing them.
Talk about leaking private information to the world.
You have to realize that a 4K text file contains the same number
of letters as a 34K Word file. (I've seen the same file change from
23K to 35K just by using SaveAs.)
(I have valid reasons why I prefer to use html for text processing.)
WordPerfect
Well, at least you can see the edit codes with this.
I no longer use it on a regular basis because
my company won't let me.
The main reason for that is because our client (the US Government)
REQUIRES us to use MS Word.
(Well, that's one way to create a monopoly.)
html
Only about 5 additional tags are necessary to make html a full
word processing language
- Forced page break
(Available via css)
- Conditional page break
- Header & footers - could fake this using the "Forced page break" style sheet trick
- Floating keep
- Ability to select Portrait / Landscape
(Available via css)
Some additional functions that would be nice are
- Tab and Tab Set (can be simulated with tables, but not quite the same thing)
Author: Robert Clemenzi -
clemenzi@cpcug.org