As OpenXML and ODF battles for acceptance with governments and corporations, Google is quietly rolling out its down ‘contents + zip’ file format. The screenshot below shows a ‘Save as HTML (Zipped)’ option in Google Doc’s File menu
Once saved, it appears as a zip file in my system. Silly me try to rename it to docx but Office 2007 reported a file corruption. so lets explore what is inside the zip file. Since I just inserted an image and type a line of text, the zip contains a HTML file with an image folder as below
If i extract the file contents and edit it with Word 2007, it will appear nicely.
So now lets look at the HTML source:
<meta http-equiv=”Content-Type” content=”text/html; charset=UTF-8″>
<head>
<style>
BODY, P, DIV, H1, H2, H3, H4, H5, H6, ADDRESS, OL, UL, LI, TITLE, TD, OPTION, SELECT
{
font-family: Verdana;
}
BODY, P, DIV, ADDRESS, OL, UL, LI, TITLE, TD, OPTION, SELECT
{
font-size: 10.0pt;
margin-top:0pt;
margin-bottom:0pt;
}
BODY, P
{
margin-left:0pt;
margin-right:0pt;
}
BODY
{
line-height: ;
background: white;
margin: 6px;
padding: 0px;
}
h6
{ font-size: 10pt }
h5
{ font-size: 11pt }
h4
{ font-size: 12pt }
h3
{ font-size: 13pt }
h2
{ font-size: 14pt }
h1
{ font-size: 16pt }
blockquote
{padding: 10px; border: 1px #DDDDDD dashed }
a img
{border: 0}
</style>
</head>
<body revision=”ddhn77sv_46hnkzjt:2″>
<P>
Test
</P>
<P>
<IMG src=”Test_images/ddhn77sv_47dvgmzdgg.jpg” mce_src=”Test_images/ddhn77sv_47dvgmzdgg.jpg” style=”WIDTH:400px; HEIGHT:300px”>
</P>
</body>
</html>
Quite a standard HTML (not event XHTML as there is no notation). Google Doc embeds all the standard formattings in the Style area. One thing added is the revision attribute in Body with the value ‘ddhn77sv_46hnkzjt’. The 2 after the colon is the revision since this is teh 2nd time I saved the file. HTML editor would not increase this value but once you upload to Google Doc, it will be increased to 3. I wonder this is a pre-text for a document management system later for Google Apps Domain.
That’s all for Google Doc….