Creating and Importing Webpages
Webpages differ from most conventional document formats as their visible content is produced by a related set of interlinked files of various types. The basis of this process is an html file (Hyper-Text Markup Language) which contains the text of the webpage along with a set of instructions. The instructions include any links to other pages, and the details of any other files that are required to place graphics on the page.
Most of the graphics on webpages are based on two types of bitmap file – JPEGs and GIFs. Each of these has become popular because they compress the picture information and cut down the required filesize. More recently, a new form of compressed graphic file – the PNG – has started to become popular. Although currently rarer than JPEGs and GIFs it combines the good points of them both, and adds a few useful features of its own. As a result it is likely to become very common in the future.
From version 4.10 onwards, TW/EW ‘Pro’ can import and export html files which use all of the above filetypes. Many of the common features of webpages are now recognised – including links. This page explains the import/export process and shows how you can obtain the best results. For simplicity, I’ll use ‘TW’ to refer to TW/EW Pro throughout this page.
The topics covered on this page are
TW 4.10 has also introduced an extended ability to control and set background colours. This is of particular use for producing attractive webpages.
To import a webpage, you simply drag the appropriate html file and drop it on the TW icon bar icon. TW will then import the file, read its contents, and attempt to load any graphics by following up the file names it contains.
For example, the file intro/html shown in the above filer window is accompanied by some GIF bitmap image files – back/gif, head/gif, next/gif, etc. When importing the html file TW discovers the references to these files, finds them, and uses them to place the appropriate graphical images in the TW document it creates. The user simply drops intro/html on the TW icon bar icon and TW will do the rest.
An alternative way to import a webpage is simply to drag the html file and drop it on a TW window. This currently imports the html but does not then attempt to find and load the graphics. This is useful if you want to read a page, or use text from it, without having to devote memory (or time) to the graphics.
When you load an html file, the presented styles that determine the document’s appearance are controlled by the stationery file, HTMLStyles, which is normally kept inside the TW application directory. When you drop an html file onto the TW icon bar icon this stationery file is loaded. You can edit the styles in this stationery file to setup the appearance you require. Most of the styles have names which will be fairly obvious to someone familiar with html – Hr for ‘horizontal rule’, Em for ‘emphasis’ (i.e. italic on most browsers), etc. An exception to this is that the html ‘header’ styles are treated as sections/subsections by TW rather than styles, so you find these via the Structures->Sections or ->Sub-sections menus. These aren’t really intended for export, but the sub-sections can be used to place header tags around text in exported html.
The process of importing JPEGs and PNGs found on webpages is essentially the same as when they are found in word documents. The process is described in detail on the page devoted to ‘word’ import/export so here we need only make the following two main points:
- If you are using RiscOS 3.60 or higher, the JPEG will normally be kept in JPEG format in the TW document and display will be handled by the operating system.
- PNGs are now displayed by TW and it treats them as if they are a ‘native’ format just like a sprite.
The above behaviour is very convenient when using TW to view or edit webpages as it means that JPEGs and PNGs are kept in their compressed formats.
When TW encounters a reference to a GIF in an html file it asks for !InterGIF to convert them into equivalent sprites. If the conversion is successful, the sprite is then stored and displayed in the resulting TW document. As a result, unlike JPEGs/PNGs, GIFs are converted upon importing as neither RiscOS nor TW currently have a ‘native’ way to deal with them. If you do not have !InterGIF (or !ImageFS) then GIFs will not be converted and displayed.
As with JPEGs, there are some potential problems when handling GIFs if !ImageFS is active. If !ImageFS is running and set to handle GIFs ‘automatically’ it will attempt to intercept any calls from TW requesting !InterGIF to handle conversion. It is therefore worth while to either ensure !ImageFS is not running, or to set it not to ‘auto’ convert GIFs. The main reason for this is that !InterGIF will translate GIF/Sprite transparency masks correctly whilst current versions of !ImageFS will discard transparency data.
As with importing, exporting webpages should appear to the user as a simple ‘drag and drop’ process. The save box provides you with an html icon and the ability to choose a name before dropping the file where you want it to be saved.
During the saving process TW will use !InterGIF to convert all the DrawFiles and Sprites it encounters into GIFs. (Note: as with import !ImageFS may ‘interfere’ with this process if you do not prevent it taking automatic control over GIFs!) JPEGs and PNGs are kept in this form. All of the resulting files and then saved in a subdirectory.
The above illustration shows the result of exporting a TW document by dragging and dropping an html file called, test/html, into the directory, demo. The document contains a single graphic image. TW uses (or creates if it does not yet exist) a directory, Images, and a subdirectory of the same name as the saved file. The graphics for the webpage are then saved into the appropriately named subdirectory and then linked into the html file.
This process effectively automates the conversion, saving, and linking of graphics without any need for the user to become involved. However the user should be aware of this in case, for example, they wish to delete a webpage when they should also know where to find the associated image files. BTW, this process doesn’t limit the user to 77 graphics per webpage as TW sensibly creates up to ten additional, differently numbered, subdirectories as required, to hold graphics. Clearly, this is a major limitation for anyone who finds they simply must have over 700 graphics on a page.
(Apologies here if your browser doesn’t support the <AMUSED SARCASM> tag... I’ve not personally tried what happens when you put more than 770 graphics on a single page. I’m not going to worry about it!... )
When TW encounters a Sprite illustration during html export it sends the image to !InterGIF for conversion into a GIF. This process will preserve any transparency mask. However, as noted earlier in the section on GIF import, this process may be ‘intercepted’ by !ImageFS. Since !ImageFS does not handle transparency you should either ensure it is not running or disable its ‘auto’ handling of GIFs to obtain the best results.
DrawFiles (and TW equations) are first converted into Sprites, then the sprites are sent to !InterGIF for conversion into GIFs. To achieve good results against a coloured page background, the sprite is plotted at 3 normal resolution against an appropriate background colour and all DrawFile/Equation text has its font background colour set to the appropriate background colour before being drawn on the sprite. The result is then ‘dithered down’ to obtain an effectively anti-aliased result where the background colour has been made transparent.
TW assumes that PNGs and JPEGs can be displayed by browsers and hence makes no attempt to convert them. Any PNGs and JPEGs are therefore simply saved as graphics files and linked into the html webpage.
TW now understands most of the common HTML3.2 tags and name-entities and converts them automatically. Some characters fall outwith the 3.2 standard, but are commonly used in conventional documents. Many of these are covered by the Transitional HTML4 standard. To cope with some of these TW provides a ‘map’ file called Hi_HTML inside the TW application directory. (Note that, as is now usual, the information in this file will be overridden if there is another file of the same name in !Boot.Choices.TechWriter.)
The contents of this file control what TW will export when it encounters characters in the ASCII range 128-159 which have no unique definition in the common character set(s). An excerpt from the default file supplied with TW is as follows, shown on the left, next to a modified version to the right.
Original
| Modified
|
#138
| #138
|
#139
| #139
|
...
| ...
|
#141
| #141
|
#142
| #142
|
#143
| #143
|
'
| ‘
|
'
| ’
|
#146
| #146
|
#147
| #147
|
"
| “
|
"
| ”
|
#150
| #150
|
-
| –
|
--
| —
|
-
| —
|
#154
| #154
|
#155
| #155
|
Each line of the file defines what output TW will place in the html file when it encounters a specific character in the relevant range. The lines are in order, so the first line of the file relates to character number 128, the second line to character number 129, and so on. (Note that the above table omits the first few lines and starts with the one related to character number 138.)
When a line starts with a hash (#) symbol TW simply uses ‘number entity’ for output. This essentially asks the web-browser to use whatever character it has available assigned to that number. So, for example, the default settings means that a “ ” typed into a TW document will ask the browser for “character number 142”. This will give the correct result if someone happens to be using the same operating system and browser font, but otherwise it will give ‘unpredictable’ results!
A ‘safer’ method is shown above in the ‘original’ column for characters 148 and 149. (Smart/sexed double quotation marks in most RiscOS fonts.) When TW finds a line in Hi_HTML that does not start with a hash symbol it simply copies the line into the exported webpage. As a result, the default Hi_HTML file causes all exported smart quotes to be replaced by old-fashioned ' or " quotation marks. This makes the output less attractive typographically, but it maximises the chance that all browsers will produce a readable result.
An alternative is to change to the use of ‘name entities’. The ‘modified’ column shows this for the sexed/smart quotes. Once this modification has been carried out TW will replace any quotes with their HTML4 name entities – “ and ” for the left and right double quotes, and ‘ and ’ for left and right single quotes. These produce much better results on Browsers (such as !Fresco, !Browse, and MIE4) that recognise the HTML4 name entities, but can have unexpected results when viewed with older browsers.
By providing the Hi_HTML file, Icon Tech have left the choice up to the user. For maximum compatibility, leave the file as it came. To exploit the extra features of more modern and HTML4 aware browsers, modify the file as you wish. The modified example above shows how to get sexed quotes and various dashes on browsers that understand the relevant entities. It is also perhaps worth pointing out that the Hi_HTML file also give the user the ability to ‘poke’ specific strings into the html output. So if you really want to you could arrange for, say, character 154 (“ ”) to put a string like “hi there!” on the webpage. In itself this is trivial, but you could perhaps use (misuse?) it to poke html instructions that TW does not currently support...
Back to the main TechWriter Tips page for more info...
Pages written using TW Pro and HTMLEdit
Content and pages maintained by: Jim Lesurf (jcgl@st-and.demon.co.uk)