Another grep question
Matthew Seaman
m.seaman at infracaninophile.co.uk
Tue Feb 8 03:11:26 PST 2005
On Tue, Feb 08, 2005 at 03:44:47AM +0100, Anthony Atkielski wrote:
> Giorgos Keramidas writes:
>
> GK> It may not be related to what you are seeing, but grep(1)
> GK> is locale-aware. What it considers a "text" character
> GK> depends on the current locale settings.
>
> I tried setting LC_ALL to en_US.UTF-8, en_US.ISO8859-15, and
> en_US.ISO8859-1, with no effect. The character in question is an
> opening double quotation mark in the Windows character set. I want to
> find it in my Web pages and replace it by an appropriate HTML escape
> sequence. I know it's out there, but grep isn't finding it, or I'm not
> telling it how to find the character correctly.
Ah -- well, the beauty of Unix is that if the first tool you think of
doesn't do the job, then the next one probably will.
You can use perl to match and replace arbitrary characters:
% perl -pi.bak -e 's/\x93/“/g' foo.html
Or you could go for the bulk method and run HTML tidy(1) over the
file, which is usually pretty good at converting any-old HTML into
something that will pass validation:
(ports: www/tidy) http://www.w3c.org/People/Raggett/tidy/
(ports: www/tidy-devel) http://tidy.sourceforge.net/
Cheers,
Matthew
--
Dr Matthew J Seaman MA, D.Phil. 8 Dane Court Manor
School Rd
PGP: http://www.infracaninophile.co.uk/pgpkey Tilmanstone
Tel: +44 1304 617253 Kent, CT14 0JL UK
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 305 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20050208/ea104486/attachment.bin
More information about the freebsd-questions
mailing list