Wednesday 28 March 2007

Compact Framework... link drop

It's been a while since I've worked on CE/mobile, but getting back into it. All the stuff that's new/changed since my last 'real' mobile app makes for interesting reading...

What's New for Developers in Windows Mobile 6

Windows Communication Foundation (Compact Edition) is a great read.

OpenNETCF was an essential reference last time I checked...

Synchronization Strategies: email?!

SQL Server CE - latest name for SQL Everywhere; no it doesn't just run on Windows CE...

Thursday 22 March 2007

'Declarative' Google Maps - store locator sample

Just added another 'sample' app:
Store Locator: Help customers find you with Google Maps to CodeProject.

I started playing with GMapEZ declarative Google Maps API in Feb, originally just for mapping 'race' courses in Sydney for my running club. Later adapted it for a couple of 'proof of concept' samples related to mapping customer data and delivery truck routes (using Google's driving directions feature).

As well as the article, you can play with the NeedCoffeeNow sample online.

Monday 19 March 2007

Open DOCX using C# to extract text for search engine

Parsing text-content out of different file formats for Searcharoo (or other search engines) can be accomplished a number of ways, including writing your own parser (eg. not too difficult for Html) or using an IFilter loader.

However there's always going to be new document types or formats where you want to build a custom parser... and today the new Word 2007 DOCX format is an example: I don't have Word 2007 installed on my PC so I doubt there's any IFilter implementations for it lying around here either.

A bit of background: the DOCX format is basically a ZIP file containing a directory-tree of Xml files, and from what I can gather the main body of a (Word 2007) DOCX file is located in word/document.xml within the main ZIP archive.

Using a .NET ZIP library based on System.IO.Compression it's relatively simple to open a DOCX file, extract the document.xml and read the InnerText, like this:
using System;
using System.IO;
using System.Xml;
... your code to populate the DOCX filename here ...
using (ZipFile zip = ZipFile.Read(filename))
MemoryStream stream = new MemoryStream();
zip.Extract(@"word/document.xml", stream);
stream.Seek(0, SeekOrigin.Begin); // don't forget
XmlDocument xmldoc = new XmlDocument();
string PlainTextContent = xmldoc.DocumentElement.InnerText;
If you're using NET 3.0, the System.IO.Packaging.ZipPackage class is probably a better bet than the open source ZIP library for 2.0.

Now to do some reading on XLSX and PPTX formats...

Sunday 18 March 2007

Don't forget SQL Server 2005 'Compatibility Level' (ARITHABORT error on Xml query)

Working on a database recently that has been 'upgraded' from SQL Server 2000 to 2005. As part of the process, I added some 'Tools' stored procedures I commonly use (eg. ScriptDiagram).

For "some reason" that wasn't immediately obvious to me, I could execute these procs without trouble in SQL Server Management Studio BUT when I added some C# code that referenced a '2005-feature' stored procedure (such as using the new xml column.query syntax)...

SELECT failed because the following SET options have incorrect settings: 'ARITHABORT'. Verify that SET options are correct for use with indexed views and/or indexes on computed columns and/or query notifications and/or xml data type methods.

I started off trying to debug the C# code, thinking I'd done something wrong there... but then realised the Database Properties - Options - Compatibility Level must still be set to SQL Server 2000 (80). Changing that to SQL Server 2005 (90) got rid of the SELECT failed error... because I hadn't really 'upgraded' the database at all, just restored a 2000 backup into 2005.

Why the queries work in Studio still confuses me a little bit... but I'll worry about that some other time. Hope that helps someone - I didn't find anything useful on my first google for the error message.

Saturday 17 March 2007

Searcharoo C# search engine indexes Word, PDF and more...

The latest version of Searcharoo (4) is now available for download (and read about, of course) --
C# search engine: refactored to search Word, PDF and more.

The coolest feature - searching Microsoft Office docs (Word, Excel, PowerPoint) and Acrobat/PDF files - is based on someone else's article Using IFilter in C#.

Also put a bit of work into refactoring the code structure, layout, naming, etc. At least now it might pass a "code review" (if anyone cared to conduct one:)

The project now also has it's own website - the latest article isn't quite there yet (requires some formatting) but over time I'll post more frequently to that site; it takes forever to write, format and upload decent, CodeProject-quality work...