Wednesday, 30 June 2004

VS.NET 2005 : Why is MS Hard Coding?

I've Seen the Future and It's Hard Coded talks about the new /Data, /Code and /Themes folders in ASP.NET 2.0

I hadn't really thought that much about it, but it is unfortunate that MS hasn't made the foldernames 'more unique' or configurable. If an ASP.NET 1.1 website uses folders with those names, you may have to update your data/links/etc... although I wonder if it wouldn't be too hard to write an HttpHandler to intercept requests for "your" /Code path and process it using HttpContext.Current.RewritePath and an alternate directory-name like /MyCode

Hmmm - planning to download Express soon, so I might give that a try...

Word stemming for Search

Version 2 of Searcharoo is now online (although still in final draft form) and I'm starting to think about the version 3.

The first priority is building a database and/or file-based persistance layer, but other more advanced search-type technology is also on my mind, including STEMMING :

What is Stemming?

Paice/Husk stemmer modifications by Antonio Zamora

Limitations of Stemming

and for Japanese text (which I'm also interested in)...

A WWW JAPANESE DICTIONARY and article 2

and

The Challenges of Intelligent Japanese Searching

Saturday, 26 June 2004

JoelOnSoftware gets flamed

SecretGeek writes How Microsoft Lost the Joel War and it's pretty good too.

TODO Driven Development beats eXtreme Programming and TestDriven Development hands-down :-)

Friday, 25 June 2004

Open-source .NET search engine : Nata1

The Homepage of Nata1 is now 'live' and contains an article about implementing a .NET search engine using either custom data structures, Index Server or Google.

The search input and results are highly customizable using templated controls that have some VS designer support, and you can use Google on your site without having to deal with their API.

Interesting 'search' project to watch...

Thursday, 24 June 2004

Writing Language-Portable Transact-SQL...

Writing Language-Portable Transact-SQL... and no it doesn't mean C#, VB or JScript - this article actually discusses some of the more esoteric issues relating to SQL Server and multilingual data (you know, languages other than English) including interpreting dates, Unicode fields and 'collation'.

Useful if you are unsure about how to approach these issues in SQL - although it does make it sound more complex than it really is (for simple applications)... Some date handling should just be common sense by now!

Wednesday, 23 June 2004

CassiniEx Web Server

I think Whidbey (VS.NET05) has a version of Cassini built-in as the 'development server' -- but this looks pretty cool and is available "now"...

CassiniEx Web Server is an enhanced open-source version of Microsoft's cut-down C# web server. I currently start 2 or 3 Cassini sessions (port 8081, 8082, 8083...) on my XP Home machine when I'm developing/debugging with #develop and WebMatrix. Hoping that CassiniEx will make my XP Home development environment a whole lot easier to manage... it's downloading now...

OT: Parsing CSV files using the ODBC Text driver could come in really handy - although I'm not quite sure for what, yet.

Sunday, 20 June 2004

Unicode and multilingual support in HTML...

Don't know how I've not come across Unicode and multilingual support in HTML, fonts, Web browsers and other applications before.

Definitely a great addition to my Localization and Globalization info.

OT: Found my first commercial use for ExtendedHtmlUtility.HtmlEncode() today: a client's website is hosted on an ISP's Apache Server - configured to ALWAYS set the HTTP Header Content-Type: Shift_JIS. This was making it impossible to serve Korean and Chinese pages from this server, since W3C says
To sum up, conforming user agents must observe the following priorities when determining a document's character encoding (from highest priority to lowest):
  1. An HTTP "charset" parameter in a "Content-Type" field.
  2. A META declaration with "http-equiv" set to "Content-Type" and a value set for "charset".
  3. The charset attribute set on an element that designates an external resource.

Which means the browse (IE, Firefox, Netscape, etc.) will ALWAYS think the page is Shift_JIS (Japanese) and not display Korean or Chinese text correctly!

By converting ALL the non-ASCII (well, all non-Shift-JIS actually) characters into Html Entities (eg. Ӓ) the page will be successfully displayed in Korean or Chinese with the encoding set to Shift_JIS (because [ & # 1-9 ; ] are all valid Shift_JIS characters, and once they're resolved into their Unicode characters, the browser is happy to display them using whatever font-settings (or mappings) it knows about, regardless of the actual page encoding!.

It's not ideal, but at least it works - even in Netscape 4.7 (as long as you have specified the correct fonts, because we all know how dumb NS4 is at font substitution). I suspect if the pages had any 'text' within Javascript strings/variables/etc that would have caused a problem... Luckily not (this time).

Wednesday, 9 June 2004

UrlEncode vs. HtmlEncode

In an earlier blog about Html Entities and Unicode I touched on the HtmlEncode() method.

This page UrlEncode vs. HtmlEncode contains the complete framework HtmlEncode() method using Reflector. It clarifies exactly what to expect from the method, and confirms that HtmlEncode() only converts a subset of Unicode characters into Html Entities (eg. Ӓ).

OT: I really like the look of this site, too.

Visual Source Safe vs Visual Studio Team System

It seems like most Microsoft products have gone from version 3 to version 11 in the same timeframe as VSS has stumbled from 6.0a to 6.0f...

So it's no surprise that there has been a lot of excitement about the news of a NEW version of VSS; and also of some cool 'Team' features in VS.NET2005... Imagine the confusion, then, as it becomes clear these are not the same product!

Visual Studio Team System is a good place to start reading about VSTS -- but not necessarily VSS. The MSDN Team System page has a good overview.

I like this quote "So VSS is to Hatteras[Team System] as Access is to SQL Server" (from somewhere here I think) except VSS doesn't have a cut-down, database driven back-end that easily migrates/scales to VSTS, like Access → SQL (??).

The Microsoft Visual SourceSafe Roadmap explains a bit more about SourceSafe 2005 -- basically sounds like the version upgrade we wished for around 2000... hard to see it lasting to the post-Whidbey VisualStudio release (i.e. beyond 2006).

Sunday, 6 June 2004

User input - who puts comma's in a number, anyway?

My users keep finding little issues with out 'Site Admin' pages - they want to do crazy things like type large numbers with 'thousands' seperators and/or decimail points, ie. 12,500 and 3,500.99

Of course, in my testing I only type 1233465 or 99999 so I never see the Exceptions thrown by their "bad" input.

This page is a handy reference for all things format-related, including both ToString() output as well as parsing inputs : Inside C#, Second Edition: String Handling and Regular Expressions Part 1.

The answer to my problem is here too - Int32.Parse(string, NumberStyles) with the System.Globalization.NumberStyles enum to pick and choose what inputs are acceptable. You might still want a try/catch, or some Regex Validation on the client side, just to make sure...

Thursday, 3 June 2004

.NET tools

Came across two useful-sounding tools today

QuickCode.NET for speeding up creation of code constructs within VS.NET (eg. type "prop int test" and it will expand to a complete Property get/set definition)... and you can create your own shortcuts+templates.

log4net looks waaay better then the dodgy Trace.WriteLine/TraceListener setup I'm currently using... being able to set hierarchical log targets across a number of different 'formats' (EventLog, file, email, etc).

In the process of installing them both now...

Wednesday, 2 June 2004

CAPTCHA (Completely Automated Public Turing Test to Tell Computers and Humans Apart)

You're not alone if you haven't heard that CAPTCHA acronym before - neither had I.

However I have come across them many times - Network Solutions requires you to pass a CAPTCHA test before every "whois" query, and many other sites (particularly 'free' services) are not asking new users to 'verify' themselves via CAPTCHA...

So what is it? A combination image & form input which requires you to 'decipher' some distorted/disguised/obsfucated text from the image and type it into the browser. It's a LOT easier for people to read these images than it is for automated bots to OCR or otherwise guess them - thus preventing automated/fraudulent use of your website/resources!

Interesting, eh? This article 15 Seconds : Fighting Spambots with .NET and AI is a great intro, and includes some interesting examples and Visual Basic.NET sample code. See the CAPTCHA references too.

Automating Excel with C# II - unexpected dialogs

Opening files from C# occasionally results in the app (Excel, for example) opening an unexpected dialog box (such as "Name Conflict - Name cannot be the same as a built-in name. Old name: Print_Area New Name: ________".)
AFAICT there's no parameter in the Application.Workbooks.Open() method that I can use to auto-dismiss this dialog, so I need to address it some other way.

A simple answer appears to be the
SendKeys Class
which I 'discovered' via this MSDN article: HOWTO: Dismiss a Dialog Box Displayed by an Office Application with Visual Basic.

I haven't actually implemented it yet (and even if i do send an 'ESCAPE' sequence to cancel the dialog, I still won't be able to open the file. BUT at least it won't hang/wait forever for user input that is never coming. Plus, (hopefully) sendint 'ESCAPE' will also deal with some other (as yet unknown) dialogs that might appear...