Author Archive
Count String Occurrences Using SQL
Suppose you want to know how many times the string str occurs in a text blob. No predefined function exusts in SQL for such a task, and doing it with loops is costly, especially if you want to rank records according to the number of occurrences for each, which would be next to impossible.
First let’s look at the context. Suppose you want to keep track of registered users who viewed certain records. Creating a new row in the database each time someone views a record would balloon the table in no time, especially for a large number of users and records.
A common practice today is to keep a running sums table structure with a string blob column to which you keep appending context related data and increase the sum. In this case, you would have a “views” column for each record to store the number of times the record has been viewed in total (not needed in this case), as well as the text blob which would store user names of those who viewed the record, all as one big string, appending the user name each time the user views the record.
This is practical since the number of rows in the table stays the same as each record has one and only one row. However, it gets tricky if you want to know how many times a particular user viewed the record. The user’s ID has been appended numerous times to the blob field, but the blob is one big string, so how would you know how many times that particular ID appears in the blob?
As I said before, no SQL function exists to deal with this problem, but there is a workaround. In simple terms, you retrieve the blob, measure its length, replace all occurences of the needed ID with an empty string, measure the length of the new blob, subtract it from the original length, and finally divide the difference by the number of digits in the ID.
SQL: ( LENGTH(blob) - LENGTH( REPLACE(blob, ‘ID’, ”) ) ) / LENGTH(ID)
If you want to sort the table according to the number of views, you add a couple of elements to get
… ( LENGTH(blob) - LENGTH( REPLACE(blob, ‘ID’, ”) ) ) / LENGTH(ID) AS param … ORDER BY param [DESC]
and if you want to output the numbers, you’ll need to wrap this in a ROUND() function to eliminate the decimal zeros.
ROUND( ( LENGTH(blob) - LENGTH( REPLACE(blob, ‘ID’, ”) ) ) / LENGTH(ID) )
The disadvantage of retrieving all the blobs from the database is still worth it, especially in the long run, since otherwise your table will keep on growing. One way to speed up the query is to extract blobs that have at least one instance of the desired ID. To do that, you ad a LIKE ‘%ID%’ in the WHERE clause, which will eliminate the irrelevant records.
No commentsThey Did It, But It’s Our Fault?
There is a reason management often distrusts the IT staff and their budget demands. We complain about how managers don’t understand what’s going on and are unappreciative of the work developers put into any given project, but that’s all thanks to a few who ruin it for all of us. Here is a perfect example:
We are currently working on the development of a massive information system for a research institution, part of which will be extracting and manipulating data from other systems already built and currently managed by various government agencies.
During the design stage, I received the documentation for the systems in place so we could put together a data transfer mechanism. In simple terms, the documentation stated that one record can have one instance in one year. Period. Example: Bob paid $10 in 2008.
We built the module to extract the data, which took considerable time and resources, tested it on some sample records provided, and were hoping to move on to other things after the first data collection. However, the collector crashed 7 hours into the run. It turns out there are exceptions to the rule. Bob might have paid the $10, but he then could have paid another $12, both in 2008. The problem was that the developers decided not to handle it, so the system wouldn’t allow it’s users to enter two records for the same year.
It is in our human nature to scramble for a creative solution to any given problem, and the more creative (read: MacGyver) it is, the better we feel about ourselves when we succeed. In this case, the users decided to rename the entity for the second record, calling Bob “Bobby” for the $12 transaction and technically creating a new entity. I can figure out that Bob is Bobby, but our data collector can’t. In our case the architecture was flawed and it was my job to explain to the client that we just burned a lot of money on something that is now a virtual paper weight.
The whole fiasco wasn’t our fault, as we went by the documentation and data samples provided, but blaming it on the previous guy doesn’t fly in the corporate world. I of course took the heat and went back to the drawing board, thanking my lucky stars that we weren’t sacked, but I’m pretty confident that our client will think three times before investing any more money into future IT projects, all thanks to a few imbeciles who didn’t bother or had a deadline they had to make at any cost, which in this case turned out to be steep.
No commentsThe World of Software Should Vote Obama
After a week of rest and relaxation in the form of work and studies, I am back with a message. The U.S. elections are about to happen and anyone related to IT should pay attention. The economy is obviously the number one topic on everyone’s mind, and people are losing jobs all over the place, but there is a bright future ahead under Obama if you are in software development or a related field.
How come? Well, Obama’s economic plan, especially with regards to health care, is to cut cost by introducing information systems to manage record keeping. This means that thousands and thousands of computer systems, large and small, will need to be built and maintained in the near future if Obama wins.
Since most health care facilities are private in the States, the systems will be build and maintained through contractual work, and that is good news for developer firms and their employees, so get out there and vote.
No commentsWhy Niche Marketing? Because Me Myself and I Matter
I often times see businesses kick off their online presence by trying to go after the mainstay giants. A local chain of bookstores would come out with a website and go after Amazon and Chapters because they think that since their web pages are prettier, people will flock like a bunch of banks on a government bailout. I previously wrote about the importance of niches for search engine optimization and marketing in general, but today I want to explore a different angle.
When visiting a website, you often times see personalized gimmicks being used to get your attention. If you are from Montreal, for example, you could see an ad about something going on in the city, but the ad is on a Brazilian travel agency website. Your location got traced and the ad got spiced up to make you feel special. Google’s massive pay-per-click money pit is based on such personalization, which is niche marketing at its best.
People want to feel special and unique, but if your marketing strategy is to go after the general public, then you miss the boat. For example, Microsoft releases multiple versions of Windows for a reason. One is basic, one is for business, one is professional (whatever that means), another one is for entertainment and media, etc. They could have just made one Windows that encompasses all of these, and it would have worked just fine, but they wanted to make the business people feel special by giving them “their own” version, and the same applies to people who use their computers for games and movies, and so on.
Niche marketing captures this concept. Here is an example from the user point of view. If you are looking for an apartment for rent, you would go to a site such as apartment.com, start digging for your hometown, and go through the ads. The problem with that is when you start, you are treated like a sand grain on the beach. However, if you visit a local website with apartment listings only for your city, you feel a little warmer inside.
If I am planning on releasing an apartment listings website, I would be better releasing a number of them, each for a different city, and focus my marketing on the different geographic niches, rather than releasing one countrywide generic site and going after apartment.com and other big name mainstays.
You see this phenomenon everywhere you look. Car dealerships let you customize your car to make it “you”, grocery shops are stacked with variety like never before (see if you can count the number of chip flavors out there), and so on. The fact of the matter is if you go generic, you will flame out unless you have an established brand name (Coke?).
If you focus on niches, you increase your “special” factor and along with it your marketing effectiveness.
No commentsGoogle Shmoogle. Let’s Talk Yahoo!
Everyone knows Yahoo! lost the search engine battle royal and now Google is just running up the score. We all seem to be in awe of the colorful logo, but Yahoo! is trying hard to get people’s attention, and I think they have some substance to offer.
Y!’s search engine results suck. Unless you are looking for a major brand name, you are most likely to be disappointed. Believe it or not, I am and always have been a Yahoo! supporter. I even stuck by their search engine for as long as I could, but infrequent trips to Google finally convinced me to convert. However, Yahoo! isn’t simply a search engine, which by the way is used to fund all of their other fun toys that I plan on talking about. If you are a developer or an online business owner, you might want to do some Y! exploring. There exists a world beyond Google, it seems. This is more surprising than finding out that we erupted from volcanoes.
These days developers do less coding and more drag & dropping, copy pasting, and so on. The mouse gets more coding done than the keyboard, so with this spirit in mind, you should take a look at Yahoo! Widgets. There is a lot of garbage in there, but if you learn to get around, you can find some really fancy stuff that could be useful and isn’t difficult to implement.
What I am really excited about is the upcoming Web Analytics tool. It looks like it could make Google’s version look like a school project, but right now it is only available to people hosting e-commerce websites with Yahoo!, which makes me feel all warm and fuzzy inside. This tool can give Yahoo! quite a boost, so why not release it to everyone as a beta version a la Google? By the time it finally gets released to the general public, the buzz will wear itself out.
Another nice offering is the SearchMonkey. While Yahoo! holds roughly 16% of the search engine traffic market compared to Google’s 70%, you have to consider the sheer numbers. The 16% is still more than 40 million daily users, so why not take advantage? SearchMonkey lets you pretty up your search result listing when it gets displayed in Yahoo!, and since people often scour search engine results before visiting a website, this works well towards a good first impression.
Here is some additional stuff you might want to check out:
- Developer Network: If you are into this kind of thing.
- Fire Eagle and GeoPlanet: Let’s you offer up services according to the geographical location of your users. There is good value in it for small business owners who can’t compete with the big mainstays and must focus their attention on small market niches.
- The OpenID Project, which I am greatly in favor of, let’s people use a unique login across all OpenID participants.
Being second banana to Google puts the “who?” in Yahoo!, but big Y has some goodies to offer if only you are willing to pay it some attention.
No commentsHow To Set Up an Online Business Presence
Everyone is doing it, so you decided to do it too, and why not? If everyone is getting a website for their business, then it’s logical to think that the benefits are there. However, most either do a half-buttocks job due to a half-buttocks commitment, or don’t get involved altogether due to either fear of the unknown or the apparent massive personal investment required.
While putting up a website for your business is a daunting task, as it should be, it doesn’t mean you can’t handle it. Also, going about it the right way can save you time and money, but going about it the wrong way can really hurt your business.
The benefits of having a good website are obvious, so we go all in.
First, you need to know what you are trying to accomplish with your online presence. You could start selling a product or simply describe your services to potential customers. Find and analyze a few websites in your industry and see what they are doing, what you like about their approach, and how it can apply to your initiative. Develop a list of functionalities you want your future website visitors to benefit from, but if you are lost on the subject, don’t worry and do your best.
Now comes the time to find someone to actually design the damn thing. This is the single most important step in the process, especially if the territory is new to you. You must pick not only someone who can design the site, but who could also help you pinpoint your needs. If you got lost in the previous step, this is where you must find your path, so whomever you pick must have some experience in online business. This is important.
You have a choice between picking a particular designer or a company to do the work. I would recommend the latter, but I might be biased. However, a good firm has tons of knowledge, experience, and resources due to the number of heads. Like the [altered] saying goes: more heads are better than one.
One important issue to keep in mind throughout the process is that if you keep the developer’s cost down, you will keep your cost down. For example, I previously had clients in need of a simple $500 brochure website, but they insisted on meeting with me and my web and graphic designers when a phone conference would have sufficed. If I have to get my people to sit in a room for two hours, the cost of it will be integrated into the development cost. A lot of people will tell you that it’s alright to torture the designers until you get what you want, but they are people too, and the more time they spend working, even if it’s for a pointless meeting, the more money it will cost you.
Make sure you have a mutual agreement on the work to be done, and get it in writing. It’s a protection blanket for both sides. It should include costs, dates of delivery, and terms of support. Also, be sure to discuss the issue of registering a domain name and securing a hosting plan, both under your name. If you are unsure on how to proceed, discuss it with the designers. This is another reason why it is vital to pick someone who can guide you through the unknown.
Finally, don’t dump the project on the developers and forget about it until delivery date. You will surely have decisions to make, and you better make them or else the developers will. You know more about your business than they do, so it is in your best interest to have a presence.
Follow these steps and you too can reap the benefits of an online presence. Man, do I sound like an infomercial or what?
No commentsHow to Write Good Copy for Your Website
With the recent news that directory submissions of a website are now pointless for SEO, it is more important than ever to focus on your content. This means that you need to keep your visitors intrigued with what’s written on you web pages, but to do that you must understand what is good copy and what isn’t.
The ability to write good copy comes from the understanding of the way we humans look at what’s in front of us. The best way to demonstrate this is with an example. Suppose you write up a nice looking paragraph that starts off as follows:
According to various scientific sources, dinosaurs died due to a meteor striking our planet some 65 million years ago.
And on it went. However, a visitor of your website sees the following:
According to various scientific sources, dinosaurs died …?
Why? Because they are only willing to spend a couple of seconds to read the first few words and make the decision on spending the next five minutes reading the rest or moving on to something more interesting. There is no way you can make a sound decision from the given excerpt whether the text is worth reading or not, but we as people trust our instinct for better or for worst.
Simply observe your own behavior. Have you ever started reading something, didn’t have the time to read the whole thing, but decided to do it anyway, as if your eyes were a fly on fresh cow manure? Just couldn’t let it go. Case in point: the first 5 or 6 words of every paragraph matter, so invest a good amount of time in them.
Another helpful self-study would be to monitor how you go through a newspaper. Do you read everything to try and find something you like? I don’t think so. Heck, we are so lazy we barely read the title, instead focusing on a few words in it to make the decision on spending our next five minutes. Case in point: write up some good headings for your content, and make them a little catchy and intriguing. It doesn’t have to be “SHE’S PREGNANT!” tabloid magazine style, but them folks are onto something.
The next tip to keep in mind is to write short paragraphs. Long paragraphs tend to appear as pains in rear to read, so people stay away. Organize your text into segments of 200 words or less, and the whole thing will appear easier to get through. Now, coincidentally, onto the next paragraph.
This is an obvious one, but a small mishap can cost you big time. If you guessed spelling, congrats. One spelling mistake can “kill the vibe” of the reader. You are focused and into whatever you are reading, and suddenly you come across a misspelled worrd, you snap back to reality. You check out the word to make sure you didn’t just imagine the fault, but it’s there. The author loses credibility in your eyes and the remainder of the text will be heavily scrutinized instead of being consumed for knowledge. Never stop editing your content, even after it is released into the wild, and yes, I misspelled on purpose. If you didn’t pick it up, never mind.
The final important point I want to make is you should know your reader. If your website is for rocket scientists, you can go nuts with the technical jargon and fancy wordings, but if your readers aren’t as linguistically gifted, adjust your copy accordingly. This aspect in itself can be split into a list of topics, but I won’t get into 6th grade English class.
However, here are a few basic tips. Keep your sentences short, but don’t overdo it. Your text will suck. People won’t like it. It will sound choppy. Read it out loud. You should be able to tell if you went over the line as I just did.
You can use little tricks to give emphasis to important points. You can really show the importance of something or other in your website copy. I can’t come up with any from the top of my head right now.
Try to stay away from the “we are awesome” way of writing. People hate being presented with promo blabbering, so keep it straight. The people making up television ads don’t get it, but that doesn’t mean you can’t.
And to finish with the cherry on top of the icing on the cake, since we are writing copy for a website, we want people to find it on Google, right? If you want search engine traffic to come your way, focus on niches. I recently posted a simple tutorial on how to find your keyword niche. Go through it, find some keywords that match your topic and are weak on search engine rankings competition, and incorporate them into your copy. Make sure that your incorporations make sense, otherwise it’s pointless. Always write knowing that the copy is for website visitors whom you are trying to convince to at least return for a second visit. Class dismissed.
1 commentDirectory Submission as an SEO Tactic
I came across an interesting blog post on the value of submitting your website to directories. This is an interesting turn of events since directory submission has been an integral part of SEO for a while now. It’s common knowledge that submissions to sleazeball directories will get you nowhere, but to find out that DMOZ is almost as useless is a tough pill to swallow. As of October 2nd, Google officially began the distribution of that pill.
Directory submissions are going the way of meta tags when it comes to the evolution of SEO. When meta tags were relevant, people squeezed all of its juice to get good rankings, and search results were losing relevancy. When the directory submission tactic came onto the scene, the same phenomenon began there too. This is the effect of people wanting the quick fix instead of focusing on content.
I still think getting listed on DMOZ or Yahoo Directory is a good thing. It can get your website crawled more often and you might get some traffic from it, but a link from DMOZ is next to impossible to get these days, and the price tag from Yahoo is a little steep for the return you’re getting.
There are also people out there who do focus on content, just not their own. I see some of my posts reprinted on web pages stuffed with Google ads and flashing banners. There is a link to the original post, but who are they kidding? I hope the next segment of evolution in SEO somehow takes care of that too.
No commentsFind Your Google Niche
This tutorial is a little follow-up on my previous post about SEO. If you decided that your business would benefit from a website, then I think it’s important for you to get a high return on your investment, but you can never tell by just looking at the way most people approach the issue.
A good idea would be for the website to do well in search engine rankings, but everybody wants that cake, but only a few can eat it too. There are fish out there you simply cannot fry, so you have to see what else is there for dinner. Alright, enough with the food idioms and lets move on to the meat of the problem. OK, seriously now…
I am assuming you aren’t an SEO expert, as most business people aren’t, but that doesn’t mean you can’t take a few baby steps. The best way I thought I can get through this post is with a concrete example, so adjust it to your particular situation accordingly.
The point of this tutorial is to find a way to get to the top of Google’s search results, and to do that we will need to use the Google AdWords Keyword Tool. As an example I will be using a Chinese food restaurant named Zuzu as I did before, and we are again located in Montreal, Canada.
Looking up “chinese restaurant” you see that the competition level is high for that exact wording, but you have a long list of alternative keyword combinations with various search volume and competition levels. Scrolling down you can see the keywords “new chinese restaurant” present no competition and have over 14000 average monthly searches. Coincidentally, this keyword set is a perfect match for Zuzu, so the website can be optimized accordingly.
The next step is for us to get more specific. Looking up “montreal chinese restaurant” returns quite feeble search numbers, but this is the crowd you want the most, since they are looking for exactly what you are offering. Be sure to have the “Montreal” keyword sprinkled in your website and links.
Besides the obvious keywords, you might want to explore additional combinations on a related topic. If you look up “chinese cuisine”, you can see that the search numbers don’t merit adjusting your site, but scrolling down to the “Additional keywords to consider” you can pick out “eat chinese” and “chinese eating” with a total of over 14000 monthly searches and zero competition. Well, you know what to do with that, and as you can see by the lack of smoothness of the post, I’m doing it as I’m writing it.
Keep on searching relevant terms, and put in the necessary time to do a thorough job. All of this has to be done before you put your website together, unless you don’t mind the extra post-release spending. The competition bar is for pay-per-click advertisements, but it’s a good indicator of what your website would be up against in the search results. In Zuzu’s case it’s painfully obvious that “chinese restaurant” is out of the question with the brutal competition already out there, which is why you must focus on a smaller target.
No commentsDealing with Database Coupling
The business world runs on information systems, and yet we just can’t get it right when it comes to building them. We learn about modularity and coupling at an early age, but to most of us the notion appears to apply only to coding, so when a database comes into the mix, we don’t think twice about the fact that all of our modules will be dependent on one central storage structure.
Here’s a hypothetical example. Say a startup supplier company XYZ built an information system to handle customer information, inventory, financial reporting, and payroll. All of these components are modules of the system, but due to the lazy nature of developers (why do you think we strive to get computers to do everything for us?) the data storage is handled by one central database. A while later additional modules are built to satisfy new business requirements as the company expands, and these modules further grow the database structure.
Now suppose a small change is introduced into the schema. Let’s say that the phone number field got expanded because the firm now deals with customers overseas. This tiny change will cost the company gigantic money. Why? Because the source code (both business and presentation logic) in all modules is now in need of attention, the data storage and retrieval procedures require changes, all of the existing automated tests are now bogus, and the documentation needs an update. You must now go through everything you have in order to find all instances where the structural change affects the system. All of this due to one small modification. If the initial schema was badly made, expect the maintenance cost of the perfectly modulated system to skyrocket.
It is surprising to me that most systems built today still go with the central data management module. I guess we never learn, but if we did, how would one avoid database coupling? There are a few ways to deal with it.
You can’t effectively “modulate” a database schema by simply creating multiple databases for the system modules. The complexity of such a system would be enormous since any particular module will need to work with several databases at once, unless you duplicate the information, in which case you must ensure coherence among the copies. All in all, it would be a gigantic mess.
One solution would be to put together multiple databases without data duplication, and cap it with a central data retrieval system. All storage and retrieval commands would have to go through this module, after which they will be spread through the databases. A module can be built to work with the database it needs, so changes to other database structures won’t affect it. Any changes will require tweaking the storage and retrieval module as well as the source code of the module affected. It isn’t a perfect solution, but it isn’t the apocalypse version.
Another way would be to use a central database and a number of local data storages. A module can have a minimal duplicate data set stored locally for frequent operations and the main database would include a trigger to update the local data set if it is updated in central storage, and vice versa. This is a good solution if the various system components mostly use unique parts of the database for standard operation.
My final proposition, the wildcard, is to use an entity-attribute-value database schema. If done right, this type of implementation is very forgiving when it comes to change. Since there is no “final” schema at any point of its existence, the modules are built in accordance with this constraint, so changes are absorbed much more easily. The cost of developing a system with an EAV model is usually higher, at least it was for the two projects I directed, but if you anticipate constant evolution of your model, the future savings are worth it.
If you are already stuck in a coupling mess, the best you can do is make sure additional modules are built using one of the techniques mentioned above, and don’t forget to learn from your mistakes.
No comments