Introduction to basic arithmetic and comparison
In its simplest form, SQL may be used to selected objects for which an expression is true. What this means exactly will be handled in a later topic, but for now we will confine ourselves to uncontroversial arithmetic and string comparisons.
The available arithmetic operators are:
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
^ | Exponent (e.g. width^2, area^0.5) |
% | Modulus |
It is also possible to use ‘-‘ to negate a number, this is known as ‘unary minus’.
These arithmetic operators follow the normal operator precedence, with unary minus having highest precedence, then exponent, then division and multiplication, then finally addition and subtraction.
For example a * b + c means multiply a and b, then add c. You can use brackets to override these precedence rules e.g. a * (b + c) means add b and c then multiply by a. There is nothing to stop you from using brackets where they are not needed in order to clarify your intention. There is no harm in writing (a * b) + c instead of a * b + c.
You can also use the addition operator ‘+’ to join strings. For example, if user text 1 were ‘X12’ and user text 2 were ‘Y34’ then user_text_1 + user_text_2 would yield ‘X12Y34’.
None of the other operators listed above have any special meaning for strings. The behaviour if you do use one of these operators with two strings, or with a string and a number etc. will be described in detail in a later topic.
You can use comparison operators to compare values to see if they are equal, one is greater than the other etc.
The available comparison operators are:
= | Equality |
>= | Greater than or equal to |
> | Strictly greater than |
<= | Less than or equal to |
< | Strictly less than |
<> | Not equal to |
The comparison operators have a higher precedence than the arithmetic operators. This means that if you say something like a + b * c >= d + e * f , the left and right hand sides of the comparison operator are evaluated, then the comparison is done.
There are three things to note about string comparison:
- String comparison in InfoWorks WS Pro is always case insensitive.
- When using less than, greater than, greater than or equal to and less than or equal to, the behaviour will depend on the behaviour of the language in which you are using Windows.
- In InfoWorks WS Pro strings are always trimmed so that they have no leading or trailing spaces or ‘tab’ characters i.e. a string can never begin with a string or a tab.
Having described the arithmetic and comparison operators, the next question that will spring to mind is what can be manipulated with the arithmetic operators and what can be compared.
We will start by considering two things: fields and constants. Other things, including variables, will be explained in a later topic.
Fields can be found in the pull down lists on the SQL dialog. The fields fall into a number of categories:
- Fields in the current table e.g. node_id, us_node_id, x, y, width, height
- Results fields e.g. sim.
- Fields in joined tables. These fall into two distinct categories:
- One-to-one links – links where there can be only one object linked – e.g. the link from a link to its upstream node is a one-to-one link since any given link can only be linked to one (or zero) upstream node
- One-to-many links – links where there can be more than one object linked – e.g. the link from a node to links upstream of it is a one-to-many link since any given node can have zero, one, two or more links upstream of it.
- Fields in ‘structure blobs’ e.g. demand_by_category for nodes in InfoWorks WS Pro.
-
OID. The read only field name OID can be used to obtain the object ID. This can be used to get multi-part IDs for objects that have multi-part IDs, with the parts separated by '.'. The name OID is used to avoid a clash with those objects that have a field named 'id'.
As well as these four main categories, there are some special cases which will be described later in this paper.
The fields in the current table have the meaning you would expect. The others will be explained in later topics.
You may wonder what happens when you have selected more than one table in the pull-down list of tables e.g. ‘All Nodes’, ‘All Links’, and a given object is of a type that does not have a particular field e.g. if the object for which the query is being evaluated is a conduit and the field is specific to a flap valve. This will be explained in the next topic once the special value used has been described.
Constants fall into 5 types:
- Numbers – simply put the number in, with or without decimal points and the minus sign e.g. 123, -123, 123.45, -123.45.
- Strings – strings should begin and end with the single quote character ‘ e.g. ‘mynodeid’ or the double quote character e.g. “mynodeid”. If you begin a string with one of these you must end it with the same one. To include the single quote character within a string, use the double quotes at the beginning and the end of the string. To include the double quote character within a string, use the single quotes at the beginning and the end of the string .
- Boolean – The Boolean data type and the two constant values will be described later in this paper.
- Dates – Date constants begin and end with the # character. The rules are as follows:
- Behaviour depends on the ‘locale’ setting if your PC
- If you are in the UK or another country using dates of the form dd/mm/yyyy then dates should be entered #day/month/year# e.g. #31/1/2008#.
- If you are in the US or another country using dates of the form mm/dd/yyyy then dates should be entered #month/day/year# e.g. #1/31/2008#.
- If you are in Japan, China or another country using dates of the form yyyy/mm/dd then dates should be entered #2008/31/1# .
- It is possible, but not recommended to enter dates without using all 4 digits in the year (e.g. #1/31/08#). The behaviour of 2 digit years is determined by the regional and language options control panel – see http://support.microsoft.com/kb/214391 for details.
- It is also possible but also not recommended to enter dates of only two parts e.g. #1/12#. In both of these cases the underlying date library will resolve this as described in the above Microsoft knowledge base article.
- If you are using dates of the form dd/mm/yyyy or mm/dd/yyyy and you enter a date which is invalid in the form you are using, but valid in the other of these two forms, the date will be interpreted as the other e.g. if you are in the US and enter #31/1/2008#, this is an invalid date since there is no 31st month, however it would be valid as a UK date, so it is treated as being the 31st of January 2008
- 5. NULL – this is a special value, described in the next topic of this paper.
NULL field values and associated operators
In InfoWorks WS Pro, numerical fields can usually be blank. In SQL this is represented as a special value called ‘NULL’, and the fields are said to ‘be NULL’. You can see if fields are aren’t NULL by using the special constructs ‘IS NULL’ and ‘IS NOT NULL’ e.g. ground_level IS NULL, ground_level IS NOT NULL.
We are now in the position to answer the question posed above, as to what value is returned for objects that do not have a particular field when the query is being evaluated for a number of different types of object e.g. ‘All Links’. The answer is simply that the NULL value is returned for any object that does not have the field in question.
Note that fields of the Boolean data type, described below, are never treated as NULL from the point of view of the SQL.
Logical operators
You can combine the expressions built up so far with the logical operators AND, OR and NOT e.g.
x>10000 AND y>20000 will select nodes where the x coordinate is greater than 10000 AND the y coordinate is greater than 20000.
x>10000 AND y>20000 will select nodes where the x coordinate is greater than 10000 AND the y coordinate is greater than 20000.
As with all implementations of SQL and in common usage for programming languages ‘A or B’ means A is true, B is true OR both A and B are true. This is often referred to as ‘inclusive or’, and is slightly different from the way in which the word ‘or’ is sometimes used in day to day English. When the keyword NOT is used it will select objects for which the expression to which the NOT applies is not true.
The keyword AND has higher precedence than the keyword OR, so A OR B AND C OR D means the same thing as (A OR B) AND (C OR D). As with the other operators described above it is possible to override this precedence rule by using brackets, so you can say things like A OR (B AND C) OR D.
String operators
There are two operators which can be used to perform more sophisticated tests on strings than the tests described above. These operators are LIKE and MATCHES.
LIKE may be used to perform relatively simple tests on strings by testing for prefixes and a match in the number of characters in a string. This is done by using the special characters ? and *. ? matches any character * matches the rest of the string.
Any other character matches only that character itself – in common with the string comparisons described above this matching is case insensitive.
e.g.
node_id LIKE ‘MH12345’ will match only the string MH12345, since no special characters are included in the string on the right of the LIKE operator.
node_id LIKE ‘????????’ will match any 8 character node ID
node_id LIKE ‘MH??????’ will match any 8 character string beginning MH
node_id LIKE ‘??1????’ will match any node ID with 1 as the 3rd character
node_id LIKE ‘MH*’ will match any node ID beginning with MH
node_id LIKE ‘MH??*’ will match any node ID at least 4 characters long beginning with MH.
You cannot check for anything after * has been used in the pattern on the right of the LIKE operator, e.g. node_id LIKE ‘*01’cannot be used to match any node ID ending in 01. Similarly you cannot use * at the beginning and end to match something in the middle, so you cannot say node_id LIKE ‘*01*’ to match any node ID with 01 somewhere in the middle.
MATCHES may be used to perform complex tests on strings by seeing if they match regular expressions.
A full description of regular expressions falls outside the scope of this paper. Please note:
- The regular expression matching, in common with the string comparisons described above and the LIKE keyword, is case insensitive.
- The regular expression matches the whole of the string i.e. node_id MATCHES ‘[0-9]*’ will only match node_ids containing only digits.
Data types and type conversion rules
Data Types
The InfoWorks WS Pro databases contain data values of a number of types. These types are as follows:
- A number of numerical types. From the point of view of SQL the differences between these types may largely be ignored as the numerical calculations are all performed using the same numerical data type (double precision floating point numbers).
- String
- Boolean. The ‘Boolean’ data type is named after the British mathematician and philosopher George Boole (which is the reason the word is written with a capital letter at the beginning) and represents something that can either be true or false. In InfoWorks WS Pro, Boolean values are typically represented by check-boxes which are checked if the variable is true and unchecked if it is false. The two constant values for Boolean variables are ‘true’ and ‘false’.
- Boolean fields are never treated as NULL from the point of view of SQL.
- Date. This represents a date and time e.g. 12th January 2003 12:34. The time may or may not be relevant depending on the context in which it is being used.
All numerical calculations within the InfoWorks SQL engine are performed using the currently selected user units.
The following is a technical description of exactly how calculations are performed, with particular reference to what happens when you combine data types e.g. add a number to a string and then store the result in a Boolean field.
Mixing Data Types in Binary Operators
A binary operator is one that takes two values e.g +, -, *, AND, >=. The word binary is used here meaning ‘two’, in this context it has nothing to do with binary numbers.
When operations are performed with a mixture of data types the following rules are applied in order:
- If the operator is = and one of the two values being compared is NULL then this will return ‘true’ only if both of the values being compared is NULL, otherwise it will return ‘false’.
- If the operator is <> and one of the two values being compared is NULL, then this will return true if only one of the values being compared is NULL, if both are NULL then it will return ‘false’.
- If the operator is + and one of the two values is a string, then the other value will be converted to a string (see below) and the two strings concatenated.
- If the operator is not OR and one of the two values is NULL then the result will be NULL
- If the operator is OR and one of the two values is NULL, then other value is converted to a Boolean (see below) and that is the value of the expression i.e. if that value is ‘false’ then the result will be false, otherwise the result will be true.
The above are the only cases where an expression where one of the two values is NULL may not result in a result of NULL. This is in-line with ‘normal’ SQL behaviour. This is known as ‘3 valued logic’ because in a sense as well as ‘true’ and ‘false’ there is a value of NULL. The underlying theory behind this is that NULL represents an absent or unknown value, and therefore if you attempt to perform any operation that combines an absent or unknown value with a known value, the result is still absent or unknown.
The following descriptions assume that neither of rules 4 and 5 have applied and therefore neither of the values is NULL.
- If the operator is +, -, /, * or ^, then both values are converted to numbers (see below) and the operation performed.
- If the operator is OR (and neither of the values is NULL) then both values will be converted to Boolean values, the result of the expression will be ‘true’ if either expression is true.
- If the operator is AND then both values will be converted to Boolean values, the result of the expression will be ‘true’ if both of the expressions are true.
- If the operator is <, >, >=, <=, <> or = (and not covered by one of rules 1, 2 or 4 above) then the two values will be compared according to the following rules in order and the result will be set to the Boolean value true or false depending on the result.
- If either value is a date, then the other value will be converted to a date and a comparison performed.
- If either value is a string, then the other value will be converted to a string (see below) and a comparison performed.
- Otherwise, both values will be converted to numbers and a comparison performed.
- If the operator is LIKE or MATCHES, the expression will be false unless both values are strings. If they are strings, the expression will be true or false depending on the result of the appropriate test.
Conversion to strings
- Boolean values will be converted to the string ‘1’ if true, ‘0’ if false
- Numbers will be converted to strings using the standard method used throughout InfoWorks WS Pro, with any trailing zeros after the decimal point removed (e.g. if the number is 12.34000 it will have the final 3 zeros removed and the string will therefore be ‘12.34’). If there are no non-zero values after the decimal point, the decimal point will also be removed e.g. 123.00000 will have the trailing zeros removed, then will have the decimal point removed, therefore the resulting string will be ‘123’
- Dates will be converted to strings using the short form which will be dd/mm/yyyy, mm/dd/yyyy or yyyy/mm/dd depending on how Windows is set up on the PC on which the software is being run. In InfoWorks WS Pro, the time in the format hh:mm will be added to the end of the date.
Conversion to Boolean
For this purpose, the following are considered to be true:
- Any string with one or more characters in it
- Any date
- Any number other than zero
The following are considered to be false, and therefore if the expression is evaluated for an object and is found to be one of the following, the object will not be selected.
- NULL
- Any string with no characters in it
- The number zero
Conversion to numbers
- If you assign a value from a date field into a number then the number given will be a floating point number representing the number of days since the 30th December 1899. This is a representation commonly used in Microsoft software.
- Strings are converted to numbers if the string contains only a number, otherwise the value 0 will be used.
- If you convert a Boolean field to a number, the value will be 1 if it is true, 0 if it is false.
Unary operators
The unary operators behave as follows:
Unary minus will have the result of –1 * the number if it is a number, NULL otherwise.
NOT will convert the value it applies to a Boolean (see above) and then invert that value (i.e. if the value it applies to is false the resulting value will be true, and vice versa).
IS NULL will have the result ‘true’ if the value it applies to is NULL, false otherwise.
IS NOT NULL will have the result ‘false’ if the value it apples to is NULL, true otherwise.
Determining which objects to select based on SQL expressions
When an SQL expression is used to determine which objects to select, all the objects will be selected for which the expression is evaluated and the result considered to be true.
The same criteria will be applied as described in ‘conversion to Boolean’ above.
Simulation Results
Within InfoWorks WS Pro, if you have the results of a simulation loaded into the GeoPlan you are able to run SQL queries incorporating the results of that simulation.
These results will appear in the list box for the object type in the SQL dialog prefixed with sim.
It is also possible to load the results of a second simulation by using the appropriate menu item. The results for this simulation will then also appear in the list box for the object type in the SQL dialog prefixed with sim2.
These results will the results at the current timestep, or the maximum results if the time control has been set to show the maximum results. In InfoWorks WS Pro only it is also possible to use simulation results for all timesteps by using the prefixes tsr and tsr2, in which tsr stands for time series results.
One to one links
It is possible to include values for linked objects in SQL expressions. As described above these fall into two categories, at this stage we are only considering links where an object is linked to ONE other object (or possibly no object) of a particular type. If you are familiar with SQL in databases such as Microsoft Access, Oracle, MySQL etc. you may be familiar with this concept as being a ‘join’. However, whereas in these packages you will have to join the table to the other table by means of a view or the JOIN keyword, in InfoWorks WS Pro the joining is normally done for you automatically, taking advantage of the software packages knowledge of their own data structures. These are sometimes referred to as ‘implicit joins’.
The implicit one-to-one joins fall into two main categories:
- In InfoWorks WS Pro, joins between objects in the Network and associated objects in the Control.
- Joins between related objects in the same Network.
The implicit one-to-one joins within the same InfoWorks Network are as follows:
Customer Point | Link | pipe |
Customer Point | Node | node |
Incident Report | Link | pipe |
Link | Node | us_node |
Link | Node | ds_node |
Demand Polygon | Node | node |
As you can see, the ‘joined’ prefix has a number of different meanings depending on what is being joined. The ‘joined’ prefix is used in exactly those circumstances where the fields of the joined object appear in the grids in the software to the right of the object and in italics.