The
sieve of
Eratosthenes finds prime numbers by creating a list of numbers, taking the smallest number in this list that is not stroken out and striking out all of its multiples, then going on to the next number that is not stroken out and so on.
For example, if the starting list is (2, 3, 4, 5, 6, 7, 8, 9) (we do leave out 1, because it is not a prime), then the first number from the list is 2, and therefore 4, 6 and 8 cannot be primes and are removed from the list: 2 is prime, and the list is now (3, 5, 7, 9). Lowest number from the list is 3, and therefore 6 and 9 cannot be primes, leaving (5, 7) and so on.
We implement this with a TEMPORARY TABLE work which contains the numbers we want to work with, and a result table prim, in which we will store the found primes.
The basic framework for a stored procedure is
DROP PROCEDURE IF EXISTS primes;
DELIMITER //
CREATE PROCEDURE primes(IN count bigint)
LANGUAGE SQL
SQL SECURITY INVOKER
DETERMINISTIC
BEGIN
END;
//
DELIMITER ;
That is, first we drop the procedure we want to create, if it already exists. That way we will not get an error if it does not already exist, and we won't get an error when we want to redefine it. A stored procedure must be entered as a single command, but the individual commands in it are terminated with ";" as usual. Thus we have to change the system delimiter from the usual ";" to something else. I chose "//" here. All between DELIMITER // and the closing // is the procedure proper.
A stored procedure consists of a CREATE PROCEDURE statement that declares a procedure head and the procedure body which has to be a simple statement or a compound statement enclosed in BEGIN and END. The procedure head states the parameters of the procedure and the different options for it. We supply one parameter, which is an input parameter ("IN") named 'count' of the SQL type bigint. 'count' is the upper limit to which we want to search for primes, starting at 2.
Our stored procedure language will be SQL, and we declare it. Actually, we do not have to declare it, because the only language MySQL understands currently is SQL, and that's why we made it the default. :) But for compatibility with other database systems, and because it is good manners we include the language declaration anyway.
A stored procedure can run with the access permission of the user that called it, the INVOKER, or it can behave like a Unix program with Set User ID bit set. The latter is called DEFINER rights in SQL. Unfortunately, the default is SQL SECURITY DEFINER, which from a security point of view is like making any program in your Unix systems Set User ID by default, so it is good practice to declare SQL SECURTY INVOKER just to make sure no unintended privilege escalation happens.
For good measure we also declare the procedure as DETERMINISTIC, that is, having the same result every time it is being called. We should not do this if we use time or random functions or other things apart from call parameters that influence the result. The DETERMINSTIC hint can be useful for the optimizer in the future, but currently it is just being ignored.
Inside the procedure we are can be using SQL statements as we are used to them, local variables and parameters, session variables and procedural SQL extensions such as loops and conditions. We will see a few of them shortly.
Local variables and parameters can be used just like identifiers. They are not prefixed with @ or any other character. They must be declared so that the parser can decide if a name is a reserved word, a variable or a syntax error. Parameters are declared in the procedure head - they can be IN for input parameters, OUT for output parameters or INOUT if you want both. They also have a type, choose any of the SQL types you are familiar with.
Local variables also must be declared, using DECLARE statements just after the BEGIN statement. For example
DECLARE i bigint DEFAULT 2;
DECLARE current bigint DEFAULT 0;
will declare two variables, i and current, both of them of the type bigint, and will also provide default values of 2 respective 0 to them.
Another thing that must be declared is handlers for error conditions. We provide EXIT handlers, and CONTINUE handlers, which provide statements to handle the error and define an action that is to be taken afterwards: Either the procedure is aborted or continued after the error.
DECLARE EXIT HANDLER FOR NOT FOUND BEGIN END;
defines an action that is to be taken when a NOT FOUND condition is encountered (a SELECT statement is not returning any rows). The action is EXIT, and the handler is empty: Just a BEGIN-END block with nothing in it.
Now for the real work. We define a result table, called prim for the result primes.
DROP TABLE IF EXISTS prim;
CREATE TABLE prim ( p bigint NOT NULL PRIMARY KEY );
We also need a TEMPORARY table to work in it. Just for the heck of it, we are using the serial shorthand for "bigint not null primary key auto_increment" and the AUTO_INCREMENT option to the "CREATE TABLE" statement to have the counters start at 2.
DROP TABLE IF EXISTS work;
CREATE TEMPORARY TABLE work ( p serial ) AUTO_INCREMENT = 2;
We now need fill this table, with NULL values for the auto_increment to work. We do it using a WHILE-loop. Here is the SQL syntax for this:
WHILE i<=count
DO
INSERT INTO work ( p ) VALUES ( NULL );
SET i = i +1;
END WHILE;
The WHILE-loop in SQL is "WHILE condition DO ... END WHILE;". We refer to local variables such as i and parameter such as count just by name, not with an @-prefix as we do with session variables. Local variables and parameters have a scope, they lose validity after the END of the block they are defined in (as opposed to session variables, which have a session scope). In order to assign values to such variables you need a SET statement.
A different kind of loop is the REPEAT-loop. We use it to execute "find lowest number, save it as a prime, and strike it and all its multiples from the work table" multiple times. The syntax is "REPEAT statements UNTIL condition END REPEAT;". So we end up with
REPEAT
SELECT p INTO current FROM work ORDER BY p LIMIT 1;
INSERT INTO prim ( p ) VALUES ( current );
DELETE FROM work WHERE p%current = 0;
UNTIL ROW_COUNT() = 0 END REPEAT;
That is, we find the lowest number in work and remember it in current. If that statement does not find anything, our NOT FOUND handler kicks in and we are finished. We then insert this value into the result table prim, and delete it and all of its multiples from the work table. The repeat condition is actually never executed, but is still there for correctness.
Cleanup work at the end: We drop the work table.
All in all, we get
DROP PROCEDURE IF EXISTS primes;
DELIMITER //
CREATE PROCEDURE primes (IN count bigint)
LANGUAGE SQL
SQL SECURITY INVOKER
DETERMINISTIC
BEGIN
DECLARE i bigint DEFAULT 2;
DECLARE current bigint DEFAULT 0;
DECLARE EXIT HANDLER FOR NOT FOUND BEGIN END;
DROP TABLE IF EXISTS prim;
CREATE TABLE prim ( p bigint NOT NULL PRIMARY KEY );
DROP TABLE IF EXISTS work;
CREATE TEMPORARY TABLE work ( p serial ) AUTO_INCREMENT = 2;
WHILE i <= count
DO
INSERT INTO work ( p ) VALUES ( NULL );
SET i = i+1;
END WHILE;
REPEAT
SELECT p INTO current FROM work ORDER BY p LIMIT 1;
INSERT INTO prim ( p ) VALUES ( current );
DELETE FROM work WHERE p%current = 0;
UNTIL ROW_COUNT() = 0 END REPEAT;
DROP TABLE IF EXISTS work;
END;
//
DELIMITER ;