In one of our current projects, we have a feature which handles recurring donations for a non-profit organization. The recurrence could happen on some frequency of week or month, such as “every six week”, “every other month”, “every quarter”, etc.
To keep track of how often a donation is to be processed, we store both the “interval” and “frequency” in the associated database table. The “interval” would be either “week”, “month”, or “year”, while frequency would be an integer value. A frequency of “6” on a “month” interval would mean “every six months.”
To determine which donations need to be processed, we need to also know when they first began. A monthly donation which began on the 15th of January should be processed on the 15th of every month thereafter.
The trick comes in when you’re dealing with donations starting on the last day of the month. A monthly donation which begins on the 31st of January, should also be processed on the 28th of February (in a non-leap year), the 31st of March, and the 30th of April.
One possible solution we’ve seen is to retrieve all the records and use a library, such as the excellent IceCube Ruby library, to determine if a record occurs within the sequence and on the specified day.
This solution is suboptimal, not because of the library, but because every record must be retrieved and processed to determine the next occurrence. Ideally, we should only retrieve those records which need to be processed.
Here’s how we can do that in pure SQL.
Here’s the basic database table from which we’ll be retrieving data.
Table "public.donations"
Column | Type | Modifiers
---------------+-----------------------------+-------------------------------
person_id | integer | not null
amount | integer | not null
begin_on | timestamp without time zone | not null
end_on | timestamp without time zone |
interval | string | not null
frequency | integer | not null
We’re going to break things up between a weekly and a monthly query. Trust me, it’s easier this way.
SELECT *
FROM donations
WHERE interval = 'week'
AND ('2016-10-31'::date - begin_on::date)::int % (frequency * 7) = 0
There’s a bit of noise in the where
clause, but don’t be fooled, it’s simpler
than it looks. The ::date
and ::int
bits merely cast the data to those
particular types.
What we are doing is getting the number of days between the start date
(begin_on
) and the date we want to run this for (2016-10-31) and then using
the modulo operator (%
) to see if the frequency of weeks (frequency * 7
)
equals zero, i.e. if it lands on today.
Overall, pretty simple. The monthly query’s a bit more involved.
SELECT *
FROM donations
WHERE interval = 'month'
AND (begin_on::date + (round(('2016-10-31'::date - begin_on::date)::int / (30.4367 * frequency)) * frequency || ' months')::interval)::date = '2016-10-31'::date
Similar to the “weekly query” above, we’re limiting our results to only those with an interval of “month”.
To see if our target date lands n months from our start date, we can use
Postegres’ interval
feature. In our calculation, we…
('2016-10-31'::date - begin_on::date)::int
/ (30.4367 * frequency))
round(('2016-10-31'::date - begin_on::date)::int / (30.4367 * frequency)) * frequency || ' months'
::interval
function to add to the
begin_on date(begin_on::date + (...)::interval)::date
(begin_on::date + (...)::interval)::date = '2016-10-31'::date
This solution isn’t perfect. If you didn’t already know, when you perform a database function or calculation within the “where clause” of a SQL statement, the database can no longer retrieve data merely by following indexed fields or other algorithms, but instead performs the calculations on the data while retrieving it to determine if it meets the constraints (in our case, comparing it to a date).
How is this any different than performing these calculations outside the database?
I had a lot of fun figuring this solution out. If it helps you, drop us a line and let us know.