Compile-time factorization

It is well known that C++ templates are Turing-complete and therefore any algorithm can be theoretically implemented to be executed at compile-time. But what about reality? How far the current available compilers can go? In this article I’d like to discover the limits of compilers, but before that I will explain a relatively simple and suitable factorization algorithm.

Yes, it is about decomposition of an integer N into its prime factors at compile-time. A list of prime factors is not available at compile-time and a generation of such a list is rather unnecessary. Let’s go a simpler way starting with a trial division. In worst case, what is needed, is to test all the integers less or equal M, where M^2<=N. Such an algorithm is feasible even at compile-time, but can involve many template instantiations. Reducing the number of instantiations, a bigger N can be factorize. Let’s see how far that can work.

The idea

Evidently, once trying to divide by 2, it is useless to divide by any number containing 2 as a factor. The same is valid for the factor 3 and so on. So, instead of having the list of primes, it better to generate a larger list of numbers using a simple formula that would contain all the primes. The first trial factor is 2. Then the first formula is 2k+1, k = 1,2,…. It returns odd numbers and reduces the amount of all trial integers by 50%. Setting k=1,2 delivers the next trial factors:3 and 5. The next formula eliminating the factor 3 is 2*3*k + r = 6*k + r. The remainder r equals 1 or 5, otherwise the numbers generated by the formula will be divisible by 2 or 3. The second formula contains only 1/3 of all integers. With the third formula, let’s generalize. It is 2*3*5*k + r = 30*k + r, where r is 1 or any number in the range (6,30) not divisible by 2, 3 and 5, i.e. r = 1, 7, 11, 13, 17, 19, 23, 29. So, r is 1 or a prime number smaller than 30, excepting 2,3,5. However, for the next formula 210*k+r, the list of reminders r will also contain a couple of composite numbers divisible by 11, 13 and 17 and totally 48 reminders. The latter formula 210*k+r represents 48/210 (less than 23%) of all integers and contains all the prime factors. Such formulas can be seen as the sieve of Eratosthenes in a functional way – exactly what is needed in template metaprogramming.

Preliminaries – compile-time arrays

My original implementation goes back before C++11 and is based on Loki::Typelist by Andrey Alexandrescu (book Modern C++ design). Those compile-time arrays can be used to save the reminders of the formulas as well as the final factorization. In C++11, the variadic templates becomes handy and can be used as compile-time arrays too. What is need is a holder typelist. Here I follow the notation of Eric Niebler. A natural operation on the arrays is insertion of new entries from both sides as well as concatenation of two arrays

template<typename ...List> struct typelist;

template <typename List1, typename List2> struct typelist_cat;

template <typename ...List1, typename ...List2>
struct typelist_cat<typelist<List1...>, typelist<List2...>>
{
  using type = typelist<List1..., List2...>;
};
template <typename T, typename ...List2>
struct typelist_cat<T, typelist<List2...>>
{
  using type = typelist<T, List2...>;
};
template <typename T, typename ...List1>
struct typelist_cat<typelist<List1...>,T>
{
  using type = typelist<List1... ,T>;
};

template<typename ...List> struct typelist;

template <typename List1, typename List2> struct typelist_cat;

template <typename ...List1, typename ...List2>

struct typelist_cat<typelist<List1...>, typelist<List2...>>

{

using type = typelist<List1..., List2...>;

};

template <typename T, typename ...List2>

struct typelist_cat<T, typelist<List2...>>

{

using type = typelist<T, List2...>;

};

template <typename T, typename ...List1>

struct typelist_cat<typelist<List1...>,T>

{

using type = typelist<List1... ,T>;

};

As a holder for a compile-time integer and a pair (factor, power) I define two simple class templates

typedef unsigned long int_t;
template<int_t N>
struct sint {
  static const int_t value = N;
};
template<typename T1, typename T2>
struct spair {
  typedef T1 first;
  typedef T2 second;
};

typedef unsigned long int_t;

template<int_t N>

struct sint {

static const int_t value = N;

};

template<typename T1, typename T2>

struct spair {

typedef T1 first;

typedef T2 second;

};

I intentionally don’t want to misuse std::tuple to emphasize a purely compile-time programming.

Implementation

A formula of the type qk+r_i should start with k=1. The main class template has therefore two tasks:

1) Process all the prime factors less than (q+r). They are given as InitialPrimesList

template<typename Num, 
unsigned int Q = StartQ,
typename StartList = InitialPrimesList>
struct factorization;

// Factorization using trial division based on InitialPrimesList
template<int_t N, unsigned int Q, typename H, typename ...Tail>
struct factorization<sint<N>, Q, typelist<H,Tail...> >
{
  static const int_t candidate = H::value;
  using trial = try_factor<N, candidate>;
  static const int_t P = trial::power;
  using T = spair<sint<candidate>, sint<P> >;
  using nextN = sint<N/trial::factor>;
  using next = typename factorization<nextN,Q,typelist<Tail...>>::type;
  using type = typename std::conditional<(P > 0), 
     typename typelist_cat<T, next>::type, next>::type;
};

template<typename Num,

unsigned int Q = StartQ,

typename StartList = InitialPrimesList>

struct factorization;

// Factorization using trial division based on InitialPrimesList

template<int_t N, unsigned int Q, typename H, typename ...Tail>

struct factorization<sint<N>, Q, typelist<H,Tail...> >

{

static const int_t candidate = H::value;

using trial = try_factor<N, candidate>;

static const int_t P = trial::power;

using T = spair<sint<candidate>, sint<P> >;

using nextN = sint<N/trial::factor>;

using next = typename factorization<nextN,Q,typelist<Tail...>>::type;

using type = typename std::conditional<(P > 0),

typename typelist_cat<T, next>::type, next>::type;

};

2) Start a loop over k

// Start factorization loop with k=1
template<int_t N, unsigned int Q>
struct factorization<sint<N>, Q, typelist<> > 
{
  static const int_t candidate = Q + 1;
  using RList = typename select_factors<N,Q,1,Reminders>::type;
  using type = typename factor_loop<N,Q,1,RList,(candidate*candidate > N)>::type;
};

// End of factorization
template<unsigned int Q, typename H, typename ...List>
struct factorization<sint<1>, Q, typelist<H, List...>> {
  using type = typelist<>;
};
template<unsigned int Q>
struct factorization<sint<1>, Q, typelist<>> {
  using type = typelist<>;
};

// Start factorization loop with k=1

template<int_t N, unsigned int Q>

struct factorization<sint<N>, Q, typelist<> >

{

static const int_t candidate = Q + 1;

using RList = typename select_factors<N,Q,1,Reminders>::type;

using type = typename factor_loop<N,Q,1,RList,(candidate*candidate > N)>::type;

};

// End of factorization

template<unsigned int Q, typename H, typename ...List>

struct factorization<sint<1>, Q, typelist<H, List...>> {

using type = typelist<>;

};

template<unsigned int Q>

struct factorization<sint<1>, Q, typelist<>> {

using type = typelist<>;

};

The actual loop over k-values and the set of prime reminders r_i is realized in the next class template. N is currently factorized integer. It is reduced, if a factor has been found. RList is a typelist<…> containing the prime reminders r_i.

template<int_t N, unsigned int Q, int_t K, typename RList, bool doExit = false>
struct factor_loop;

template<int_t N, unsigned int Q, int_t K, typename H, typename ...Tail>
struct factor_loop<N,Q,K,typelist<H,Tail...>,false>
{
  static const int_t candidate = Q*K + H::value;
  using trial = try_factor<N, candidate>;
  using T = spair<sint<candidate>, sint<trial::power> >;
  using nextIter = typename factor_loop<N/trial::factor, Q, K, typelist<Tail...>, 
                            (candidate*candidate > N)>::type;
  using type = typename std::conditional<(trial::power > 0), 
               typename typelist_cat<T,nextIter>::type, nextIter>::type;
};

template<int_t N, unsigned int Q, int_t K, typename RList, bool doExit = false>

struct factor_loop;

template<int_t N, unsigned int Q, int_t K, typename H, typename ...Tail>

struct factor_loop<N,Q,K,typelist<H,Tail...>,false>

{

static const int_t candidate = Q*K + H::value;

using trial = try_factor<N, candidate>;

using T = spair<sint<candidate>, sint<trial::power> >;

using nextIter = typename factor_loop<N/trial::factor, Q, K, typelist<Tail...>,

(candidate*candidate > N)>::type;

using type = typename std::conditional<(trial::power > 0),

typename typelist_cat<T,nextIter>::type, nextIter>::type;

};

The template loop below goes first through RList and then increments K, if RList is empty and the exit condition is still not reached.

template<int_t N, unsigned int Q, int_t K>
struct factor_loop<N,Q,K,typelist<>,false>
{
  static const int_t candidate = Q*(K+1) + 1;
  using RList = typename select_factors<N,Q,K+1,Reminders>::type;
  using type = typename factor_loop<N,Q,K+1,RList,(candidate*candidate > N)>::type;
};

template<int_t N, unsigned int Q, int_t K>

struct factor_loop<N,Q,K,typelist<>,false>

{

static const int_t candidate = Q*(K+1) + 1;

using RList = typename select_factors<N,Q,K+1,Reminders>::type;

using type = typename factor_loop<N,Q,K+1,RList,(candidate*candidate > N)>::type;

};

An exit condition occurs, if the square of a candidate factor becomes greater than the current N. That can happen either with an empty or non-empty RList.

// Different exit conditions
template<int_t N, unsigned int Q, int_t K, typename RList>
struct factor_loop<N,Q,K,RList,true>
: public factor_loop<N,Q,K,typelist<>,true> {};

template<int_t N, unsigned int Q, int_t K>
struct factor_loop<N,Q,K,typelist<>,true>
{
  using type = typelist<spair<sint<N>, sint<1>>>;
};

template<unsigned int Q, int_t K, typename RList>
struct factor_loop<1,Q,K,RList,true> { using type = typelist<>; };

template<unsigned int Q, int_t K>
struct factor_loop<1,Q,K,typelist<>,true> { using type = typelist<>; };

// Different exit conditions

template<int_t N, unsigned int Q, int_t K, typename RList>

struct factor_loop<N,Q,K,RList,true>

: public factor_loop<N,Q,K,typelist<>,true> {};

template<int_t N, unsigned int Q, int_t K>

struct factor_loop<N,Q,K,typelist<>,true>

{

using type = typelist<spair<sint<N>, sint<1>>>;

};

template<unsigned int Q, int_t K, typename RList>

struct factor_loop<1,Q,K,RList,true> { using type = typelist<>; };

template<unsigned int Q, int_t K>

struct factor_loop<1,Q,K,typelist<>,true> { using type = typelist<>; };

Besides std::conditional from the standard header <type_traits>, there are two class templates still to define:

1) try_factor is a simple loop doing trial division of N by the given factor so many times as N is divisible. It returns power and the full factor candidate^power, even if power=0.

template<int_t N, int_t Factor, bool C = (N % Factor == 0)>
struct try_factor;
  
template<int_t N, int_t Factor>
struct try_factor<N, Factor, true> {
  typedef try_factor<N/Factor, Factor> next;
  static const int_t power = next::power + 1;
  static const int_t factor = next::factor * Factor;
};

template<int_t N, int_t Factor> 
struct try_factor<N, Factor, false> {
  static const int_t power = 0;
  static const int_t factor = 1;
};

template<int_t Factor> 
struct try_factor<0, Factor, true> {
  static const int_t power = 0;
  static const int_t factor = 1;
};

template<int_t N, int_t Factor, bool C = (N % Factor == 0)>

struct try_factor;

template<int_t N, int_t Factor>

struct try_factor<N, Factor, true> {

typedef try_factor<N/Factor, Factor> next;

static const int_t power = next::power + 1;

static const int_t factor = next::factor * Factor;

};

template<int_t N, int_t Factor>

struct try_factor<N, Factor, false> {

static const int_t power = 0;

static const int_t factor = 1;

};

template<int_t Factor>

struct try_factor<0, Factor, true> {

static const int_t power = 0;

static const int_t factor = 1;

};

2) select_factors is generally optional. It can be eliminated and Reminders is taken instead of RList. But tests have shown a significant reduction of the compilation time, when for a next k, the list of reminders is sieved before applying test_factor.

template<int_t N, unsigned int Q, int_t K, typename RList>
struct select_factors;

template<int_t N, unsigned int Q, int_t K, typename H, typename ...Tail>
struct select_factors<N,Q,K,typelist<H,Tail...> >
{
  static const int_t candidate = Q*K + H::value;
  using next = typename select_factors<N,Q,K,typelist<Tail...>>::type;
  using type = typename std::conditional<(N % candidate == 0), 
               typename typelist_cat<H,next>::type, next>::type;
};

template<int_t N, unsigned int Q, int_t K>
struct select_factors<N,Q,K,typelist<> > { using type = typelist<>; };

template<int_t N, unsigned int Q, int_t K, typename RList>

struct select_factors;

template<int_t N, unsigned int Q, int_t K, typename H, typename ...Tail>

struct select_factors<N,Q,K,typelist<H,Tail...> >

{

static const int_t candidate = Q*K + H::value;

using next = typename select_factors<N,Q,K,typelist<Tail...>>::type;

using type = typename std::conditional<(N % candidate == 0),

typename typelist_cat<H,next>::type, next>::type;

};

template<int_t N, unsigned int Q, int_t K>

struct select_factors<N,Q,K,typelist<> > { using type = typelist<>; };

Two ‘global’ typelists InitialPrimesList and Reminders are similar. The latter is derived from the first one. For instance, for q=2*3=6, InitialPrimesList should contain all primes less than 6.

static const int_t StartQ = 2*3;
using InitialPrimesList = typelist<sint<2>,sint<3>,sint<5>>;

1 2	static const int_t StartQ = 2*3; using InitialPrimesList = typelist<sint<2>,sint<3>,sint<5>>;

Of course, such a small primes list can be generated for a given q at compile-time, but this is a topic of another article. Reminders is obtained from InitialPrimesList by taking out all the prime factors of q (all primes smaller than q) and adding sint<1> in the beginning. This simplified procedure to get Reminders is valid for formulas 6k+r and 30k+r. Starting with 210k+r Reminders will contain composite numbers, e.g. with 11 and 13 as factors. The class template below eliminates all entries from List that are less or equal to Start:

template<int Start, typename List> 
struct select_reminders;

template<int Start, typename T, typename ...List> 
struct select_reminders<Start, typelist<T, List...>>
{
  using next = typename select_reminders<Start, typelist<List...>>::type;
  using type = typename std::conditional<(Start < T::value), 
               typename typelist_cat<T,next>::type, next>::type;
};

template<int Start, typename T> 
struct select_reminders<Start, typelist<T>>
{
  using type = typelist<T>;
};

using Reminders = typelist_cat<sint<1>, 
                  select_reminders<3,InitialPrimesList>::type>::type;

template<int Start, typename List>

struct select_reminders;

template<int Start, typename T, typename ...List>

struct select_reminders<Start, typelist<T, List...>>

{

using next = typename select_reminders<Start, typelist<List...>>::type;

using type = typename std::conditional<(Start < T::value),

typename typelist_cat<T,next>::type, next>::type;

};

template<int Start, typename T>

struct select_reminders<Start, typelist<T>>

{

using type = typelist<T>;

};

using Reminders = typelist_cat<sint<1>,

select_reminders<3,InitialPrimesList>::type>::type;

Usage

To try out the implementation, a simple output of a typelist containing either numbers in sint<> holder or factorization as a list of spair<> is needed:

template<typename List> struct typelist_out;

template<typename T, typename ...Args>
struct typelist_out<typelist<T, Args...> >
{
  static void print(std::ostream& os = std::cout, const char sep = '\t') 
  {
    os << T::value << sep;
    typelist_out<typelist<Args...>>::print(os, sep);
  }
};

template<typename T1, typename T2, typename ...Args>
struct typelist_out<typelist<spair<T1,T2>, Args...> >
{
  static void print(std::ostream& os = std::cout, const char sep = '\t') 
  {
    os << T1::value << "^" << T2::value << sep;
    typelist_out<typelist<Args...>>::print(os, sep);
  }
};

template<>
struct typelist_out<typelist<>>
{
  static void print(std::ostream& os = std::cout, const char sep = '\t') 
  { 
    os << std::endl;
  }
};

template<typename List> struct typelist_out;

template<typename T, typename ...Args>

struct typelist_out<typelist<T, Args...> >

{

static void print(std::ostream& os = std::cout, const char sep = '\t')

{

os << T::value << sep;

typelist_out<typelist<Args...>>::print(os, sep);

}

};

template<typename T1, typename T2, typename ...Args>

struct typelist_out<typelist<spair<T1,T2>, Args...> >

{

static void print(std::ostream& os = std::cout, const char sep = '\t')

{

os << T1::value << "^" << T2::value << sep;

typelist_out<typelist<Args...>>::print(os, sep);

}

};

template<>

struct typelist_out<typelist<>>

{

static void print(std::ostream& os = std::cout, const char sep = '\t')

{

os << std::endl;

}

};

… And finally the program:

static const int_t N = 12345;
int main(int argc, char *argv[])
{
    std::cout << "Compile-time factorization of " << N << ": ";
    typelist_out<factorization<sint<N> >::type>::print();
    return 0;
}

static const int_t N = 12345;

int main(int argc, char *argv[])

{

std::cout << "Compile-time factorization of " << N << ": ";

typelist_out<factorization<sint<N> >::type>::print();

return 0;

}

Conclusion

Does this implementation work? For which values of N? Try it yourself by copying the code pieces from here in a right order and compile them with a C++11 capable compiler. Don’t forget to set a higher template depth for a large N, include <iostream> and <type_traits> headers as well as reorganize classes in a proper order. Or get the full code including an implementation using Loki::Typelist from github.

It was very surprisingly to me, that the worst case (the largest prime) in the range of unsigned int – 4294967291, could be tried within a few seconds of compilation time using g++ 4.8.3. Even more interesting, that the factorization of larger numbers from the range of unsigned long depends rather on RAM availability than the compilation time. More detailed benchmarks will be given in the next article.

The stuff here can be seen as a metaprogramming fun, but I came to this task during my work on GFFT library. The library performs fast Furier transforms of size N, where N is a compile-time integer constant. I started with the simplest and fastest case N=2^P, but later moved to a general size N, which had to be factorized. Sure, it is worthy to do that at compile-time. And I didn’t know, if a compile-time factorization will work for large numbers. That made this work even more exciting.

Get the source code to the article

References

Hans Riesel. Prime Numbers and Computer Methods for Factorization. 2nd Edition, 2011
Andrei Alexandrescu. Modern C++ Design, Generic Programming and Design Patterns Applied. 2001

Scientific C++

Modern C++ in Scientific Computing

Compile-time factorization

2 thoughts on “Compile-time factorization”

Leave a Reply Cancel reply