Breaking the Pi World Record?
Note: this post should hopefully be understandable with anyone who is a computer power user. Also, this post is meant to supplement Mr. Alexander Yee’s y-cruncher, not replace it. You have to see his website for the crucial technical details required to make world records.
Difficulties of mathematical constants: (Excerpt from y-cruncher v0.7.8.9506, Mr. Yee thankfully let me post this)
Compute a Constant: (in ascending order of difficulty to compute)
# Constant Value Approximate Difficulty*
Fast Constants:
0 Sqrt(n) 1.46
1 Golden Ratio = 1.618034... 1.46
2 e = 2.718281... 3.88 / 3.88
Moderate Constants:
3 Pi = 3.141592... 13.2 / 19.9
4 Log(n) > 35.7
5 Zeta(3) (Apery's Constant) = 1.202056... 62.8 / 65.7
6 Catalan's Constant = 0.915965... 78.0 / 105.
7 Lemniscate = 5.244115... 60.4 / 124. / 154.
Slow Constants:
8 Euler-Mascheroni Constant = 0.577215... 383. / 574.
Other:
9 Euler-Mascheroni Constant (parameter override)
10 Custom constant with user-defined formula.
*Actual numbers will vary. Radix conversion = 1.00
If you did not read the First Post and Second Post of the significance and algorithms of mathematical constants, read them to understand algorithms and the significance of all main mathematical constants.
Note there are more mathematical constants that are defined with custom formula files available with the executable, but they are more complicated math, so if you really want to set custom formula records, you should eventually know more as you research them.
IntroductionPermalink
Of the people that I have contacted related to my ongoing world record project, all of them either have set a world record for Pi or are people that definitely will set one with the highest professionalism and optimization.
Dr. Ian Cutress also did not hide his intent to pursue a world record for Pi, perhaps in a collaboration with Gaming PC reviewers LinusTechTips and Gamers Nexus. But a world record is a world record. It requires stuff to set them. If it didn’t, it wouldn’t have been called one and instead everyone would do these everyday. What is required to set these records then?
What’s RequiredPermalink
We are going to consider 100 trillion digits of Pi with the Chudnovsky (1988) (reduced memory) algorithm, which is a sensible next step to the current record of 50 trillion digits by Timothy Mullican as of Jan 2019.
Verification of the digits are done using the BBP spigot algorithm thus the computing resources for verification is negligible (under a week with a desktop CPU).
1. Mediocre relatively recent server CPUs around 28-36 cores (I mean this is still a lot but this isn't high-end, it still won't saturate these cores because of I/O bottleneck). The CPU's performance is not important, instead the maximum RAM and PCIe lanes for the secondary storage it can support is important. Thus it is normally better to use multiple sockets with smaller cores.
2. At least 768GB RAM at the very very minimum. At least 2-3 times if the objective is finishing in under 6 months, not over a year.
3. 440 TiB = 483.785 TB of high-speed RAID storage and 166 TiB = 182.519 TB to store the generated digits (doesn't need to be additional).
Hardware CostsPermalink
Let’s start with simply summing up the hardware making up this specification, which is the cost of starting from scratch. This was the way the Pi world record in 2009 by biologist Shigeru Kondo and y-cruncher developer Alexander J. Yee was done; assembling computers and connecting many hard drives manually.
RAM: One 64GB DDR4 PC4-23400 ECC REG Samsung RAM stick costs about $400 in Amazon. 32 of them (if we can even stick all of them to the server motherboard) cost $12800, summing to 2.048 TB of RAM. One 128GB DDR4 PC4-21300 ECC REG RAM stick 128 GB costs around $1100. 16 of them cost $17600, summing to 2.048 TB of RAM too. Intel Optane DIMMs (only for Intel Xeon Scalable CPUs) cost minimum $2100 each per 256 GB DIMM, 8 of them plus 8 64GB DDR4 RAM cost $20000 and sums to 2.56 TB, so this also can be an option when there is a lack of RAM sockets.
CPU: Timothy Mullican used 4x Xeon E7-4880 v2 CPU sockets for the record and had around 50% multicore efficiency. The closest benchmark I could find related to this system is This (Passmark score 34472). Since the E7-4890 v2 CPU is a faster clocked version, it can be expected that the performance will be similar or even lower.
For single-processor options, Intel Xeon W-3275 (28 cores with AVX-512 and 6-channel DDR4-2933, max 1 TB and 12 DIMMs, 64 PCIe lanes) costs $4537.90. AMD EPYC 7502P (32 cores with AVX2 and 8-channel DDR4-3200, max 2 TB, 128 PCIe lanes) costs $2300. AMD EPYC 7502P theoretically works, so let’s keep it.
For multiple-processor options, two Xeon Gold 6238 CPUs, each costing $2612, sums up to $5224 and 44 cores, and two Epyc 7352 CPUs, each costing $1350, sums up to $2700 and 48 cores. If we use older-generation used CPUs it will cost less (but nevertheless doesn’t impact the cost greatly compared to other components).
Storage: If the requirement is 440 TB, we need more than that in order to do RAID 5/6, which are configurations that tolerate one or more disk failure. One WD 8 TB shuckable external hard drive costs $140 but is slower than 7200 RPM HDDs, 70 of them costs $9800. A normal 8TB HDD with a higher I/O speed costs $200. 70 of them cost $14000. We may put in NVMe SSDs for faster buffering, and a 4 TB SSD costs 800$, so prepare to break some more money. This configuration can (hopefully) get parallel I/O speeds of around 5-6 GB/s and more if the NVMe buffers are configured correctly and meaningfully.
Cost for motherboard and RAID connection: Maybe around a couple thousand plus another few thousand on motherboards and cooling, summing to maybe $4000 or upwards? Add another $1500 per additional socket.
Total for pure hardware: minimum $33700 with Epyc 7502P and 128GB DDR4 ECC DIMMs minimum $30800 with 2x EPYC 7352 and 64GB DDR4 ECC DIMMs, minimum $40524 with 2x Xeon Gold 6238 and Optane mixed with DDR4. Note that this is the bare minimum and parts costs in real life will be more for sure.
This is not including electricity. Two CPUs spend under 400W and closer to 300W. One spends under 200W. 70 HDDs spend around 560W if we assume each HDD spends 8W. If there is no GPU, it will spend around 900W total. Assuming 0.2$/kWh, running for 6 months will cost roughly $800. So the whole summed bare minimum is expected to be around $32000. I would advise to prepare around $50000 if starting from scratch. If you have no substantial experience of computer hardware, you also have to hire someone to set up all the hardware.
I don’t think many people outside some extremely rich people happening to be interested in mathematics would be willing to do this from scratch and rather people doing computing work as a job and thus have accessibility to hardware.
Cloud Computing?Permalink
I also thought about cloud computing since Google’s Emma Haruka Iwao achieved the world record of Pi using the Google Cloud Platform m1-megamem-96
(formerly n1-megamem-96
) instance. The good side of cloud computing is that it is scalable and the required man-hours are hugely less if the time of whoever doing this is expensive. The downside is that we have to use the provided computing node as-is and thus cannot optimize as much as building our own systems.
We can use Ubuntu 18.04 or CentOS 8 so we don’t have to pay for the OS. For GCP the new m1-ultramem-80
looks perfect. Let’s see the pricing per month. Around $6430 per month with sustained usage discount. The sole-tenant node m1-node-96-1433
with less RAM of which is around 500$ less than m1-ultramem-80
costs around $5900 per month. If the computation lasts 6 months, that is already similar to the whole cost of buying all the hardware but you don’t get to keep or sell the hardware.
The real problem doesn’t come with the compute node. Around 460 TiB a month costs $31260. Even if the real usage cost is approximately 2/3 since the peak point isn’t persistent, it still costs $20840 a month on average.
For AWS the x1e.16xlarge
costs around $9800 a month. Dedicated hosts also exist, but have inadequate ratios between the CPU cores and RAM. The Amazon EBS volume has a max size of 16 TB (costing around $700 per month) so we require an average of 15 of them. Thus it costs over $20000 a month.
Both of these fees could be reduced with using a different storage surface, but it is still way over assembling a system from scratch.
You can now see that setting a Pi world record with cloud providers is a huge expenditure and is impractical unless the resources are sponsored by AWS or Google themselves. Hiring people to set up a system is clearly cheaper.
HPC ClustersPermalink
This is now the most plausible form of computing resources that could work. Distributed file systems are already deployed and already designed to handle large data at high throughput. There are so many nodes that occupying one node for a month doesn’t impact anything big. This is the case for the world record by Dr. Peter Trueb, with the HPC cluster supported by high-energy physics research instrument manufacturer Dectris. This is efficient in sense that a lot of HPC Clusters are idle and can be easily utilized for a few months, and also house distributed file systems designed for parallelization. File systems like Ceph RADOS Block Device or BeeGFS can prove a very easy solution for RAID-level parallel file I/O once deployed.
A new concept of computing frameworks for computing Pi can also work. Instead of utilizing secondary storage, connecting supercomputers using recent high-speed Mellanox InfiniBand and utilizing the required 500 TB RAM over hundreds to thousands of nodes can also work out since InfiniBand is definitely faster than SSDs or HDDs. Since supercomputers have high CPU Core/RAM GB ratios this can be inefficient for y-cruncher workloads and have a lot of idle CPU cores, but this can reduce the total compute time dramatically.
It is expected for Academic HPC Clusters to be very efficient, as long as someone is up for this task. Cost estimates are (obviously) unavailable since every clusters or supercomputers are different.
ConclusionPermalink
A world record is a world record. It wouldn’t have been called one if everyone could easily do it. I have introduced three main ways that the world record of Pi could be done, and all of these have the same problem; the I/O wall. The 4 GHz Power Wall for the CPU was once the bottleneck of numerical computations, and has been solved by multiprocessing. The bottleneck from I/O is adversely affecting a reasonable method of breaking the current barrier and it will be harder and harder to set records as long as disk speeds of economic mediums such as HDDs stay this way. SSDs are still expensive and the redundancy of SSDs are insufficient for the extreme Read/Write y-cruncher requires. This causes SSDs to potentially corrupt during the computation or be literally single-use. Technologies for storage must be improved for the plight for the world record of Pi to continue on rapidly as before.