Google makes extensive use of Swig. Greg indicated that Swig has improved much in recent years. All C++ projects in Google have swig generators created during build time, so python programmers can take advantage of this work. Greg said that neither Boost nor ctypes were as direct or clean as using SWIG.
RPC is the method Google uses to scale horizontally so well. They have their own internal binary wire format that speaks over http. Programmers can easy speak this format using Java, C++ or Python. Using RPC allows Google to divide computing problems up across large numbers of servers.
Internally Google has been using Python 2.2. It has been hard for them to move forward to 2.3 or 2.4 becaue of the large number of machines that the have and they have to have compatibility amoung those machines (I'm assumming this is more of an IT issue, since Python is pretty good at backwards compatibility, but I guess if you deploy 2.4 and start using decorators any machine running 2.2 will choke). Greg said that they will soon try to move to 2.4.
Python programmers at Google must follow a strict style guideline (based on PEP8 with 2 spaced indenting). When engineers are first granted commit access to their SCM system, they must pass a style test. All code must pass through two sets of eyes before being checked in. That combined with liberal doses of unittest, pychecker and code coverage eliminates most non-algorithmic issues that might appear in python code.
Greg commented that the code Google has released as Open Source thus far is not too interesting. But hopefully that will change in the future. He noted that they will probably release their packaging system.
Greg said the really Python is rarely a bottleneck at Google. (With bits going over a network and hitting a database, both of these will impact before python even comes into the picture). As mentioned previosly when MS ported their eShops code to .asp, because "Python is interpreted and has to be slow" (not a direct quote, but something like that). When the port to asp was done, the code was in fact slower!
When programming in python, one should design for scalability. If you design properly, the you should be able to just throw more machines at the problem and your app should scale fine.
Greg's final point on performance was that you can always write in C/C++ and wrap if you need to. This is an attempt to justify the use of python to higher up types who may disapprove of it's use. Greg stated (and this is a direct quote), "People have been saying [wrap C/C++] for 10 years. I've never done this once!"
Alex Martelli commented that you can throw more machines to solve the bandwidth problem, but you can't do that to solve the latency problem.
Someone asked about the use of MySQL in Google. Greg said, "We use it specifically because it is open source". When you buy from a vendor you are subject to the whims of that vendor. Google had an instance where they were using proprietary software and needed a feature to be added. The vendor said no. Google offered to pay for a developer. The vendor said no! Google obviously wasn't happy with this, and open source is one solution. (Obviously not having to pay Oracle a per cpu license for 1000's of machines is probably another reason, but not one that Greg mentioned).
There was talk about testing and code quality of python. Greg said that by using code coverage for an interpreted language. Once you have coverage on a line you know it will "compile" much like a C or Java program. So you can get around typos somewhat with that.
Someone asked why Google even used Java at all? Greg appeared to bite his tongue and then said that there are a lot of good java programmers out there and Google hires a few of them.
On the issue of catching bugs, Greg mentioned that because they are running code on tens of thousands of machines, that they see bugs that appear less frequently A LOT more than most people. They've run into obscure kernel bugs that other people rarely run into. So he said, Google's production code is quite good, because bugs get exposed early and often.
There was a question on the GIL (global interpreter lock) in python. Greg said he tried to remove the GIL a few years back, but it was a quick hack and slowed some things down. He said that trying to remove it now would prove very difficult (hopefully PyPy helps here). But Greg's suggestion was to use RPC instead of threading.
All in all it was a very good talk, providing interesting insight from a man who has a lot of experience using Python. It was also fun to see where Google is using python inside.
Thanks for taking the time to write this summary.
RPC is the method Google uses to scale horizontally so well. They have
their own internal binary wire format that speaks over http. Programmers
can easy speak this format using Java, C++ or Python. Using RPC allows
Google to divide computing problems up across large numbers of
servers.
>Not true. Google uses 9P, not http.
I would very much like to know how you know that.
Hi. I develop code generator for boost.python library.
This code generator introduces few ideas not found in others.
It allows to minimize support and development time to minimum.
Hi,