From e72458a3d6818b97b9bdc41f462c23264f5ce481 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Reynir=20Bj=C3=B6rnsson?= Date: Mon, 5 Feb 2024 12:53:21 +0100 Subject: [PATCH] Update python-str-rep --- posts/2024-02-03-python-str-repr.md | 31 +++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/posts/2024-02-03-python-str-repr.md b/posts/2024-02-03-python-str-repr.md index b3aa15d..5d36f17 100644 --- a/posts/2024-02-03-python-str-repr.md +++ b/posts/2024-02-03-python-str-repr.md @@ -264,6 +264,35 @@ Now I can't find anymore failing test cases! ## Epilogue +What can we learn from this? +It is easy to say in hindsight that a different representation should have been chosen. +However, arriving at this insight takes time. +The exact behavior of `str.__repr__()` is poorly documented. +Reaching my understanding of `str.__repr__()` took hours of research and reading the C implementation. +It often doesn't seem to be worth it to spend so much time on research for a small function. +Technical debt is a real thing and often hard to predict. +Below is the output of `help(str.__repr__)`: + +```Python +__repr__(self, /) + Return repr(self) +``` + +Language and (standard) library designers could consider whether the slightly nicer looking strings are worth the added complexity users eventually are going to rely on - inadvertently or not. +I do think strings and bytes in Python are a bit too complex. +It is not easy to get a language lawyer[^language-lawyer] level understanding. +In my opinion it is a mistake to not at least print a warning if there are illegal escape sequences - especially considering there are escape sequences that are valid in one string literal but not another. + +Unfortunately it is often the case that to get a precise specification it is necessary to look at the implementation. +For testing your implementation hand-written tests are good. +Testing against the original implementation is great, and if combined with property-based testing or fuzzing you may find failing test cases you couldn't dream up! +I certainly didn't see it coming that the output depends on the Unicode version. +As is said, testing can only show the presence of bugs, but with a, in a sense, limited domain like this function you can get pretty close to showing absence of bugs. + +I enjoyed working on this. +Sure, it was frustrating and at times I discovered some ungodly properties, but it's a great feeling to study and understand something at a deeper level. +It may be the last time I need to understand Python's `str.__repr__()` this well, but if I do I now have the OCaml code and this blog post to reread. + If you are curious to read the resulting code you may find it on github at [github.com/reynir/python-str-repr](https://github.com/reynir/python-str-repr). I have documented the code to make it more approachable and maintainable by others. Hopefully it is not something that you need, but in case it is useful to you it is licensed under a permissive license. @@ -291,3 +320,5 @@ If you have a project in OCaml or want to port something to OCaml and would like The Python code in question is using `str` however. [^raw-escape-example]: Note I use single quotes for the output. This is what Python would do. It would be equivalent to `"\\\""`. + +[^language-lawyer]: [A person, usually an experienced or senior software engineer, who is intimately familiar with many or most of the numerous restrictions and features (both useful and esoteric) applicable to one or more computer programming languages. A language lawyer is distinguished by the ability to show you the five sentences scattered through a 200-plus-page manual that together imply the answer to your question “if only you had thought to look there”.](http://catb.org/jargon/html/L/language-lawyer.html)