We prime-encode the natural numbers via recursive factorisation, iterated to the exponents, generating a corpus of planar rooted trees equivalently represented as Dyck words. This forms a deterministic text endowed with internal rules. Statistical analysis of the corpus reveals that the dictionary and the entropy grow sublinearly, compression shows non-monotonic trend, and the rank-frequency curves assume a stable parabolic form deviating from Zipf’s law. Correlation analysis using mean-squared displacement reveals a transition from normal diffusion to superdiffusion in the associated walk. These findings characterise the tree-encoded sequence as a statistically structured text with long-range correlations grounded in its generative arithmetic law, providing an empirical basis for subsequent theoretical investigations and empirical ones with large language models.
Statistical properties of the rooted-tree encoding of N / Contucci, Pierluigi; Giberti, Claudio; Osabutey, Godwin; Vernia, Cecilia. - In: PHYSICA. A. - ISSN 0378-4371. - 686:(2026), pp. 1-16. [10.1016/j.physa.2026.131361]
Statistical properties of the rooted-tree encoding of N
Giberti, Claudio;Osabutey, Godwin
;Vernia, Cecilia
2026
Abstract
We prime-encode the natural numbers via recursive factorisation, iterated to the exponents, generating a corpus of planar rooted trees equivalently represented as Dyck words. This forms a deterministic text endowed with internal rules. Statistical analysis of the corpus reveals that the dictionary and the entropy grow sublinearly, compression shows non-monotonic trend, and the rank-frequency curves assume a stable parabolic form deviating from Zipf’s law. Correlation analysis using mean-squared displacement reveals a transition from normal diffusion to superdiffusion in the associated walk. These findings characterise the tree-encoded sequence as a statistically structured text with long-range correlations grounded in its generative arithmetic law, providing an empirical basis for subsequent theoretical investigations and empirical ones with large language models.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S037843712600097X-main.pdf
Open access
Tipologia:
VOR - Versione pubblicata dall'editore
Licenza:
[IR] creative-commons
Dimensione
3.84 MB
Formato
Adobe PDF
|
3.84 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate

I metadati presenti in IRIS UNIMORE sono rilasciati con licenza Creative Commons CC0 1.0 Universal, mentre i file delle pubblicazioni sono rilasciati con licenza Attribuzione 4.0 Internazionale (CC BY 4.0), salvo diversa indicazione.
In caso di violazione di copyright, contattare Supporto Iris




