Scholarly Output


Publications

International Conferences.
Sandy Aoun, Varvara Arzt, Daniel Luger, Georg Vogeler. "Information Extraction from German Medieval Charters Abstracts". 19th Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2024), Washington, D.C., United States, August 6-10, 2024. (Accepted!)
Andreas Habring, Anguelos Nicolaou, Daniel Luger, Florian Atzenhofer-Baumgartner, Florian Lamminger, Franziska Decker, Sandy Aoun, Tamás Kovács, Georg Vogeler, Martin Holler. "Probabilistic Modeling of Chronological Dates to Serve Machines and Scholars". 18th Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2023), Graz, Austria, July 10-14, 2023.
Tamás Kovács, Sandy Aoun, Georg Vogeler, Anguelos Nicolaou, Daniel Luger, Florian Atzenhofer-Baumgartner, Florian Lamminger, Franziska Decker. "Few Shot Classification for Labeling of Medieval and Early Modern Charter Texts". Poster. 18th Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2023), Graz, Austria, July 10-14, 2023.
Daniel Luger, Anguelos Nicolaou, Franziska Decker, Florian Atzenhofer-Baumgartner, Florian Lamminger, Georg Vogeler, Sandy Aoun, Tamás Kovács. "Digital contributions to a 300 years old methodology: Diplomatics & DH". Poster. 18th Annual International Conference of the Alliance of Digital Humanities Organizations (DH 2023), Graz, Austria, July 10-14, 2023.
Georg Vogeler, Anguelos Nicolaou, Daniel Luger, Tamás Kovács, Florian Atzenhofer-Baumgartner, Sandy Aoun, Franziska Decker. "Computational Methods in Studying Late Medieval Charters". Poster. Third Conference on Computational Humanities Research (CHR 2022), Antwerp, Belgium, December 12-14, 2022.
Other Conferences.
Florian Atzenhofer-Baumgartner, Daniel Luger, Tamás Kovács, Johannes Laroche, Anguelos Nicolaou, Franziska Decker, Nicolas Renet, Sandy Aoun, Niklas Tscherne, Georg Vogeler. "Formulaic Language in Diplomatics: Investigating Formulas as Charter Type Discriminators". Conference on Formulaic Language in Historical Research and Data Extraction, Amsterdam, The Netherlands, February 7-9, 2024.
Georg Vogeler, Daniel Luger, Anguelos Nicolaou, Tamás Kovács, Florian Atzenhofer-Baumgartner, Florian Lamminger, Sandy Aoun, Franziska Decker. "Building a virtual research environment to move from digital to distant Diplomatics (ERC project DiDip)". Poster. 9. Tagung des Verbands Digital Humanities im deutschsprachigen Raum (DHd 2023), Belval, Luxembourg and Trier, Germany, March 13-17, 2023.

Other Scholarly Manuscripts

Research Proposal. Sandy Aoun. "Automatic Speech Recognition of Arabic Speech Using Sequence-to-Sequence Models". Submitted to the Grant Research Program which is jointly supported by the American University of Beirut (AUB) and the National Council for Scientific Research. 2019/2020 Academic Year. [I also prepared and performed a 30-minute Oral Presentation.]
It is worth noting that such a proposal can only be submitted by faculty members (full-time AUB professors). The proposal is usually written by a professor(s) who competes against other AUB professors - who have already written and submitted their respective research proposals - for one of a few securable research fundings.
Master's Thesis. Sandy Aoun. "Optimal Constitution of the Speech Corpus for the Speech Synthesis of Audiobooks". Lebanese University and University of Toulouse III - M.Sc. Thesis in Computer Science. September 2016. Written in French. [Thesis defense: I also prepared and performed a 20-minute Oral Presentation.]
Technical Report. Sandy Aoun. "Conception and Implementation of a JSON to XML Compiler". Lebanese University - Graduate Research Project in Computer Science. May 2015. Written in French. [Project defense: I also prepared and performed a 30-minute Oral Presentation.]

Research Software

Constructing Bilad al-Sham Flora Database: I implemented software programs which transform unstructured factual text input into a valuable biological database. In essence, useful/specific information is extracted from encyclopedia-like PDF files covering flora in the Eastern Mediterranean. The extracted data is consecutively refined into a standardized database. 2021. [Programming language used: Python.]
Building End-to-End ASR Dataset: I carried out an experiment which addresses building datasets suitable for training end-to-end automatic speech recognition (ASR) systems of spoken Arabic dialects. Our proposed automatic dataset collection method consists of firstly crawling YouTube videos whose Arabic closed captions are provided by the channel owner (the most frequent words in Arabic tweets are used as search keywords); then secondly passing the videos and their associated captions through several filtering heuristics which ensure reaching a satisfactory outcome. I also developed a program which aims to assess the effectiveness of our approach by generating relevant statistics. 2020. [Technologies used: Python, Bash, YouTube Search API, SoX.]
Packaging MGB-2 Dataset: I implemented software programs which process the MGB-2 dataset [Ali+16] in order to ultimately convert it into a form readable by the pipeline of the high-performance speech recognition framework Wav2letter++ [Pra+19]. 2019. [Technologies used: Python, SoX, Wav2letter++ [Pra+19], Docker.]
Optimal Constitution of TTS Speech Corpus: I carried out an experiment which aspires to optimize the process of constructing the speech corpus of unit selection text-to-speech (TTS) systems. In this context, I implemented a greedy algorithm (spitting strategy) to bring into view the trade-off between the amount of text to be recorded and the quality of obtained (synthesized) speech signals. The implementation is based on our formal theoretical analysis which essentially profits from concepts related to the following domains/sub-domains: Set Cover Problem; Approximation Algorithms; and Linear Algebra. 2016. [Technologies used: Python, IRISA TTS System [Ala+16], ROOTS [CLL14].]
Objective Evaluation of Speech Signals: I implemented a software which measures the objective distance between natural and synthesized speech signals. In our case, the objective distance corresponds to the normalized Dynamic Time Warping cost which is computed on the Euclidean Distance between the Mel-Generalized Cepstral sequences of the signals. 2016. [Technologies used: Python, SPTK (Speech Signal Processing Toolkit), SoX (Sound eXchange).]
JSON to XML Compiler: I implemented a compiler which translates a JSON-formatted document into an interchangeable XML-formatted document. The implementation is based on my theoretical analysis which amounted to firstly defining a formal grammar as well as formulating a lexical and syntactic analysis of the syntax of JSON, then subsequently devising a semantic analysis by coming up with a suitable attribute grammar. 2015. [Programming language used: C++.]

Back to Sandy Aoun's homepage

Copyright Notice