Publications

2025

TOSEM
VexIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

S. VenkataKeerthy, Soumya Banerjee , Sayan Dey , and 5 more authors

ACM Trans. Softw. Eng. Methodol., Mar 2025

Just Accepted

Abs Bib Website

Binary similarity involves determining whether two binary programs exhibit similar functionality with applications in vulnerability detection, malware analysis, and copyright detection. However, variations in compiler settings, target architectures, and deliberate code obfuscations significantly complicate the similarity measurement by effectively altering the syntax, semantics, and structure of the underlying binary. To address these challenges, we propose VexIR2Vec, a robust, architecture-neutral approach based on VEX-IR to solve binary similarity tasks. VexIR2Vec consists of three key components: a peephole extractor, a normalization engine (VexINE), and an embedding model (VexNet). The process to build program embeddings starts with the extraction of sequences of basic blocks, or peepholes, from control-flow graphs via random walks, capturing structural information. These generated peepholes are then normalized using VexINE, which applies compiler-inspired transformations to reduce architectural and compiler-induced variations. Embeddings of peepholes are generated using representation learning techniques, avoiding Out-Of-Vocabulary (OOV) issues. These embeddings are then fine-tuned with VexNet, a feed-forward Siamese network that maps functions into a high dimensional space for diffing and searching tasks in an application-independent manner.We evaluate VexIR2Vec against five baselines — BinDiff, DeepBinDiff, SAFE, BinFinder, and histograms of opcodes — on a dataset comprising (2.7M) functions and (15.5K) binaries from (7) projects compiled across (12) compilers targeting x86 and ARM architectures. The experiments span four adversarial settings — cross-optimization, cross-compilation, cross-architecture, and obfuscations — that are typically exploited by malware and vulnerabilities. In diffing experiments, VexIR2Vec outperforms the nearest baseline in these four scenarios by (40%) , (18%) , (21%) , and (60%) , respectively. In the searching experiment, VexIR2Vec achieves a mean average precision of (0.76) , the nearest baseline, by (46%) . Our framework is highly scalable and is built as a lightweight, multi-threaded, parallel library using only open-source tools. VexIR2Vec is (approx 3.1) – (3.5times) faster than the closest baselines and orders-of-magnitude faster than other tools.
@article{venkatakeerthy-2025-VexIR2Vec, author = {VenkataKeerthy, S. and Banerjee, Soumya and Dey, Sayan and Andaluri, Yashas and PS, Raghul and Kalyanasundaram, Subrahmanyam and Pereira, Fernando Magno Quint\~{a}o and Upadrasta, Ramakrishna}, title = {VexIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity}, year = {2025}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, issn = {1049-331X}, url = {https://doi.org/10.1145/3721481}, doi = {10.1145/3721481}, note = {Just Accepted}, journal = {ACM Trans. Softw. Eng. Methodol.}, month = mar, keywords = {Binary Similarity, Program Embedding, Representation Learning}, }

2024

CC
The Next 700 ML-Enabled Compiler Optimizations

S. VenkataKeerthy, Siddharth Jain , Umesh Kalvakuntla , and 6 more authors

In Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction , Mar 2024

Abs arXiv Bib Code Poster Website

There is a growing interest in enhancing compiler optimizations with ML models, yet interactions between compilers and ML frameworks remain challenging. Some optimizations require tightly coupled models and compiler internals, raising issues with modularity, performance and framework independence. Practical deployment and transparency for the end-user are also important concerns. We propose ML-Compiler-Bridge to enable ML model development within a traditional Python framework while making end-to-end integration with an optimizing compiler possible and efficient. We evaluate it on both research and production use cases, for training and inference, over several optimization problems, multiple compilers and its versions, and gym infrastructures.
@inproceedings{venkatakeerthy-2024-MLCompilerBridge, author = {VenkataKeerthy, S. and Jain, Siddharth and Kalvakuntla, Umesh and Gorantla, Pranav Sai and Chitale, Rajiv Shailesh and Brevdo, Eugene and Cohen, Albert and Trofin, Mircea and Upadrasta, Ramakrishna}, title = {The Next 700 ML-Enabled Compiler Optimizations}, year = {2024}, isbn = {9798400705076}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3640537.3641580}, doi = {10.1145/3640537.3641580}, booktitle = {Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction}, pages = {238–249}, numpages = {12}, keywords = {Machine Learning for Compiler Optimizations, ONNX, Pipes, TensorFlow AOT, gRPC}, location = {<conf-loc>, <city>Edinburgh</city>, <country>United Kingdom</country>, </conf-loc>}, series = {CC 2024}, }

2023

CC
RL4ReAl: Reinforcement Learning for Register Allocation

S. VenkataKeerthy, Siddharth Jain , Anilava Kundu , and 3 more authors

In Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction , Mar 2023

Abs arXiv Bib Code Slides Website

We aim to automate decades of research and experience in register allocation, leveraging machine learning. We tackle this problem by embedding a multi-agent reinforcement learning algorithm within LLVM, training it with the state of the art techniques. We formalize the constraints that precisely define the problem for a given instruction-set architecture, while ensuring that the generated code preserves semantic correctness. We also develop a gRPC based framework providing a modular and efficient compiler interface for training and inference. Our approach is architecture independent: we show experimental results targeting Intel x86 and ARM AArch64. Our results match or out-perform the heavily tuned, production-grade register allocators of LLVM.
@inproceedings{venkatakeerthy-2023-RL4ReAl, author = {VenkataKeerthy, S. and Jain, Siddharth and Kundu, Anilava and Aggarwal, Rohit and Cohen, Albert and Upadrasta, Ramakrishna}, title = {RL4ReAl: Reinforcement Learning for Register Allocation}, year = {2023}, isbn = {9798400700880}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3578360.3580273}, doi = {10.1145/3578360.3580273}, booktitle = {Proceedings of the 32nd ACM SIGPLAN International Conference on Compiler Construction}, pages = {133–144}, numpages = {12}, keywords = {Register Allocation, Reinforcement Learning}, location = {<conf-loc>, <city>Montr\'{e}al</city>, <state>QC</state>, <country>Canada</country>, </conf-loc>}, series = {CC 2023}, }
arXiv
VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity

S. VenkataKeerthy, Yashas Andaluri , Sayan Dey , and 2 more authors

Mar 2023

Abs Bib

We propose VEXIR2Vec, a code embedding framework for finding similar functions in binaries. Our representations rely on VEX IR, the intermediate representation used by binary analysis tools like Valgrind and angr. Our proposed embeddings encode both syntactic and semantic information to represent a function, and is both application and architecture independent. We also propose POV, a custom Peephole Optimization engine that normalizes the VEX IR for effective similarity analysis. We design several optimizations like copy/constant propagation, constant folding, common subexpression elimination and load-store elimination in POV. We evaluate our framework on two experiments – diffing and searching – involving binaries targeting different architectures, compiled using different compilers and versions, optimization sequences, and obfuscations. We show results on several standard projects and on real-world vulnerabilities. Our results show that VEXIR2Vec achieves superior precision and recall values compared to the state-of-the-art works. Our framework is highly scalable and is built as a multi-threaded, parallel library by only using open-source tools. VEXIR2Vec achieves about 3.2x speedup on the closest competitor, and orders-of-magnitude speedup on other tools.
@misc{venkatakeerthy-2023-vexir2vec, title = {VEXIR2Vec: An Architecture-Neutral Embedding Framework for Binary Similarity}, author = {VenkataKeerthy, S. and Andaluri, Yashas and Dey, Sayan and Banerjee, Soumya and Upadrasta, Ramakrishna}, year = {2023}, eprint = {2312.00507}, archiveprefix = {arXiv}, primaryclass = {cs.PL}, }
APNET
Packet Processing Algorithm Identification using Program Embeddings

S. VenkataKeerthy, Yashas Andaluri , Sayan Dey , and 3 more authors

In Proceedings of the 6th Asia-Pacific Workshop on Networking , Mar 2023

Abs Bib Video Slides Website

To keep up with the network speeds, many recent works propose to offload network functions to SmartNICs. The process involves identifying packet-processing algorithms in a network function program then offloading them to appropriate accelerators available on SmartNICs. This process is often done manually for each architecture and is error-prone and laborious. In this work, we propose an automated solution to identify algorithms in network function programs. We model our approach as a classification problem of Machine Learning (ML) and propose using sophisticated program embeddings for representing the network function programs. We also identify the limited availability of datasets and propose a way of extrapolating them by systematically generating equivalent programs using (existing) compiler transformations in popular compiler infrastructures. Our approach relies on modeling programs as embeddings, uses ML models trained on such extrapolated datasets, and shows superior results over the recent works.
@inproceedings{VenkataKeerthy-2022-PacketID, author = {VenkataKeerthy, S. and Andaluri, Yashas and Dey, Sayan and Shah, Rinku and Tammana, Praveen and Upadrasta, Ramakrishna}, title = {Packet Processing Algorithm Identification using Program Embeddings}, year = {2023}, isbn = {9781450397483}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, url = {https://doi.org/10.1145/3542637.3542649}, doi = {10.1145/3542637.3542649}, booktitle = {Proceedings of the 6th Asia-Pacific Workshop on Networking}, pages = {76–82}, numpages = {7}, keywords = {Machine Learning, Network Function program identification, Program Embeddings, SmartNICs}, location = {<conf-loc>, <city>Fuzhou</city>, <country>China</country>, </conf-loc>}, series = {APNet '22}, }

2022

ISPASS
POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning

Shalini Jain , Yashas Andaluri , S. VenkataKeerthy, and 1 more author

In International Symposium on Performance Analysis of Systems and Software , Mar 2022

Abs Bib HTML Video Code Slides Website

The ever increasing memory requirements of several applications has led to increased demands which might not be met by embedded devices. Constraining the usage of memory in such cases is of paramount importance. It is important that such code size improvements should not have a negative impact on the runtime. Improving the execution time while optimizing for code size is a non-trivial but a significant task. The ordering of standard optimization sequences in modern compilers is fixed, and are heuristically created by the compiler domain experts based on their expertise. However, this ordering is sub-optimal, and does not generalize well across all the cases. We present a reinforcement learning based solution to the phase ordering problem, where the ordering improves both the execution time and code size. We propose two different approaches to model the sequences: one by manual ordering, and other based on a graph called Oz Dependence Graph (ODG). Our approach uses minimal data as training set, and is integrated with LLVM. We show results on x86 and AArch64 architectures on the benchmarks from SPEC-CPU 2006, SPEC-CPU 2017 and MiBench. We observe that the proposed model based on ODG outperforms the current Oz sequence both in terms of size and execution time by 6.19% and 11.99% in SPEC 2017 benchmarks, on an average.
@inproceedings{Shalini-2022-PosetRL, title = {{POSET-RL: Phase ordering for Optimizing Size and Execution Time using Reinforcement Learning}}, author = {Jain, Shalini and Andaluri, Yashas and VenkataKeerthy, S. and Upadrasta, Ramakrishna}, booktitle = {{International Symposium on Performance Analysis of Systems and Software}}, year = {2022}, doi = {10.1109/ISPASS55109.2022.00012}, }

LLVM-HPC

Reinforcement Learning assisted Loop Distribution for Locality and Vectorization

Shalini Jain , S. VenkataKeerthy, Rohit Aggarwal , and 3 more authors

In 2022 IEEE/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) , Mar 2022

Bib Code Website

@inproceedings{Shalini-2022-RLLoopDistribution,
  author = {Jain, Shalini and VenkataKeerthy, S. and Aggarwal, Rohit and Dangeti, Tharun Kumar and Das, Dibyendu and Upadrasta, Ramakrishna},
  booktitle = {2022 IEEE/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC)},
  title = {Reinforcement Learning assisted Loop Distribution for Locality and Vectorization},
  year = {2022},
  volume = {},
  number = {},
  pages = {1-12},
  keywords = {Training;Measurement;Costs;Computational modeling;Pipelines;Reinforcement learning;Benchmark testing;Loop Distribution;Vectorization;Locality;Reinforcement Learning},
  doi = {10.1109/LLVM-HPC56686.2022.00006},
}

2020

TACO
IR2Vec: LLVM IR Based Scalable Program Embeddings

S. VenkataKeerthy, Rohit Aggarwal , Shalini Jain , and 3 more authors

ACM Trans. Archit. Code Optim., Dec 2020

Abs arXiv Bib Video Code Website

We propose IR2VEC, a Concise and Scalable encoding infrastructure to represent programs as a distributed embedding in continuous space. This distributed embedding is obtained by combining representation learning methods with flow information to capture the syntax as well as the semantics of the input programs. As our infrastructure is based on the Intermediate Representation (IR) of the source code, obtained embeddings are both language and machine independent. The entities of the IR are modeled as relationships, and their representations are learned to form a seed embedding vocabulary. Using this infrastructure, we propose two incremental encodings: Symbolic and Flow-Aware. Symbolic encodings are obtained from the seed embedding vocabulary, and Flow-Aware encodings are obtained by augmenting the Symbolic encodings with the flow information. We show the effectiveness of our methodology on two optimization tasks (Heterogeneous device mapping and Thread coarsening). Our way of representing the programs enables us to use non-sequential models resulting in orders of magnitude of faster training time. Both the encodings generated by IR2VEC outperform the existing methods in both the tasks, even while using simple machine learning models. In particular, our results improve or match the state-of-the-art speedup in 11/14 benchmark-suites in the device mapping task across two platforms and 53/68 benchmarks in the thread coarsening task across four different platforms. When compared to the other methods, our embeddings are more scalable, is non-data-hungry, and has better Out-Of-Vocabulary (OOV) characteristics.
@article{VenkataKeerthy-2020-IR2Vec, author = {VenkataKeerthy, S. and Aggarwal, Rohit and Jain, Shalini and Desarkar, Maunendra Sankar and Upadrasta, Ramakrishna and Srikant, Y. N.}, title = {{IR2Vec: LLVM IR Based Scalable Program Embeddings}}, year = {2020}, issue_date = {December 2020}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {17}, number = {4}, issn = {1544-3566}, url = {https://doi.org/10.1145/3418463}, doi = {10.1145/3418463}, journal = {ACM Trans. Archit. Code Optim.}, month = dec, articleno = {32}, numpages = {27}, keywords = {heterogeneous systems, representation learning, compiler optimizations, LLVM, intermediate representations}, }

2019

IJESDF
Secure Gray code-based reversible data hiding scheme in radiographic images

B. Karthikeyan , S. VenkataKeerthy, and G. Hariharan

International Journal of Electronic Security and Digital Forensics, Dec 2019

Abs Bib HTML

Transmitting medical information through a network for the purpose of tele-diagnosis involves greater risk of losing confidentiality and integrity of the information being transmitted. This paper presents a scheme that ensures reversibility of the cover image and also makes it suitable for the field of telemedicine. The methodology uses cryptographic and the steganographic methods. The proposed work decreases the overhead by reducing the size of the auxiliary data to be embedded which is used to achieve the reversibility of the cover image. The proposed method also improves security of the data and enhances the image quality. The algorithm yields a reversible data hiding (RDH) scheme based on pixel value ordering (PVO). The methodology differs from other basic schemes as it uses Gray code instead of ordinary binary codes. It naturally suits for medical steganography as the carrier image can be reconstructed after extraction of the secret data and also the distortion caused due to embedding is very less. The method is also robust as one time pad cryptographic technique is used to generate the key.
@article{Karthikeyan-2019-reversibleDataHiding, title = {{Secure Gray code-based reversible data hiding scheme in radiographic images}}, author = {Karthikeyan, B. and VenkataKeerthy, S. and Hariharan, G.}, journal = {International Journal of Electronic Security and Digital Forensics}, volume = {11}, number = {3}, year = {2019}, }

2018

P4WE, ICNP
P4LLVM: An LLVM Based P4 Compiler

Tharun Kumar Dangeti^* , S. VenkataKeerthy^* , and Ramakrishna Upadrasta

In P4WE workshop, International Conference on Network Protocols (ICNP) , Dec 2018

Abs Bib HTML Code Slides

We propose P4LLVM, an LLVM based P4 compiler for achieving better optimizations to improve the runtime performance of the network. The front-end of P4LLVM converts P4-16’s code to LLVM’s Intermediate Representation (IR). This IR is passed through various optimizations of LLVM and is translated to JSON for targeting a BMV2 Switch. We show the performance improvements obtained by running LLVM optimization passes in P4LLVM when compared to P4C.
@inproceedings{Dangeti-2018-p4llvm, title = {{P4LLVM: An LLVM Based P4 Compiler}}, author = {Dangeti<sup>*</sup>, Tharun Kumar and VenkataKeerthy<sup>*</sup>, S. and Upadrasta, Ramakrishna}, booktitle = {{P4WE workshop, International Conference on Network Protocols (ICNP)}}, year = {2018}, }
IoTSMS
INSTRUCT: A Clustering Based Identification of Valid Communications in IoT Networks

Mohd Saalim Jamal , S. VenkataKeerthy, Hideya. Ochiai , and 2 more authors

In International Conference on Internet of Things: Systems, Management and Security , Dec 2018

Abs Bib HTML

Providing access control to the IoT devices is an essential task in today’s ever-growing IoT network. IoT devices are deployed in smart homes, smart buildings, social infrastructures etc. Illegitimate users or malware should be denied access to these devices to protect the sensitive information collected by these devices and the login privileges of its operating system. This paper proposes INSTRUCT, a mechanism for providing access control by identifying valid communication in a network consisting of IoT devices using clustering techniques. INSTRUCT uses the fact that the IoT devices usually communicate with a fixed set of hosts/servers repetitively. By capturing the network traffic and learning the patterns out of the network traffic, this mechanism allows the automatic generation of access control list that can be deployed at the intermediate network switches. INSTRUCT proposes two different algorithms for TCP and UDP respectively. These algorithms are applied to two different IoT networks for evaluation. A signature-based manual analysis is used to compare with the automatically generated access control list from the algorithms. In our experiments, INSTRUCT achieved an accuracy of 100% as compared to the signature based analysis in identifying valid TCP communication. In the case of UDP, it is close to 95%.
@inproceedings{Jamal-2018-instruct, title = {{INSTRUCT: A Clustering Based Identification of Valid Communications in IoT Networks}}, author = {Jamal, Mohd Saalim and VenkataKeerthy, S. and Ochiai, Hideya. and Esaki, Hiroshi and Kataoka, Kotaro}, booktitle = {{International Conference on Internet of Things: Systems, Management and Security}}, year = {2018}, }

2015

ICIIECS
A hybrid technique for quadrant based data hiding using Huffman coding

S. VenkataKeerthy, T. K. C. Rhishi Kishore , B. Karthikeyan , and 2 more authors

In International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS) , Dec 2015

Abs Bib HTML

The paper proposes a robust steganography technique to hide the data in an image. The method proposed uses Huffman coding to minimize the number of bits to be embedded and to improve the security of the information. The security aspect is also improved by using a cryptographic substitution cipher and quadrant based embedding of the data. The quadrant based embedding of data bits helps in distribution of bits uniformly over the entire image rather having concentrated data bits over a particular region. The quality of stego image and the embedding capacity is also improved by the usage of Huffman coding. LSB embedding technique is used in the algorithm for concealing the data in the image.
@inproceedings{VenkataKeerthy-2015-datahiding, title = {{A hybrid technique for quadrant based data hiding using Huffman coding}}, author = {VenkataKeerthy, S. and Rhishi Kishore, T. K. C. and Karthikeyan, B. and Vaithiyanathan, V. and Anishin Raj, M. M}, booktitle = {{International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS)}}, year = {2015}, }

* denotes equal contribution