Dr. Aprajit Mahajan, Associate Professor, University of California, Berkeley

Dr.  Shekhar Mittal, Postdoc, University of California, Berkeley

About the Study

Improving the state’s ability to tax effectively is increasingly seen as central to the development process and value added tax (VAT) is often proposed as a key tool towards accomplishing this goal. In the VAT regime, increasing compliance and reducing tax evasion were significant challenges for the state government. One of the ways to evade tax is through the existence of fraudulent firms -- a company that exists only on paper to provide tax credits to other firms. Researchers plan to apply machine-learning methods on a large network data set (the universe of all tax returns for five years from Delhi) to identify such fraudulent firms and then use on-the-field verification of such guesses to further improve the accuracy of the machine-learning algorithm.

Context and Details of the Evaluation

The VAT implementation in many low compliance environments is plagued by firms generating false paper trails. This demand for false paper trails has led to the creation of fraudulent firms which issue fake receipts to genuine firms that allow the latter to lower their tax liability. The tax authority on the basis of actual physical inspections conducted by the tax department has documented the existence of these firms. However, identifying such firms by evaluating the available information is a laborious process.

The collaboration between Department of Finance Department, Government of NCT of Delhi and researchers from UC Berkeley proposes using “big-data” methods to reduce a share of the burden on tax officials by identifying tax evasion in two ways: reducing the effort needed to find such firms and finding more such firms. The immediate and long-term goals for the research project include:

  • To build a machine-learning (ML) tool using existing administrative data to identify dealers that have a high probability of evading taxes. Machine learning is an application of artificial intelligence (AI) that provides the software with the ability to automatically learn and improve from experience without being explicitly programmed.
    • The tool will predict a list of firms that have a high probability to be engaging in fraudulent activities.
  • To evaluate this tool in the field by asking the tax officials to verify the predictions. These verified predictions would improve the quality of our “training data” and will then be fed back into the ML tool so as to improve its accuracy. This iterative procedure is key to developing a powerful ML tool.
  • Finally, to integrate the tool within the existing system so that the tool can be used repeatedly on a monthly basis.
    • This includes demonstrating the tool to the internal IT wing and external contracted vendors to help them integrate the tool with the main tax administrative system, and
    • Evaluating the cost-effectiveness of the tool in detecting evasions relative to the existing system of conducting manual checks.


Sample - Tax returns for five years from 2012-17

Location - Delhi

Evaluation ongoing


Design and Developed by NIELIT © Copyright DDCD 2019 Last modified on October 19th, 2019